TL;DR use marshmallow-dataclass.
Serializing JSON into dataclasses with validation proved to be unexpectedly difficult. By “validation” here I mean type checking (this must be a valid integer), range checking (this integer must be non-negative), length checking (this string must be at least 5 characters long), and things like.
- 👎 Marshmallow is very rich in features, but it can’t do dataclasses, it produces dictionaries.
- 👎 Dataclasses-json is based on Marshmallow, can do dataclasses, but I could not find how to do validation, the docs does not mention it.
- 👎 Pydantic offers validation, but seems to be not compatible with type checkers, see below.
- 👍 Marshmallow-dataclass looks like the right thing.
Marshmallow-dataclass
Marshmallow-dataclass allows access to underlying Marshmallow validation as follows:
from dataclasses import dataclass, field from marshmallow import validate @dataclass class Person: age: int = field(metadata={"validate": validate.Range(0,150)})
This is a little verbose, but it gives access to the entire spectrum of Marshmallow validators. The only caveat is that a validated field is considered to have a default value, so all fields after it must have a default value as well. This can be fixed by making the default value just field()
:
from dataclasses import dataclass, field from marshmallow import validate @dataclass class Person: age: int = field(metadata={"validate": validate.Range(0,150)}) name: str = field()
Marshmallow-dataclass is very hands-off, the only place where you actually need it is to get a schema from the dataclass. Here’s a full examle:
from dataclasses import dataclass, field import json from typing import Optional from marshmallow import EXCLUDE, validate, ValidationError import marshmallow_dataclass @dataclass class Address: street: str city: str zipcode: str @dataclass class Person: name: str address: Address age: int = field(metadata={"validate": validate.Range(0, 150)}) profession: Optional[str] = None json_str = ''' { "unused": "bla-bla", "name": "John Doe", "age": 30, "address": { "street": "123 Main St", "city": "Springfield", "zipcode": "12345" } } ''' data = json.loads(json_str) schema = marshmallow_dataclass.class_schema(Person)(unknown=EXCLUDE) # allow extra fields try: person = schema.load(data) print(person) except ValidationError as e: print(f"VALIDATION ERROR: {e}")
Pydantic
Pydantic offers special field types like conint
to express validation:
from pydantic import dataclass @dataclass class Foo: positive_int: conint(ge=0)
This works great for validation, but type checkers like mypy are not happy about it, they complain that function calls are not allowed in type annotations. They also can’t figure out that positive_int
is an int. Pydantic offers a plugin for mypy that allegedly fixes it, but I haven’t tried it.
Conclusion
Serialization into dataclasses should not be a mine field, it should be part of the standard library. Fortunately, marshmallow-dataclass provides a reasonably good solution.