Python: serializing JSON into dataclasses with validation

TL;DR use marshmallow-dataclass.

Serializing JSON into dataclasses with validation proved to be unexpectedly difficult. By “validation” here I mean type checking (this must be a valid integer), range checking (this integer must be non-negative), length checking (this string must be at least 5 characters long), and things like.

  • 👎 Marshmallow is very rich in features, but it can’t do dataclasses, it produces dictionaries.
  • 👎 Dataclasses-json is based on Marshmallow, can do dataclasses, but I could not find how to do validation, the docs do not mention it.
  • 👎 Pydantic offers validation, but seems to be not compatible with type checkers, see below.
  • 👍 Marshmallow-dataclass looks like the right thing.

Marshmallow-dataclass

Marshmallow-dataclass allows access to underlying Marshmallow validation as follows:

from dataclasses import dataclass, field
from marshmallow import validate

@dataclass
class Person:
   age: int = field(metadata={"validate": validate.Range(0,150)})

This is a little verbose, but it gives access to the entire spectrum of Marshmallow validators. The only caveat is that a validated field is considered to have a default value, so all fields after it must have a default value as well. This can be fixed by making the default value just field():

from dataclasses import dataclass, field
from marshmallow import validate

@dataclass
class Person:
   age: int = field(metadata={"validate": validate.Range(0,150)})
   name: str = field()

Marshmallow-dataclass is very hands-off, the only place where you actually need it is to get a schema from the dataclass. Here’s a full examle:

from dataclasses import dataclass, field
import json
from typing import Optional
from marshmallow import EXCLUDE, validate, ValidationError
import marshmallow_dataclass

@dataclass
class Address:
    street: str
    city: str
    zipcode: str

@dataclass
class Person:
    name: str
    address: Address
    age: int = field(metadata={"validate": validate.Range(0, 150)})
    profession: Optional[str] = None

json_str = '''
{
    "unused": "bla-bla",
    "name": "John Doe",
    "age": 30,
    "address": {
        "street": "123 Main St",
        "city": "Springfield",
        "zipcode": "12345"
    }
}
'''

data = json.loads(json_str)
schema = marshmallow_dataclass.class_schema(Person)(unknown=EXCLUDE) # allow extra fields
try:
    person = schema.load(data)
    print(person)
except ValidationError as e:
    print(f"VALIDATION ERROR: {e}")

Pydantic

Pydantic offers special field types like conint to express validation:

from pydantic import dataclass

@dataclass
class Foo:
   positive_int: conint(ge=0)

This works great for validation, but type checkers like mypy are not happy about it, they complain that function calls are not allowed in type annotations. They also can’t figure out that positive_int is an int. Pydantic offers a plugin for mypy that allegedly fixes it, but I haven’t tried it.

Conclusion

Serialization into dataclasses should not be a mine field, it should be part of the standard library. Fortunately, marshmallow-dataclass provides a reasonably good solution.

Leave a Reply

Your email address will not be published. Required fields are marked *