-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Representing JSON Schema as python types.
The examples below use YAML notation of JSON Schema version draft-wright-json-schema-00.
JSON is a data format representing a limited collection of basic data types.
JSON Schema is language representing a set of constraints applicable to JSON, where empty schema means any JSON value, and set theory can be used to manipulate it.
Python model types, while able to represent the same basic types as JSON, describe what can be stored in memory, where empty model represents no data, which is the inversion of how JSON Schema describes values.
!!! note By default, object instances in python are just dictionaries with OOP syntax, but IDEs and type checking tools treat them more akin to C structs - if a field is not declared, it's not there.
Simplified, JSON Schemas can be transformed to python model with the following formula
python model = any JSON type - declared JSON Schema constraints
-
Since Schema object validates any JSON value, let's consider it a Union type:
{}
=>
dict | list | float | int | str | bool
Most of the constraints are type-specific (they apply only to values of a single type).
The exceptions are:
- nullable: extend types by type null, but only if type is specified in that schema
- enum: only allow values specified in the list
- numeric constraints for types
number
andinteger
, to both of which the numeric constraints apply.
That means most constraints can be processed separately, which is useful when they occur together with allOf
, oneOf
, allOf
and not
.
-
When
type
is present andnullable
istrue
, the allowed types are extended withnull
. Three cases are possible-
any type but null
{}
-
single type
type: integer
=>
int
-
single type or null
type: integer nullable: true
=>
int | None
-
-
Any combination of types is possible with
anyOf
/oneOf
.anyOf: - type: string - type: integer nullable: true
=>
str | int | None
-
If
enum
is inanyOf
sub-schemas, the values are summed as sets. -
If
enum
is inoneOf
sub-schemas, only the values that occur once can be validated. -
If
enum
is inallOf
sub-schemas, only the common values can be validated.
-
Scalar
enum
could be translated literallyenum: - true - false - FileNotFound
=>
Literal[True, False, 'FileNotFound']
or grouped by type:
Union[ Literal[true, False], StrLiteral['FileNotFound'], ]
-
Both scalar and non-scalar literals could be translated as python enums, but that would require names
type: object enum: - key: value
=>
class ${schema}Enum(Enum): elem${idx} = $schema(key='value')
This solution could introduce unintentional breaking changes when simply changing order of enum elements, unless enum elements were named with some extension keyword.
It would work for scalar values
enum: - true - false - FileNotFound
=>
class ${schema}Enum(Enum): value_true = True value_false = False value_FileNotFound = 'FileNotFound'
Non-scalar enum values don't have natural names, but a hash of stringified value could be used.
Also creating an arbitrary number of objects, that might never be used will be expensive and wasteful, so factory methods could be used:
enum:
- id: 1
name: LoL
slug: league-of-legends
=>
def value_5d9b08cdd67689d128f7c30f885f273c():
return CurrentVideogame(
id=1,
name='LoL',
slug='league-of-legends',
)
The problem with this solution is that the name changes when keys or any value changes, which may or may not be desirable from the user-developer perspective.
-
Constraint keywords can be grouped by type (both numeric types together) and processed as such.
maximum: 10 maxLength: 10 anyOf: - type: string - type: integer
=>
anyOf: - type: string maxLength: 10 - type: integer maximum: 10
=>
Union[ Annotated[str, Field(max_length=10)], Annotated[int, Field(ge=10)] ]
-
There might be more than one element for a given type:
anyOf: - type: integer maximum: 10 - type: integer minimum: 20
=>
Union[ Annotated[int, Field(ge=10)], Annotated[int, Field(le=20)], ]
-
The above is different than this, which is a bottom type (no object can validate against it, since no number can be greater than 20 and smaller than 10):
type: integer maximum: 10 minimum: 20
-
As somewhat a special case, this schema is alright:
maximum: 10 minimum: 20
=>
str | bool | dict | list
-
allOf applies the most restrictive set of constraints.
allOf: - maximum: 10 - maximum: 20
=>
maximum: 10
-
JSON type
object
could be mapped todict
or, with some limitations toTypedDict
or a model in one of data modelling libraries like dataclasses, pydantic msgspec, etc. Here the choice falls on pydantic, which seems the most featured. -
In the most trivial (from python's perspective) case properties can be translated to instance fields in a model class:
additionalProperties: false properties: name: type: string required: - name
=>
class $name(BaseModel): name: string
-
In case of empty schema, it's impossible to say anything about it's possible contents.
It could be mapped as
dict
but then adding a property would cause an incompatible change in the python code. Instead, it can be translated to empty model class withextra = 'allow'
{}
=>
class $name(BaseModel): model_config = pydantic.ConfigDict( extra='allow' )
The problem with this form is that there's no way to know what to do with the extra object values. Since an empty schema has the default:
additionalProperties: true
which is the same as
additionalProperties: {}
which means such a definition is indefinitely recursive. We could model it as a simple dict (in union with other types) or a common model class
class AnyObject(BaseModel): model_config = pydantic.ConfigDict( extra='allow' )
but in either case adding a property (particularly a non-required property, which is a compatible change) leads to an incompatible change in the python model.
The value is processed as a JSON Schema.
-
true
Allows extra fields of any type. See above.
-
false
Forbids extra fields
class $name(BaseModel): model_config = pydantic.ConfigDict( extra='forbid' )
-
A schema definition.
Allows extra fields and, if a non-empty schema is used, generate type:
type: object additionalProperties: type: int
=>
class $name(BaseModel): model_config = pydantic.ConfigDict( extra='allow' ) __extra__: dict[str, int]
The keyword is similar to additionalProperties, except keys must match a regular expression.
This could be implemented as a simple extra field with pre-validation, except when both additionalProperties
and patternProperties
are present.
The below example describes an object with positive integer keys (as strings) with string values, and other keys with integer values:
type: object
additionalProperties:
type: int
patternProperties:
"\\d+":
type: string
this could translate to pydantic model:
class $name(BaseModel):
model_config = pydantic.ConfigDict(
extra='allow'
)
__extra__: dict[str, int | str]
_handle_pattern_props = validate(handle_pattern_props)({"\\d":str})
or an alternative with synthetic fields that group each pattern and schema ('model_' prefix added to decrease chances of name clashes):
class $name(BaseModel):
model_config = pydantic.ConfigDict(
extra='allow'
)
__extra__: dict[str, int]
model_pattern_props_xxd: dict[str, str]
_handle_pattern_props = validate(handle_pattern_props)("\\d", str)
The second form would offer stricter validation at the cost of introducing synthetic fields and potential name clashes, while the second form would make a model more closely resembling the JSON object, but would allow invalid instances.
-
When
anyOf
keyword is used, the instance validates as long as it validates against one of children, while the validation results against other children schemas are ignored. -
Scalar types can be implemented as
Union
type: integer oneOf: - maximum: 10 - minimum: 20
=>
Union[ Annotated[int, Field(ge=10)], Annotated[int, Field(le=20)], ]
-
object
s should probably be implemented as syntheticUnion
fields, instead of unions, because addingproperties: length: type: integer anyOf: - properties: height: type: integer - properties: width: type: integer
=>
class $nameAnyOf1(BaseModel): height: int class $nameAnyOf2(BaseModel): width: int class $name(BaseModel): model_prop_any_of: $nameAnyOf1 | $nameAnyOf2
-
Per the specification
oneOf
keyword validates when the value validates against exactly one child schema.In practice it's not the case, and
oneOf
is used as type union, typically with disjoint sub-schemas, but sometimes erroneously with overlapping ones.For this reason it could be processed just like
anyOf
, although separately: values must validate against one ofoneOf
children and one ofanyOf
children. -
With only
type
constraint,anyOf
andoneOf
are equivalent, since any value can be of only one type:oneOf: - type: integer - type: string anyOf: - type: integer - type: string
=>
int | str
-
With more than one constraint for the same
type
, interpreting them (also for python) gets more complextype: integer oneOf: - maximum: 20 - minimum: 10
is equivalent to:
type: integer oneOf: - allOf: - maximum: 20 # not minimum: 10 - maximum: 10 exclusiveMaximum: true - allOf: - minimum: 10 # not maximum: 20 - minimum: 20 exclusiveMinimum: true
Since 10 < 20, it reduces to:
type: integer oneOf: # matches minimum: 10 but not maximum: 20 - minimum: 20 exclusiveMinimum: true # matches maximum: 20 but not minimum: 10 - maximum: 10 exclusiveMaximum: true
When enum
and type
are both used, any value present in enum
but whose type is not present in type
wouldn't validate.
Similarly, any value of allowed type, but absent from enum wouldn't validate.
Therefore enum
keyword determines allowed types, and it's a set intersection of the two.
If type
is defined, constraints for types not listed are discarded.
The default type
value (not defined) means all types are valid and all constraints are considered.
-
When evaluating
type
with eitheroneOf
oranyOf
, the two are equivalent, since no JSON value can be of more than one typeanyOf: - type: integer - type: string
or
oneOf: - type: integer - type: string
=>
int | str
allOf
applies a set intersection to type
# implied type: $any
allOf:
- type: integer
=>
type: integer
maximum: 20
This is a bottom type:
allOf:
- integer
- string
-
Since instances must validate against the parent schema and each of
allOf
child schemas, properties from each schema can be calculated either as a sum of sets, and their schemas merged.allOf: - properties: length: type: integer - properties: width: type: integer
=>
properties: length: type: integer width: type: integer
An exception to this rule occurs when the parent schema or any of the child schemas has
additionalProperties: false
, in which case the resulting set of properties in an intersection of all properties defined in these schemas.allOf: - properties: length: type: integer - properties: width: type: integer height: type: integer additionalProperties: false - properties: length: type: integer height: type: integer additionalProperties: false
=>
properties: height: type: integer
-
In cases where the same property appears in more than once in
allOf
schemas, rules apply as ifallOf
was defined for that property:allOf: - properties: size: maximum: 10 - properties: size: maximum: 20
=>
properties: size: allOf: - maximum: 10 - maximum: 20
=>
properties: size: maximum: 10
-
There are many ways of declaring schemas that no value could validate. Such schemas aren't invalid, but the part describing a single type the the keywords apply to, must be discarded.
minimum: 20 maximum: 10
=>
oneOf: - type: boolean - type: string - type: object - type: array
but this schema never validates, so can't be translated to a type:
type: integer minimum: 20 maximum: 10
-
Conflicting enum values
enum
keyword applies to all types so a conflict here makes the schema impossible to translate into a type:`allOf: - enum: ["red"] - enum: ["green"]
- https://apis.guru/ - a directory of OpenAPI/swagger descriptions.
- https://www.learnjsonschema.com/2019-09/ - an extended explanation of JSON Schema keywords. Wrong version, but close enough.