You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This proposal is about reworking the JSON Schema compiler, implementing a virtual machine for validating JSON Schemas, and integrating multiple backends into the compilation pipeline.
The main goal is to change the execution model to a more performance-oriented one. Bytecode execution would be a much faster approach than the current tree-walk interpreter, and it is possible to generate Rust code when the schema is known statically.
Motivation
The current implementation uses super instructions to optimize validation by combining multiple JSON Schema keywords into single operations, minimizing jumps and execution overhead.
While decently effective, this approach is bound by the limitations of direct interpretation. A virtual machine executing bytecode could significantly improve performance by:
Better spatial locality as instructions are packed in a single allocation.
Avoiding lazy compilation as the validation state becomes smaller (instruction pointer + a few iterators + stack of call frames) and easier to handle.
Precompilation of schemas into efficient Rust code via procedural macros / build.rs.
Allowing schemas to be serialized into bytecode, stored, and reloaded for reuse, enabling VM execution directly from a block of static memory.
Implementation Overview
Rework the compilation pipeline so it generates JSON Schema Intermediate Representation (JSIR) first (POC done)
Add JSIR-specific optimization passes (canonicalization like removing redundant combinators like single-value allOf / anyOf + $ref inlining, loop unrolling, etc). Requires a bit of extra case to properly store location metadata
Compilation backend for VM + optimization passes (superinstructions, redundant jumps elimination, etc).
Compilation backend for Rust code.
Probably (3) and (4) may require their own IR format to simplify codegen, i.e. I can imagine instructions like IF_UNPACK_NUMBER that will generate code like
ifletValue::Number(number) = value {}else{}
Right now I've implemented direct bytecode compilation + interpretation for some JSON Schema subset, but plan to split this process into multiple phases.
The text was updated successfully, but these errors were encountered:
It would be great if the byte code format was externally accessible, that way you could bring your own interpreter to implement custom functionality. E.g. If I want to find all json nodes matching a certain subschema.
It would amazing if the interpreter was built with a public API (even if unstable) that could be re-used.
Summary
This proposal is about reworking the JSON Schema compiler, implementing a virtual machine for validating JSON Schemas, and integrating multiple backends into the compilation pipeline.
The main goal is to change the execution model to a more performance-oriented one. Bytecode execution would be a much faster approach than the current tree-walk interpreter, and it is possible to generate Rust code when the schema is known statically.
Motivation
The current implementation uses super instructions to optimize validation by combining multiple JSON Schema keywords into single operations, minimizing jumps and execution overhead.
While decently effective, this approach is bound by the limitations of direct interpretation. A virtual machine executing bytecode could significantly improve performance by:
build.rs
.Implementation Overview
allOf
/anyOf
+$ref
inlining, loop unrolling, etc). Requires a bit of extra case to properly store location metadataProbably (3) and (4) may require their own IR format to simplify codegen, i.e. I can imagine instructions like
IF_UNPACK_NUMBER
that will generate code likeRight now I've implemented direct bytecode compilation + interpretation for some JSON Schema subset, but plan to split this process into multiple phases.
The text was updated successfully, but these errors were encountered: