Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON Schema validator #254

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from
Draft

JSON Schema validator #254

wants to merge 10 commits into from

Conversation

snej
Copy link
Contributor

@snej snej commented Jan 22, 2025

JSONSchema class parses a JSON Schema and validates Fleece Values against it. See header file for class documentation. You may want to look at the JSON Schema docs / tutorial.

Other Fleece API changes supporting this:

  • Added slice::UTF8Length(), which counts the number of UTF-8 code points
  • Added FLEvalJSONPointer(), a wrapper around fleece::impl::Path::evalJSONPointer()
  • Added FLDictIterator_BeginSK(), exposing the existing ability to pass a SharedKeys to a Dict::iterator

Bug fixes:

  • Fixed some limitations of fleece::impl::Path::evalJSONPointer() (it didn't handle escaped keys)
  • Fixed a SharedKeys issue with empty-string keys

Optimizations:

  • Optimized Dict lookup from a Dict::key (aka FLDictKey), mostly by avoiding locating the Dict's SharedKeys unless we have to, since that's very slow.

@snej snej changed the title JS JSON Schema validator Jan 22, 2025
@snej snej marked this pull request as draft January 22, 2025 20:14
@snej snej force-pushed the feature/schema branch 3 times, most recently from 4073d2c to f3f59a6 Compare January 23, 2025 20:35
snej added 4 commits January 29, 2025 10:05
I did this so the JSON Schema benchmark could be fully optimized.
Code adapted from Litecore's StringUtils.cc
- FLEvalJSONPointer() exposes impl::Path::evalJSONPointer().
- It now properly handles an empty path.
- It now properly handles a path with a trailing "/".
- It now properly handles those weird "~" escapes (see RFC 6901.)
Exposes existing internal method.
snej added 5 commits January 29, 2025 11:24
As an optimization, allow a Dict iterator to be created with its
SharedKeys already known; saves a lookup.
The jsonsl parser we use follows an older version of the JSON spec
which didn't allow a document to be a scalar value, only an array or
object.

I've run into this issue a few times lately -- for example, JSON
Schema considers `true` or `false` a valid schema, but we can't
parse it, so a few tests in their test suite break.

I made a small change to JSONConverter to detect when the input isn't
an array or object, and wrap it in `[...]` so that jsonsl can parse
it. (But meanwhile it ignores the outer array when parsing.)
A weird edge case that showed up when running the JSON Schema test
suite, which includes some JSONPointer paths with empty-string
components.

The bug is that `_table.find(str)` finds the entry for the empty
string, but the following test `entry.key != nullslice` fails
because an empty slice compares as == nullslice since they're both
empty. The proper way to test whether a slice is not nullslice is
to use `if(entry.key)`, which doesn't work here because
`__usuallyTrue` requires a boolean, or `entry.key.buf != nullptr`.
There's no reason for it to be an instance method, and the optimizations
I'm making to Dict require calling it without having a SharedKeys
instance.
These came about from profiling the JSONSchema validator, and speed
up the travel-sample validation benchmark by about 10%.

(1) `DictImpl::get(int)` uses a binary search, but for small Dicts
it's faster to just scan all the keys, especially since we can
precompute what the two bytes of the matching key must be.

(2) `DictImpl::get(slice,SharedKeys*)` can skip looking up the Dict's
SharedKeys (which is slow) if the key isn't alphanumeric since it
won't ever be a shared key.

(2) `DictImpl::get(Dict::key&)` shouldn't waste any time looking up
sharedKeys if it already knows the numeric key.
I also changed its boolean `_hasNumericKey` flag into an int8_t with
three states: 0 for unknown, 1 for true, and -1 for "can't be a
shared key". This latter state avoids trying to look up the key
every single time when it won't ever succeed.
@snej snej force-pushed the feature/schema branch 2 times, most recently from 3037b5b to 85bd947 Compare January 29, 2025 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant