Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Enable format validation in validating BIDS schemas #2077

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

candleindark
Copy link
Contributor

@candleindark candleindark commented Mar 4, 2025

The stable version of schema.json fails to validate against the stable version of metaschema.json if format validation is enabled.

Specifically, running check-jsonschema --schemafile metaschema.json schema.json produces the following output

Schema validation errors were encountered.
  schema.json::$.objects.formats.participant_relative.pattern: '(?!/)(?!sub-)[0-9a-zA-Z/\\_\\-\\.]+' is not a 'regex'
  schema.json::$.objects.formats.participant_relative: Unevaluated properties are not allowed ('pattern' was unexpected)
  schema.json::$.objects.formats.bids_uri.pattern: 'bids:[0-9a-zA-Z/#:\\?\\_\\-\\.]+' is not a 'regex'
  schema.json::$.objects.formats.bids_uri: Unevaluated properties are not allowed ('pattern' was unexpected)
  schema.json::$.objects.formats.stimuli_relative.pattern: '(?!/)(?!stimuli/)[0-9a-zA-Z/\\_\\-\\.]+' is not a 'regex'
  schema.json::$.objects.formats.stimuli_relative: Unevaluated properties are not allowed ('pattern' was unexpected)
  schema.json::$.objects.formats.file_relative.pattern: '(?!/)[0-9a-zA-Z/\\_\\-\\.]+' is not a 'regex'
  schema.json::$.objects.formats.file_relative: Unevaluated properties are not allowed ('pattern' was unexpected)
  schema.json::$.objects.formats.dataset_relative.pattern: '(?!/)[0-9a-zA-Z/\\_\\-\\.]+' is not a 'regex'
  schema.json::$.objects.formats.dataset_relative: Unevaluated properties are not allowed ('pattern' was unexpected)
  schema.json::$.objects.formats.hed_version.pattern: '^(?:[a-zA-Z]+:)?(?:[a-zA-Z]+_)?(?:0|[1-9]\\d*)\\.(?:0|[1-9]\\d*)\\.(?:0|[1-9]\\d*)\\ (?:-(?:(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\\.(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?\\ (?:\\+(?:[0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$' is not a 'regex'
  schema.json::$.objects.formats.hed_version: Unevaluated properties are not allowed ('pattern' was unexpected)

where check-jsonschema is at 0.31.2, and by default, it validates with format validation enabled.

My guess is that BIDS schemas are intended to be validated against the meta schema with format validation enabled. This PR provides a solution to ensure validations of BIDS schemas are done with format validation enabled. Specifically, this PR includes the following changes.

  1. Provide a helper function to create jsonschema validators
  2. Provide a helper function to create the jsonschema validator for validating BIDS schemas with format validation enabled. Additionally, this function also caches the validator upon the return of the first execution to improve performance.
  3. Use the validator from previous step to validate BIDS schemas
  4. Include the format extra in the jsonschema package as a dependency to enable format validation when using the package. (See https://python-jsonschema.readthedocs.io/en/latest/validate/#validating-formats for details)

@candleindark candleindark force-pushed the jsonschema-validator branch from 7597bb7 to f8043f5 Compare March 4, 2025 06:12
Copy link

codecov bot commented Mar 4, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.07%. Comparing base (78213dc) to head (280ac9a).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2077      +/-   ##
==========================================
+ Coverage   82.92%   83.07%   +0.14%     
==========================================
  Files          17       17              
  Lines        1511     1524      +13     
==========================================
+ Hits         1253     1266      +13     
  Misses        258      258              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@candleindark candleindark marked this pull request as ready for review March 4, 2025 06:16
@yarikoptic yarikoptic requested review from effigies and tsalo March 5, 2025 19:34
@yarikoptic
Copy link
Collaborator

ping @effigies and @tsalo -- let's get "kosher" again?

@effigies
Copy link
Collaborator

effigies commented Mar 5, 2025

There aren't any changes to the schema or metaschema being proposed, so I don't see how this addresses the stated problem.

I note that this is having problems with the schema in /stable/. If the problem is addressed in latest, then what's there to do?

@candleindark
Copy link
Contributor Author

There aren't any changes to the schema or metaschema being proposed, so I don't see how this addresses the stated problem.

You are right. This PR doesn't correct the problems in the BIDS schema. It only ensures subsequent validations of the BIDS schemas are done with format validation enabled.

I don't think I am the right person to correct any existing problems in the BIDS schema at this point.

@effigies
Copy link
Collaborator

effigies commented Mar 5, 2025

I ran check-jsonschema locally, and this reproduces now. I believe that these are new errors as of python-jsonschema/check-jsonschema#511.

Given that you're not fixing the regexes, shouldn't the goal be for this PR be for CI to fail until they're fixed?

I have a patch that will fix the regexes, but it doesn't make much sense to apply it before we have a test that fails.

@effigies effigies closed this Mar 5, 2025
@effigies effigies reopened this Mar 5, 2025
@effigies
Copy link
Collaborator

effigies commented Mar 5, 2025

(Clicked the wrong comment button.)

@candleindark
Copy link
Contributor Author

I ran check-jsonschema locally, and this reproduces now. I believe that these are new errors as of python-jsonschema/check-jsonschema#511.

You are right. The errors are result of 'regular expression interpretation in "pattern", "patternProperties", and "format": "regex" usages now uses unicode-mode JS regular expressions by default' in check-jsonschema. The following commands produced no error.

(tmp) ➜  tmp check-jsonschema --regex-variant python --schemafile metaschema.json schema.json
ok -- validation done
(tmp) ➜  tmp check-jsonschema --regex-variant nonunicode --schemafile metaschema.json schema.json
ok -- validation done

Given that you're not fixing the regexes, shouldn't the goal be for this PR be for CI to fail until they're fixed?

I just tried using the jsonschema package to validate the BIDS schema against the metaschema, and I was not able to reproduce the errors output by the check-jsonschema CLI. The validation was successful and didn't produce any error at all. In fact, according to python-jsonschema/check-jsonschema#511, to get the default behavior of the check-jsonschema, one has to build a customer validators using the jsonschema package's extend() API. Since this is just a matter of choice regex variant, I don't think it is a good idea, or add much value, to deviate from the default behavior of jsonschema package.

There is an existing test that tests the validation of the BIDS schema against the metaschema, test_valid_schema. It will fail should the BIDS schema fail to validate (now including format validation with this PR).

Though the observed errors that initiated this PR are no longer applicable. I still think the changes in this PR should be included since without them format validation would not run at all in validation of BIDS schemas.

@effigies
Copy link
Collaborator

effigies commented Mar 6, 2025

Though the observed errors that initiated this PR are no longer applicable.

They are still applicable:

󰅂 uv run bst export | uvx check-jsonschema --schemafile src/metaschema.json -
Schema validation errors were encountered.
  -::$.objects.formats.participant_relative.pattern: '(?!/)(?!sub-)[0-9a-zA-Z+/\\_\\-\\.]+' is not a 'regex'
  -::$.objects.formats.participant_relative: Unevaluated properties are not allowed ('pattern' was unexpected)
  -::$.objects.formats.stimuli_relative.pattern: '(?!/)(?!stimuli/)[0-9a-zA-Z+/\\_\\-\\.]+' is not a 'regex'
  -::$.objects.formats.stimuli_relative: Unevaluated properties are not allowed ('pattern' was unexpected)
  -::$.objects.formats.bids_uri.pattern: 'bids:[0-9a-zA-Z/#:\\?\\_\\-\\.]+' is not a 'regex'
  -::$.objects.formats.bids_uri: Unevaluated properties are not allowed ('pattern' was unexpected)
  -::$.objects.formats.dataset_relative.pattern: '(?!/)[0-9a-zA-Z+/\\_\\-\\.]+' is not a 'regex'
  -::$.objects.formats.dataset_relative: Unevaluated properties are not allowed ('pattern' was unexpected)
  -::$.objects.formats.hed_version.pattern: '^(?:[a-zA-Z]+:)?(?:[a-zA-Z]+_)?(?:0|[1-9]\\d*)\\.(?:0|[1-9]\\d*)\\.(?:0|[1-9]\\d*)\\ (?:-(?:(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\\.(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?\\ (?:\\+(?:[0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$' is not a 'regex'
  -::$.objects.formats.hed_version: Unevaluated properties are not allowed ('pattern' was unexpected)
  -::$.objects.formats.file_relative.pattern: '(?!/)[0-9a-zA-Z+/\\_\\-\\.]+' is not a 'regex'
  -::$.objects.formats.file_relative: Unevaluated properties are not allowed ('pattern' was unexpected)

That's why I was hoping we could get to the point of having them appear before I fixed them, so we know the check works.

@candleindark
Copy link
Contributor Author

I will give it another shot, possibly sometime next week though.

@effigies effigies added schema-code Updates or changes to the code used to parse, filter, and render the schema. exclude-from-changelog This item will not feature in the automatically generated changelog labels Mar 7, 2025
This func returns a jsonschema validator that
is bound to the meta schema for validating
 BIDS schemas. Format validation is
enabled in this validator. Additionally, the
validator is cached.
This schema validator have format validation
enabled which is needed to correctly validate
a BIDS schema against the meta schema
To make format validation available in the
`jsonschema` package, the `format` extra
must be included.
See https://python-jsonschema.readthedocs.io/en/latest/validate/#validating-formats
for more details.
@candleindark candleindark force-pushed the jsonschema-validator branch from 2b64162 to 280ac9a Compare March 25, 2025 05:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exclude-from-changelog This item will not feature in the automatically generated changelog schema-code Updates or changes to the code used to parse, filter, and render the schema.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants