Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behavior around failed $ref resolution is not defined #1276

Closed
gregsdennis opened this issue Aug 16, 2022 · 22 comments · Fixed by #1439
Closed

Behavior around failed $ref resolution is not defined #1276

gregsdennis opened this issue Aug 16, 2022 · 22 comments · Fixed by #1439
Assignees

Comments

@gregsdennis
Copy link
Member

gregsdennis commented Aug 16, 2022

(By "is not defined" I don't mean that it's considered and left as "undefined" behavior; I mean it's not addressed at all.)

In Slack, we had a short discussion around the behavior of this schema:

{
  "anyOf": [
    { "$ref": "https://unresolvable.uri/schema" },
    true
  ]
}

An implementation could handle this in one of two ways:

  1. Produce a validation error, meaning that /anyOf/0 merely fails validation and the implementation continues to /anyOf/1, ultimately passing an instance.
  2. Produce a runtime error, meaning that the implementation halts evaluation and no validation result is given either way.

The discussion landed on the latter: produce a runtime error, halting evaluation.

We need to add language to spec to properly prescribe this behavior.

For reference:

@handrews
Copy link
Contributor

@gregsdennis I see what you're doing with the anyOf to potentially still produce a valid outcome, but just to make sure I understand, even this:

{
   "$ref": "https://uresolvable.uri/schema"
}

is under-specified.

My vote is for runtime error (as you know but folks who weren't on Slack wouldn't).

@gregsdennis
Copy link
Member Author

gregsdennis commented Aug 16, 2022

Right, my example would either pass validation (because of the anyOf: [ ... , true ]) or it would halt. It's a pathalogical example to illustrate the point.

Given the two optional behaviors, yours would either fail validation or halt.

@jdesrosiers
Copy link
Member

The way I see it, if the referenced schema doesn't exist, then the result of validation of an instance against the schema isn't true or false, it's indeterminate. We don't have the referenced schemas, so we don't know if it would have resulted in true or false.

Therefore, the result of validating a schema against an instances shouldn't be true or false because it contains an indeterminate result. It doesn't necessarily need to be a runtime error, but it needs to be a result other than true or false. Generally that means throwing an error, but I don't think it must be. If the implementation has another way of representing an indeterminate result, that would be fine too.

@gregsdennis
Copy link
Member Author

I think the issue is that the output in the specification assumes that evaluation completed successfully, so it doesn't provision a way to represent an indeterminate result. Maybe that's something that needs consideration.

Regardless, it seems we agree that halting evaluation is the right action; just how to report it to the consumer is the remaining question. Perhaps this is fine to leave as implementation-defined (as much as that's bitten us before). This leaves the door open for implementations to decide in their context whether an exception is necessary vs some other mechanism. Either way, they should document it.

@handrews
Copy link
Contributor

@gregsdennis I think this gets into some really important design questions that we've touched on in our IM discussions.

my mental model for JSON Schema evaluation is something like:

evaluator = JSONSchemaImplementationThing()
evaluator.load(schema1)  # might raise an exception if schema cannot be scanned for identifiers, etc.
evaluator.load(schema2, retrieval_uri=uri2, media_type_params={schema: meta_schema2)
result = evaluator.evaluate(instance, schema_uri_1)  # might raise an exception for unresolvable reference or whatever
if result.valid:
    annotations = result.output('basic')
    do_stuff(annotations)
else:
    errors = result.output('hierarchical')
    report_errors(errors)

So I would not expect the output format to handle the indeterminate result. I've mentioned before that I'm wary of making the output structure the One True Way to interact with schema evaluation. It is an interoperable report of the evaluation, and that's important, but we shouldn't lose the flexibility of native programming language idioms either.

As noted in the last community call, generating a complete output structure can be very expensive in both performance and memory consumption. In many cases, a more interactive, idiomatic approach would be preferred.

@Julian has a really awesome exception interface for working with errors, including a programmatically queryable tree interface. (I have not checked to see if the way you invoke his implementation is similar to what I wrote above, but it fits in the sense that there's a thing you can interact with after evaluation).

There are places where we need the interoperability: For example, a microservice architecture where one service runs JSON Schema in Python and another in .NET, but it's the front-end code in JavaScript that needs to interpret the results.

But we don't want to get so focused on "output" that we preclude other modes of interaction. Even in the microservices environment, a runtime problem such as an unresolved reference might be better handled by returning an HTTP error in application/problem-detail+json format than returning a JSON Schema output structure.

@handrews
Copy link
Contributor

(let me just note that I am aware that @gregsdennis was not suggesting that output was the only way to handle this, and I should have made that more clear before talking about possible alternatives that I wanted to get into this conversation)

@gregsdennis
Copy link
Member Author

But we don't want to get so focused on "output" that we preclude other modes of interaction. Even in the microservices environment, a runtime problem such as an unresolved reference might be better handled by returning an HTTP error in application/problem-detail+json format than returning a JSON Schema output structure.

As discussed elsewhere, the output as defined by the specification isn't necessarily what an application (e.g. a microservice API) need return; it's what is returned from the implementation to the application. It's perfectly fine for the application to translate that into application/problem-detail+json or whatever reporting mechanism it wants for consumption by its clients.

@handrews
Copy link
Contributor

I was making a distinction between using the full error output (in which case I would probably embed it in an application/problem-detail+json response, assuming the likely required action was the user correcting something, e.g. form input), vs how to report a problem that prevented evaluation from happening.

But I can see how runtime errors could fit into the output format given that the location information is helpful either way, in which case we'd just need a 3rd value (null?) for valid to express that. 🤔

I am absolutely not against using output for this, I'm just trying to figure out where we want interoperability (which comes at an efficiency cost) vs where we want efficiency.

@gregsdennis
Copy link
Member Author

I agree that output (at least as it currently exists) isn't right for this kind of error. I think an implementation should use whatever native language support exists.

I've made an attempt at covering this in my data v2 PR (section 4.4). This is what spawned the question in the first place.

@gregsdennis
Copy link
Member Author

So we've agreed that unresolvable reference failures should be handled in-language. @Julian do you think there's a way we can test this with the suite?

@Julian
Copy link
Member

Julian commented Aug 19, 2022

We've had this come up a few times on the test suite, most recently here and here. (n.b. both of those cover slightly different variations on the theme. I'm not sure it's worth your time to read through them, though obviously feel free to if you'd like).

I'm OK with us adding another folder and sticking some of these in them. The "issue" if there is one is that there isn't really anything interesting we can say about the result (i.e. having an equivalent of the valid property would be pointless). And for some implementations, even mine, it'd be quite annoying to run them even, because I raise different error classes for some of these, and the test wouldn't be able to say which of them it's saying will get raised.

So I think the best first step is simply to have a file with an array of the schemas and descriptions themselves, and no assertion. But yeah feel free to send over a small PR (or otherwise stick it in an issue).

@handrews
Copy link
Contributor

So I think the best first step is simply to have a file with an array of the schemas and descriptions themselves, and no assertion.

@Julian I'm not sure I understand the "no assertion" part. Why wouldn't we have a value (e.g. "valid": null or "error": true) that indicates that we expect an error rather than a pass/fail result? I don't think it's necessary or plausible to specify the nature of the error.

@Julian
Copy link
Member

Julian commented Aug 19, 2022

"valid": null overloads a property with other meaning, so best to avoid that, and "error": true just seems pointless to introduce, all the examples in the whole file are going to be error: true, so until/unless we have some other information to convey it doesn't seem needed.

@handrews
Copy link
Contributor

So people writing test executors would write something like:

if 'valid' in test_case:
    # run validation and check against the valid field
else:
    # run valid and fail the test if there's no error

@Julian
Copy link
Member

Julian commented Aug 19, 2022

I think it'll be even easier to match based on the directory and we not mix test types between the two -- so errors/ contains tests with different structure.

@Julian
Copy link
Member

Julian commented Aug 19, 2022

(and then no existing runners break, since they're not looking in the new directory anyhow)

@handrews
Copy link
Contributor

I think it'll be even easier to match based on the directory and we not mix test types between the two -- so errors/ contains tests with different structure.

Oh I see what you mean, that wasn't clear. Could you file the issue about the proposed new directory structures soon, please? Directory-based behavior keeps coming up and there's nothing to point to for context about where you're going with it right now.

@karenetheridge
Copy link
Member

I agree, an "unreconcilable" error should be produced.

This is what I do:

$; json-schema-eval 
enter data instance, followed by ^D:
1
enter schema, followed by ^D:
{"anyOf":[true,{"$ref":"https://unresolvableschema"}]}
^D
{
  "errors" : [
    {
      "error" : "EXCEPTION: unable to find resource https://unresolvableschema",
      "instanceLocation" : "",
      "keywordLocation" : "/anyOf/1/$ref"
    }
  ],
  "valid" : false
}

Essentially, a normal error is produced for the appropriate keyword+dataLocation, and then an exception is thrown to be caught by the top level of evaluation (before the recursive processing of the schema begins), so it cannot be paired with an "anyOf":[true, ... ], wrapped in a not, etc.

The result object, which is returned from the evaluation call and contains all the error objects, also has an exception flag which is set to boolean true indicating that the result is not the same as a normal validation error.

FWIW, the other ways in which such exceptional errors can be generated _at runtime are:

  • similar failure to resolve a reference in $recursiveRef or $dynamicRef
  • unknown vocabulary referenced in $vocabularies
  • maximum evaluation depth exceeded (to detect infinite recursion)
  • infinite loop detected (same location evaluated twice - another form of infinite recursion)
  • unknown media type referenced in contentEncoding or contentMediaType

@gregsdennis
Copy link
Member Author

@karenetheridge Does your implementation halt execution when that's encountered, or do you continue and just report it similar to a validation error?

Unknown vocabs should only be an error if the value for that vocab is false. Thus I think this should be reported in the standard output, not as a runtime error.

Unknown media types (and the content* keywords in general) are a different animal. My implementation just reports these as annotations so that the host app can decide what to do with it. I'll never have a runtime error from these.

@handrews
Copy link
Contributor

handrews commented Sep 5, 2022

unknown media type referenced in contentEncoding or contentMediaType

This is no longer a concern in 2020-12, as these keywords are just annotations.

unknown vocabulary referenced in $vocabularies

This, and certain situations involving $schema or its absence are "refusal to process" errors. They are either valid, nor invalid, nor a runtime error. They are "I looked at this and decided not to do it." Which is different from "I thought I could do this and started to do it and then something blew up."

@karenetheridge
Copy link
Member

Does your implementation halt execution when that's encountered, or do you continue and just report it similar to a validation error?

Evaluation is halted, in the sense that an exception is thrown to the top of the evaluation scope, but an exception is not thrown right back to the caller of the evaluator. The evaluation returns a result object as normal, with the exception represented as an error. (Any errors encountered earlier in the evaluation, for example from a sibling properties: { foo: false }, may also be reported in the result object as well.) But it is not possible to return a valid result if such an exception is encountered.

Unknown vocabs should only be an error if the value for that vocab is false.

I think you mean true here. Yes, vocabularies that are false are defined in the spec as "okay to ignore if not found":

If the value is false, implementations that do not recognize the vocabulary SHOULD proceed with processing such schemas. But if the vocabulary is true, I think this qualifies as an exception on the same level as an unresolveable $ref: "we cannot proceed without this URI resolving, therefore, we must stop with an indeterminate result." -- just as a $schema URI being unrecognized.

@gregsdennis
Copy link
Member Author

I think you mean true here.

Yes, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment