Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] xml-to-json() ignores namespace requirement #5543

Open
djbpitt opened this issue Nov 10, 2024 · 4 comments
Open

[BUG] xml-to-json() ignores namespace requirement #5543

djbpitt opened this issue Nov 10, 2024 · 4 comments
Labels
bug issue confirmed as bug
Milestone

Comments

@djbpitt
Copy link

djbpitt commented Nov 10, 2024

Describe the bug

The description of the xml-to-json() function in the XPath and XQuery Functions and Operators 3.1 spec states that:

The first argument $input is a node; the subtree rooted at this node will typically be the XML representation of a JSON document as defined in 17.4.2 XML Representation of JSON.

That definition says that:

the phrase "an element named N" is to be interpreted as meaning "an element node whose local name is N and whose namespace URI is http://www.w3.org/2005/xpath-functions".

The xml-to-json() function in eXist-db transforms XML map and array markup to JSON even when the namespace is omitted. <oXygen/> raises an error if the XML elements to be transformed are not in the required namespace.

Expected behavior

I expect eXist-db to raise an error if the XML elements to be transformed by the xml-to-json() function are not in the required namespace.

To Reproduce

Create the following XQuery and run in eXide and in the <oXygen/> XQuery debugger. It will succeed in both. Then remove the namespace declaration and rerun. The code will succeed (incorrectly) in eXide and raise an error (correctly) in <oXygen/>:

let $doc :=
<array
  key="stooges"
  xmlns="http://www.w3.org/2005/xpath-functions">
  <string>Curly</string>
  <string>Larry</string>
  <string>Moe</string>
</array>
return
  fn:xml-to-json($doc)

Context (please always complete the following information)

Build: eXist-6.2.0 (c8fa495)
Java: 1.8.0_333 (Oracle Corporation)
OS: Mac OS X 12.7.6 (x86_64)

Additional context

  • How is eXist-db installed? [e.g. JAR installer, DMG, … ]: DMG
  • Any custom changes in e.g. conf.xml?: No
@joewiz joewiz added the bug issue confirmed as bug label Nov 11, 2024
@joewiz joewiz added this to the eXist-6.3.1 milestone Nov 11, 2024
@joewiz
Copy link
Member

joewiz commented Nov 11, 2024

Error in Saxon:

xml-to-json: element found in wrong namespace: Q{}array

Error in BaseX:

[FOJS0006] Element 'array' has invalid namespace: ''.

The QT3 test suite contains a test for this condition, but the addition of this function to eXist predated the integration of the test runner with eXist's CI. In addition, some of the tests for this function in eXist lack the required namespace on the tested elements.

I think the fix is likely to update the conditions here - https://github.com/eXist-db/exist/blob/develop/exist-core/src/main/java/org/exist/xquery/functions/fn/FunXmlToJson.java#L77-L79 - to include a check for the namespace of the element.

@dariok
Copy link

dariok commented Nov 11, 2024

The important part of the spec is the definition $input:

The node supplied as $input must be one of the following: [err:FOJS0006]

  1. An element node whose name matches the name of a global element declaration in the schema given in C.2 Schema for the result of fn:json-to-xml ("the schema") and that is valid as defined below:

    a. If the type annotation of the element matches the type of the relevant element declaration in the schema (indicating that the element has been validated against the schema), then the element is considered valid.

    b. Otherwise, the processor may attempt to validate the element against the schema, in which case it is treated as valid if and only if the outcome of validation is valid.

    c. Otherwise (if the processor does not attempt validation using the schema), the processor must ensure that the content of the element, after stripping all attributes (at any depth) in namespaces other than http://www.w3.org/2005/xpath-functions, is such that validation against the schema would have an outcome of valid.

    Note:

    The process described here is not precisely equivalent to schema validation. For example, schema validation will fail if there is an invalid xsi:type or xsi:nil attribute, whereas this process will ignore such attributes.

  2. An element node E having a key attribute and/or an escaped-key attribute provided that E would satisfy one of the above conditions if the key and/or escaped-key attributes were removed.

  3. A document node having exactly one element child and no text node children, where the element child satisfies one of the conditions above.

Furthermore, $input must satisfy the following constraint (which cannot be conveniently expressed in the schema).
Every element M that is a descendant-or-self of $input and has local name map and namespace URI http://www.w3.org/2005/xpath-functions must satisfy the following rule: there must not be two distinct children of M (say C1 and C2) such that the normalized key of C1 is equal to the normalized key of C2. The normalized key of an element C is as follows:

If C has the attribute value escaped-key="true", then the value of the key attribute of C, with all JSON escape sequences replaced by the corresponding Unicode characters according to the JSON escaping rules.

Otherwise (the escaped-key attribute of C is absent or set to false), the value of the key attribute of C.

In the schema definition array is defined as of type J:array, which in turn is defined as:

<xs:complexType name="arrayType">
        <xs:choice minOccurs="0" maxOccurs="unbounded">
            <xs:element ref="j:map"/>
            <xs:element ref="j:array"/>
            <xs:element ref="j:string"/>
            <xs:element ref="j:number"/>
            <xs:element ref="j:boolean"/>
            <xs:element ref="j:null"/>
        </xs:choice>
        <xs:anyAttribute processContents="skip" namespace="##other"/>
    </xs:complexType>

Furthermore, we have:

Nodes in the input tree are handled by applying the following rules, recursively. In these rules the term "an element named N" means "an element node whose local name is N and whose namespace URI is http://www.w3.org/2005/xpath-functions".

And the error is defined:

A dynamic error is raised [err:FOJS0006] if the value of $input is not a document or element node or is not valid according to the schema for the XML representation of JSON, or if a map element has two children whose normalized key values are the same.


The problem now arises from the fact that the rules and definitions contradict each other:

The definition for $input as quite above, in No. 1, only refers to the element’s name, not its FQName and 1a refers to matching type annotations, not requiring full validation (which is expressly confirmed in the note!). Having the same type annotation, though, does not mean being in the same namespace!

A schema identical to the one quoted without defining the fn-namespace as targetNamespace would still contain the type definition of j:array and hence 1a would actually fit the example.
Hence, by the definition, the example without th fn-namespace will pass the input “validation” as defined.

There is, though, no defined way how the data are to be processed (as that requires the namespace) and an error – well, the error and the definition still contradict each other…

Hence, I suggest to discuss this issue in the context of how the spec should be phrased as the real intention is not really clear (in all practicality, if a nodetree validates against the schema but for being in the fn-namespace, no actual problem would arise).

@djbpitt
Copy link
Author

djbpitt commented Nov 11, 2024

I am puzzled by the phrase, in the last sentence above, that "the real intention is not really clear". I don't have a lot of experience parsing the language of specifications, so perhaps there is a genuine uncertainty about intention that only my own inexperience prevents me from recognizing, but I don't see any ambiguity at all about "the real intention": the citation I reproduce from the spec says plainly that the elements to be converted from XML to JSON by the xml-to-json(() function must be in the http://www.w3.org/2005/xpath-functions namespace. The Saxon and BaseX implementations seem to agree that this is the real intention, since it is the behavior they enforce.

With that said, I agree that the current QT4 revision of the specs provides an opportunity to revisit the wording where it might lead to confusion—not because the real intention is unclear, but because the language may not be as clear as the intention.

@dariok
Copy link

dariok commented Nov 11, 2024

The catch is in the contradiction I pointed out – the definition of $input does not require a namespace while the remainder of the text does make it a requirement.

I’d like to also draw your attention to one word in your first quote: “typically”.
This means that the input might be something other than “the XML representation of a JSON document as defined in 17.4.2 XML Representation of JSON.”

Also, when processing: as long as the data types match, there is nothing that precludes a node tree from being parsed – that might well be the reason for the simplified “validation” that is described in my quote above (by requiring the type annotations to match without recurring to the fn-namespace).
The lengthy description of a simplified “validation” would not be necessary as the spec could simply point out that $input must validate against the schema.

These last 2 points together allow for the interpretation that the intention might have been to allow such a node tree to actually be an acceptable (“is considered valid”) input with the more restrictive text simply having been copied from the definition of the inverse function.

That’s the reason for my claim that the intention is not completely clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug issue confirmed as bug
Projects
None yet
Development

No branches or pull requests

3 participants