Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whitespace versus colon as a namespace separator #146

Closed
mozfreddyb opened this issue Jun 10, 2022 · 10 comments
Closed

Whitespace versus colon as a namespace separator #146

mozfreddyb opened this issue Jun 10, 2022 · 10 comments

Comments

@mozfreddyb
Copy link
Collaborator

@annevk and I noticed that there are pieces of spec text where we separate (or join) namespace + element name with a whitespace character while most other bits use the colon :.

@annevk pointed out that some poor mind could create a HTML element named svg:svg by using document.createElement('svg:svg'), just because it's syntactically correct HTML.

We either need to use the whitespace consistently or rigorously avoid these kinds of element names.. Relatedly, our handling of names that re neither in the baseline nor in the defaults is potentially not explicit enough.

@otherdaniel
Copy link
Collaborator

@annevk and I noticed that there are pieces of spec text where we separate (or join) namespace + element name with a whitespace character while most other bits use the colon :.

Those should be editing mistakes, where I didn't clean up the spec properly after changing to colon. We should certainly fix them.

@annevk pointed out that some poor mind could create a HTML element named svg:svg by using document.createElement('svg:svg'), just because it's syntactically correct HTML.

Yes, that's explicitly called out in a note. That was originally the reason for using whitespace as the namespace designator, and got dropped for simplicity. I suspect "simplicity" is still the right call here, but am happy to change it if we don't feel that was a good call.

We either need to use the whitespace consistently or rigorously avoid these kinds of element names.. Relatedly, our handling of names that re neither in the baseline nor in the defaults is potentially not explicit enough.

@annevk
Copy link
Collaborator

annevk commented Jun 13, 2022

The main reason I would prefer whitespace here is that using : means we cannot express all names within the namespaces we choose to support. For a generic API this seems problematic. We've gotten pretty consistent feedback that document.createElement() and the HTML parser mismatch is problematic. This would then mismatch both of those in a new way. (See whatwg/dom#1079 for work on unifying the HTML parser and document.createElement(). That still allows whitespace to be used for something else.)

There is some merit to : in that it resembles "qnames", though on the flipside it's not actually a "qname" in the usual sense as the prefixes are limited and hard-coded.

@domenic curious if you stumbled across this part of the sanitizer API and thought about the implications.

@domenic
Copy link

domenic commented Jun 13, 2022

IMO the best API is when local name and namespace (usually in URL form) are given separately. Using a string microsyntax seems pretty bad, no matter what kind of separator is used.

@Andrew-Cottrell
Copy link

HTML and CSS are both using a string microsyntax for this purpose (although with different separators), so I think use of a string microsyntax is defendable here on familiarity and ease-of-use grounds. I wouldn't prefer anything like the following style of API

new Sanitizer({allowAttributes: {"style": [
    "span",
    ["svg", "line"],  // or {ns: "svg",  ln: "line"}  or {namespace: "svg",  localName: "line"}
    ["math", "mfrac"] // or {ns: "math", ln: "mfrac"} or {namespace: "math", localName: "mfrac"}
]}});

Is the option of using the | character (like CSS type selectors) a possibility? Other potential options (aside from ASCII whitespace) are / and >, which I understand to be invalid in names.

  • "svg|line"
  • "svg line"
  • "svg/line"
  • "svg>line"

@domenic
Copy link

domenic commented Jun 13, 2022

I just don't think adding to the existing confusion and mess is worth saving a bit of typing on the configuration side. { namespaceURI: "http://www.w3.org/2000/svg", localName: "line" } matches existing DOM APIs the best, by far. You could let simple strings, e.g. "span", be a shorthand for { namespaceURI: "http://www.w3.org/1999/xhtml", localName: "span" }, but I wouldn't go further than that in terms of adding syntactic sugar.

@otherdaniel
Copy link
Collaborator

My reading of https://html.spec.whatwg.org/multipage/syntax.html#attributes-2 (table at the end) is that for attributes the colon separator is part of the name, and not something we can or should change. Is this so?

This was part of the motivation for going with ":" for elements, in that it struck me as rather surprising to use "svg line" for an element, but "xlink:href" for an attribute. With similar surprise for any of the alternatives, I guess. I can justify any of these choices from a standards perspective; but I don't think I can reasonably explain them to anyone that just wants to use the API.

In any case, having "div" yield the expected result (match an HTML div element) would be highly desirable.

@Andrew-Cottrell
Copy link

Maybe we can find an API that avoids confusion & mess, while also being somewhat reasonably explainable.

How about something like the following?

new Sanitizer({allowAttributes: {"style": [
    "span",
    Sanitizer.svg("line"),
    Sanitizer.math("mfrac")
]}});

where Sanitizer.svg and Sanitizer.math return something like { namespaceURI: "http://www.w3.org/2000/svg", localName: "line" }.

@domenic
Copy link

domenic commented Jun 13, 2022

is that for attributes the colon separator is part of the name, and not something we can or should change. Is this so?

No. That's just a table explaining how authors need to write their attributes to pass the HTML checker and get a "valid HTML" badge. The actual underlying model is a (namespace URL, local name) tuple, per the first two columns.

Maybe we can find an API that avoids confusion & mess, while also being somewhat reasonably explainable.

I don't think introducing a new convention such as Sanitizer.svg("line") is any more explainable than using the existing one, which all the existing DOM APIs use, of namespaceURI and localName properties.

@mozfreddyb
Copy link
Collaborator Author

We're proposing a newer config syntax that will hopefully address @domenic's concerns
Discussion is in #181 and I'd like to close this issue here, unless anyone objects.

@mozfreddyb
Copy link
Collaborator Author

Follow-up conversations (if any) should be in 181 as mentioned above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants