Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for mechanism to determine if children changed steps are called as a result of script or parser #1288

Open
lukewarlow opened this issue May 16, 2024 · 7 comments
Labels
addition/proposal New features or enhancements needs concrete proposal Moving the issue forward requires someone to figure out a detailed plan topic: nodes

Comments

@lukewarlow
Copy link
Member

What problem are you trying to solve?

In TrustedTypes we have a need to distinguish between a script's children being changed by an API (e.g. script.appendChild) or changed by the parser.

This is specifically needed as we treat all parsed scripts as trusted by default. However, it's possible for a script to be manipulated before it's finished parsing, and in those cases it shouldn't be trusted.

What solutions exist today?

Both Chromium and WebKit implement a childrenChanged function on their ContainerNode type which I believe translates to the children changed steps concept defined in the DOM algorithm. However, unlike the spec Chromium and WebKit both provide an argument to the function with details, such as the type (what changed and how it changed), and importantly the source of the change (parser vs API).

We can (and do) use this to invalidate scripts in the above described situation.

How would you solve it?

It would be good if the DOM spec matched the reality of at least 2 implementations (unsure about Gecko), and provided details about the change, at the very least for this use case the source of the change.

Anything else?

See w3c/trusted-types#499 (comment) for a long discussion that lead to this issue being raised.

I'm open to any alternative spec mechanisms we can use to spec this behaviour, so if there's a way today or a simpler change we can use let me know.

@lukewarlow lukewarlow added needs implementer interest Moving the issue forward requires implementers to express interest addition/proposal New features or enhancements labels May 16, 2024
@annevk
Copy link
Member

annevk commented May 16, 2024

@annevk annevk added topic: nodes needs concrete proposal Moving the issue forward requires someone to figure out a detailed plan and removed needs implementer interest Moving the issue forward requires implementers to express interest labels May 16, 2024
@lukewarlow
Copy link
Member Author

cc @otherdaniel as he did the Chromium implementation

@domfarolino
Copy link
Member

I am interested in two things here.

  1. The integrity of the "is changed by API" bit that Chromium maintains. Are we positive that it is accurate in the ways we expect it to be here? I did just a little bit of digging and I found that unconditionally in the ContainerNode::RemoveChildren() path, the ChildrendChangedSource is always kAPI. But this is a very broad method that covers a ton of cases, and I'm wondering if it sometimes covers parser-only cases that get treated as kAPI cases incorrectly. For example, I found that when an HTMLInputElement gets its type updated, possibly as a result of a parser(I think), we go through this flow: HTMLInputElement::ParseAttribute --> HTMLInputElement::UpdateType --> HTMLInputElement::SetInnerEditorValue() --> ReplaceChildrenWithText --> ContainerNode::RemoveChildren. This kind of seems like innocent parser changes can trigger the children changed flow with the kAPI flag set incorrectly. I'll be honest I haven't spent too much time debugging this case to know whether my speculation is right or not, but specifically I'm interested in understanding if all browsers (that implement this extra non-specified bit) match in all scenarios.
  2. The point made by @annevk in Update HTML Parser steps for script element to set "script text" w3c/trusted-types#499 (comment) about whether we derive this bit of information based on (a) "invalidation" (I am actually not sure what this means. Is there a common dfn somewhere), or (b) comparing pre-insert and post-insert children contents to detect mid-parsing API intervention 1.

Footnotes

  1. One point I am not clear on is: is it possible for a node's children to be changed during parser as a result of a non-API action? Like, can the parser itself be responsible for under any scenario?

@annevk
Copy link
Member

annevk commented Jun 17, 2024

By "invalidation" I meant having the script element maintain a boolean that gets set to false (invalidated) when non-parser manipulation has taken place. This seems similar to the "is changed by API" you describe in 1 above.

Such a setup seems vastly preferable over maintaining a complete separate copy of the contents of the script element, to the extent that is even viable to begin with. For instance, is there a clear difference between pre-insert and post-insert? There's a clear point when we hit </script>, but until that code points can be streaming in from various places.

@lukewarlow
Copy link
Member Author

I'll take some more time to properly read through and respond but regarding how correct the "is changed by API" bit is.

Chromium used to (and right now WebKit still does) have at least 1 situation where a parser change is marked as API and so the script is untrusted. The XML parser would append nodes as if they were an API change. This is because a legacy deprecated SVG element (tref) is built on mutation events, and so it went down the path that fired mutation events. Chromium simply removed that element and fixed the parser, WebKit is slightly more complicated because it might be used in non-web places, BUT that's looking like the end result too.

This is the only (relevant) place that's wrong from my testing but it's possible there could be other mismatches.

I'm wondering if it sometimes covers parser-only cases that get treated as kAPI cases incorrectly.

The main thing is we need all API changes to show as such, if some parser steps pretends it's an API that's not the end of the world, especially because in it's current form it's just used during script element parse time.

One point I am not clear on is: is it possible for a node's children to be changed during parser as a result of a non-API action? Like, can the parser itself be responsible for under any scenario?

Yes, my understanding is that the parser itself will fire children changed steps during the course of parsing.

@domfarolino
Copy link
Member

This is the only (relevant) place that's wrong from my testing but it's possible there could be other mismatches.

That is good to know. Web platform tests confirming this across the two implementations would be lovely to see though.

Yes, my understanding is that the parser itself will fire children changed steps during the course of parsing.

Right but I guess the question is: do these "children changed" steps in any specs actually modify the children of the node they are being called on? I don't think <script> processing uses the children changed steps today. After whatwg/html#10188 though, the "children changed steps" will be used for scripts (see this example), which will run scripts, but any further children modification done as a result of that I guess will be marked as an "API-initiated change" Anyways, I realize my initial question here was very dumb, since yes of course the parser can add children to nodes. That's like its only job when parsing a page.

@lukewarlow
Copy link
Member Author

lukewarlow commented Jul 9, 2024

Following up on this I've opened a draft PR on Trusted Types which tries to advaced the script protection. I've moved to a new model after discussions with Anne and others which be simpler to understand.

The idea is that a script element has two flags, one to say it's trusted which is initially true (so parsed scripts are trusted), and one to say it's been modified by a trusted sink.

The model relies on the children changed steps and a new flag "parser change" for those steps which can be used to differentiate between API or Parser. See w3c/trusted-types#533 for the specifics.

So the big missing piece of the puzzle is how to get that new "parser change" flag specced. It seems that the children changed steps aren't neccessarily called directly from the parser so it might require piping the flag through multiple layers of indirection?

I also discussed with @smaug---- about how feasible this information is to get in Gecko and while the implementation is different from Webkit and Chromium, it's sounds like it should be easy enough to get the neccessary information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements needs concrete proposal Moving the issue forward requires someone to figure out a detailed plan topic: nodes
Development

No branches or pull requests

3 participants