otherMarkup #240

otherdaniel · 2024-09-13T16:20:01Z

Comment on #237 (comment): "We should not need otherMarkup"

[Splitting this out from discussion on #237]

I re-introduced otherMarkup, because I wanted a way to express the behaviour of setHTMLUnsafe without parameters. The requirements, I take it, are:

parseHTMLUnsafe/setHTMLUnsafe("...") (without passing in a config) should not modify the parse result.
the empty options dict should behave like no options dict
- setHTMLUnsafe("...") == setHTMLUnsafe("...", {}) == setHTMLUnsafe("...", {sanitizer: {})
setHTML should behave mostly like setHTMLUnsafe, except for the known-XSS-unsafe stuff.

Generally, I see 3 options to satisfy these:

Have otherMarkup (or somesuch), so that we can have an unsafe-default config that will pass through all markup.
Have the behaviour of otherMarkup, but tie it to the method invocation.
- I.e., setHTML("<bla>", {sanitizer: s}) would behave differently from setHTMLUnsafe("<bla>", {sanitizer: s})
Special case the absent and empty config to behave in a way that a Sanitizer instance can't.
- I.e., setHTMLUnsafe("...") does something setHTMLUnsafe("...", {sanitizer: s}) can't, for any value of s.

I have a mild preference for the first; but can live with any of them.

The text was updated successfully, but these errors were encountered:

otherdaniel · 2024-09-18T14:24:10Z

[Follow-up from meeting today:]

I removed otherMarkup from #237. The key sentence is https://pr-preview.s3.amazonaws.com/otherdaniel/purification/pull/237.html#sanitize-core, 2.4.2.

An empty allow list now allows everything. That is:

.setHTMLUnsafe("<bla>") => <bla>
.setHTML("<bla>") => (Because .setHTML default has an allow-list based default)
.setHTML("<bla>", {sanitizer: {}}) => <bla> (Because explicit config is given; and explicit config has no allow list)

One oddity this has is that removing elements can allow more of them. E.g.:

s = new Sanitizer({elements: ["div"]); # Allows only <div>, but not e.g. <p>
s.removeElement("div"); # Allows everything except <div>. Does allow <p>.

Does this reflect group consensus?

One item that came up when discussing this with Guillaume: Can we distinguish between a missing list and an empty one? That is, {} would allow everything, while {elements: []} would allow nothing?

annevk · 2024-09-18T14:46:23Z

I think the policy manipulation methods having some weird side effects is okay, although it does make me wonder whether we need to think even harder about the names and also whether we might need more fine-grained manipulation methods in the future.

Like if you do addElement() twice, and the second call includes or omits details about attributes, I guess it ends up replacing the first call entirely? Almost makes me think they should all be prefixed with set(). setElement(), setRemoveElement(), etc.

It's certainly possible IDL-wise to distinguish between an empty list and a missing list.

otherdaniel · 2024-09-18T15:29:19Z

Like if you do addElement() twice, and the second call includes or omits details about attributes, I guess it ends up replacing the first call entirely?

Good point. The way it's currently written (https://pr-preview.s3.amazonaws.com/otherdaniel/purification/pull/237.html#sanitizer-allowelement, step 2 + 3) is that is always adds to the per-element lists.

s = new Sanitizer({elements: ["img"]}).
s.get();  // {elements: [{name: "img", namespace: ..html..}]}
s.allowElement({"img", attributes: ["src"], removeAttributes: ["srcset"]});
s.allowElement({"img", attributes: ["alt"]});
s.get();  // {elements: [{name: "img", namespace: ...html..., attributes: ["src", "alt"], removeAttributes: ["srcset"]}]}

In order to set the per-element attributes, one would have to remove it and then allow it again.

Not sure if that's what we want... I admittedly didn't think too much about that. :/

It's certainly possible IDL-wise to distinguish between an empty list and a missing list.

Ah. In that case, I think I'd like to change it to: empty list allowing nothing; missing list meaning no constraint (aka, allowing everything). That seems more consistent.

annevk · 2024-09-18T15:39:24Z

Given that we don't have fine-grained operations I think replacing (set semantics) is probably what we want. You don't want multiple of the same element in the list and have to deal with deduplication later and you also don't really want more complicated merge semantics than replace I'd think.

I think I'd like to change it to: empty list allowing nothing; missing list meaning no constraint (aka, allowing everything).

That sounds reasonable. And then elements : [], removeElements: [ aValidRemoveElementEntry ] would throw due to ambiguity?

otherdaniel · 2024-09-19T14:04:25Z

Given that we don't have fine-grained operations I think replacing (set semantics) is probably what we want. You don't want multiple of the same element in the list and have to deal with deduplication later and you also don't really want more complicated merge semantics than replace I'd think.

Done.

I think I'd like to change it to: empty list allowing nothing; missing list meaning no constraint (aka, allowing everything).

That sounds reasonable. And then elements : [], removeElements: [ aValidRemoveElementEntry ] would throw due to ambiguity?

In the current PR, nothing throws. The old spec had an elaborate "canonicalization" step, and would throw if that encountered ambiguities. That makes sense for an immutable object, where the construction is also the final state of the object. In this PR, we define modifier methods, that would presumably be called a number of times. These methods just "make it fit" instead of throwing. E.g., if you add an element to the allow list, it'll simply remove it from the remove-list. Initialization from a dictionary is now re-written in terms of those manipulation methods. The consequence of that is that it doesn't throw anymore at all. (There's an ISSUE: about that in the draft.)

If we don't want that, I can see several ways:

We could make the manipulation methods throw. (E.g., removeElement would throw if that particular element is in the allow list.) I'm skeptical here, because I think that makes it very easy to write code that will fail in practice if the underlying sanitizer has changed from what the code writer was expecting.

We could keep the manipulation methods als always succeeding, but throw when constructing from a dictionary. I guess if we have an allow-list then any removeElement entry is redundant. I guess we could throw for any config that has both allow and remove lists.

annevk · 2024-09-19T14:09:41Z

I was expecting that dictionary construction would still throw for the cases we discussed:

Duplicate entries in the same list (perhaps with mismatching attribute lists).
Lists that are redundant with each other (such as attributes and removeAttributes).
Actual syntax errors (although maybe we don't have any limitations on the strings themselves, that's prolly for the best).

I agree that ideally the manipulation methods always succeed, even if they might not do what you expect. But they would still guarantee some kind of "canonical model" underneath and not allow for duplicates or the ability to end up with redundant lists internally.

otherdaniel · 2024-09-19T15:36:11Z

I now added checks & throwing back in for the constructor.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

otherMarkup #240

otherMarkup #240

otherdaniel commented Sep 13, 2024

otherdaniel commented Sep 18, 2024

annevk commented Sep 18, 2024

otherdaniel commented Sep 18, 2024

annevk commented Sep 18, 2024

otherdaniel commented Sep 19, 2024

annevk commented Sep 19, 2024

otherdaniel commented Sep 19, 2024

otherMarkup #240

otherMarkup #240

Comments

otherdaniel commented Sep 13, 2024

otherdaniel commented Sep 18, 2024

annevk commented Sep 18, 2024

otherdaniel commented Sep 18, 2024

annevk commented Sep 18, 2024

otherdaniel commented Sep 19, 2024

annevk commented Sep 19, 2024

otherdaniel commented Sep 19, 2024