Early design review: Sanitizer API #619

mozfreddyb · 2021-03-23T13:41:37Z

I'm requesting an early design review of the Sanitizer API

Provide a browser-maintained "ever-green", safe, and easy-to-use library for user input sanitization as part of the general web platform.

Explainer: https://github.com/WICG/sanitizer-api/#sanitization-explainer
Security and Privacy self-review: https://github.com/WICG/sanitizer-api/blob/master/security-questionnaire.md
GitHub repo: https://github.com/WICG/sanitizer-api/
Primary contacts:
- Frederik Braun (@mozfreddyb), Mozilla, Co-Editor
- Daniel Vogelheim (@otherdaniel), Google, Co-Editor
- Yifan Luo (@iVanlIsh), Google, Co-Editor
Organization/project driving the design:
- Prototypes exist in Mozilla Firefox and Google Chrome and we intend to continue driving this.
External status/issue trackers for this feature (publicly visible, e.g. Chrome Status):
- Meta bug for Firefox
- Chrome Status

Further details:

We have reviewed the TAG's Web Platform Design Principles.
The group where the incubation/design work on this is being done:
- WICG
The group where standardization of this work is intended to be done:
- We intend to move this to the webappsec working group. The draft is in scope for the current charter
  and to-be-renewed charter and has been discussed in a recent meeting.
Existing major pieces of multi-stakeholder review or discussion of
this design: N/A
Major unresolved issues with or opposition to this design:
Currently none
This work is being funded by: Mozilla and Google

We'd prefer the TAG provide feedback as:

Please open issues in our GitHub repo for each point of feedback

LeaVerou · 2021-03-29T20:44:14Z

I'm happy to see work in this direction. This is so frequently needed, and requiring authors to seek and include a library to provide basic security for any app that handles HTML was not ideal.

I took a look at the proposed API and will outline my thoughts below.

I like the sensible defaults approach, and the fact that "simple things should be easy" is an explicit goal. It's good that there is an optional config to customize these defaults, to cater to more specialized use cases. However, I noticed that even though there's a very long default config, it does not describe or afford customization of all rules required to properly sanitize HTML. For example, neither <a> elements, nor href attributes are dropped, but javascript: URLs need to be removed, which I imagine is part of the default sanitization. However, there is no way to specify or remove this behavior in the config, or any behavior relating to values of attributes. It would be good if the config could expose and afford customization on all sanitization rules. This could also allow for a more granular way to avoid DOM clobbering than stripping all ids. Perhaps this is what you mean by additional configuration options for 2.0?

It is unclear how authors can remove an element from the list of allowed elements, without re-specifying the entire allowlist of the default configuration. My understanding is that if they specify it in a block list, the behavior of the sanitizer changes to "anything goes except the blocklist". Exposing the config on Sanitizer instances (possibly read-only) could provide an easy solution for this use case. Exposing the default config as a static property on Sanitizer might be helpful too.

While I understand the reasons behind making the main sanitize() method return a DocumentFragment, it does feel a little unexpected from a DX perspective. Also, given that sanitizeToString() is merely a helper for serializing document fragments, I wonder if it would be better for the Web Platform if a property/method was added to DocumentFragment for serialization, instead of defining ad hoc helpers on more specialized APIs.

otherdaniel · 2021-04-01T15:36:22Z

Hi Lea, Thank you for the feedback! A quick round of replies:

javascript:-URLs and expressiveness: This is an open issue. My current thinking -- reflected in the current spec draft -- is that simplicity is key to get web devs to use the Sanitizer API and that consequently we should try hard to keep this simple and usable and to cover the basics first. This is corrobated by several authors from exisiting sanitizer libraries, who warned us that we'd almost instantly get into a "feature race", and that it's worth clamping down on feature requests in order to keep overall complexity at bay.

The way the current spec draft handles this is:

There are no ways to specify content-dependent sanitization. Elements or attributes are allowed, or blocked, but the config provides no way to be more granular.
There are a handful of cases that don't fit into this framework. In these cases, we'll add special case handling to the spec. This is a little unsatisfying, because this creates "magic" behaviour that a user couldn't re-create with a config.
You mention javascript:-URLs, which are the perfect example. This is handled in handle funky elements (rules 2-4).

Several proposals for more expressive configs are discussed in WICG/sanitizer-api#26 . We marked this issue as "v2".

"unclear how authors can remove an element from the list of allowed elements": This is true, I'm afraid. The idea is that authors can either supply their own allow list (which makes most sense if they have a specific goal in mind, e.g. only formatting), or they supply a block list (with which they can effectively eliminate individual elements from the defaults). So I think the capability is there, but it's arguably laborious to discover it.

Exposing the default config is a good idea; we should do that.

Exposing a Sanitizer's config is something we've had early on, but then removed it for lack of a clear use case. We should reconsider this.

general readability: This is arguably a follow-on to the previous point. But I think this is also touched in the "very long default config" thought. I think of this as an editorial problem: The problem I'm having as spec editor is that I'm trying very hard to be unambigous, and to emulate the "algorithms" writing style of the HTML family of specs. This does, unfortunately, make the spec very dense and quite hard to read. This has come up repeatedly. For the specific audience of "spec imlementors" this is a good choice; but arguably not for anyone else.

I'm a bit at a loss of how to improve this. My current thinking is to continue specifying this as we currently do and to assume spec implementors are the primary audience of the spec, and to provide more & better additional, developer-focused documentation seperately.

If you know of any specs that do a particularly good job of providing dense & precise information, while still remaining very accessible to multiple audiences, I'll gladly take a look.

sanitizeToString vs. serializing a DocumentFragment directly: I agree. I have no experience with those parts of the spec universe, though, so I'm not sure whether this is easy to enact.

otherdaniel · 2021-04-21T14:45:33Z

We have added a feature for configuration introspection. (Sanitizer config introspection. WICG/sanitizer-api#77).
We have also added prose to explain how we arrived at the built-in constants (Justify the built-ins. WICG/sanitizer-api#78), although the constants
remain as the normative text.

LeaVerou · 2021-05-12T15:59:15Z

@hober, @hadleybeeman and I looked at this during a breakout today.

We are glad to see the recent improvements based on earlier feedback. Note that I wasn't saying that the "very long default config" wasn't readable, just that the longer it is, the more tedious it would be for authors to replicate it. However, now that it's exposed, that wouldn't be a problem.

We all thought that Document Fragment serialization should be exposed via a more general method instead of residing in this API. I did open an issue on this a while ago, and it is ongoing.

Overall, we're happy with the direction this is going, and are considering closing this review. Thank you for doing this very valuable and highly needed work!

mozfreddyb · 2021-05-17T11:43:47Z

Thank you for concluding this review! Just one quick question..

We all thought that Document Fragment sanitization should be exposed via a more general method instead of residing in this API. I did open an issue on this a while ago, and it is ongoing.

Am I right to assume this is a typo and you were saying that Document Fragment serialization should ideally be exposed someplace else?

LeaVerou · 2021-05-17T11:54:20Z

Yes, fixed now, thanks!

mozfreddyb added Progress: untriaged Review type: CG early review An early review of general direction from a Community Group labels Mar 23, 2021

mozfreddyb mentioned this issue Mar 23, 2021

Redirect 'purification' to 'sanitizer-api' WICG/wicg.github.io#11

Merged

plinss assigned LeaVerou Mar 24, 2021

LeaVerou added Progress: in progress Venue: WICG and removed Progress: untriaged labels Mar 24, 2021

rhiaro self-assigned this Mar 24, 2021

plinss added this to the 2021-03-29-week milestone Mar 24, 2021

LeaVerou added Topic: HTML Topic: security features Topic: universal JavaScript Features that work on the web and non-web (e.g. node.js) labels Mar 24, 2021

This was referenced Apr 1, 2021

Expose document fragment serialization algorithm whatwg/dom#965

Open

APIs should allow introspection w3ctag/design-principles#300

Open

This was referenced Apr 12, 2021

Sanitizer config introspection. WICG/sanitizer-api#77

Closed

Justify the built-ins. WICG/sanitizer-api#78

Merged

plinss modified the milestones: 2021-03-29-week, 2021-05-10-F2F-Arakeen Apr 26, 2021

hadleybeeman self-assigned this May 12, 2021

LeaVerou added Progress: propose closing we think it should be closed but are waiting on some feedback or consensus and removed Progress: in progress labels May 12, 2021

hadleybeeman closed this as completed May 12, 2021

torgo added Resolution: satisfied The TAG is satisfied with this design and removed Progress: propose closing we think it should be closed but are waiting on some feedback or consensus labels May 14, 2021

otherdaniel mentioned this issue May 18, 2021

Revisit config and defaultConfig as getters. WICG/sanitizer-api#92

Closed

el3um4s mentioned this issue Jun 20, 2022

Use Markdown for notes el3um4s/gest-dashboard#35

Closed

annevk mentioned this issue Nov 9, 2022

Sanitizer API WebKit/standards-positions#86

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Early design review: Sanitizer API #619

Early design review: Sanitizer API #619

mozfreddyb commented Mar 23, 2021 •

edited

Loading

LeaVerou commented Mar 29, 2021 •

edited

Loading

otherdaniel commented Apr 1, 2021

otherdaniel commented Apr 21, 2021

LeaVerou commented May 12, 2021 •

edited

Loading

mozfreddyb commented May 17, 2021 •

edited

Loading

LeaVerou commented May 17, 2021

Early design review: Sanitizer API #619

Early design review: Sanitizer API #619

Comments

mozfreddyb commented Mar 23, 2021 • edited Loading

LeaVerou commented Mar 29, 2021 • edited Loading

otherdaniel commented Apr 1, 2021

otherdaniel commented Apr 21, 2021

LeaVerou commented May 12, 2021 • edited Loading

mozfreddyb commented May 17, 2021 • edited Loading

LeaVerou commented May 17, 2021

mozfreddyb commented Mar 23, 2021 •

edited

Loading

LeaVerou commented Mar 29, 2021 •

edited

Loading

LeaVerou commented May 12, 2021 •

edited

Loading

mozfreddyb commented May 17, 2021 •

edited

Loading