Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Early design review: Sanitizer API #619

Closed
mozfreddyb opened this issue Mar 23, 2021 · 6 comments
Closed

Early design review: Sanitizer API #619

mozfreddyb opened this issue Mar 23, 2021 · 6 comments
Assignees
Labels
Resolution: satisfied The TAG is satisfied with this design Review type: CG early review An early review of general direction from a Community Group Topic: HTML Topic: security features Topic: universal JavaScript Features that work on the web and non-web (e.g. node.js) Venue: WICG

Comments

@mozfreddyb
Copy link

mozfreddyb commented Mar 23, 2021

I'm requesting an early design review of the Sanitizer API

Provide a browser-maintained "ever-green", safe, and easy-to-use library for user input sanitization as part of the general web platform.

Further details:

  • We have reviewed the TAG's Web Platform Design Principles.

  • The group where the incubation/design work on this is being done:

    • WICG
  • The group where standardization of this work is intended to be done:

  • Existing major pieces of multi-stakeholder review or discussion of
    this design: N/A

  • Major unresolved issues with or opposition to this design:
    Currently none

  • This work is being funded by: Mozilla and Google

We'd prefer the TAG provide feedback as:

@LeaVerou
Copy link
Member

LeaVerou commented Mar 29, 2021

I'm happy to see work in this direction. This is so frequently needed, and requiring authors to seek and include a library to provide basic security for any app that handles HTML was not ideal.

I took a look at the proposed API and will outline my thoughts below.

I like the sensible defaults approach, and the fact that "simple things should be easy" is an explicit goal. It's good that there is an optional config to customize these defaults, to cater to more specialized use cases. However, I noticed that even though there's a very long default config, it does not describe or afford customization of all rules required to properly sanitize HTML. For example, neither <a> elements, nor href attributes are dropped, but javascript: URLs need to be removed, which I imagine is part of the default sanitization. However, there is no way to specify or remove this behavior in the config, or any behavior relating to values of attributes. It would be good if the config could expose and afford customization on all sanitization rules. This could also allow for a more granular way to avoid DOM clobbering than stripping all ids. Perhaps this is what you mean by additional configuration options for 2.0?

It is unclear how authors can remove an element from the list of allowed elements, without re-specifying the entire allowlist of the default configuration. My understanding is that if they specify it in a block list, the behavior of the sanitizer changes to "anything goes except the blocklist". Exposing the config on Sanitizer instances (possibly read-only) could provide an easy solution for this use case. Exposing the default config as a static property on Sanitizer might be helpful too.

While I understand the reasons behind making the main sanitize() method return a DocumentFragment, it does feel a little unexpected from a DX perspective. Also, given that sanitizeToString() is merely a helper for serializing document fragments, I wonder if it would be better for the Web Platform if a property/method was added to DocumentFragment for serialization, instead of defining ad hoc helpers on more specialized APIs.

@otherdaniel
Copy link

Hi Lea, Thank you for the feedback! A quick round of replies:

javascript:-URLs and expressiveness: This is an open issue. My current thinking -- reflected in the current spec draft -- is that simplicity is key to get web devs to use the Sanitizer API and that consequently we should try hard to keep this simple and usable and to cover the basics first. This is corrobated by several authors from exisiting sanitizer libraries, who warned us that we'd almost instantly get into a "feature race", and that it's worth clamping down on feature requests in order to keep overall complexity at bay.

The way the current spec draft handles this is:

  • There are no ways to specify content-dependent sanitization. Elements or attributes are allowed, or blocked, but the config provides no way to be more granular.
  • There are a handful of cases that don't fit into this framework. In these cases, we'll add special case handling to the spec. This is a little unsatisfying, because this creates "magic" behaviour that a user couldn't re-create with a config.
  • You mention javascript:-URLs, which are the perfect example. This is handled in handle funky elements (rules 2-4).

Several proposals for more expressive configs are discussed in WICG/sanitizer-api#26 . We marked this issue as "v2".

"unclear how authors can remove an element from the list of allowed elements": This is true, I'm afraid. The idea is that authors can either supply their own allow list (which makes most sense if they have a specific goal in mind, e.g. only formatting), or they supply a block list (with which they can effectively eliminate individual elements from the defaults). So I think the capability is there, but it's arguably laborious to discover it.

Exposing the default config is a good idea; we should do that.

Exposing a Sanitizer's config is something we've had early on, but then removed it for lack of a clear use case. We should reconsider this.

general readability: This is arguably a follow-on to the previous point. But I think this is also touched in the "very long default config" thought. I think of this as an editorial problem: The problem I'm having as spec editor is that I'm trying very hard to be unambigous, and to emulate the "algorithms" writing style of the HTML family of specs. This does, unfortunately, make the spec very dense and quite hard to read. This has come up repeatedly. For the specific audience of "spec imlementors" this is a good choice; but arguably not for anyone else.

I'm a bit at a loss of how to improve this. My current thinking is to continue specifying this as we currently do and to assume spec implementors are the primary audience of the spec, and to provide more & better additional, developer-focused documentation seperately.

If you know of any specs that do a particularly good job of providing dense & precise information, while still remaining very accessible to multiple audiences, I'll gladly take a look.

sanitizeToString vs. serializing a DocumentFragment directly: I agree. I have no experience with those parts of the spec universe, though, so I'm not sure whether this is easy to enact.

@otherdaniel
Copy link

@LeaVerou
Copy link
Member

LeaVerou commented May 12, 2021

@hober, @hadleybeeman and I looked at this during a breakout today.

We are glad to see the recent improvements based on earlier feedback. Note that I wasn't saying that the "very long default config" wasn't readable, just that the longer it is, the more tedious it would be for authors to replicate it. However, now that it's exposed, that wouldn't be a problem.

We all thought that Document Fragment serialization should be exposed via a more general method instead of residing in this API. I did open an issue on this a while ago, and it is ongoing.

Overall, we're happy with the direction this is going, and are considering closing this review. Thank you for doing this very valuable and highly needed work!

@LeaVerou LeaVerou added Progress: propose closing we think it should be closed but are waiting on some feedback or consensus and removed Progress: in progress labels May 12, 2021
@torgo torgo added Resolution: satisfied The TAG is satisfied with this design and removed Progress: propose closing we think it should be closed but are waiting on some feedback or consensus labels May 14, 2021
@mozfreddyb
Copy link
Author

mozfreddyb commented May 17, 2021

Thank you for concluding this review! Just one quick question..

We all thought that Document Fragment sanitization should be exposed via a more general method instead of residing in this API. I did open an issue on this a while ago, and it is ongoing.

Am I right to assume this is a typo and you were saying that Document Fragment serialization should ideally be exposed someplace else?

@LeaVerou
Copy link
Member

Yes, fixed now, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: satisfied The TAG is satisfied with this design Review type: CG early review An early review of general direction from a Community Group Topic: HTML Topic: security features Topic: universal JavaScript Features that work on the web and non-web (e.g. node.js) Venue: WICG
Projects
None yet
Development

No branches or pull requests

7 participants