Swapping HTML character entities before spell checking #4071

nschonni · 2025-02-09T20:53:26Z

Was running into some split words when spell checking some French words, since they used HTML entities in some escaped code blocks.
Was thinking that at least for the letter based entities https://html.spec.whatwg.org/multipage/named-characters.html#named-character-references it might makes sense to setup small repMap array for the HTML dictionary.
Thought I'd make an issue before trying to do any coding to see if it makes sense, or would even cascade out once referencing the html dictionary in a downstream cspell.json.

The text was updated successfully, but these errors were encountered:

Jason3S · 2025-02-10T07:22:51Z

@nschonni,

You are seeing one of the current limitations of the spell checker. repMap happens too late, just before the word is checked against the dictionary that defines it.

What is needed in this case is a preprocessing step, one that transforms the document before it is spell checked. That is the purpose of: cspell/rfc/rfc-0003 parsing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Swapping HTML character entities before spell checking #4071

Swapping HTML character entities before spell checking #4071

nschonni commented Feb 9, 2025

Jason3S commented Feb 10, 2025 •

edited by nschonni

Loading

Swapping HTML character entities before spell checking #4071

Swapping HTML character entities before spell checking #4071

Comments

nschonni commented Feb 9, 2025

Jason3S commented Feb 10, 2025 • edited by nschonni Loading

Jason3S commented Feb 10, 2025 •

edited by nschonni

Loading