Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swapping HTML character entities before spell checking #4071

Open
nschonni opened this issue Feb 9, 2025 · 1 comment
Open

Swapping HTML character entities before spell checking #4071

nschonni opened this issue Feb 9, 2025 · 1 comment

Comments

@nschonni
Copy link
Collaborator

nschonni commented Feb 9, 2025

Was running into some split words when spell checking some French words, since they used HTML entities in some escaped code blocks.
Was thinking that at least for the letter based entities https://html.spec.whatwg.org/multipage/named-characters.html#named-character-references it might makes sense to setup small repMap array for the HTML dictionary.
Thought I'd make an issue before trying to do any coding to see if it makes sense, or would even cascade out once referencing the html dictionary in a downstream cspell.json.

@Jason3S
Copy link
Collaborator

Jason3S commented Feb 10, 2025

@nschonni,

You are seeing one of the current limitations of the spell checker. repMap happens too late, just before the word is checked against the dictionary that defines it.

What is needed in this case is a preprocessing step, one that transforms the document before it is spell checked. That is the purpose of: cspell/rfc/rfc-0003 parsing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants