-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support UTF-16 in LSP #656
Conversation
Obligatory link to microsoft/TypeScript#38078 |
Yeah, so I am clearly doing the LSP thing here, and I can't find a reason why we shouldn't just do that regardless for everything else, honestly. |
character = position - start | ||
} else { | ||
// We need to rescan the text as UTF-16 to find the character offset. | ||
for _, r := range scriptInfo.Text()[start:position] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't realize that range
does "the right thing" over a string
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's actually faster than manually calling the utf8 lib, IIRC.
func positionToLineAndCharacter(scriptInfo *project.ScriptInfo, position core.TextPos) lsproto.Position { | ||
// UTF-8 offset to UTF-8/16 0-indexed line and character | ||
|
||
lineMap := scriptInfo.LineMapLSP() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this ever get cached anywhere? Is the work being done on every call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, scriptInfo caches this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the source file's line map to be the same if the file contains only ASCII? That way the two will be deduplicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or possibly just track whether a non CR/LF line ending was encountered here and do it then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By that point we'll have constructed the whole thing, so I'm not totally sure if it's helpful, but I guess we could save a little memory sometimes... If we've even requested the line map for other reasons, which we might not have at all.
I feel like we can just standardize on the LSP-compatible line map as long as we don’t use it for parsing/grammar considerations around IOW, we have to respect ECMAScript’s conception of a line terminator, but we don’t have to use it in our own reporting of line numbers, which exists to help humans find their code in an editor. |
The line map is also used for source maps. I do genuinely wonder if source maps actually use JS's definition, though. |
https://tc39.es/ecma426/2024/#extraction-javascript does split across ECMAScript code points, so it is a bit implied. |
Yay. |
Before:
After:
Includes #653
Per https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocuments
The eldritch horror is: