-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect handling of double-width characters #14
Comments
Actually I think this might be one of the few reports that's not futile. It wouldn't be too hard to figure out the character width of each character if unicode will tell you like that. My one concern is that in Sublime Text, double-width characters aren't quite double-width. It would probably work fine if you just had one or two and wide tabs, but with a large amount of characters there may be an offset. Do you feel comfortable hacking python? Want to give it a shot? Or I can look at it sometime soon. |
I think the offset is not a problem, since ST's double-width characters appear slightly less than two places, and that should be perfectly compensated by increased width of the following tab. It'd be a problem if wide characters took more space :) I might give it a try, should be simple... |
Right, it would only be a problem if you had quite a few characters in a row combined with a relatively narrow tab width: at a certain point you might be able to lose enough space so that you end up at an earlier tabstop. Give it a shot and let me know how it goes! I'm excited to see the PR :) |
Well @adzenith, turns out you were quite right: I couldn't get this to work with tab width less than 5! But otherwise, I'm quite satisfied with the result. Looks excellent in the terminal — what could be desired more :) PR incoming, any comments welcome |
Take this TSV:
It looks aligned in ST+elastic tabstops, completely with column headers. But in any other text viewer (
less
or this Markdown view above) column headers are not aligned — because of an extra space inserted between double-width characters東南西北
and the following tab character separator.For clarity, I'll visualize the whitespace characters involved:
In a fixwidth environment like a terminal (e.g.
less
), the string|P東
takes 4 character places to render (even though it's a 3-character string:|
,P
,東
). This is exactly the width thatjohn
andmary
cells have. But — and this is the bug —john
andmary
have 3 U+20's after them, while|P東
has 4. This is what breaks alignment in monospace non-elastic-tabstop-aware viewers.Conceptually, this is easily fixed by using "em width" (which is 1 or 2 for character
C
whereunicodedata.east_asian_width(C)=='Na'
orunicodedata.east_asian_width(C)=='W'
correspondingly) instead of plain character count when computing the number of spaces that the plugin inserts for compatibility alignment.Whew. I do realize that this report is futile, but still, it's here for the record.
The text was updated successfully, but these errors were encountered: