Incorrect handling of double-width characters #14

ulidtko · 2015-10-20T13:33:32Z

Take this TSV:

date        |P東       |東 score  |P南       |南 score  |P西       |西 score  |P北       |北 score  |comment
2015-04-04  john    35100       bob     32100       mary    12000       katy    20800
2015-04-04  mary    33500       bob     49500       katy    21600       john    -4600

It looks aligned in ST+elastic tabstops, completely with column headers. But in any other text viewer (less or this Markdown view above) column headers are not aligned — because of an extra space inserted between double-width characters 東南西北 and the following tab character separator.

For clarity, I'll visualize the whitespace characters involved:

date······↦   |P東····↦   |東·score↦   |P南····↦   |南·score↦   |P西····↦   |西·score↦   |P北····↦   |北·score↦   |comment
2015-04-04↦   john···↦   35100···↦   bob····↦   32100···↦   mary···↦   12000···↦   katy···↦   20800
2015-04-04↦   mary···↦   33500···↦   bob····↦   49500···↦   katy···↦   21600···↦   john···↦   -4600

In a fixwidth environment like a terminal (e.g. less), the string |P東 takes 4 character places to render (even though it's a 3-character string: |, P, 東). This is exactly the width that john and mary cells have. But — and this is the bug — john and mary have 3 U+20's after them, while |P東 has 4. This is what breaks alignment in monospace non-elastic-tabstop-aware viewers.

Conceptually, this is easily fixed by using "em width" (which is 1 or 2 for character C where unicodedata.east_asian_width(C)=='Na' or unicodedata.east_asian_width(C)=='W' correspondingly) instead of plain character count when computing the number of spaces that the plugin inserts for compatibility alignment.

Whew. I do realize that this report is futile, but still, it's here for the record.

The text was updated successfully, but these errors were encountered:

adzenith · 2015-10-20T14:09:24Z

Actually I think this might be one of the few reports that's not futile. It wouldn't be too hard to figure out the character width of each character if unicode will tell you like that. My one concern is that in Sublime Text, double-width characters aren't quite double-width. It would probably work fine if you just had one or two and wide tabs, but with a large amount of characters there may be an offset.

Do you feel comfortable hacking python? Want to give it a shot? Or I can look at it sometime soon.

ulidtko · 2015-10-21T11:51:34Z

I think the offset is not a problem, since ST's double-width characters appear slightly less than two places, and that should be perfectly compensated by increased width of the following tab. It'd be a problem if wide characters took more space :)

I might give it a try, should be simple...

adzenith · 2015-10-21T16:01:06Z

Right, it would only be a problem if you had quite a few characters in a row combined with a relatively narrow tab width: at a certain point you might be able to lose enough space so that you end up at an earlier tabstop.

Give it a shot and let me know how it goes! I'm excited to see the PR :)

Fix SublimeText#14

ulidtko · 2015-10-23T16:12:10Z

Well @adzenith, turns out you were quite right: I couldn't get this to work with tab width less than 5!

But otherwise, I'm quite satisfied with the result. Looks excellent in the terminal — what could be desired more :)

PR incoming, any comments welcome

ulidtko added a commit to ulidtko/ElasticTabstops that referenced this issue Oct 23, 2015

Add support for double-width characters

f2e9d63

Fix SublimeText#14

ulidtko mentioned this issue Oct 23, 2015

Add support for double-width characters #15

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect handling of double-width characters #14

Incorrect handling of double-width characters #14

ulidtko commented Oct 20, 2015

adzenith commented Oct 20, 2015

ulidtko commented Oct 21, 2015

adzenith commented Oct 21, 2015

ulidtko commented Oct 23, 2015

Incorrect handling of double-width characters #14

Incorrect handling of double-width characters #14

Comments

ulidtko commented Oct 20, 2015

adzenith commented Oct 20, 2015

ulidtko commented Oct 21, 2015

adzenith commented Oct 21, 2015

ulidtko commented Oct 23, 2015