Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combining characters in identifiers are not parsed correctly #101

Open
expipiplus1 opened this issue Jul 12, 2023 · 1 comment
Open

Combining characters in identifiers are not parsed correctly #101

expipiplus1 opened this issue Jul 12, 2023 · 1 comment

Comments

@expipiplus1
Copy link

An objectionable file and the treesitter tree:

a = () -- single 'a'
â = () -- single 'a with circumflex' character= () -- single 'a' with combining circumflex u770
function [0, 0] - [0, 6]
  name: variable [0, 0] - [0, 1]
  rhs: exp_literal [0, 4] - [0, 6]
    con_unit [0, 4] - [0, 6]
comment [0, 7] - [0, 20]
function [1, 0] - [1, 7]
  name: variable [1, 0] - [1, 2]
  rhs: exp_literal [1, 5] - [1, 7]
    con_unit [1, 5] - [1, 7]
comment [1, 8] - [1, 47]
function [2, 0] - [2, 8]
  name: variable [2, 0] - [2, 1]
  ERROR [2, 1] - [2, 3]
    ERROR [2, 1] - [2, 3]
  rhs: exp_literal [2, 6] - [2, 8]
    con_unit [2, 6] - [2, 8]
comment [2, 9] - [2, 53]

Thank you for all the hard work maintaining this library btw!

@tek
Copy link
Contributor

tek commented Jul 12, 2023

For varids, we use this regex:

varid_pattern = /[_\p{Ll}](\w|')*#?/u

The first character is in the Ll class of lowercase letters, and it's unclear to me whether that would match the combined codepoints or just the a without diacritic…but since it also fails when the combined character is at a later position, I would assume that the \w class is insufficient.

Gonna investigate later, but if you have more useful insights, please let me know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants