-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crashes during escaped Unicode surrogate pairs parsing #855
Comments
I can't reproduce it locally: $ ruby -v bin/ruby-parse --32 -E -e '"\\u{D800}"'
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin19]
"\\u{D800}"
^~~~~~~~~~~ tSTRING "\\u{D800}" expr_end [0 <= cond] [0 <= cmdarg]
"\\u{D800}"
^ false "$eof" expr_end [0 <= cond] [0 <= cmdarg]
(str "\\u{D800}")
$ ruby -ve 'p "\\u{D800}"'
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin19]
"\\u{D800}" Is it related to an old version of Ruby? Could you try it on a version of Ruby that is still supported (i.e. at least 2.7) My hunch is that old Ruby has old Unicode support that doesn't know about these codepoints. |
This is the default Ruby on macos. I'm not sure if you do support it. |
No, Ruby 2.7 is deprecated since 2022-04-12. We do run tests for I'm closing it, but feel free to reopen it if the error appears again for you with maintained Ruby versions (>= 2.7) |
Am I still doing something wrong?
|
Same, but using current master:
|
Sorry, bash escaping issue, I should've checked this code in a separate file. My bad. $ /bin/cat test.rb
"\u{D800}"
$ ruby -v test.rb
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin19]
test.rb:1: invalid Unicode codepoint
"\u{D800}"
$ ruby -v bin/ruby-parse --32 test.rb
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin19]
Failed on: test.rb
/Users/ilyabylich/Work/parser/lib/parser/lexer.rb:17506:in `chr': invalid codepoint 0xD800 in UTF-8 (RangeError)
...
stacktrace
... This is a bug and it should be fixed, reopening. The error comes from this line, => "D800".to_i(16).chr(Encoding::UTF_8)
RangeError (invalid codepoint 0xD800 in UTF-8) I'm pretty sure we need to catch a I'll fix it next week, thanks for reporting. |
Sure, no problem. I was running it in Fish and didn't even though about shell escaping differences. |
I would assume that U+D800...U+DFFF should be ignored.
The text was updated successfully, but these errors were encountered: