-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supports Chinese Docstring #6281
Conversation
@@ -589,7 +589,7 @@ | |||
|
|||
// catch-all for styles except reST | |||
const hasArguments = | |||
!line?.endsWith(':') && !line?.endsWith('::') && !!line.match(/^\s*.*?\w+(\s*\(.*?\))*\s*:\s*\w+/g); | |||
!line?.endsWith(':') && !line?.endsWith('::') && !!line.match(/^\s*.*?\w+(\s*\(.*?\))*\s*:[\s\S]*/g); |
Check failure
Code scanning / CodeQL
Inefficient regular expression High
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Aruelius, note the issue detected above. This will need to either be addressed or an explanation given for why it's OK.
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅ |
@@ -589,7 +589,7 @@ class DocStringConverter { | |||
|
|||
// catch-all for styles except reST | |||
const hasArguments = | |||
!line?.endsWith(':') && !line?.endsWith('::') && !!line.match(/^\s*.*?\w+(\s*\(.*?\))*\s*:\s*\w+/g); | |||
!line?.endsWith(':') && !line?.endsWith('::') && !!line.match(/^\s*.*?\w+(\s*\(.*?\))*\s*:\s*[\u4e00-\u9fa5\w+]*/g); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bschnurr, do we have existing unit tests that we can expand to cover this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bschnurr, any thoughts here?
@@ -589,7 +589,7 @@ class DocStringConverter { | |||
|
|||
// catch-all for styles except reST | |||
const hasArguments = | |||
!line?.endsWith(':') && !line?.endsWith('::') && !!line.match(/^\s*.*?\w+(\s*\(.*?\))*\s*:\s*\w+/g); | |||
!line?.endsWith(':') && !line?.endsWith('::') && !!line.match(/^\s*.*?\w+(\s*\(.*?\))*\s*:\s*[\u4e00-\u9fa5\w+]*/g); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really the only regex that needs to be updated for us to properly support Chinese characters? I understand that it fixes this one scenario, but are there others we should update at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, this one scenario is the biggest issue for me, for now, I have to add the \n
newline character after docstring.
a: 中文\n
And I think it need be fixed, I thank for you help, it's very helpful for me.
Co-authored-by: Erik De Bonte <[email protected]>
@@ -589,7 +589,7 @@ class DocStringConverter { | |||
|
|||
// catch-all for styles except reST | |||
const hasArguments = | |||
!line?.endsWith(':') && !line?.endsWith('::') && !!line.match(/^\s*.*?\w+(\s*\(.*?\))*\s*:\s*\w+/g); | |||
!line?.endsWith(':') && !line?.endsWith('::') && !!line.match(/^\s*.*?\w+(\s*\(.*?\))*\s*:\s*\p{L}+/gu); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably we want to add some tests covering issue it is fixing to prevent regressions in future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that will be good, thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Aruelius, are you planning to add tests to this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually have a fix.. i used this line
!line?.endsWith(':') && !line?.endsWith('::') && !!line.match(/^\s*.*?\S+(\s*\(.*?\))*\s*:\s*\S+/g);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bschnurr, do you mean that you're going to update this PR? Or that you're going to fix the problem in a separate PR and we should close this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\S
includes non-letter characters like <
, >
, +
, $
, ,
, '
, etc, whereas \p{L}
only includes characters that Unicode says are letters. Is \S
what we want? I'm not familiar with the docstring format requirements, but going from \w
to \S
seems strange to me unless we should always have been including symbol characters.
Btw, my initial suggestion of \p{L}
only includes letters, not numbers. So if we wanted to use that approach, I believe [\p{L}\p{N}]
would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought using \S
would be the most flexible when it comes to user naming stuff
Closing. new PR here with a test. #6307 |
\w+
only matches one or more word characters (same as [a-zA-Z0-9_]+), when the docstring is chinese (or other) it will not be matched.microsoft/pylance-release#4840