You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a literal ] appears in square brackets in a regular expression, base R functions find nothing within the range unless perl=TRUE (R for Data Science could mention this)
#1629
Open
markpurver opened this issue
Feb 8, 2024
· 0 comments
Section 15.4.3 in R for Data Science (https://r4ds.hadley.nz/regexps.html#character-classes) says this about regular expressions: \ escapes special characters, so [\^\-\]] matches ^, -, or ].
But this specific example does not seem to be true when using base R, unless perl=TRUE is chosen (I am using R 4.2.1).
The general issue of slight differences between base R and stringr is noted in section 15.7.2, but perhaps this particular quirk is worth mentioning in 15.4.3 as the example contains one of these differences.
For example: grepl("[\\^\\-\\]]", "]")
returns FALSE.
And: grepl("[\\^\\-\\]]", "^-]")
also returns FALSE, indicating that nothing in the range is found in the string.
But only the ] symbol appears to cause this. So: grepl("[\\^\\-\\[]", "^-]")
returns TRUE, seemingly because the ] is not there (in this example it has been replaced by [ but it could just as well be replaced by nothing).
This issue seems to go away entirely when perl=TRUE is used, so: grepl("[\\^\\-\\]]", "]", perl=TRUE)
and grepl("[\\^\\-\\]]", "-", perl=TRUE)
both return TRUE.
Perhaps there could to be a note in the book to reflect this, or perhaps it is an issue with base R or the TRE engine.
The text was updated successfully, but these errors were encountered:
Section 15.4.3 in R for Data Science (https://r4ds.hadley.nz/regexps.html#character-classes) says this about regular expressions:
\
escapes special characters, so[\^\-\]]
matches^
,-
, or]
.But this specific example does not seem to be true when using base R, unless perl=TRUE is chosen (I am using R 4.2.1).
The general issue of slight differences between base R and stringr is noted in section 15.7.2, but perhaps this particular quirk is worth mentioning in 15.4.3 as the example contains one of these differences.
For example:
grepl("[\\^\\-\\]]", "]")
returns FALSE.
And:
grepl("[\\^\\-\\]]", "^-]")
also returns FALSE, indicating that nothing in the range is found in the string.
But only the ] symbol appears to cause this. So:
grepl("[\\^\\-\\[]", "^-]")
returns TRUE, seemingly because the ] is not there (in this example it has been replaced by [ but it could just as well be replaced by nothing).
This issue seems to go away entirely when perl=TRUE is used, so:
grepl("[\\^\\-\\]]", "]", perl=TRUE)
and
grepl("[\\^\\-\\]]", "-", perl=TRUE)
both return TRUE.
Perhaps there could to be a note in the book to reflect this, or perhaps it is an issue with base R or the TRE engine.
The text was updated successfully, but these errors were encountered: