Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escape path prefix and restrict it to be a pathname in Speculation Rules #951

Merged
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,12 @@ class PLSR_URL_Pattern_Prefixer {
*/
public function __construct( array $contexts = array() ) {
if ( $contexts ) {
$this->contexts = array_map( 'trailingslashit', $contexts );
$this->contexts = array_map(
static function ( string $str ): string {
return self::escape_pattern_string( trailingslashit( $str ) );
},
$contexts
);
} else {
$this->contexts = self::get_default_contexts();
}
Expand Down Expand Up @@ -69,12 +74,23 @@ public function prefix_path_pattern( string $path_pattern, string $context = 'ho
return $path_pattern;
}

// If the path already starts with the context path (including '/'), there is nothing to prefix.
if ( str_starts_with( $path_pattern, $this->contexts[ $context ] ) ) {
return $path_pattern;
// In the event that the context path contains a :, ? or # (which can cause the URL pattern parser to
// switch to another state, though only the latter two should be percent encoded anyway), we need to
// additionally enclose it in grouping braces. The final forward slash (trailingslashit ensures there is
// one) affects the meaning of the * wildcard, so is left outside the braces.
$context_path = $this->contexts[ $context ];
$escaped_context_path = $context_path;
if ( strcspn( $context_path, ':?#' ) !== strlen( $context_path ) ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever. I would have done str_contains( $context_path, ':' ) || str_contains( $context_path, '?' ) || str_contains( $context_path, '#' ) but clearly what you've done is faster.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or rather:

Suggested change
if ( strcspn( $context_path, ':?#' ) !== strlen( $context_path ) ) {
if ( preg_match( '/[:?#]/', $context_path ) ) {

This would be the more common way to do it (and perhaps more readable?) There are 31 uses of strcspn() in core, whereas there are 635 of preg_replace().

I don't feel strongly either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably being a C++ developer by day, but my mild preference is for the most efficient option if the options are similar in readability (though obviously neither is going to have a massive effect on your WordPress performance).

Happy to change if you felt more strongly; it seems not, so leaving this as-is for now.

$escaped_context_path = '{' . substr( $context_path, 0, -1 ) . '}/';
}

// If the path already starts with the context path (including '/'), remove it first
// since it is about to be added back.
if ( str_starts_with( $path_pattern, $context_path ) ) {
$path_pattern = substr( $path_pattern, strlen( $context_path ) );
}

return $this->contexts[ $context ] . ltrim( $path_pattern, '/' );
return $escaped_context_path . ltrim( $path_pattern, '/' );
}

/**
Expand All @@ -86,8 +102,32 @@ public function prefix_path_pattern( string $path_pattern, string $context = 'ho
*/
public static function get_default_contexts(): array {
return array(
'home' => trailingslashit( wp_parse_url( home_url( '/' ), PHP_URL_PATH ) ),
'site' => trailingslashit( wp_parse_url( site_url( '/' ), PHP_URL_PATH ) ),
'home' => self::escape_pattern_string( trailingslashit( wp_parse_url( home_url( '/' ), PHP_URL_PATH ) ) ),
'site' => self::escape_pattern_string( trailingslashit( wp_parse_url( site_url( '/' ), PHP_URL_PATH ) ) ),
Comment on lines -89 to +106
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This escapes the "path prefixes", which makes sense. But wouldn't we also want to escape any other parts of the path string in the same way?

Any string that a 3P developer may provide via the plsr_speculation_rules_href_exclude_paths filter below could also contain one of those characters that would need to be escaped as far as I can tell.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, such strings may in fact be patterns and not literal paths, so we probably do want the special characters to have their special meanings there.

For example, the readme.txt example is '/cart/*', and in this case the developer probably did mean "all paths within /cart/", not "the literal path /cart/*".

However, if WordPress is serving from a directory named /* (however odd a choice that may be), then we need to escape the asterisk because it's not intended as a wildcard.

);
}

/**
* Escapes a string for use in a URL pattern component.
* https://urlpattern.spec.whatwg.org/#escape-a-pattern-string
*
* @since n.e.x.t
adamsilverstein marked this conversation as resolved.
Show resolved Hide resolved
*
* @param string $str String to be escaped.
* @return string String with backslashes added where required.
*/
private static function escape_pattern_string( string $str ): string {
$replacements = array(
'+' => '\\+',
'*' => '\\*',
'?' => '\\?',
':' => '\\:',
'{' => '\\{',
'}' => '\\}',
'(' => '\\(',
')' => '\\)',
'\\' => '\\\\',
);
return strtr( $str, $replacements );
jeremyroman marked this conversation as resolved.
Show resolved Hide resolved
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ public function data_prefix_path_pattern() {
// Missing trailing slash still works, does not consider "cut-off" directory names.
array( '/subdir', '/subdirectory/my-page/', '/subdir/subdirectory/my-page/' ),
array( '/subdir', 'subdirectory/my-page/', '/subdir/subdirectory/my-page/' ),
// A base path containing a : must be enclosed in braces to avoid confusion.
array( '/scope:0/', '/*/foo', '{/scope\\:0}/*/foo' ),
);
}

Expand All @@ -45,25 +47,30 @@ public function test_get_default_contexts() {
$this->assertSame( '/', $contexts['site'] );
}

public function test_get_default_contexts_with_subdirectories() {
add_filter(
'home_url',
static function () {
return 'https://example.com/subdir/';
}
);
/**
* @dataProvider data_default_contexts_with_subdirectories
*/
public function test_get_default_contexts_with_subdirectories( $context, $unescaped, $expected ) {
add_filter(
'site_url',
static function () {
return 'https://example.com/subdir/wp/';
$context . '_url',
static function () use ( $unescaped ) {
return $unescaped;
}
);

$contexts = PLSR_URL_Pattern_Prefixer::get_default_contexts();

$this->assertArrayHasKey( 'home', $contexts );
$this->assertArrayHasKey( 'site', $contexts );
$this->assertSame( '/subdir/', $contexts['home'] );
$this->assertSame( '/subdir/wp/', $contexts['site'] );
$this->assertArrayHasKey( $context, $contexts );
$this->assertSame( $expected, $contexts[ $context ] );
}

public function data_default_contexts_with_subdirectories() {
return array(
array( 'home', 'https://example.com/subdir/', '/subdir/' ),
array( 'site', 'https://example.com/subdir/wp/', '/subdir/wp/' ),
// If the context URL has URL pattern special characters it may need escaping.
array( 'home', 'https://example.com/scope:0.*/', '/scope\\:0.\\*/' ),
array( 'site', 'https://example.com/scope:0.*/wp+/', '/scope\\:0.\\*/wp\\+/' ),
);
}
}