From 239f27ccf5daacf6d4ae94b98091544f423dc028 Mon Sep 17 00:00:00 2001 From: Richard Gibson Date: Thu, 30 Jan 2025 10:17:18 -0500 Subject: [PATCH 1/4] Editorial: Make ResolveLocale parameter localeData optional --- spec/negotiation.html | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/spec/negotiation.html b/spec/negotiation.html index b1a38c9d..65f0a914 100644 --- a/spec/negotiation.html +++ b/spec/negotiation.html @@ -227,7 +227,7 @@

_requestedLocales_: a Language Priority List, _options_: a Record, _relevantExtensionKeys_: a List of Strings, - _localeData_: a Record, + optional _localeData_: a Record, ): a Record

@@ -235,6 +235,7 @@

It performs "lookup" as defined in BCP 47 at RFC 4647 section 3, determining the best element of _availableLocales_ for satisfying _requestedLocales_ using either the LookupMatchingLocaleByBestFit algorithm or LookupMatchingLocaleByPrefix algorithm as specified by _options_.[[localeMatcher]], ignoring Unicode locale extension sequences, and returns a representation of the match that also includes corresponding data from _localeData_ and a resolved value for each element of _relevantExtensionKeys_ (defaulting to data from the matched locale, superseded by data from the requested Unicode locale extension sequence if present and then by data from _options_ if present). If the matched element from _requestedLocales_ contains a Unicode locale extension sequence, it is copied onto the language tag in the [[Locale]] field of the returned Record, omitting any keyword Unicode locale nonterminal whose key value is not contained within _relevantExtensionKeys_ or type value is superseded by a different value from _options_.

+ 1. Assert: If _relevantExtensionKeys_ is not empty, then _localeData_ is present. 1. Let _matcher_ be _options_.[[localeMatcher]]. 1. If _matcher_ is *"lookup"*, then 1. Let _r_ be LookupMatchingLocaleByPrefix(_availableLocales_, _requestedLocales_). @@ -242,8 +243,11 @@

1. Let _r_ be LookupMatchingLocaleByBestFit(_availableLocales_, _requestedLocales_). 1. If _r_ is *undefined*, set _r_ to the Record { [[locale]]: DefaultLocale(), [[extension]]: ~empty~ }. 1. Let _foundLocale_ be _r_.[[locale]]. - 1. Let _foundLocaleData_ be _localeData_.[[<_foundLocale_>]]. - 1. Assert: _foundLocaleData_ is a Record. + 1. If _localeData_ is present, then + 1. Let _foundLocaleData_ be _localeData_.[[<_foundLocale_>]]. + 1. Assert: _foundLocaleData_ is a Record. + 1. Else, + 1. Let _foundLocaleData_ be ~empty~. 1. Let _result_ be a new Record. 1. Set _result_.[[LocaleData]] to _foundLocaleData_. 1. If _r_.[[extension]] is not ~empty~, then From 70accc60ec9f1bf665c58549142f6caacc6acb1b Mon Sep 17 00:00:00 2001 From: Richard Gibson Date: Thu, 30 Jan 2025 10:20:09 -0500 Subject: [PATCH 2/4] Normative: Update String toLocale{Lower,Upper}Case to ResolveLocale with best-fit matching Fixes #896 --- spec/locale-sensitive-functions.html | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/spec/locale-sensitive-functions.html b/spec/locale-sensitive-functions.html index 284efb2b..9d2a7c29 100644 --- a/spec/locale-sensitive-functions.html +++ b/spec/locale-sensitive-functions.html @@ -79,13 +79,11 @@

1. Let _requestedLocales_ be ? CanonicalizeLocaleList(_locales_). - 1. If _requestedLocales_ is not an empty List, then - 1. Let _requestedLocale_ be _requestedLocales_[0]. - 1. Else, - 1. Let _requestedLocale_ be DefaultLocale(). 1. Let _availableLocales_ be an Available Locales List which includes the language tags for which the Unicode Character Database contains language-sensitive case mappings. If the implementation supports additional locale-sensitive case mappings, _availableLocales_ should also include their corresponding language tags. - 1. Let _match_ be LookupMatchingLocaleByPrefix(_availableLocales_, « _requestedLocale_ »). - 1. If _match_ is not *undefined*, let _locale_ be _match_.[[locale]]; else let _locale_ be *"und"*. + 1. Let _opt_ be the Record { [[localeMatcher]]: *"best fit"* }. + 1. Let _relevantExtensionKeys_ be a new empty List. + 1. Let _r_ be ResolveLocale(_availableLocales_, _requestedLocales_, _opt_, _relevantExtensionKeys_). + 1. Let _locale_ be _r_.[[Locale]]. 1. Let _codePoints_ be StringToCodePoints(_S_). 1. If _targetCase_ is ~lower~, then 1. Let _newCodePoints_ be a List whose elements are the result of a lowercase transformation of _codePoints_ according to an implementation-derived algorithm using _locale_ or the Unicode Default Case Conversion algorithm. From 389a885667139e38da2c97545e90e127838b2efe Mon Sep 17 00:00:00 2001 From: Richard Gibson Date: Tue, 18 Feb 2025 12:09:19 -0500 Subject: [PATCH 3/4] fixup! Normative: Update String toLocale{Lower,Upper}Case to ResolveLocale with best-fit matching --- spec/locale-sensitive-functions.html | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/spec/locale-sensitive-functions.html b/spec/locale-sensitive-functions.html index 9d2a7c29..2d018991 100644 --- a/spec/locale-sensitive-functions.html +++ b/spec/locale-sensitive-functions.html @@ -79,7 +79,7 @@

1. Let _requestedLocales_ be ? CanonicalizeLocaleList(_locales_). - 1. Let _availableLocales_ be an Available Locales List which includes the language tags for which the Unicode Character Database contains language-sensitive case mappings. If the implementation supports additional locale-sensitive case mappings, _availableLocales_ should also include their corresponding language tags. + 1. Let _availableLocales_ be an Available Locales List which includes all language tags for which the implementation supports at least one case mapping transformation (even if that transformation applies to all known locales). 1. Let _opt_ be the Record { [[localeMatcher]]: *"best fit"* }. 1. Let _relevantExtensionKeys_ be a new empty List. 1. Let _r_ be ResolveLocale(_availableLocales_, _requestedLocales_, _opt_, _relevantExtensionKeys_). @@ -94,7 +94,7 @@

- Code point mappings may be derived according to a tailored version of the Default Case Conversion Algorithms of the Unicode Standard. Implementations may use locale-sensitive tailoring defined in the file SpecialCasing.txt of the Unicode Character Database and/or CLDR and/or any other custom tailoring. Regardless of tailoring, a conforming implementation's case transformation algorithm must always yield the same result given the same input code points, locale, and target case. + Implementations are required to support case transformation for each language tag that appears in the condition_list of an entry defined in the file SpecialCasing.txt of the Unicode Character Database. Code point mappings from the Default Case Conversion Algorithms of the Unicode Standard may be tailored according to CLDR and/or any other custom adjustments. Regardless of tailoring, a conforming implementation's case transformation algorithm must always yield the same result given the same input code points, locale, and target case.

From d1fb28f398a86a8d9a7215487a4940b8b21b085f Mon Sep 17 00:00:00 2001 From: Richard Gibson Date: Tue, 18 Feb 2025 12:10:06 -0500 Subject: [PATCH 4/4] Editorial: Add TransformCase note about efficiency --- spec/locale-sensitive-functions.html | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/spec/locale-sensitive-functions.html b/spec/locale-sensitive-functions.html index 2d018991..669753ab 100644 --- a/spec/locale-sensitive-functions.html +++ b/spec/locale-sensitive-functions.html @@ -97,6 +97,10 @@

Implementations are required to support case transformation for each language tag that appears in the condition_list of an entry defined in the file SpecialCasing.txt of the Unicode Character Database. Code point mappings from the Default Case Conversion Algorithms of the Unicode Standard may be tailored according to CLDR and/or any other custom adjustments. Regardless of tailoring, a conforming implementation's case transformation algorithm must always yield the same result given the same input code points, locale, and target case.

+ + Locale-sensitive case transformations have been rare in practice—the Unicode Character Database has historically defined only two sets, one for Lithuanian and one for Turkic languages (specifically Turkish and Azeri). This algorithm is defined to remain valid even if that changes, but permits conforming implementations to use more efficient approaches that preserve the specified semantics. For example, an implementation might scan for the first “known” locale in _requestedLocales_ (or ultimately DefaultLocale()) and check it against a static collection of prefixes encompassing all locales subject to locale-sensitive case transformation. + + The case mapping of some code points may produce multiple code points, and therefore the result may not be the same length as the input. Because both `toLocaleUpperCase` and `toLocaleLowerCase` have context-sensitive behaviour, the functions are not symmetrical. In other words, `s.toLocaleUpperCase().toLocaleLowerCase()` is not necessarily equal to `s.toLocaleLowerCase()` and `s.toLocaleLowerCase().toLocaleUpperCase()` is not necessarily equal to `s.toLocaleUpperCase()`.