Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLDR-11570 Update Ancient Greek writing systems #4058

Merged
merged 4 commits into from
Nov 6, 2024

Conversation

conradarcturus
Copy link
Contributor

@conradarcturus conradarcturus commented Sep 17, 2024

CLDR-11570

Ancient Greek is only written in Greek alphabet rather than Linear B or Cypriot Syllabary -- so this change corrects that. In fact, Linear B was never used for Ancient Greek -- rather it was used by its predecessor Mycenaean Greek (ancient greek to the ancient greeks). The Cypriot Syllabary was used primarily for a different language, Eteocypriot -- its not related to Greek or Indo-European. There is some Ancient Greek text (particularly the Arcadocypriot dialect) in the Cypriot Syllabary, and it's even present in the Idalion Tablet and the Idalion bilingual but its pretty rare.

Unfortunately I cannot list "Cypriot Syllabary" as secondary without it bumping out the Greek Alphabet because Cprt > Grek alphabetically. Thereby I made a separate Jira ticket to fix ambiguities in the xml https://unicode-org.atlassian.net/browse/CLDR-18087. This ticket just focused on the Ancient Greek fix so we can fix the bug.

  • This PR completes the ticket.

Scripts ran:

mvn package
java -jar tools/cldr-code/target/cldr-code.jar ConvertLanguageData 
java -jar tools/cldr-code/target/cldr-code.jar GenerateLikelySubtags
java -jar tools/cldr-code/target/cldr-code.jar GenerateTestData
java -jar tools/cldr-code/target/cldr-code.jar GenerateScriptMetadata
java -jar tools/cldr-code/target/cldr-code.jar GenerateValidityXML

ALLOW_MANY_COMMITS=true

@conradarcturus conradarcturus marked this pull request as draft September 17, 2024 23:57
@srl295 srl295 added the ddl DDL-SC specific work label Oct 25, 2024
srl295
srl295 previously approved these changes Oct 25, 2024
@@ -1628,7 +1629,7 @@ XXX Code for transations where no currency is involved
<language type="gos" scripts="Latn"/>
<language type="got" scripts="Goth" alt="secondary"/>
<language type="grb" scripts="Latn"/>
<language type="grc" scripts="Cprt Grek Linb" alt="secondary"/>
<language type="grc" scripts="Cprt Grek" alt="secondary"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be primary Grek and secondary Cprt (which as far as i can tell isn't even 100% historically certain)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list here is alphabetical not ordered by frequency. Another reason I want to redo tag/attribute hierarchy here.

I'll look into removing the auto-alphabetical ordering for the script attribute. It may not be too tricky.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I cannot find the right piece of code to do that.

Now that UTW's done I can work more on my proposal to redo this section.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that was the job of language_script.tsv but i really don't know any more-

yeah

grc	Ancient Greek	secondary	Cprt	Cypriot
grc	Ancient Greek	secondary	Grek	Greek
grc	Ancient Greek	secondary	Linb	Linear B

@macchiati why is Grek secondary here with no primary? ^

@@ -324,6 +324,7 @@ public static void main(String[] args) throws IOException {
"unr_Beng_IN",
"ttt_Latn_AZ",
"pnt_Grek_GR",
"grc_Grek_GR",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may need to add grc_Cprt_CY too

@srl295
Copy link
Member

srl295 commented Oct 25, 2024

Ugh, more cryptic error messages:

Error:  (LikelySubtagsTest.java:218)  Error: : checkAdding: max(und_Cprt)->grc_Cprt_CY, however max(grc_Cprt)->: expected "grc_Cprt_CY", got "grc_Cprt_GR"

C'mon, tell us why it was 'expected'. anyway, i'm not sure if there's another fix besides a manual override

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • common/supplemental/likelySubtags.xml is different
  • common/supplemental/supplementalData.xml is different
  • common/testData/localeIdentifiers/likelySubtags.txt is now changed in the branch
  • common/testData/localeIdentifiers/localeDisplayName.txt is now changed in the branch
  • tools/cldr-code/src/main/java/org/unicode/cldr/tool/GenerateLikelySubtags.java is different
  • tools/cldr-code/src/main/java/org/unicode/cldr/util/SupplementalDataInfo.java is now changed in the branch
  • tools/cldr-code/src/main/resources/org/unicode/cldr/util/data/language_script.tsv is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@conradarcturus conradarcturus force-pushed the CLDR-11570-Ancient-Greek-Writing-System branch from 8b2f7b2 to 5649abe Compare October 31, 2024 00:20
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • common/properties/scriptMetadata.txt is now changed in the branch
  • common/supplemental/likelySubtags.xml is different
  • common/supplemental/supplementalData.xml is different
  • common/testData/localeIdentifiers/likelySubtags.txt is different
  • common/testData/localeIdentifiers/localeDisplayName.txt is no longer changed in the branch
  • tools/cldr-code/src/main/java/org/unicode/cldr/tool/GenerateLikelySubtags.java is different
  • tools/cldr-code/src/main/java/org/unicode/cldr/util/SupplementalDataInfo.java is no longer changed in the branch
  • tools/cldr-code/src/main/resources/org/unicode/cldr/util/data/country_language_population.tsv is now changed in the branch
  • tools/cldr-code/src/main/resources/org/unicode/cldr/util/data/language_script.tsv is different
  • tools/cldr-code/src/main/resources/org/unicode/cldr/util/data/Script_Metadata.csv is now changed in the branch

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@@ -1629,7 +1631,7 @@ XXX Code for transations where no currency is involved
<language type="gos" scripts="Latn"/>
<language type="got" scripts="Goth" alt="secondary"/>
<language type="grb" scripts="Latn"/>
<language type="grc" scripts="Cprt Grek Linb" alt="secondary"/>
<language type="grc" scripts="Grek" alt="secondary"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

secondary still… but, better than before

srl295
srl295 previously approved these changes Oct 31, 2024
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • tools/cldr-code/src/main/resources/org/unicode/cldr/util/data/external/langtags.json is now changed in the branch

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@@ -123,13 +123,13 @@ Dupl; 33; 1BC20; FR; 1; EXCLUSION; NO; NO; NO; YES; NO
Egyp; 33; 13153; EG; 3; EXCLUSION; NO; NO; YES; YES; NO
Elba; 33; 10500; AL; 1; EXCLUSION; NO; NO; NO; NO; NO
Elym; 33; 10FF1; IR; 1; EXCLUSION; YES; NO; NO; NO; NO
Gara; 33; 10D5D; SN; 1; EXCLUSION; YES; NO; YES; NO; YES # provisional data for future Unicode 16.0 script
Gara; 33; 10D5D; SN; 1; EXCLUSION; YES; NO; YES; NO; YES
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer provisional since Unicode 16.0 is out now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say to put the age here but that's kind of pointless because age data is a unicode property.

@conradarcturus conradarcturus marked this pull request as ready for review October 31, 2024 00:32
Comment on lines 42450 to 42451
"tag": "grc",
"tags": [ "grc-GR", "grc-Grek" ],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no - please don't change langtags.json! it's an external file (hence /external/ … ) - fyi @mhosken @DavidLRowe @emily-roth

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. I also submitted a PR to their project silnrsi/langtags#21

But I don't have to get ahead of myself, I'll undo the json changes. I wasn't sure whether to be this aggressive or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to override with code or other data than to modify external files- makes it hard to reproduce the situation

srl295
srl295 previously requested changes Oct 31, 2024
Copy link
Member

@srl295 srl295 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

langtags.json shouldn't be patched.

@srl295
Copy link
Member

srl295 commented Oct 31, 2024

sorry i didn't see that langtags.json itself was modified and shouldn't be.

CLDR-11570 Retain order when using ConvertLanguageData

When building the file, it changes the order of supplementalData's scripts. This changing a TreeSet (orders nodes by keys) to a LinkedHashSet (serialized output stays in originally inputed order).

Scripts to run: ` mvn package -DskipTests=true &&  java -jar tools/cldr-code/target/cldr-code.jar ConvertLanguageData &&  java -jar tools/cldr-code/target/cldr-code.jar GenerateLikelySubtags `

CLDR-11570 Make grc_Cprt map to CY

CLDR-11570 Fix test for grc_Cprt

Last commit I edited a file but made a slight mistake. This fixes that and hopefully addresses the test issue.

Code ran:

mvn package -DskipTests=true &&  java -jar tools/cldr-code/target/cldr-code.jar ConvertLanguageData &&  java -jar tools/cldr-code/target/cldr-code.jar GenerateLikelySubtags &&  java -jar tools/cldr-code/target/cldr-code.jar GenerateTestData

Note `ht` Changes look like they come from changes after adding the locale in a different pull request.
Jeez, if it exists at all I get the bug.

Thereby, I'm removing it again and created https://unicode-org.atlassian.net/browse/CLDR-18087 for the long-term fix.
macchiati
macchiati previously approved these changes Nov 6, 2024
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • common/supplemental/likelySubtags.xml is different
  • common/supplemental/supplementalData.xml is different
  • common/testData/localeIdentifiers/likelySubtags.txt is no longer changed in the branch
  • tools/cldr-code/src/main/resources/org/unicode/cldr/util/data/country_language_population.tsv is different
  • tools/cldr-code/src/main/resources/org/unicode/cldr/util/data/language_script.tsv is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@srl295
Copy link
Member

srl295 commented Nov 6, 2024

@conradarcturus did you mean to include French changes or is that just because of merging in main?

@srl295
Copy link
Member

srl295 commented Nov 6, 2024

french changes seem to be from already merged #3985

@conradarcturus conradarcturus merged commit af54e7e into main Nov 6, 2024
16 checks passed
@conradarcturus conradarcturus deleted the CLDR-11570-Ancient-Greek-Writing-System branch November 6, 2024 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ddl DDL-SC specific work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants