Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language coverage details #19

Open
thlinard opened this issue Feb 23, 2017 · 10 comments
Open

Language coverage details #19

thlinard opened this issue Feb 23, 2017 · 10 comments

Comments

@thlinard
Copy link

In Charset Coverage Details, I'd like to see separated the old and new (2016) Google sets . Latin, Cyrillic and Greek next to Cyrillic Plus, Latin Expert, etc. is a bit confusing.

Same for Charset Coverage Details.

Language support by all available Char Sets is sometimes erroneous. Greek Core doesn't include Latin Plus, for example.

@graphicore
Copy link
Owner

In Charset Coverage Details, I'd like to see separated the old and new (2016) Google sets . Latin, Cyrillic and Greek next to Cyrillic Plus, Latin Expert, etc. is a bit confusing.

I see. Easiest would be for me to just sort the legacy sets to the bottom. But I can also make a clean separation. I'm not sure how the GF API will handle the legacy encodings next to the novel ones., but that will be important to the users of the font specimen in the end.

Language support by all available Char Sets is sometimes erroneous. Greek Core doesn't include Latin Plus, for example

Aha, OK. We should probably discuss this at google/fonts. If Greek Core doesn't include Latin Plus, where is it taking it's (standard) punctuation from? I have some similar questions on my list. The discussion of google/fonts#624 is related.

Note that the online version uses the files of google/fonts#642 where I included Latin Plus in Greek Core, which surly could be wrong.

@thlinard
Copy link
Author

We should probably discuss this at google/fonts. If Greek Core doesn't include Latin Plus, where is it taking it's (standard) punctuation from?

Hum… From Latin Core? But Latin Core seems to exist only virtually. Probably this should be clarified.

@graphicore
Copy link
Owner

graphicore commented Feb 24, 2017

Yeah, I'm in the process of writing something up. There are a few issues I have with this charset analysis. As a matter of fact in the moment you posted I just created this list:

0x0021 ! EXCLAMATION MARK
0x0022 " QUOTATION MARK
0x0026 & AMPERSAND
0x0028 ( LEFT PARENTHESIS
0x0029 ) RIGHT PARENTHESIS
0x002A * ASTERISK
0x002C , COMMA
0x002D - HYPHEN-MINUS
0x002E . FULL STOP
0x002F / SOLIDUS
0x003A : COLON
0x003B ; SEMICOLON
0x0040 @ COMMERCIAL AT
0x005B [ LEFT SQUARE BRACKET
0x005C \ REVERSE SOLIDUS
0x005D ] RIGHT SQUARE BRACKET
0x00A7 § SECTION SIGN
0x00AB « LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00BB » RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x0301 ́ COMBINING ACUTE ACCENT
0x0308 ̈ COMBINING DIAERESIS
0x2010 ‐ HYPHEN
0x2013 – EN DASH
0x2014 — EM DASH
0x2026 … HORIZONTAL ELLIPSIS

These are the chars missing from Greek Core when asking the CLDR.

But Latin Core seems to exist only virtually. Probably this should be clarified.

In google/fonts#624 I came to the same conclusion :-) The question for me is whether to pack this into the google/fonts#642 PR or do it with a new PR. Also, from 642 I should probably remove the including of Latin Plus into Greek Core, heh?

(just updated the charlist above: removed duplicates, sorted)

@thlinard
Copy link
Author

These are the chars missing from Greek Core when asking the CLDR.

And not the Arabic numerals?

Also, from 642 I should probably remove the including of Latin Plus into Greek Core, heh?

Yes, probably. One set (GF Greek Pro) needs some characters from GF Latin Plus and Pro sets, like stated in the README.md, but unless Latin Pro is intended as a prerequisite for all GF, it's too much for Greek coverage.

@graphicore
Copy link
Owner

And not the Arabic numerals?

Interesting. I'm using this: https://github.com/unicode-cldr/cldr-misc-modern/blob/master/main/el/characters.json
And of that the keys main.characters.exemplarCharacters and main.characters.punctuation also I'm using the JavaScript String.prototype.toUpperCase function on all chars, which should do the right thing and change the char if Unicode defines an uppercase, otherwise leave it. There are no numerals in this document though. Similarly, for Arabic no numerals are defined either: https://github.com/unicode-cldr/cldr-misc-modern/blob/master/main/ar/characters.json

Good find, thanks!

The information should be somewhere, maybe in https://github.com/unicode-cldr/cldr-numbers-modern? But on a first glance it seems to define rather number formating. Do you know where to look for the numerals in the CLDR?

Also, from 642 I should probably remove the including of Latin Plus into Greek Core, heh?

Yes, probably.

Will do.

One set (GF Greek Pro) needs some characters from GF Latin Plus and Pro sets

I've seen that. This needs a decision. Either we do kind of "technical" Namelist files, so that we don't repeat ourselves (if this is feasible, it would be quite a bummer to end up with one Namelist per char) or we just include these chars in GF Greek Pro. "technical" Namelist files wouldn't be available via the Fonts API, just for us to define charsets.

I wrote something yesterday for Dave to look at, it's interesting for this discussion as well, sort of:

#20 It suggests that we can support languages even if we don't support the whole GF-charset. This could have implications on how we define GF-charsets.

@thlinard
Copy link
Author

Do you know where to look for the numerals in the CLDR?

It seems to be https://github.com/unicode-cldr/cldr-core/blob/master/supplemental/numberingSystems.json

@graphicore
Copy link
Owner

graphicore commented Feb 24, 2017

Ah, great thanks. It's linked to the locales via cldr-numbers-modern:

excerpt

      "numbers": {
        "defaultNumberingSystem": "arab",
        "otherNumberingSystems": {
          "native": "arab"
},

for el:

      "numbers": {
        "defaultNumberingSystem": "latn",
        "otherNumberingSystems": {
          "native": "latn",
          "traditional": "grek"
},

@thlinard
Copy link
Author

Oh, they called "latn" the Arabic numerals, I suppose… And "arab" the Indic numerals used in the Arabic script.

@thlinard
Copy link
Author

The list still lacks basic characters, like # % < > + = × ÷

@graphicore
Copy link
Owner

graphicore commented Feb 24, 2017

Oh, they called "latn" the Arabic numerals, I suppose… And "arab" the Indic numerals used in the Arabic script.

Yeah, right, but it seems to do the right thing anyways:

      "latn": {
        "_digits": "0123456789",
        "_type": "numeric"
},

Though they make it more complicated for me sometimes, "_type": "algorithmic" … :

      "grek": {
        "_rules": "greek-upper",
        "_type": "algorithmic"
},

The list still lacks basic characters, like # % < > + = × ÷

I guess there's the question if these are needed to write the language. I'm not really deep into the concepts of CLDR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants