Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autogenerate locale data from CLDR #3268

Draft
wants to merge 19 commits into
base: master
Choose a base branch
from
Draft

Autogenerate locale data from CLDR #3268

wants to merge 19 commits into from

Conversation

atjn
Copy link
Contributor

@atjn atjn commented Mar 16, 2024

The data you are creating for the locales.js file already exists as part of the Unicode Common Locale Data Repository (CLDR). I suggest using the CLDR data as the default for all locales and only defining your own data when the CLDR data is undesirable.

The advantage would be that you can spend much less time validating and maintaining locale data. If you ever need to add a new data type, it is also much easier to apply to all locales because you don't need to manually define the data for each locale.

I moved all the locale data from locales.js to locales-meta.js, I then deleted most of the data since it was identical to the data in CLDR. Then I created a script create-locales-js.mjs that can read the limited locale data in locales-meta.js and autogenerate the locales.js file with all the same data that was in it before.

If there were only minor differences between the original locale data and the CLDR data, I have used the CLDR data. I also found a few things that seem to be mistakes, which I have then changed to the correct CLDR version. You can see the full diff here: e4f5d36 (*this diff does not include the change to standalone formatting made in 048aee4).

Some of the locales use very long date formats. I have used the shorter CLDR version there, because most apps seem to be designed to work with the relatively short en_US date format and everything breaks when the date string becomes twice as long.

I noticed two issues that I would like to suggest changes for, I put them in separate issues to make the discussion a bit easier:

I hope you like it!

I have changed the file to the new format, but used the existing data. This will allow me to use git diff to ensure the new locale system creates data that is identical to the old data.
@atjn
Copy link
Contributor Author

atjn commented Mar 16, 2024

I am not sure if the locales.js file needs to be included in the repository, or if we can just make the linter ignore this and then everything just works?

@atjn
Copy link
Contributor Author

atjn commented Mar 17, 2024

Btw this PR also applies the fix in #3271

@jordimas
Copy link
Contributor

jordimas commented Mar 17, 2024

Thanks for your work.
Some regressions that this PR introduces for Catalan language locale:

  • The names generated for the months in Catalan (CA_ES) locale (e4f5d36) are incorrect. The current ones are correct. For example, for April you're defining "d'abr." but it should be "abr." as it is now. There are two versions of the month names: standalone and formatting. See https://www.unicode.org/cldr/charts/44/summary/ca.html. We should use the standalone version like we do now.
  • Flags are not great to represent languages, since they trend to represent territories. For the case of Catalan, please provide the current option, the European flag (probably an exception to your library).

@atjn
Copy link
Contributor Author

atjn commented Mar 17, 2024

@jordimas thank you for the insightful comment! I will look into standalone formatting.

As for the flag, I will fix that by adding an override :)

@atjn
Copy link
Contributor Author

atjn commented Mar 17, 2024

@jordimas so if I understand it correctly format is used when the date is used in a larger context, such as 17 de març 2024 (de març), whereas standalone is used when you are only stating the month, març. The problem is that we use the date and month names in our date and time formatting. If you read the unicode specification, almost all the default formats use the format version.

So the only real solution is to extend the data format to allow both format and standalone versions. Is that possible?

Failing that, we need to find the least bad solution. Maybe you are right that the least bad solution is standalone, but I would probably argue that the formatted time strings are more important to get right than correctly showing standalone months.

@jordimas
Copy link
Contributor

jordimas commented Mar 17, 2024

Hello. My recommendation is that abmonth and month use the standalone version. You can also provide new entries for the formatting forms but do not change the default behavior since you may be breaking already existing applications.

For many years, locales only provided standalone. The forms in formatting were introduced on the last years to improve the writing in some languages (e.g. Catalan, Spanish, etc). But for many years we used standalone only. Additionally, in the context of watch (e.g. faces) you need most of the times standalone versions.

@atjn
Copy link
Contributor Author

atjn commented Mar 17, 2024

Okay well if you say that the time formats are still pretty good despite using standalone when format is supposed to be used, then I am less worried about this. I can change to using standalone and we can maybe discuss in #3272 if we can make an extension to include the format version. Thanks :)

@jordimas
Copy link
Contributor

jordimas commented Mar 18, 2024

Thanks @atjn

One important observation is that before this change "locale" app will work out of the box (just git clone the repo). Now, you need to run create-locales-js.mjs to be able to install and use the locale app. This is because by default there is no locales.js which the app in the store can find. Actually if you try to install the locale app, it shows no locales.

In my view, it will be important than this works out of the box like it was doing before.

@atjn
Copy link
Contributor Author

atjn commented Mar 18, 2024

@jordimas it depends on how you initialize the app store. You also need to generate the app index, otherwise the app store can't find any apps.

I have tried to set up the system such that the locale.js file is always generated when you start the app store. For example if you launch it with npm run local, everything works as expected. But I am not sure if I have covered all cases. There are multiple questions around how and when exactly the generator should run, which I am sure @gfwilliams has an opinion on. I am happy to adapt the script to work the way you want it to.

@atjn
Copy link
Contributor Author

atjn commented Mar 18, 2024

..or feel free to edit this PR directly if that is easier.

@gfwilliams
Copy link
Member

Sorry, I really don't like this. Any thoughts from others?

There have been very few issues with the Locale that needed changing. On the whole everything 'just worked' and you can see locale file really hasn't had many changes needed over time.

Sometimes the date formatting isn't exactly what a country might have - for instance there are characters that aren't supported in the Bangle's built-in font, or it needed making smaller to fit on the watch screen when other watch faces had been designed for the default UK/US formatting - and I imagine this undoes that?

As far as I remember the icon much wasn't needed in most cases since the way it was handled in unicode you could just use the two letter char code - so I'm not sure adding it is a huge improvement: https://github.com/espruino/BangleApps/blob/master/apps/locale/locale.html#L256

As far as I know, nobody asked for this or had any issues with locale? and what you're proposing is yet again causing a big headache for maintainers who have to check all this refactoring, and will likely break some stuff for users

It's great that you're wanting to get involved - but please can you stop refactoring everything? Especially where nobody had problems before. There are loads of open issues that you could work on fixing, or you could contribute new apps.

But right now I think it's fair to say the effect of your work on most users is not positive. There might have been some fixes from the linting but I don't think anyone noticed those - but people do notice stuff getting broken, and they did notice having to install all the updates.

@atjn
Copy link
Contributor Author

atjn commented Mar 18, 2024

But right now I think it's fair to say the effect of your work on most users is not positive. There might have been some fixes from the linting but I don't think anyone noticed those - but people do notice stuff getting broken, and they did notice having to install all the updates.

@gfwilliams that is very hurtful to read. I advised against changing existing apps exactly because of the issues you outline, but you asked me to change them anyways. That work took me more than 20 hours to complete.

@gfwilliams
Copy link
Member

Sorry - I didn't mean to be hurtful - it's hard to get the right tone in text.

It's great that you did that work, and I appreciate it took you a long time and that it did find some issues. I personally spent several hours going through it as well, and I believe @bobrippling and others spent quite a while too though.

I asked you to change the apps because I don't think we can realistically add rules that dictate how code should be written and then not have most of the apps in the repo (which people use as a guide) conform, as that will stop people contributing (or they will contribute their code with the lint exception copied in). So it was either that, or not have the rules. I still think that's fair - the whole point is contributing new apps should be easy, and saying "hey before you contribute you've got to fix these warnings that were in the example code you found" doesn't seem a positive step in that direction.

You're obviously really good at software development, and there are so many things that could be fixed and improved in apps in this repository, or there are apps/functionality that could be added - it'd just be nice if you'd work on some of those or at least ask, rather than changing things that weren't really problems anyone had.

I feel bad that you've obviously spent a lot of time on this PR, but that I don't like it or particularly want to merge it. And I was just trying to explain my point of view so if you do decide you want to contribute more, it's stuff that will be more in line with the way most people are working with and contributing to this repository.

I know it's all a bit unclear exactly what we're aiming for, but generally:

  • avoiding large changes to the way things work unless it's for a very good reason
  • try and keep friction for contributors as low as possible
  • make sure the app loader will 'just work' when statically hosted (I'm pretty sure this change doesn't allow that right now - if I committed this and did a release without some changes on my side, it'd break locale on banglejs.com/apps, and I believe the github-pages hosted version too)
  • make it easy for new users to debug - so if they see some text somewhere, ideally they could search in this repo and find the place where it's defined and what needs fixing
  • as we don't have any proper CI app testing framework, ideally test apps on-device as much as possible

It know that's all a bit hand-wavey though - we should maybe try and come up with a more specific set of pointers for something like https://github.com/espruino/BangleApps/wiki/App-Contribution

@atjn
Copy link
Contributor Author

atjn commented Mar 18, 2024

Thank you for the reply :) I understand why you thought it was best to fix lint errors, and I also understand that there were negative aspects to the change I made. I do however hope that you only asked me to continue working on it because you, like me, believe there are more pros than cons. Otherwise it feels like a waste of time, and I would have fully understood if you closed the PR.

I am fully aware that when I make an unsolicited PR like this, there is no guarantee that it will be merged. I only started making this because I thought it was interesting to learn about CLDR. So if it is never merged, I should not and will not be mad about it :)

I will however say that I still think it is a positive for you. With this generator, you can literally generate all languages right now, and you will never have to review a PR for a new language again. That seems like the opposite of a reviewer headache to me.

Also to address your comment that this refactor is a major review headache - I can just put back in all the locale data that was defined before, that way there are no regressions, and you can just use the tool for new locales only.

To make it easier to debug and statically host, the generated locales.js file can be included in the git repository. Even if outside contributors make changes directly in it, it should be simple for a maintainer to move those changes to locales-meta.js.

Idk I think this will make your life easier and make the end user experience better, but as I mentioned, you are free to close it.

@gfwilliams
Copy link
Member

In general, I get quite a lot of negativity... People notice things breaking (or even changing) a lot more than they notice them getting better - so there's a really big motivation from my side to only change things when it's actually fixing a bug...

So yes, I thought the lint PRs were useful, but also merging them brings in a slight feeling of dread. I'd really prefer not to have too many more like that to merge at the moment :)

If there are automatic checks you can add that do pick up real errors without requiring changes to apps that were otherwise functioning ok, those kind of things are awesome though.

For this PR, are there specific new locales that need adding that you think people would use?

My concern with your changes at e4f5d36 is that most of these locales have been created by hand, by users of the watch. Often if they don't agree with CLDR I think there's a high probability it was done for a reason (eg leaving out meridian).

If we were dealing with something that changed often then I'd totally agree that we should have a way to generate it automatically, but these locales are basically never going to change, and I think are tweaked enough for the watch that in most cases they shouldn't be pulled from CLDR.

For instance your change:

-"abmonth": "gen.,febr.,març,abr.,maig,juny,jul.,ag.,set.,oct.,nov.,des.",
+"abmonth": "de gen.,de febr.,de març,d'abr.,de maig,de juny,de jul.,d'ag.,de set.,d'oct.,de nov.,de des.",

The fact 'short month' should be very short probably isn't documented properly anywhere (I guess it should be) but people now rely on the fact it is (although I'm sure some locales do cause problems).

As an example the calendar widget (https://github.com/espruino/BangleApps/blob/master/apps/widcal/widget.js) draws the short month in a 4x6 font into ideally a 20px area, so that's 5 chars absolute max. Many of these translations would overflow that.

... but it shouldn't really be our job to go through PRs finding out why they break things and then argue about why things should be left as they are.

So really I'd like to see locales.js still in the repo, and generated by the script but being basically identical apart from the cases like anv to janv where there was obviously some issue.

But I think if we're doing that, we'd probably find that apps/locale/locales-meta.js ended up not much different from locales.js, but with this slight added layer of complexity that would make things more difficult.

So maybe instead bin/create-locales-js.mjs should not exist as a way to create locales.js but just to automatically produce a sections of JSON for new locales that could be copied in?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants