-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decide on formatting to something other than text #272
Comments
Some thoughts on this topic. When implementing support for MF2 in a commonly used library like ICU we should consider that ICU is used in many-many places. It is now the base for i18n support in Windows, MacOS, iOS, Android, several Linux UI frameworks, browsers, many applications. So the functionality we expose should allow ALL if these customers to implement what they need on top of this ICU implementation. And some of these users have their own ways to model rich UIs that might not map to a linear view of the world (sequence of parts), or a tree view of the world (DOM). For example text to speech support often requires the generation of parallel "tracks" of text. Depending on the expressiveness of TTS available one might do that by tagging a sequence of text with semantic info:
Or the tts might be very basic, and you might want ICU to add an "explicit spellout text stream"
The second approach is especially handy when you have custom formatters and you know the underlying TTS engine is not rich enough to know how to read it. Another form of "overlapping parts" Imagine you format an interval: The result might be plain text ("...between Aug 27-Sep 9, 2022.") Or you want the look like the one above, but clicking in various areas to invoke various pickers. So a library should be able to convey the information that various ranges represent different concepts, and that they might overlap:
Which means that a simple placeholder ( Some existing frameworks that don't use HTML for formatting, but still format things in UI:
TLDR: what should a library return to support ALL of these use cases. |
That's a great in depth analysis! I agree with your question and your summary captures cases that I see emerging. Thank you for writing it down |
One addition is that I am coming to conclusion that there will be a significant use case for a system that is composed of two parts - template engine to generate partially resolved message, and then something i dubbed Grammatical Correctness Engine that will take it on the input and use rules+ML to formulate final sentence. The GCE is something George advocated for for quite a while and I see it emerging out of Alexa TTS, and Amazon Retail needs, both as a way to resolve complex phrases without combinatory explosion and as a way to allow for message variations (common requirement in VA systems) without losing grammatical correctness. If MF2 were to aspire to be a fit for the templating part of such system (which i hope it would), then the schema of the output must be semantically meaningful for the GCE engine to reason about. |
My suspicion is that the exact shape of the output should be an implementation question, rather than one answered by the MF2 spec. For instance, the current JS Intl.MessageFormat proposal provides a It is rather likely of course that this JS API will internally rely on an ICU implementation, but I would still think that even its corresponding interface should be outside the scope of MF2 itself. So rather than deciding what such non-string interfaces ought to look like, could it be sufficient at the MF2 spec level to ensure that the needs of these "consumers" of the spec are satisfied, and that a conformant implementation is able to define its own "parts" output? |
Then I would expect that the format returned by MF2 is something that can be transformed into the form specified by Nobody is asking for MF2 to return a Spanned, or an If we don't do that, the only other options I can think of:
About Anyway, this issues is not even about deciding a |
Consensus : It's not a blocker for Tech Preview it can be added in a different phase |
Following on from the discussions in #28 and #315, I realise that I should note that my preferences on this topic are not compatible with the premise presented in the above. While I do agree that we should fully support the formatting of messages to non-string targets and that we should prototype this in the JS and ICU4J implementations, I do not think that such a non-string target should be explicitly specified in the MF2 spec. Rather, we should develop potentially multiple implementations of such non-string formatters and through that exercise help ensure that the text of the underlying spec supports them. So I would support for example the ICU4J Tech Preview experimenting with and implementing a formatted-parts API, but I do not see why the specification of that instance's implementation should be encoded in the MF2 spec. |
Replying to @zbraniecki in #28 (comment):
I think we need to build some of these layers in practice in order to get an idea of what they should really look like and how they could support the features we need of them. As far as I know, the only current such attempts so far are the ECMA-402 proposal (spec, polyfill) and unicode-org/icu4x#2272. This interface will need to be well-specified at least in ECMA-402. There, the interface will need to be able to support a variety of different architectures and implementations built on top of it while aligning with existing JS Hence my suspicion that we could more efficiently reach alignment on this by working on the implementations, rather than pre-emptively trying to figure out one right answer that satisfies everyone. |
Assuming we land #463, will that address the need for this? |
From all I know all parties agree that we need to format to "something that is not text"
We need to decide what that is. Or to decide if is not for the Tech Preview.
Something like format-to-parts, or the "paradigm change" that Zibi advocates, or the (Fluent inspired?) "binding" of formatting functions with variables (I think it was called "formatable"?)
The text was updated successfully, but these errors were encountered: