Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Design) Formatted Parts #463
(Design) Formatted Parts #463
Changes from 4 commits
8933d22
e973b3b
a3e6b8c
cf1b5aa
ac84d25
0ff1a33
2d96e77
dbd626a
a8d966f
b7c88ba
1645aea
0b5debd
9c59af9
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These need more flesh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aphillips This section has been expanded since this comment. Sufficiently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this make sense? Or would it be better to say:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggestion would make the proposed MessageFallbackPart invalid, as it does not include either
value
orparts
. It's conceivable for other parts to also exist which do not include either, such as open/close expressions without an annotation.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would an empty
value
or emptyparts
satisfy that? Or a fallback could have a string expression? Empty strings don't result in the erroneous emission of the stringnull
😉I understand that it would "break" the current definition: we should decide what the shapes should be and make consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For fallback when formatting to a string the
{...}
make sense as a visual indicator, but for a formatted-parts consumer some different representation could be better. So including an explicitvalue
orparts
would be misleading.For open/close, it doesn't make sense to define their explicit parts shapes in this spec, but for JS I have them as:
There, the
value
would be'b'
for{b +html}
, but it would not be set for{+html.b}
. Setting it to an empty string would be misleading, as{+foo}
and{|| +foo}
could easily mean different things.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #463 (comment) I'm suggesting to define separate interfaces for single-valued and multi-valued parts. This could extend to fallback parts and markup, as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unclear to me why have
dir
andlocale
at this level.We don't need a locale to format anything, because the parts should be already formatted.
The whole proposal is called "Formatted Parts"
The locale might be needed to render things.
Or to process the formatting result (fix grammatical agreements as a post-step, fix "a apple" to "an apple" (en) or "La abeille" => "L'abbeile" (fr), or to sentence case the result of "{item} is foo"
But then that is something that is needed for the whole collection of parts, not on
MessageExpressionPart
only.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are needed to allow embedding content in a message that uses a different script or locale than the surrounding message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of clear what they can be used for.
But this info is on the
MessageExpressionPart
, which comes from an expression.And an expression can't create this info out of nothing, it is probably something we passed as a parameter.
So if I already know the info (because I passed it to the expression), having it on the
MessageExpressionPart
is useless duplication.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider this message, intended for consumption by a text-to-speech system:
How would you propose that the locale information is transmitted, if not as a field on the formatted parts?
As context, the
fr
number would be "quatre-vingt-dix-huit", while infr-BE
it's "nonante-huit".There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Language and direction are needed for placeholders because they represent values being inserted into the overall string. The language (locale) is used to ensure proper rendering and processing (such as line breaking, text transforms, or spell-checking). The direction is used to enable bidi isolation and get the direction of the substring correct.
The language and direction of a formatted part might not match that of the message due to resource fallback when looking up the message. Or because values passed in have different language or direction. (And we want bidi isolation even if the directions match!!!!)
Providing the fields in the formatted part structure allows the user to easily access the values, e.g. it makes it easy to do something like this, resulting in proper isolation of the formatted parts (not shown is decorating the parts separately):
That is done on portions of the message. The whole message also has a language (locale) and base paragraph direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe? But since the parts have a direction, the bidi isolation parts are not necessarily required. Let's consider pattern:
In locale
ar-AE
, the output of the:datetime
function might be "30/08/2023" with adir
ofrtl
. Note that the string looks bad here because it is an LTR context and not bidi isolated. There are U+200F RLMs after08
and30
to assist, but the string needs to be wrapped.Ignoring that the date has interior parts, getting a MessageExpressionPart:
... could result in a formatted string (in an HTML context) like:
This draws correctly without any further intervention.
If this became a list of parts it still works:
The caller could use this to produce:
Test
You wouldn't want Unicode isolates in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that in some cases bidi isolation of formatted-parts output may be achieved e.g. via
<span dir=...>
tags, but I don't think that's universally true for every use case for formatted parts.This is one reason why e.g. in the JS Intl.MF proposal my proposal for bidi handling includes the
"none"
strategy as an alternative to the default (and required)"compatibility"
strategy.Effectively, I think we should include the MessageBiDiIsolationPart by default, but allow for an implementation to provide a way to get rid of them, or just have a consumer of the output ignore all the
type: "bidiIsolation"
parts.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that span is not the only way that bidi will be implemented.
none
is easy: one can just ignore the bidi/direction. Obviously sometimes folks will want bidi controls. Or they might want to make an attributed string, I suppose, in some UI frameworks.I'm reluctant to create a "bidi isolation part" because it separates the direction information from the value it is associated with. It's harder if one is doing decoration, such as building some control using various tags, not all of which are span, some of which have classes or other attributes. The only place where the isolation part is easier is stringifying the message (since one just consumes it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing as you propose would require implementations to provide explicit
bidiStrategyImpl.start()
andbidiStrategyImpl.end()
interfaces, or the behaviour of format-to-parts-to-string could not be guaranteed to match the behaviour of format-to-string.That seems like a much bigger ask than defining what the bidi isolation parts would look like if they are to be included in the parts output by the requested bidi isolation strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Format-to-string's output would depend on what the bidi isolation strategy was, no?
The start/end thing was just an example. Note that with isolation parts you have to "read ahead" to find out what is being isolated if what one is doing with the parts is decorating a control or generating HTML.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
Sure, but either the bidi isolation parts need to be included, or some way of obtaining their equivalent is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need for the distinction? It is simply the source?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to
source
thedir
andlocale
are not present on MessageTextPart.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't text parts have a direction and language? Are they inheriting from the message?
Aside: we don't necessarily want to span every section of text, that is, this is better:
than this:
... which is why text parts would inherit language and base paragraph direction from the message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Text parts do inherit the language and direction from the message, so they'll be whatever the formatter was called with. But they're not necessary to be repeated on every part, because the caller already knows what they asked for, and they may well be formatting the message in a context with e.g.
<body lang=foo dir=ltr>
setting them at a much higher level.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we provide the fallback value? I think we have some text in the spec that allows implementations or functions to supply their own fallback.
Question: should a goal be that the string output of a message be equivalent to concatenating the string representation of its parts? Or at least that a test be that one can assemble the string output from the parts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fallback customization is only available for syntax and data model errors, which default to
�
. That would be used as thesource
value here.I think the latter. With the current proposal, it's doable like this:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the parts are technically not formatted yet?
I can't get from the proposal what a
sub.value
is.Can it be a date?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we define the "parts" so that they are generic rather than each type having its own special part type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the MessageExpressionPart definition above, which these interfaces also match. The definitions here are giving more specificity about what e.g.
:datetime
and:number
end up producing, i.e. that they have explicittype
identifiers and defineparts
rather thanvalue
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice in my example earlier in this review that I exposed the field name as a field in the parts. This becomes important when trying to decorate an Iterable whose contents shift around due to the locale/localized formatting. Dates have this feature (YMD, DMY, MDY). So do currency values (which may or may not have a decimal part, may have the symbol first or last, and may or may not have a space around the symbol). That's how the screen shots of currency values (from amazon.com and amazon.fr) get decorated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the
"type"
field here. I picked that rather than"name"
because it's used by the JS Intl formatters.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was super non-obvious to me (hence this conversation!), particularly since the
type
fields in the examples seemed to be focused on the "type" of formatter (datetime, number) rather than on the parts field within them. Admittedly, the MF2-level parts will be at the placeholder level. Interior parts are the problem of the formatter. But this was not at all clear and probably could use an example.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure that it's explained in more detail in the spec PR, should this design doc be accepted.