-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposed spec changes for string literal aliases #1082
base: main
Are you sure you want to change the base?
Conversation
At Meta, our GraphQL clients for both web and mobile care a lot about local data consistency. Local data consistency is when the client is able to reconcile data from multiple GraphQL responses in a local client-side cache which can be subscribed to. When data changes, we issue updates to these subscribers — which allows us to keep various screens "in-sync" and show the same view of the data. The canonical example of this on Facebook is if you navigate to the Groups page from a Feed post and join the group, then when you navigate back to Feed we should update the "Join Group" button in the post to "Joined". Local data consistency is supported today by Relay for JS clients (widely adopted within industry) and by Meta's internal mobile GraphQL client for Android/iOS clients. To support consistency, we need to be able to remap aliases/field names into their "canonical names". The canonical name is what we use to keep two fields consistent, as it represents the true definition of a field. Given a field selection: alias: field(arg: "foo") the canonical name would be: field(arg:"foo") Internally, we've explored various ways of doing this. We currently need to embed a lot of information about the schema and query in our clients; for example Relay creates a "normalization AST" with this information, and our internal mobile client requires all the information to be pre-compiled into the app binary. This leads to additional costs, e.g. binary size bloat or wire size costs. We've found that we can more efficiently deliver the canonical name information by embedding it into our response instead, which removes the need for pre-compiled metadata. We run a transform on our queries to add aliases for each field, and use the canonical name as the alias. However since the canonical name may contain non-supported characters (e.g. (, ), $), we are proposing a spec change to allow StringValue tokens as the alias. This would allow syntax such as: "field(arg:\"foo\")": field(arg: "foo") We've attempted alternative ways of implementing this, such as doing this transform during server-side execution (a spec violation, and high server overhead!) or delivering encoded aliases to abide by the NameValue specification (client parsing overhead for decoding!). Potential side effects / outcomes of this change: Selection set conflict validation should still work out of the box, even if one selection is a string literal and the other is a normal NameValue. Here we parse the StringValue into a NameNode (same as when parsing a NameValue), which doesn't have any requirements on the name itself as far as I can tell. We allow string literal aliases here but not number literals, which abides by the JSON specification. According to the JSON spec, map/object keys must be strings.
✅ Deploy Preview for graphql-spec-draft ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Interesting problem and well described! Thanks for opening this discussion.
Another concern I have is that in some situations (e.g. when a field is referenced multiple times with different arguments) it makes sense to use the field alias to store a precalculated result when performing "lookahead". Currently "internal" data can be stored to attributes with prefixes such as Finally, governed by the "Favor no change" guiding principle, I find myself wondering why you can't use a hash of your string as the alias:
Where |
So we actually do need the raw string, so a one-way hash like SHA256 doesn't work. The alias isn't actually a canonical name yet, it's a canonical name template — specifically, it can include variables like There's an alternative question: what about a two-way encoding? tl;dr is that we've actually also tried this! We ran a test where we used base32 encoding to fit the canonical name into the allowed character set, but this had pretty steep performance regressions. The encoded payload doesn't compress as well, and more bytes meant that we spend more time parsing and use more memory during parse. Concretely, we saw a 5%-15% regression in parse times, 20%-60% regressions in payload sizes, and a 10% regression in OOMs. We're running a test right now to escape unsupported characters, by using |
Yeah, it's interesting because we also do this. We reserve certain characters as prefixes in the field name that we know are illegal in the current syntax. For example, we may prefix the name with Our thoughts on how to mitigate this is to have compile-time validation against using those characters. Since it's a client-specific limitation (e.g. not every client reserves these characters!) then these clients should ship with validation to avoid conflicts.
This is a real concern, and I don't have a better answer than similarly suggesting compile-time validation for JS-based clients. |
Thinking about it…
To be clear, I was referring to server-side concerns when you have a system that looks into |
It feels like it wouldn't be too difficult to turn:
into Where Would this be sufficient for your needs? If you need to reverse every facet you could build your own character-constrained format for it; using the numeric prefixes will help with parsing performance because you'll know exactly how many characters to parse... I wouldn't be surprised if you could build a parser for a format like this that's faster than the parser you'd use on the literal function digestInputValue(iv) {
switch (iv.kind) {
case 'variable': {
return `v` + iv.name.length + iv.name;
}
case 'inputObject': {
return `o` + Object.entries(iv.value).length + Object.entries(iv.value).map(...)
}
default: {
const val = String(iv.value);
return `r` + val.length + val;
}
}
}
function digestField(field) {
return `_`
+ field.name.length + field.name // `9updateFoo`
+ `_` + args.length // `_1`
+ args.map(a =>
`_` + a.name.length + a.name // `_5input`
+ digestInputValue(a.value) // o2_2id_v2id_5patch_o1_4body_v4body
);
} Result would be something like: |
Another consideration is we could expand |
So if you have two queries:
where
We need these two to be consistent with each other in our normalized store, so the two This has a few implications:
So I think we're pretty resigned to a two-way encoding for this to work, and which also generally benefits as they scale in length relative to the input (as opposed to SHA256 which would be consistently 64 bytes, much longer than the original keys!).
We did briefly look into base64! It wasn't possible due to GraphQL not having enough characters as you allude. It shares a lot of common properties with the base32 encoded format:
This is what we've done for our current experiment that's running:
The specific format we're testing is:
This has a few benefits:
Again, we're still waiting for data here so it'll be interesting to see how it performs. Our local benchmark tests have it at a 3% parse performance regression. |
Been pondering this in the background a bit, have you considered encoding the response on the server-side in a pre-normalized format, @cuhtis? Could be something that clients opt into e.g. via an |
At Meta, our GraphQL clients for both web and mobile care a lot about local data consistency. Local data consistency is when the client is able to reconcile data from multiple GraphQL responses in a local client-side cache which can be subscribed to. When data changes, we issue updates to these subscribers — which allows us to keep various screens "in-sync" and show the same view of the data.
The canonical example of this on Facebook is if you navigate to the Groups page from a Feed post and join the group, then when you navigate back to Feed we should update the "Join Group" button in the post to "Joined".
Local data consistency is supported today by Relay for JS clients (widely adopted within industry) and by Meta's internal mobile GraphQL client for Android/iOS clients.
To support consistency, we need to be able to remap aliases/field names into their "canonical names". The canonical name is what we use to keep two fields consistent, as it represents the true definition of a field.
Given a field selection:
the canonical name would be:
Internally, we've explored various ways of doing this. We currently need to embed a lot of information about the schema and query in our clients; for example Relay creates a "normalization AST" with this information, and our internal mobile client requires all the information to be pre-compiled into the app binary. This leads to additional costs, e.g. binary size bloat or wire size costs.
We've found that we can more efficiently deliver the canonical name information by embedding it into our response instead, which removes the need for pre-compiled metadata. We run a transform on our queries to add aliases for each field, and use the canonical name as the alias. However since the canonical name may contain non-supported characters (e.g. (, ), $), we are proposing a spec change to allow StringValue tokens as the alias.
This would allow syntax such as:
We've attempted alternative ways of implementing this, such as doing this transform during server-side execution (a spec violation, and high server overhead!) or delivering encoded aliases to abide by the NameValue specification (client parsing overhead for decoding!).
Potential side effects / outcomes of this change: