-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PoC: Make type checking constant time (for fast generics & DNF matching) #18189
base: master
Are you sure you want to change the base?
Conversation
@withinboredom I've only skimmed this PR, and interning types seems like an interesting idea. Would it be possible to achieve the same thing by interning complex types, rather than creating a new data structure? This would not require changes in the type layout, but just make complex types point to the interened storage (since they are already pointers). The fast path can then use direct comparison of the complex type pointers, and use the existing type comparison as a fallback. |
🤦 of course @iluuu1994! That's probably far simpler, to be honest. I was experimenting with the following logic and needed something easier to fiddle with: Given a type constraint: When we check that We can then update the type to include |
This has union-find/equivalence classes vibes, might be interesting to take a look at that if you're interested in these kinds of optimizations. |
This is worth pursuing IMO, at least for the type-checking performance gain. Possibly, a runtime cache of As @iluuu1994 said you can probably get away with only interning You will want to support Opcache: Types from cached scripts should be stored in SHM, and when a cached script is loaded, its types should be added to the tree. Two scripts may independently declare the same type (in two different requests), and then loaded in the same request, but maybe they can be de-duplicated during script persistence. Regarding memory management: You can probably use the arena by default. IIRC all types are allocated on the arena currently (and moved to SHM during persistence). This simplifies memory management as you don't need to free types, and you don't need to know if a type is in SHM. This is something I wanted to try in the context of generic arrays, but didn't get to it. I'm glad you are working on it. (The use-case was that with reified generics we needed to create I think @Girgias also had the idea of interned types in the context of type aliases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made just a quick review and might miss some ideas.
I see how the PR de-duplicates the complex types and saves relate memory, but at the same time it increases memory usage used by every arg_info
and property_info
.
I didn't understand how this may increase the speed of run-time type checks, because we still have to check unions/intersections and inheritance.
Idea of interned types looks interesting. Maybe it's possible to implement it completely transparent as a part of opcache persistence.
@@ -479,13 +480,15 @@ typedef struct _zend_class_constant { | |||
typedef struct _zend_internal_arg_info { | |||
const char *name; | |||
zend_type type; | |||
zend_type_node *type_tree; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This increases size of one of the core data structure, and therefore the common memory usage.
I think zend_type_node
and zend_type
should be somehow combined together.
@@ -191,6 +192,7 @@ struct _zend_executor_globals { | |||
HashTable *function_table; /* function symbol table */ | |||
HashTable *class_table; /* class table */ | |||
HashTable *zend_constants; /* constants table */ | |||
HashTable *type_trees /* type trees table */; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need CG(type_trees) and EG(type_trees)?
if (EXPECTED(type_tree != NULL) && type_tree->kind != ZEND_TYPE_SIMPLE) { | ||
switch (type_tree->kind) { | ||
case ZEND_TYPE_UNION: { | ||
for (uint32_t i = 0; i < type_tree->compound.num_types; i++) { | ||
if (zend_check_type_slow(type, type_tree->compound.types[i], arg, ref, cache_slot, is_return_type, is_internal)) { | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like you replace traversing one structures by the others.
Should this affect performance? why?
I've finished constructing a PoC using I should have a PR in another week or two -- I'll be traveling for work for this week and won't have as much time to finish cleaning up my implementation. How it works: Basically there is a HashTable of HashTables. To write it in php itself, it would look something like this: // represent a type that could be A|B, A&B or, A>B
$constraintCache = [];
$constraintCache[canonical_hash($zend_type_list)] = [
ce_hash($class_a) => true,
ce_hash($class_b) => true,
];
// find if the constraint is satisfied with a given type
$satisfied = $constrainCache[canonical_hash($zend_type_list)][ce_hash($class_a)] ?? false; Since we are using interned types, the "hash" is literally the pointer address of the |
As Arnaud said I was having a think about this, but mainly in the context of compile time resolution for Being able to do a pointer equality check for a type would definitely speed up the basic case of 2 types being equal which would be useful for variance checking. I can see the benefit for this as well for type checking more complicated types like a function type. But I don't really see how this helps for type checking that an object satisfies an intersection or union type? Mainly because I don't see how a tree would cover all cases and complicated unions/intersections. |
@withinboredom this is already the case for built-in types, so no this is not an issue from our PoV. (See https://3v4l.org/qqP8E) |
This PR introduces interned type trees to the engine, enabling canonical representation and deduplication of complex types such as intersections and unions. The goal is to improve memory efficiency and runtime performance involving complex type annotations.
Why
Currently,
zend_type
structures are duplicated and compared structurally throughout the engine. This leads to redundant memory usage and slower runtime checks for unions and intersections. By introducing interned type trees, we create a shared, normalized representation of each unique type structure, allowing for memory savings, faster comparisons via pointer equality, and a foundation for advanced type features in the future; such as pattern matching and generics.This Proof-of-Concept
I suspect this can be made more backward compatible by using
oparray
over changingarginfo
, but I wasn't able to figure out how to make it work. Ideally, we could replacezend_type
in php 9.0, but with both being utilized, about 0.5-1% more memory is needed when compared tomaster
in my tests.unresolved issues
Micro-benchmarks
I ran a small collection of microbenchmarks on type checking to see how it performs compared to
master
:I'm not sure why benchmarking is broken on this PR (probably something I did when trying to fix a memory leak), but here's a link to the last successful run just before I tried fixing the memory leaks.
Is this worth pursuing further, or should I take a different approach altogether?
cc: @arnaud-lb