Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC: Make type checking constant time (for fast generics & DNF matching) #18189

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

withinboredom
Copy link
Contributor

This PR introduces interned type trees to the engine, enabling canonical representation and deduplication of complex types such as intersections and unions. The goal is to improve memory efficiency and runtime performance involving complex type annotations.

Why

Currently, zend_type structures are duplicated and compared structurally throughout the engine. This leads to redundant memory usage and slower runtime checks for unions and intersections. By introducing interned type trees, we create a shared, normalized representation of each unique type structure, allowing for memory savings, faster comparisons via pointer equality, and a foundation for advanced type features in the future; such as pattern matching and generics.

This Proof-of-Concept

I suspect this can be made more backward compatible by using oparray over changing arginfo, but I wasn't able to figure out how to make it work. Ideally, we could replace zend_type in php 9.0, but with both being utilized, about 0.5-1% more memory is needed when compared to master in my tests.

unresolved issues

  • The way I am storing the tree and freeing it is probably disastrously wrong.
  • I changed arginfo and function-info structs -- this possibly breaks many extensions.

Micro-benchmarks

I ran a small collection of microbenchmarks on type checking to see how it performs compared to master:

comparing [type_trees_local vs. master_local]

\Acme\Tests\Benchmark\TypeCheckBench

    bench_native_scalar_type................R2 I9 - [Mo0.094μs vs. Mo0.098μs] -3.86% (±0.21%)
    bench_native_class_type.................R4 I9 - [Mo0.104μs vs. Mo0.104μs] +0.82% (±0.22%)
    bench_native_union_nullable.............R1 I1 - [Mo0.094μs vs. Mo0.098μs] -4.11% (±0.23%)
    bench_native_union_deep.................R1 I4 - [Mo0.090μs vs. Mo0.095μs] -4.74% (±0.26%)
    bench_native_deep_union_with_intersecti.R5 I7 - [Mo0.098μs vs. Mo0.104μs] -5.53% (±0.23%)

Subjects: 5, Assertions: 0, Failures: 0, Errors: 0
Storing results ... OK
Run: 134fed951f0415dcd36f841639dc64312b77dcc2
+----------------+--------------------------------------------+-----+--------+-----+----------+----------------+----------------+
| benchmark      | subject                                    | set | revs   | its | mem_peak | mode           | rstdev         |
+----------------+--------------------------------------------+-----+--------+-----+----------+----------------+----------------+
| TypeCheckBench | bench_native_scalar_type                   |     | 100000 | 10  | ERR      | 0.094μs -3.86% | ±0.21% -11.18% |
| TypeCheckBench | bench_native_class_type                    |     | 100000 | 10  | ERR      | 0.104μs +0.82% | ±0.22% +48.04% |
| TypeCheckBench | bench_native_union_nullable                |     | 100000 | 10  | ERR      | 0.094μs -4.11% | ±0.23% -13.74% |
| TypeCheckBench | bench_native_union_deep                    |     | 100000 | 10  | ERR      | 0.090μs -4.74% | ±0.26% -8.85%  |
| TypeCheckBench | bench_native_deep_union_with_intersections |     | 100000 | 10  | ERR      | 0.098μs -5.53% | ±0.23% +6.59%  |
+----------------+--------------------------------------------+-----+--------+-----+----------+----------------+----------------+

I'm not sure why benchmarking is broken on this PR (probably something I did when trying to fix a memory leak), but here's a link to the last successful run just before I tried fixing the memory leaks.

Is this worth pursuing further, or should I take a different approach altogether?

cc: @arnaud-lb

@iluuu1994
Copy link
Member

@withinboredom I've only skimmed this PR, and interning types seems like an interesting idea. Would it be possible to achieve the same thing by interning complex types, rather than creating a new data structure? This would not require changes in the type layout, but just make complex types point to the interened storage (since they are already pointers). The fast path can then use direct comparison of the complex type pointers, and use the existing type comparison as a fallback.

@withinboredom
Copy link
Contributor Author

🤦 of course @iluuu1994! That's probably far simpler, to be honest. I was experimenting with the following logic and needed something easier to fiddle with:

Given a type constraint: A|(C&D)|E and a type F = (C&D&Z) (class F extends C implements D, Z).

When we check that F satisfies A|(C&D)|E, we will find that F satisfies C&D after a search.

We can then update the type to include F so that it is A|(C&D)|E|F so that we don't need to do a search for that type anymore.

@nielsdos
Copy link
Member

When we check that F satisfies A|(C&D)|E, we will find that F satisfies C&D after a search.

This has union-find/equivalence classes vibes, might be interesting to take a look at that if you're interested in these kinds of optimizations.

@arnaud-lb
Copy link
Member

Is this worth pursuing further, or should I take a different approach altogether?

This is worth pursuing IMO, at least for the type-checking performance gain.

Possibly, a runtime cache of zend_check_user_type_slow() would bring additional gains. Currently we use the cache slots for class resolution, but we could add a slot for the last positive type check. Type checks are positive most of the time.

As @iluuu1994 said you can probably get away with only interning zend_type_list.

You will want to support Opcache: Types from cached scripts should be stored in SHM, and when a cached script is loaded, its types should be added to the tree. Two scripts may independently declare the same type (in two different requests), and then loaded in the same request, but maybe they can be de-duplicated during script persistence.

Regarding memory management: You can probably use the arena by default. IIRC all types are allocated on the arena currently (and moved to SHM during persistence). This simplifies memory management as you don't need to free types, and you don't need to know if a type is in SHM.

This is something I wanted to try in the context of generic arrays, but didn't get to it. I'm glad you are working on it. (The use-case was that with reified generics we needed to create zend_types at runtime, often by unioning two types. Interned types would allow to have a cache of union operations, to make the operation less expensive and use less memory.)

I think @Girgias also had the idea of interned types in the context of type aliases.

Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made just a quick review and might miss some ideas.

I see how the PR de-duplicates the complex types and saves relate memory, but at the same time it increases memory usage used by every arg_info and property_info.

I didn't understand how this may increase the speed of run-time type checks, because we still have to check unions/intersections and inheritance.

Idea of interned types looks interesting. Maybe it's possible to implement it completely transparent as a part of opcache persistence.

@@ -479,13 +480,15 @@ typedef struct _zend_class_constant {
typedef struct _zend_internal_arg_info {
const char *name;
zend_type type;
zend_type_node *type_tree;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This increases size of one of the core data structure, and therefore the common memory usage.

I think zend_type_node and zend_type should be somehow combined together.

@@ -191,6 +192,7 @@ struct _zend_executor_globals {
HashTable *function_table; /* function symbol table */
HashTable *class_table; /* class table */
HashTable *zend_constants; /* constants table */
HashTable *type_trees /* type trees table */;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need CG(type_trees) and EG(type_trees)?

Comment on lines +1149 to +1154
if (EXPECTED(type_tree != NULL) && type_tree->kind != ZEND_TYPE_SIMPLE) {
switch (type_tree->kind) {
case ZEND_TYPE_UNION: {
for (uint32_t i = 0; i < type_tree->compound.num_types; i++) {
if (zend_check_type_slow(type, type_tree->compound.types[i], arg, ref, cache_slot, is_return_type, is_internal)) {
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like you replace traversing one structures by the others.
Should this affect performance? why?

@withinboredom
Copy link
Contributor Author

withinboredom commented Apr 6, 2025

I've finished constructing a PoC using zend_type and only zend_type, it is much more different than this, but is actually O(1) type checking, when using opcache to intern the types (resulting in a ~2-5% speedup for basic type checks). When not using opcache, it still appears to speed things up, but not that much. My only concern about the new implementation is that it appears to cause a lot more page faults than the existing implementation. I think I can resolve that by learning more about how oparray works to make sure the required memory is nearby during runtime.

I should have a PR in another week or two -- I'll be traveling for work for this week and won't have as much time to finish cleaning up my implementation.

How it works:

Basically there is a HashTable of HashTables. To write it in php itself, it would look something like this:

// represent a type that could be A|B, A&B or, A>B
$constraintCache = [];

$constraintCache[canonical_hash($zend_type_list)] = [
  ce_hash($class_a) => true, 
  ce_hash($class_b) => true,
];

// find if the constraint is satisfied with a given type
$satisfied = $constrainCache[canonical_hash($zend_type_list)][ce_hash($class_a)] ?? false;

Since we are using interned types, the "hash" is literally the pointer address of the zend_type_list and/or zend_class_entry. The only issue I think it may cause is that it needs to reorder type signatures when interning them (so that A|B == B|A) which will be visible to users during error messages. I don't know if that will be an issue.

@Girgias
Copy link
Member

Girgias commented Apr 6, 2025

As Arnaud said I was having a think about this, but mainly in the context of compile time resolution for self/parent at compile. The current issue I have is with traits as changing the arginfo requires a deep copy of a function, which is not reasonable.

Being able to do a pointer equality check for a type would definitely speed up the basic case of 2 types being equal which would be useful for variance checking.

I can see the benefit for this as well for type checking more complicated types like a function type. But I don't really see how this helps for type checking that an object satisfies an intersection or union type? Mainly because I don't see how a tree would cover all cases and complicated unions/intersections.

@Girgias
Copy link
Member

Girgias commented Apr 6, 2025

Since we are using interned types, the "hash" is literally the pointer address of the zend_type_list and/or zend_class_entry. The only issue I think it may cause is that it needs to reorder type signatures when interning them (so that A|B == B|A) which will be visible to users during error messages. I don't know if that will be an issue.

@withinboredom this is already the case for built-in types, so no this is not an issue from our PoV. (See https://3v4l.org/qqP8E)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants