-
-
Notifications
You must be signed in to change notification settings - Fork 31.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-126835: make CFG optimizer skip over NOP's when looking for const sequence construction #129703
Conversation
Good job. This problem also blocks #128802 where's NOP isn't taken into account in between. |
One thing to keep in mind about
I don't see a problem related to this here, just want you to be aware of this. |
I don't think adding complexity to the individual optimizations is the way to go. Take the constant |
The issue here is skipping NOPs that are not going to be removed (because source locations), but they are in the middle of a const sequence construction. |
Current Python version does not fold it to constant
Do you suggest this is something we will be adding? |
The example I give could be folded using AST transforms, even if it happens not to be. @iritkatriel what's your take on this? |
But this particular logic does not interfere with potential control flow simplification. We just fold inside individual basic blocks. Sometimes multiple such foldings happen together and we need to take into account intermediate |
We have some of that in |
Oh, turns out @iritkatriel merged a1417b2, so tests are failing because |
2fe4e7d
to
5866faa
Compare
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks, Yan! Just to make sure there are no refleaks, I will run the test suite locally in huntrefleak mode.
is_constant_sequence
to take NOP's into accountCo-authored-by: Irit Katriel <[email protected]>
Results: == Tests result: SUCCESS ==
22 tests skipped:
test.test_asyncio.test_windows_events
test.test_asyncio.test_windows_utils test_android test_apple
test_devpoll test_free_threading test_idle test_kqueue
test_launcher test_msvcrt test_smtpnet test_ssl test_startfile
test_tcl test_tkinter test_ttk test_ttk_textonly test_turtle
test_winapi test_winconsoleio test_winreg test_wmi
8 tests skipped (resource denied):
test_curses test_peg_generator test_pyrepl test_socketserver
test_urllib2net test_urllibnet test_winsound test_zipfile64
454 tests OK.
Total duration: 13 min 49 sec
Total tests: run=44,920 skipped=2,362
Total test files: run=476/484 skipped=22 resource_denied=8
Result: SUCCESS |
We need to account for optimizations interfering with each other when moving more and more AST optimizations to CFG. Currently
is_constant_sequence
& (fold_tuple_on_constants
&optimize_if_const_list_or_set
that use it) do not take into accountNOP
s in between. This becomes problematic when there are several optimizations performed one after another as theyNOP
out unused instructions. Example:This expression will trigger subscript & binop foldings. With current
is_constant_sequence
implementation, resulting instruction sequence cannot be optimized correctly. Here is final dis:1 RESUME 0 2 LOAD_SMALL_INT 1 LOAD_CONST 1 ((3,)) BUILD_TUPLE 2 LOAD_SMALL_INT 1 BINARY_SUBSCR LOAD_SMALL_INT 0 BINARY_SUBSCR RETURN_VALUE
When correct sequence should be:
Here is basic block dump in a state when
BUILD_TUPLE
fails to be optimized byfold_tuple_on_constants
:As you can see,
BUILD_TUPLE 2
has 2 consts before it but they are separated byNOP
s due to previous binop folding. Obvious solution would be to removeNOP
s on every iteration inoptimize_basic_block
which makes sense, but we cannot do that because we would be changing basic block size while iterating over it which breaks everything. So a different approach should be implemented. This PR presents one of the possible approaches to prepare us for the next optimizations migration to CFG. My main concern is complexity & readability as I think they suffer a bit from it. Another cleaner way I see is to usePyMem_Malloc
to store indexes of corresponding instructions which seems cleaner but should be discussed.cc @Eclips4 @tomasr8 @iritkatriel