Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-126835: make CFG optimizer skip over NOP's when looking for const sequence construction #129703

Merged
merged 12 commits into from
Feb 9, 2025

Conversation

WolframAlph
Copy link
Contributor

@WolframAlph WolframAlph commented Feb 5, 2025

We need to account for optimizations interfering with each other when moving more and more AST optimizations to CFG. Currently is_constant_sequence & (fold_tuple_on_constants & optimize_if_const_list_or_set that use it) do not take into account NOPs in between. This becomes problematic when there are several optimizations performed one after another as they NOP out unused instructions. Example:

(1, (1+2,),)[1][0]

This expression will trigger subscript & binop foldings. With current is_constant_sequence implementation, resulting instruction sequence cannot be optimized correctly. Here is final dis:

  1           RESUME                   0

  2           LOAD_SMALL_INT           1
              LOAD_CONST               1 ((3,))
              BUILD_TUPLE              2
              LOAD_SMALL_INT           1
              BINARY_SUBSCR
              LOAD_SMALL_INT           0
              BINARY_SUBSCR
              RETURN_VALUE

When correct sequence should be:

  1           RESUME                   0

  2           LOAD_SMALL_INT           3
              RETURN_VALUE

Here is basic block dump in a state when BUILD_TUPLE fails to be optimized by fold_tuple_on_constants:

Screenshot 2025-02-05 at 21 47 58

As you can see, BUILD_TUPLE 2 has 2 consts before it but they are separated by NOPs due to previous binop folding. Obvious solution would be to remove NOPs on every iteration in optimize_basic_block which makes sense, but we cannot do that because we would be changing basic block size while iterating over it which breaks everything. So a different approach should be implemented. This PR presents one of the possible approaches to prepare us for the next optimizations migration to CFG. My main concern is complexity & readability as I think they suffer a bit from it. Another cleaner way I see is to use PyMem_Malloc to store indexes of corresponding instructions which seems cleaner but should be discussed.

cc @Eclips4 @tomasr8 @iritkatriel

@Eclips4
Copy link
Member

Eclips4 commented Feb 5, 2025

Good job. This problem also blocks #128802 where's NOP isn't taken into account in between.
I'll take a closer look tomorrow.

@iritkatriel
Copy link
Member

One thing to keep in mind about NOPs is that they are not necessarily going to be removed. When we replace an instruction by a NOP it retains the source location of the instruction we overwrote. If there is no other instruction with the same line, the NOP stays so that tracing/debugging sees that an instruction executed on that line. For example:

>>> def f():
...     1
...     2
...     3
...     return 4
...     
>>> dis.dis(f)
  1           RESUME                   0

  2           NOP

  3           NOP

  4           NOP

  5           LOAD_SMALL_INT           4
              RETURN_VALUE

I don't see a problem related to this here, just want you to be aware of this.

@markshannon
Copy link
Member

I don't think adding complexity to the individual optimizations is the way to go.
All these optimizations can expose the opportunity for other optimizations, so ultimately we want to iterate to a fixed point.
We already have a pass to remove NOPs from a basic block, so we should incorporate that into the loop.

Take the constant 3 if (1 if True else 0, 1)[0 if 2 == 2 else 1] else 2 which evaluates to 3. To preform that evaluation in the CFG requires tuple folding, flow control simplification, comparison folding and subscript folding. All of which depend on each other to get the final result. The only way to fold this is iteratively.

@iritkatriel
Copy link
Member

I don't think adding complexity to the individual optimizations is the way to go.
All these optimizations can expose the opportunity for other optimizations, so ultimately we want to iterate to a fixed point.
We already have a pass to remove NOPs from a basic block, so we should incorporate that into the loop.

The issue here is skipping NOPs that are not going to be removed (because source locations), but they are in the middle of a const sequence construction.

@WolframAlph
Copy link
Contributor Author

WolframAlph commented Feb 6, 2025

@markshannon

3 if (1 if True else 0, 1)[0 if 2 == 2 else 1] else 2 which evaluates to 3

Current Python version does not fold it to constant 3. Constant folding is not done across basic block boundaries in cases where control flow present (like you posted) IIRC, and AST optimizations being moved to CFG operate on single blocks.

Constant folding is not done across basic block boundaries

Do you suggest this is something we will be adding?

@markshannon
Copy link
Member

The example I give could be folded using AST transforms, even if it happens not to be.
So I think we should do the same in the CFG optimizer, which will need iteration.

@iritkatriel what's your take on this?

@WolframAlph
Copy link
Contributor Author

But this particular logic does not interfere with potential control flow simplification. We just fold inside individual basic blocks. Sometimes multiple such foldings happen together and we need to take into account intermediate NOPs they leave so this is all this PR is about. Or am I missing something?

@iritkatriel
Copy link
Member

The example I give could be folded using AST transforms, even if it happens not to be. So I think we should do the same in the CFG optimizer, which will need iteration.

@iritkatriel what's your take on this?

We have some of that in inline_small_or_no_lineno_blocks. We can put more steps into that loop.
But as I said, the issue here is NOPs that are not going to be removed, so fixed point won't help them.

@python python deleted a comment from raymondiiii Feb 6, 2025
@WolframAlph
Copy link
Contributor Author

WolframAlph commented Feb 8, 2025

Oh, turns out @iritkatriel merged a1417b2, so tests are failing because BINARY_SUBSCR does not exist anymore. I will rebase & update the PR.

@WolframAlph
Copy link
Contributor Author

Done.

Copy link
Member

@Eclips4 Eclips4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks, Yan! Just to make sure there are no refleaks, I will run the test suite locally in huntrefleak mode.

@iritkatriel iritkatriel changed the title gh-126835: Redesign is_constant_sequence to take NOP's into account gh-126835: make CFG optimizer skip over NOP's when looking for const sequence construction Feb 9, 2025
@Eclips4
Copy link
Member

Eclips4 commented Feb 9, 2025

LGTM. Thanks, Yan! Just to make sure there are no refleaks, I will run the test suite locally in huntrefleak mode.

Results:

== Tests result: SUCCESS ==

22 tests skipped:
    test.test_asyncio.test_windows_events
    test.test_asyncio.test_windows_utils test_android test_apple
    test_devpoll test_free_threading test_idle test_kqueue
    test_launcher test_msvcrt test_smtpnet test_ssl test_startfile
    test_tcl test_tkinter test_ttk test_ttk_textonly test_turtle
    test_winapi test_winconsoleio test_winreg test_wmi

8 tests skipped (resource denied):
    test_curses test_peg_generator test_pyrepl test_socketserver
    test_urllib2net test_urllibnet test_winsound test_zipfile64

454 tests OK.

Total duration: 13 min 49 sec
Total tests: run=44,920 skipped=2,362
Total test files: run=476/484 skipped=22 resource_denied=8
Result: SUCCESS

@Eclips4 Eclips4 enabled auto-merge (squash) February 9, 2025 17:48
@Eclips4 Eclips4 merged commit 91d9544 into python:main Feb 9, 2025
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants