Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move const folding to the peephole optimizer #126835

Open
Eclips4 opened this issue Nov 14, 2024 · 119 comments
Open

Move const folding to the peephole optimizer #126835

Eclips4 opened this issue Nov 14, 2024 · 119 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@Eclips4
Copy link
Member

Eclips4 commented Nov 14, 2024

Feature or enhancement

Proposal:

For additional context see #126830 (comment).

Flow graph optimizer has more information and can do better job.

The problem is that we need to convert from UNARY_OP(-, CONST(1)) to CONST(-1), still before the code generation phase, because this leads to a few problems, one of which is shown below.

x = 1

match x:
    case -0:
        y = 0
eclips4@nixos ~/p/p/cpython (remove-ast-optimizer)> ./python example.py
  File "/home/eclips4/programming/programming-languages/cpython/example.py", line 4
    case -0:
         ^^
SyntaxError: patterns may only match literals and attribute lookups

cc @markshannon

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Linked PRs

@Eclips4 Eclips4 added type-feature A feature request or enhancement interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Nov 14, 2024
@Eclips4 Eclips4 self-assigned this Nov 15, 2024
@Eclips4
Copy link
Member Author

Eclips4 commented Dec 21, 2024

We had a discussion with @tomasr8 and decided that splitting this into parts would be good. Initially, I tried to do it in one PR, but realized that it would be impossible to review, as it would do a lot of unrelated things.

To begin, we decided to start by moving tuple folding from AST optimizer to CFG.

Problem: Currently compiler emits a warning if is/is not operators are used with constants1 (because it can evaluate them to True which is an implementation detail of CPython, other implementations might evaluate const is const to False).
So, the compiler (especially the codegen part) assumes that tuples are already folded, and has (in AST terms) Constant_kind. This breaks when we fold tuples later and they have a Tuple_kind at the codegen phase.

Possible solution: Perform this in a CFG after folding. This will be slower (I have no information on how slower it will be, but I assume it will) than performing this in codegen phase.

Footnotes

  1. https://github.com/python/cpython/blob/2a66dd33dfc0b845042da9bb54aaa4e890733f54/Python/codegen.c#L1678-L1691

@WolframAlph
Copy link
Contributor

WolframAlph commented Feb 1, 2025

Is there a plan what parts/in what order should be moved from ast optimizer to cfg? Does it make sense to create sub issues to make it more granular (instead of just PRs linked to this issue)? This issue Remove the AST optimizer seems too general and broad IMO.

Eclips4 added a commit that referenced this issue Feb 1, 2025
…en to CFG (#129426)

Codegen phase has an optimization that transforms
```
LOAD_CONST x
LOAD_CONST y
LOAD_CONXT z
BUILD_LIST/BUILD_SET (3)
```
->
```
BUILD_LIST/BUILD_SET (0)
LOAD_CONST (x, y, z)
LIST_EXTEND/SET_UPDATE 1
```
This optimization has now been moved to CFG phase to make #128802 work.


Co-authored-by: Irit Katriel <[email protected]>
Co-authored-by: Yan Yanchii <[email protected]>
@tomasr8
Copy link
Member

tomasr8 commented Feb 1, 2025

I'm currently looking into moving unaryop/binop folding (though I'm at Fosdem atm so It might take a bit longer) Anything else is up for grabs I'd say, unless Kirill is working on something currently?

@WolframAlph
Copy link
Contributor

WolframAlph commented Feb 1, 2025

I am looking into binop folding as well (almost done). Do you want to pass it or should I take something else?

@WolframAlph
Copy link
Contributor

WolframAlph commented Feb 1, 2025

Problem with binop I discovered is that after moving it to cfg, we, for instance, run into problem with case matching which @Eclips4 mentioned in issue description. For example:

match x:
    case 0 + 0j:
        pass

raises

File "/Users/yyanchii/Desktop/cpython/internal/t.py", line 2
    case 0 + 0j:
         ^^^^^^
SyntaxError: patterns may only match literals and attribute lookups

It is raised from:

cpython/Python/codegen.c

Lines 6022 to 6025 in 89fe067

if (!MATCH_VALUE_EXPR(value)) {
const char *e = "patterns may only match literals and attribute lookups";
return _PyCompile_Error(c, LOC(p), e);
}

So at codegen stage, it already expects (in this case) binops to be folded. @tomasr8 did you run into this already? We would need to add more logic in here which doesn't look good to me. On the other hand, how would we move binop folding to CFG then?

@Eclips4
Copy link
Member Author

Eclips4 commented Feb 2, 2025

@tomasr8 @WolframAlph

I think the current priority is to move unaryop folding from AST optimizer to codegen phase.
I'm looking into it.

Then we should have to think about binaryop folding :)

@WolframAlph
Copy link
Contributor

WolframAlph commented Feb 2, 2025

Placing this here, not to forget in future: #129568 (comment). Maybe in future we get rid of emitting LOAD_SMALL_INT in codegen and let CFG do it instead

@Eclips4
Copy link
Member Author

Eclips4 commented Feb 3, 2025

cc @iritkatriel
We need to decide on what to do with the optimize parameter of ast.parse and the PyCF_OPTIMIZED_AST flag.
If we remove the AST optimizer, these two will become obsolete.
Should we deprecate them in 3.14 (after the AST optimizer is removed) and remove them in 3.19?

@WolframAlph
Copy link
Contributor

WolframAlph commented Feb 3, 2025

@Eclips4 as I am reading https://docs.python.org/3/library/ast.html#ast.parse, it is not clear to me whether feature_version also works for constant folding case.

@WolframAlph
Copy link
Contributor

WolframAlph commented Feb 3, 2025

But I guess Best-effort part makes it clear that one cannot rely on the consistent output.

@WolframAlph
Copy link
Contributor

Sorry, ignore previous comments. I just realized it is controlled by optimize argument.

@Eclips4
Copy link
Member Author

Eclips4 commented Feb 3, 2025

Sorry, ignore previous comments. I just realized it is controlled by optimize argument.

I was also confused after reading "best effort", but I strongly feel that it does not have anything to do with constant folding.

@iritkatriel
Copy link
Member

cc @iritkatriel We need to decide on what to do with the optimize parameter of ast.parse and the PyCF_OPTIMIZED_AST flag. If we remove the AST optimizer, these two will become obsolete. Should we deprecate them in 3.14 (after the AST optimizer is removed) and remove them in 3.19?

Yes, or just deprecate now and not worry about scheduling removal. We'll just document that they don't do anything anymore.

Eclips4 pushed a commit that referenced this issue Feb 4, 2025
Move folding of constant subscription from AST optimizer to CFG.

Co-authored-by: Irit Katriel <[email protected]>
@tomasr8
Copy link
Member

tomasr8 commented Feb 4, 2025

I'm just now reading your messages - @WolframAlph I checked your binop PR and it's pretty much identical to what I had 👍
I don't know what to do about the problem with match..case though :/

@WolframAlph
Copy link
Contributor

WolframAlph commented Feb 4, 2025

@Eclips4 @tomasr8 I have noticed that CFG already does tuple folding:

cpython/Python/flowgraph.c

Lines 1357 to 1395 in 0664c1a

static int
fold_tuple_on_constants(PyObject *const_cache,
cfg_instr *inst,
int n, PyObject *consts)
{
/* Pre-conditions */
assert(PyDict_CheckExact(const_cache));
assert(PyList_CheckExact(consts));
assert(inst[n].i_opcode == BUILD_TUPLE);
assert(inst[n].i_oparg == n);
if (!is_constant_sequence(inst, n)) {
return SUCCESS;
}
/* Buildup new tuple of constants */
PyObject *newconst = PyTuple_New(n);
if (newconst == NULL) {
return ERROR;
}
for (int i = 0; i < n; i++) {
int op = inst[i].i_opcode;
int arg = inst[i].i_oparg;
PyObject *constant = get_const_value(op, arg, consts);
if (constant == NULL) {
return ERROR;
}
PyTuple_SET_ITEM(newconst, i, constant);
}
int index = add_const(newconst, consts, const_cache);
if (index < 0) {
return ERROR;
}
for (int i = 0; i < n; i++) {
INSTR_SET_OP0(&inst[i], NOP);
}
INSTR_SET_OP1(&inst[n], LOAD_CONST, index);
return SUCCESS;
}

So folding it in ast optimizer:

cpython/Python/ast_opt.c

Lines 558 to 568 in 0664c1a

static int
fold_tuple(expr_ty node, PyArena *arena, _PyASTOptimizeState *state)
{
PyObject *newval;
if (node->v.Tuple.ctx != Load)
return 1;
newval = make_const_tuple(node->v.Tuple.elts);
return make_const(node, newval, arena);
}

can be just removed and CFG is already prepared for that. Indeed, when I remove tuple folding from ast optimizer, I still get folded tuple when I dis some constant tuples. So CFG already handles that.

@Eclips4
Copy link
Member Author

Eclips4 commented Feb 4, 2025

@Eclips4 @tomasr8 I have noticed that CFG already does tuple folding

Yes, that's true. You can check #128802, which essentially removes the fold_tuple from AST optimizer. However, there are still a couple of issues that I'll resolve soon.

@WolframAlph
Copy link
Contributor

@tomasr8 there is also string formatting left.

@tomasr8
Copy link
Member

tomasr8 commented Feb 15, 2025

@WolframAlph ok if I tackle string formatting then?

@WolframAlph
Copy link
Contributor

Go for it, I haven't started it yet.

@WolframAlph
Copy link
Contributor

WolframAlph commented Feb 15, 2025

I've been going through expression folding code in ast_opt.c and found this logic:

cpython/Python/ast_opt.c

Lines 765 to 770 in e65e9f9

case Name_kind:
if (node_->v.Name.ctx == Load &&
_PyUnicode_EqualToASCIIString(node_->v.Name.id, "__debug__")) {
LEAVE_RECURSIVE(state);
return make_const(node_, PyBool_FromLong(!state->optimize), ctx_);
}

If we don't migrate this to CFG, we would need to keep a lot of expression related recursive boilerplate code just for this case. I got it migrated locally successfully, but it requires passing optimization level + names dict from compiler unit metadata to _PyCfg_OptimizeCodeUnit. Problem is that would affect test internal c api code as this function is called from there as well (indirectly). @iritkatriel what do you think would be the best course here? Maybe we just create default values implicitly for optimization level & names for internal c api test? But then there will be no way to test this. Seems like a lot of changes required just for this special case.

@Eclips4
Copy link
Member Author

Eclips4 commented Feb 15, 2025

@WolframAlph Do we really want to move it? Maybe we should just leave it as it is. I'm sure it's not worth the trouble of moving it to another stage

@WolframAlph
Copy link
Contributor

WolframAlph commented Feb 15, 2025

We don't have to, but then we have to keep boilerplate code in ast_opt.c, because we need to recurse into all expressions to handle this. Question is whether we are ok with that.

@iritkatriel
Copy link
Member

Yes, let's leave this in ast_opt.c. This file is not going away, and it needs to continue to perform a full AST traversal (so keep all the recursion). This pass will be used for things which are not const-folding optimisations, like detecting syntax errors. We will rename it at some point.

So leave the recursion, and leave the __debug__ business there.

@tomasr8
Copy link
Member

tomasr8 commented Feb 16, 2025

One other thing that we should also probably leave in ast_opt is the docstring optimization:

cpython/Python/ast_opt.c

Lines 619 to 626 in 798f8d3

int docstring = _PyAST_GetDocString(stmts) != NULL;
if (docstring && (state->optimize >= 2)) {
/* remove the docstring */
if (!stmt_seq_remove_item(stmts, 0)) {
return 0;
}
docstring = 0;
}

As for string formatting: I had a look at the code and I think it's safer if we keep that in the optimizer as well. As Yan pointed out, this optimization can end up with more instructions than it started with (e.g. it turns '%d %d' % (2, 3) into f'{2} {3}'). We'd need to be able to dynamically resize the basic block while running the optimizations - I think this would add a bit more complexity and be error prone. Since ast_opt is gonna stay around anyway I suggest we keep it there.

@WolframAlph
Copy link
Contributor

We'd need to be able to dynamically resize the basic block while running the optimizations

Or creating new block with enough space to optimize and then re-link old and new. Also ugly IMO.

@tomasr8
Copy link
Member

tomasr8 commented Feb 16, 2025

We'd need to be able to dynamically resize the basic block while running the optimizations

Or creating new block with enough space to optimize and then re-link old and new. Also ugly IMO.

Yeah either way I think it adds a lot of complexity for little benefit.. but I guess that's fine, not everything has to necessarily go
to flowgraph

@WolframAlph
Copy link
Contributor

WolframAlph commented Mar 10, 2025

I would like to address how we are searching & constructing constant sequences for constant foldings. Currently we collect constant sequence with:

cpython/Python/flowgraph.c

Lines 1354 to 1386 in 990ad27

static int
get_constant_sequence(basicblock *bb, int start, int size,
PyObject *consts, PyObject **seq)
{
assert(start < bb->b_iused);
*seq = NULL;
PyObject *res = PyTuple_New((Py_ssize_t)size);
if (res == NULL) {
return ERROR;
}
for (; start >= 0 && size > 0; start--) {
cfg_instr *instr = &bb->b_instr[start];
if (instr->i_opcode == NOP) {
continue;
}
if (!loads_const(instr->i_opcode)) {
break;
}
PyObject *constant = get_const_value(instr->i_opcode, instr->i_oparg, consts);
if (constant == NULL) {
Py_DECREF(res);
return ERROR;
}
PyTuple_SET_ITEM(res, --size, constant);
}
if (size > 0) {
Py_DECREF(res);
}
else {
*seq = res;
}
return SUCCESS;
}

Benefit is (arguably) that we need single pass and the code is simpler, but the drawback is that regardless of whether it will or it will not be a constant sequence, we always pre-allocate tuple of requested size. Considering that most of the cases it won't be a constant sequence, this seems to be quite wasteful.

This becomes even more concerning when it comes to (for example) binary operations. Now, this would pre-allocate tuple for every single binary operation existing, regardless of whether it is constant or not. Splitting this logic into 2 parts for 1: counting and 2: constructing resulting tuple would improve the situation a lot I believe. But there is another thing to consider.

Some of the foldings directly use resulting tuple as final constant, but some (binop, unaryop) use it only to get their operands and then deallocate it anyway, which also seems to be quite unnecessary (even if the event of folding constant is much less likely than otherwise).

Therefore I would like to propose to change the way how we search for constant sequence. Instead, we could do single pass to collect indices of instructions that load constants, something like this:

static bool
get_const_instruction_sequence(basicblock *bb, int start, int size, int *indices)
{
    assert(start < bb->b_iused);
    assert(size <= STACK_USE_GUIDLINE);

    for (; start >= 0 && size > 0; start--) {
        cfg_instr *instr = &bb->b_instr[start];
        if (instr->i_opcode == NOP) {
            continue;
        }
        if (!loads_const(instr->i_opcode)) {
            return false;
        }
        indices[--size] = start;
    }

    return size == 0;
}

And calling function would need to do something like this (binop for example):

int indices[2];
if (!get_const_instruction_sequence(bb, i-1, 2, indices)) {
    return SUCCESS;
}

Calling function would be responsible to make sure it calls get_const_instruction_sequence with correct size and enough space for indices. The obvious problem with this is that for tuples/lists/sets we don't know how many consts it would consume. But considering that codegen stage limits subsequent stack pushes to STACK_USE_GUIDLINE, we know that constant sequences beyond this length are impossible. Therefore all variable length constant foldings would need to return early if sequence_length > STACK_USE_GUIDLINE and create int indices[STACK_USE_GUIDLINE] to make sure all constants fit in if sequence_length < STACK_USE_GUIDLINE.

As a result, we would construct constant tuples only when necessary. This would also make noping out instructions faster as we already know indices and don't have to make unnecessary iterations as we do now. WDYT @iritkatriel @Eclips4 @tomasr8 ?

@iritkatriel
Copy link
Member

Makes sense to me.

@WolframAlph
Copy link
Contributor

Little update. All expressions foldings (except for string formatting and __debug__) are now happening in the peepholer.

@WolframAlph
Copy link
Contributor

WolframAlph commented Mar 25, 2025

@iritkatriel what is left to be done on this? One thing coming to my mind is cleaning up test cases, as there are cases IIRC when different tests test the same thing/some outdated tests etc... But not sure if it is related to this task.

@iritkatriel
Copy link
Member

Cleaning up the tests is a good idea, we can do it on this issue.

Another thing is renaming ast_opt.c and _PyAST_Optimize, because they are not really about optimization.
Maybe something like _PyAST_PreCompile or _PyAST_PostProcess?

@WolframAlph
Copy link
Contributor

Tests cleanup PR: #131826

@WolframAlph
Copy link
Contributor

Another thing is renaming ast_opt.c and _PyAST_Optimize, because they are not really about optimization.
Maybe something like _PyAST_PreCompile or _PyAST_PostProcess?

@iritkatriel I suppose all structs/functions using ast optimize naming should be renamed as well then?

@iritkatriel
Copy link
Member

Another thing is renaming ast_opt.c and _PyAST_Optimize, because they are not really about optimization.
Maybe something like _PyAST_PreCompile or _PyAST_PostProcess?

@iritkatriel I suppose all structs/functions using ast optimize naming should be renamed as well then?

Which ones? There is some stuff that is about optimization, so case by case basis.

@WolframAlph
Copy link
Contributor

Another thing is renaming ast_opt.c and _PyAST_Optimize, because they are not really about optimization.
Maybe something like _PyAST_PreCompile or _PyAST_PostProcess?

@iritkatriel I suppose all structs/functions using ast optimize naming should be renamed as well then?

Which ones? There is some stuff that is about optimization, so case by case basis.

For instance:

cpython/Python/ast_opt.c

Lines 18 to 26 in 3796884

typedef struct {
PyObject *filename;
int optimize;
int ff_features;
int syntax_check_only;
_Py_c_array_t cf_finally; /* context for PEP 678 check */
int cf_finally_used;
} _PyASTOptimizeState;

/* AST optimizations */
extern int _PyCompile_AstOptimize(
struct _mod *mod,
PyObject *filename,
PyCompilerFlags *flags,
int optimize,
struct _arena *arena,
int syntax_check_only);
extern int _PyAST_Optimize(
struct _mod *,
struct _arena *arena,
PyObject *filename,
int optimize,
int ff_features,
int syntax_check_only);

@iritkatriel
Copy link
Member

Those can be renamed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants