Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support storing the code that builds the code model #305

Open
wants to merge 23 commits into
base: code-reflection
Choose a base branch
from

Conversation

mabbay
Copy link
Member

@mabbay mabbay commented Jan 22, 2025

In this PR we allow the user to decide how to store the code model. The first option is TEXT, if this is selected, we store the textual representation of the code model. The second option is CODE_BUILDR, if this is selected, we store the code that builds the code model. All work done here is around the second option, because the first option is already supported.


Progress

  • Change must not contain extraneous whitespace

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/babylon.git pull/305/head:pull/305
$ git checkout pull/305

Update a local copy of the PR:
$ git checkout pull/305
$ git pull https://git.openjdk.org/babylon.git pull/305/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 305

View PR using the GUI difftool:
$ git pr show -t 305

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/babylon/pull/305.diff

Using Webrev

Link to Webrev Comment

@mabbay mabbay self-assigned this Jan 22, 2025
@bridgekeeper
Copy link

bridgekeeper bot commented Jan 22, 2025

👋 Welcome back mabbay! A progress list of the required criteria for merging this PR into code-reflection will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jan 22, 2025

@mabbay This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

Support storing the code that builds the code model

Reviewed-by: mcimadamore

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 17 new commits pushed to the code-reflection branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the code-reflection branch, type /integrate in a new comment.

@openjdk openjdk bot added ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jan 22, 2025
@mabbay
Copy link
Member Author

mabbay commented Feb 10, 2025

To test this you can compile the program TestStoringCodeModelBuilder.java with the following configuration:

  • Program: com.sun.tools.javac.Main

  • Arguments: -XDdumpIR -XDcodeModelStorageOption=CODE_BUILDER src/jdk.incubator.code/share/classes/jdk/incubator/code/internal/TestStoringCodeModelBuilder.java

The compilation will fail with the error: no enclosing instance of type TypeElementFactory is in scope

@mabbay
Copy link
Member Author

mabbay commented Feb 23, 2025

I made CODE_BUILDER the default storage mechanism for code models, so you can test the capability by running the compiler tests.

@PaulSandoz
Copy link
Member

Getting compiler errors due to lack of imports:

=== Output from failing command(s) repeated here ===

  • For target jdk_modules_jdk.incubator.code__the.jdk.incubator.code_batch:
    /Users/sandoz/Projects/jdk/babylon/src/jdk.incubator.code/share/classes/jdk/incubator/code/Op.java:540: error: cannot find symbol
    opMethod = method.getDeclaringClass().getDeclaredMethod(opMethodName, OpFactory.class,
    ^
    symbol: class OpFactory
    location: class Op
    /Users/sandoz/Projects/jdk/babylon/src/jdk.incubator.code/share/classes/jdk/incubator/code/Op.java:542: error: cannot find symbol
    args = new Object[] {ExtendedOp.FACTORY, CoreTypeFactory.CORE_TYPE_FACTORY};
    ^
    symbol: variable CoreTypeFactory
    location: class Op
    2 errors

After fixing those locally i get many test failures.

@PaulSandoz
Copy link
Member

@mcimadamore Mourad implemented a transformation of the code model that builds a code model that adds local variables for values with more than one use, which makes it easier to generate the AST nodes. Would use of the internal LetExpr help avoid such a transformation, if so we can consider that for follow on work.

@mcimadamore
Copy link
Collaborator

@mcimadamore Mourad implemented a transformation of the code model that builds a code model that adds local variables for values with more than one use, which makes it easier to generate the AST nodes. Would use of the internal LetExpr help avoid such a transformation, if so we can consider that for follow on work.

I'm not sure I have all the context here. The problem here seems to be when you have a value that is resulting from some potentially side-effect operation. E.g. like a method call:

%2 = bar(%1)

If %2 is used multiple times, then javac has only one option -- that is, to hoist %2 in a local variable, and then replaces all references to %2 with references to the local variable. Inlining the call to bar at the use-site is not really an option, as that could change the semantics of the program.

Let expression nodes are useful when dealing with compact expressions. E.g.

List l = let x = 42 in List.of(42)

E.g. javac typically uses a let expression when it has to translate a single expression into something more complex, but it wants to do so by keeping the result as an expression (rather than turning the expression into a statement, which is not possible in all cases, such as in the case of a variable initializer).

It is true that what seems like a linear list of ops in a block can be modelled as something like more convoluted, like so:

let op1 = <op1 init> in
    let op2 = <op2 init> in
        let op3 = <op2 init> in
               ....
               <result>

This would mean to generate one let expression per op, where the "body" of the let expression is the remainder of the code model block. All this nesting is confusing, but is also avoidable -- a LetExpr node allows for more than one declaration for each body -- so you can translate the above as follows:

let (op1 = <op1 init> ;
     op2 = <op2 init> ;
     op3 = <op2 init>) in <result>

Doing something like this would probably avoid the need of generating extra local variables -- you now have one var declaration per op in the "statements" part of a LetExpr. It looks a bit odd -- visually -- that the body of the LetExpr is just the result of the code model block -- e.g. all the interesting part is in the setup code. But seems doable.

At the end of the day either adding extra variables (which can even be done as a pre-processing step, by javac), or using a more functional translation with LetExpr should work.

P.S.
I looked at the code and, at least in some cases (h) the TestAddVarsWhenNecessary seems to add intermediate Var ops, but which seem redundant - as they are initialized with some function parameter. I'd expect javac to be able to deal with references to function parameters using a JCIdent pointing at the desired parameter.


private Map<JavaType, Type> mappingFromJavaTypeToType() {
Map<JavaType, Type> m = new HashMap<>();
Symbol.ModuleSymbol jdk_incubator_code = syms.enterModule(names.jdk_incubator_code);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is effectively an extension to javac's symbol table. My preference would be to:

  • add whatever symbol/type you need in CodeReflectionSymbol
  • then set up the translation map simply, like you do for primitive types -- e.g.
Map.entry(<type element>, <javac type>)

};
}

private JCTree invokeOpToJCMethodInvocation(CoreOp.InvokeOp invokeOp) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method has some issues -- but perhaps we can keep it for now.
The main problem is that we seem to copy types from the invoked method declaration and put them "as is" in the generated AST. This works only as long as the invoked method doesn't have any generic type. For instance, consider a call to List::add. The signature of this method accepts X which is a type parameter of List<X>. I believe the code you have copies the X from the declared signature and sticks it into the javac's MethodType instance attached to the method call. Since this method type is used to describe the use-site (the call), and not the declaration, the type seems incorrect. E.g. calling List<String>:add should have a type of add(String) not add(X). (btw, a similar problem is present for field access, if the type of the accessed field is a type-variable -- so it's not just an issue with method calls).

The type of the invoked method/accessed field should always be adjusted with Types.memberType first. This method takes two parameters:

  • the receiver type you are using to call the instance method/access the instance field
  • the symbol you are trying to access

And returns the instantiated type of that symbol at that receiver type. Examples:

memberType(List<String>, add(X)->void) -> (String)->void
memberType(List<Integer>, add(X)->void) -> (Integer)->void

Once you address this, there is still another problem with generic methods -- as methods can have their own type-variables too. To figure out what is the type to be replaced for these type-variables you generally need to run inference -- which seems way too much for what we're trying to do here. The issue here is that the code model you are processing doesn't expose these details, and so generating the AST needs to "trace back" whatever steps where done when generating this model -- which is an hard problem.

I hope we can get away with just working with erased types, and maybe insert type-conversion to convert the type of the invoked method/accessed field so that it matches the expected op result type.

if (invokeOp.isVarArgs()) {
var lastParam = invokeOp.invokeDescriptor().type().parameterTypes().getLast();
Assert.check(lastParam instanceof ArrayType);
methodInvocation.varargsElement = typeElementToType(((ArrayType) lastParam).componentType());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same problem as described above -- here we're copying the vararg array at the declaration site into a type at the use-site -- for something like List::of this won't give the result you expect.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now -- while all the points I made above apply (e.g. if you were to try and type-check the generated AST using Attr you will get several errors), the saving grace here is that you are sending this tree into javac's backend anyway. And the backend plays a lot looser with types, only inserting casts where absolutely needed. Since you are using the op result type on the MethodType instance you are generating, I believe that should be enough for the backend to at least insert 90% of the type conversions that might be required because of erasure. So, in practice, even if incorrect, the code above might work fine.

@mabbay
Copy link
Member Author

mabbay commented Mar 12, 2025

#351 need to be merged before this PR

@mcimadamore
Copy link
Collaborator

I've verified that all tests pass, except the two test which fail with a compiler crash because of #351.

.ldc(cp.constantDynamicEntry(cp.bsmEntry(bsmDataAt, List.of(cp.intEntry(1))), natMH));
MethodType mtype = quotableOpGetterInfo.getMethodType();
if (quotableOpGetterInfo.getReferenceKind() != MethodHandleInfo.REF_invokeStatic) {
mtype = mtype.insertParameterTypes(0, implClass);
}
if (quotableOpGetterInfo.getMethodType().parameterList().equals(List.of(CodeReflectionSupport.OP_FACTORY_CLASS,
CodeReflectionSupport.TYPE_ELEMENT_FACTORY_CLASS))) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these fields are public/static, why don't we load them from inside the compiler-generated method, instead of injecting them here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming by the compiler-generated method you mean opMethod. The opMethod has two params (OpFactory, TypeElementFactory), so loading these fields is necessary before invoking it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can turn opMethod into one with no param and load these fields inside its body. But this removes the flexibility we have now (i.e. we can choose the OpFactory and TypeElementFactory we want to pass).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can turn opMethod into one with no param and load these fields inside its body.

That is what I was asking about -- while I agree what we have seems more flexible, we don't seem to use this flexibility anywhere -- as the method that turns a Quotable into a Quoted works with hardwired op and type factories. But maybe this has been already discussed -- in which case it's fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively use a consistent the method signature with the parameters even if they are not used. Uniformity in the method signature is i think a nice property we should aim for, but could be followed up later.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep - I agree we can follow up later.


// TODO add VarOps in OpBuilder
funcOp = addVarsWhenNecessary(funcOp);
funcOp.writeTo(System.out);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be removed?

Copy link
Member Author

@mabbay mabbay Mar 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addVarsWhenNecessary is used to add necessary VarOp in the code model before transformation to AST. For example if an op result is used more than once, we introduce a VarOp to hold the value, because that's what the equivalent Java code will have. This makes the transformation from code model to AST a bit easier.
We also use it to in case we want an Op (e.g. InvokeOp) to be appended right away to the method we are building. As you now the block of a method is a list of statements, some operations maps to an expression, so they won't be appended until needed. So by putting the result of an op in a VarOp, that VarOp will be mapped to a statement which causes the tree of the op to be added immediately.
For example an InvokeOp to Block.Builder::op is something we want to be transformed to a JCTree and added immediately to the method's block, so that we preserve the order of the operations as in the original code model.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally, the OpBuilder should produces a model that contains the VarOp when they should be. But I decided to have a seperate transformation as a quick fix. Later I will change the OpBuilder.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies -- I meant whether the writeTo should be removed :-)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I decided to have a seperate transformation as a quick fix. Later I will change the OpBuilder.

As stated in another comment, I'm not sure changing the op builder is needed -- as the javac code can easily workaround what the op builder does (e.g. what it does doesn't seem obviously wrong to me). But if the decision is to fix op builder that's fine too

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to directly avoid changing OpBuilder (investing in a SSA to Var transformation, using liveness analysis, is probably the general right approach), but there might be a more focused approach to generating the AST so we can avoid this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming @PaulSandoz -- we can defer this improvement to a follow up change -- and keep the var-adding transformation in this PR for now.

na.type = typeElementToType(at);
yield na;
}
var ownerType = typeElementToType(newOp.constructorType().returnType());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a strange asymmetry in the model between constructors and methods. Methods have a descriptor, constructors do not. Also, for methods you know from the op if it is varargs, but for constructors this info is not available. My feeling is that (at least coming at this from javac), trying to lump array and class creation under one op is probably why these asymmetries crop up.

This has nothing to do with this PR of course -- but I suspect that if you had a varargs constructor call this code would not work.

Copy link
Member Author

@mabbay mabbay Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on all points. We can have a new.array operation and then model class creation as an invoke op with method name being "init" ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on all points. We can have a new.array operation and then model class creation as an invoke op with method name being "init" ?

not sure I follow -- I was talking about class construction -- arrays are a bit special (e.g. there's no array constructor descriptor, unlike for classes). But a constructor call does share a lot of similarities with a method call, and there should be reusable code between method and constructors -- as it happens inside the javac code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be deferred until we have a better op to model instance creation expression.

var nc = treeMaker.NewClass(null, null, clazz, args.toList(), null);
var argTypes = new ListBuffer<Type>();
for (Value operand : newOp.operands()) {
argTypes.add(typeElementToType(operand.type()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong -- and is related to the point I made above -- the type of the operands here are the types of the actual arguments of the constructor calls. So, if you have:

class Foo {
     Foo(String string) { ... }
 }

the method type of the constructor call has to say Foo(String). But if you "infer" the type of the constructor argument from the actual argument, if the call is like this:

new Foo(null)

We should probably use the op's constructorType to derive the type of the invoked constructor? (You use that to determine the constructor's owner type)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be deferred until we have a better op to model instance creation expression.

return treeMaker.MethodDef(ms, mb);
}

public static CoreOp.FuncOp addVarsWhenNecessary(CoreOp.FuncOp funcOp) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something obvious. But if you have something like:

func @"f" ()java.util.Map -> {
    %0 : java.util.Map<java.lang.Integer, java.lang.Integer> = new @"func<java.util.HashMap>";
    %1 : int = constant @"1";
    %2 : int = constant @"2";
    %3 : java.lang.Object = invoke %0 %1 %2 @"java.util.Map::put(java.lang.Object, java.lang.Object)java.lang.Object";
    return %0;
}

Can't we translate as:

Map $0 = new HashMap();
int $1 = 1;
int $2 = 3;    
Object $3 = $0.put($1, $2);
return $0;

Note how this translation is very 1-1 and direct, and doesn't require any adaptation step in the model. We just emit a new local variable for each op we see in the body -- and then refer to operands via their corresponding (locally declared) variable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(of course we'd only do the above translation for ops that have a non-void result type)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can follow up on this improved translation at a later point.

var name = names.fromString(fieldLoadOp.fieldDescriptor().name());
var type = typeElementToType(fieldLoadOp.resultType());
var owner = typeElementToType(fieldLoadOp.fieldDescriptor().refType());
var sym = new Symbol.VarSymbol(flags, name, type, owner.tsym);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong -- the accessed field symbol should be findable in the scope of the class in which the field is declared. It seems to me that we need some piece of code that turns a method/field descriptor into a java symbol. Look at Lower::lookupMethod which uses Resolve::resolveInternalMethod -- I think here we should do that too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is borderline, but again, we can leave it as is for now (the risk is that if we have a bad model javac will just emit a reference to some non-existent field, which will likely cause non-sensical bytecode to be emitted).

.collect(List.collector());
var restype = typeElementToType(invokeOp.resultType());
var type = new Type.MethodType(paramTypes, restype, List.nil(), syms.methodClass);
var methodSym = new Symbol.MethodSymbol(flags, name, type,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for field access (see below) -- we should not create a new method symbol from scratch but "resolve" a method in a given class symbol (using Resolve::findInternalMethod)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to add two helper methods -- one takes a FieldDescriptor and returns a VarSymbol (using Resolve::findFieldInternal -- the other takes a MethodDescriptor and returns a MethodSymbol (using Resolve::findMethodIntenal. Then you rewrite the code here (and in field access) to use these methods.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to the one above for var symbol -- creating method/var symbols directly in the backend is not the way to do things -- we should lean on Resolve instead. It would be nice to fix these -- but we can also take another pass at it later if required.

@mabbay
Copy link
Member Author

mabbay commented Mar 15, 2025

I've verified that all tests pass, except the two test which fail with a compiler crash because of #351.

If we merge 351 into this, those two tests will pass.

Copy link
Member

@PaulSandoz PaulSandoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot tell what may be left to address for Maurizio's comments and what we can defer. If Maurizio is ok to integrate then i am ok.

It seems we can follow up later on the following:

  • avoiding the insertion of Vars
  • uniformity of method signatures that produce code models
  • adjust the new operation to support an invocation descriptor attribute, similar to the invoke operation.

.ldc(cp.constantDynamicEntry(cp.bsmEntry(bsmDataAt, List.of(cp.intEntry(1))), natMH));
MethodType mtype = quotableOpGetterInfo.getMethodType();
if (quotableOpGetterInfo.getReferenceKind() != MethodHandleInfo.REF_invokeStatic) {
mtype = mtype.insertParameterTypes(0, implClass);
}
if (quotableOpGetterInfo.getMethodType().parameterList().equals(List.of(CodeReflectionSupport.OP_FACTORY_CLASS,
CodeReflectionSupport.TYPE_ELEMENT_FACTORY_CLASS))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively use a consistent the method signature with the parameters even if they are not used. Uniformity in the method signature is i think a nice property we should aim for, but could be followed up later.


// TODO add VarOps in OpBuilder
funcOp = addVarsWhenNecessary(funcOp);
funcOp.writeTo(System.out);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to directly avoid changing OpBuilder (investing in a SSA to Var transformation, using liveness analysis, is probably the general right approach), but there might be a more focused approach to generating the AST so we can avoid this.

@openjdk openjdk bot added ready Pull request is ready to be integrated and removed ready Pull request is ready to be integrated labels Mar 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready Pull request is ready to be integrated rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

3 participants