Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++: Generate int-to-bool conversions in C code #18490

Merged

Conversation

MathiasVP
Copy link
Contributor

@MathiasVP MathiasVP commented Jan 14, 2025

In C++ code we have int-to-bool conversions on myInt in examples like:

void test(int myInt) {
  if(myInt) {
   // ...
  }
}

That is, it's not myInt that is branched on, but rather the result of myInt != 0. In the IR, this manifests as the IR looking like:

r1_5(glval<int>) = VariableAddress[myInt]     :
m1_6(int)        = InitializeParameter[myInt] : &:r1_5
r2_1(glval<int>) = VariableAddress[myInt]     :
r2_2(int)        = Load[myInt]                : &:r2_1, m1_6
r2_3(int)        = Constant[0]                :
r2_4(bool)       = CompareNE                  : r2_2, r2_3
v2_5(void)       = ConditionalBranch          : r2_4

(notice the BranchNE instruction).

This information is important in guard conditions since branching on myInt doesn't tell us that myInt == 1, but rather that (myInt != 0) == 1.

This is all well and good in C++. However, in C, there are no int-to-bool conversions. So when compiling the same code with a C compiler the IR looks like:

r1_5(glval<int>) = VariableAddress[myInt]     :
m1_6(int)        = InitializeParameter[myInt] : &:r1_5
r2_1(glval<int>) = VariableAddress[myInt]     :
r2_2(int)        = Load[myInt]                : &:r2_1, m1_6
v2_5(void)       = ConditionalBranch          : r2_2

(notice that we're now branching on the value of myInt)

Historically, we've worked around this problem in IRGuards specifically (see #16364 and #16533), but upon facing a related problem somewhere else I now feel like we should simply make the IR equivalent in C and C++.

So this PR inserts IR equivalent to what would have been generated for an int-to-bool conversion if it had been present in the database in the places I'm aware of this being a problem. It then removes the workarounds in the guards library that were added to handle these.

Commit-by-commit review recommended (especially for the IR construction related commits).

DCA reveals a few new results and a few lost results:

  • The lost cpp/overrun-write and cpp/path-injection results are because both queries use any guard as a barrier, and we now recognize more barriers.
  • The cpp/redundant-null-check-simple results are new TPs 🎉

Commits 6dd1c5e to 0d7adac are IR-construction related changes, and commits 9810a4f to 5373e22 are IRGuards-related changes.

@github-actions github-actions bot added the C++ label Jan 14, 2025
@MathiasVP MathiasVP force-pushed the generate-int-to-bool-conversion-instructions-2 branch from 446f6cb to d182a41 Compare January 15, 2025 16:28
@MathiasVP MathiasVP force-pushed the generate-int-to-bool-conversion-instructions-2 branch from d182a41 to d204810 Compare January 15, 2025 16:38
@MathiasVP MathiasVP force-pushed the generate-int-to-bool-conversion-instructions-2 branch from d204810 to 5700f46 Compare January 16, 2025 00:40
@MathiasVP MathiasVP force-pushed the generate-int-to-bool-conversion-instructions-2 branch from 5700f46 to 5373e22 Compare January 16, 2025 00:45
@MathiasVP MathiasVP marked this pull request as ready for review January 16, 2025 11:37
@MathiasVP MathiasVP requested a review from a team as a code owner January 16, 2025 11:37
@MathiasVP MathiasVP added the no-change-note-required This PR does not need a change note label Jan 16, 2025
@jketema
Copy link
Contributor

jketema commented Jan 16, 2025

It looks like the changes in 6577161 only affect syntax-zoo, and not the IR tests. I think it would make sense to add an IR test, so we can easily see that the generated IR is actually correct.

Comment on lines +283 to +289
override predicate comparesLt(Expr left, Expr right, int k, boolean isLessThan, boolean testIsTrue) {
exists(Instruction li, Instruction ri |
li.getUnconvertedResultExpression() = left and
ri.getUnconvertedResultExpression() = right and
ir.comparesLt(li.getAUse(), ri.getAUse(), k, isLessThan, testIsTrue.booleanNot())
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need some more explanation here. Does this predicate (and the ones below) every produce anything sensible given that ir is a CompareEQ?

Copy link
Contributor Author

@MathiasVP MathiasVP Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I doubt this ever produces anything, really. This was just a result of me copy/pasting the predicate 🙈 I tried to quick-eval this predicate on a couple of local DBs, but nothing came up.

Would you prefer just doing none here instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that hold for all the predicates below, or are there exceptions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to write this:

All the "less than"-related predicates are probably not relevant, no. But the "equality"-related predicates certainly are. So we could replace all the them all with none.

But I then realized that if you write:

if(!(a < b)) { }

in a C file then we'll hit this case since the result of a < b has an int type. I've added this as a testcase. A quick MRVA run shows me that we actually hit many GuardConditions that fit this. So I'm actually hesitant to delete them.

Now, as I'm looking at what kinds of bounds we actually get from it I realize that this could be improved. However, the results of this predicate are certainly not incorrect. So I'd actually prefer that we keep them if that's okay with you.

@MathiasVP
Copy link
Contributor Author

It looks like the changes in 6577161 only affect syntax-zoo, and not the IR tests. I think it would make sense to add an IR test, so we can easily see that the generated IR is actually correct.

Good point. I've added this in 2076c1c. I had to add another file since the existing ir.c is marked as --microsoft which doesn't support the binary conditional operator.

@MathiasVP
Copy link
Contributor Author

Thanks for the chat, @jketema! Some comments on the commits I just pushed:

  • I've added the reference example we saw on MRVA in 54faba2.
  • I removed the type test in d0bd6eb. This gives some new good results in the tests for C++ as well (where the operand of the not has a BoolType which meant it was previously not part of this class).

After I removed the type test requirement I reran MRVA and looked for cases where we had results for unary equality (we had lots), binary equality, unary "less than"s, and binary "less than"s. All of these cases had quite a few results, and I've added an example of each case in d5b31eb.

So, unlike what I expressed in our call earlier, I'm not actually sure we should replace the overrides with none() since doing so breaks all the tests added in d5b31eb.

What do you think?

Comment on lines 254 to 267
// When `!` is applied to an integer (such as `x`) the generated IR looks
// like:
// ```
// r1(glval<int>) = VariableAddress[myInt] :
// r2(int) = Load[x] : &:r1, m1_6
// r3(int) = Constant[0] :
// r4(bool) = CompareEQ : r2, r3
// ```
// And so the `IRGuardCondition` for an expression such as `if(!x)` is the
// `CompareEQ` instruction. However, users often expect the `x` to also
// be a guard condition. But from the perspective of the IR the `x` is just
// the left-hand side of a comparison against 0 so it's not included as a
// normal `IRGuardCondition`. So to align with user expectations we make
// that `x` a `GuardCondition`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer up-to-date, I think, as you dropped the non-boolean restriction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Fixed in b39a932

@jketema
Copy link
Contributor

jketema commented Jan 17, 2025

DCA reveals a few new results and a few lost results:

  • The lost cpp/overrun-write and cpp/path-injection results are because both queries use any guard as a barrier, and we now recognize more barriers.
  • The cpp/redundant-null-check-simple results are new TPs 🎉

What about cpp/user-controlled-bypass?

@MathiasVP
Copy link
Contributor Author

What about cpp/user-controlled-bypass?

We just discussed this offline: There aren't any query result differences. The change is in the number of #select results, but the number of query results remain the same.

Copy link
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MathiasVP MathiasVP merged commit d8ec6dd into github:main Jan 17, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants