Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix correctness bug in EffectivePredicateExtractor #23674

Merged
merged 1 commit into from
Sep 21, 2024

Conversation

aaneja
Copy link
Contributor

@aaneja aaneja commented Sep 18, 2024

Description

Fixes #23660

During predicate pushdown through a Join, we were

  • Pulling up a predicate expression that only refers to constant values
  • Incorrectly pushing down this predicate between join sides
  • This predicate expression gets optimized to FALSE, but now is pushed down to both sides of the join

This results in the optimizer Incorrectly assuming no results will be produced

Fix

Pulling up expressions happens inEffectivePredicateExtractor#pullExpressionThroughVariables. This was modified to stop pulling up expressions that only refer to constants since this is logically incorrect

Test Plan

  • New unit and E2E tests added

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== NO RELEASE NOTE ==

to not pull up expressions only referring to constants
@aaneja aaneja changed the title Bugfix pullExpressionThroughVariables Fix correctness bug in EffectivePredicateExtractor Sep 19, 2024
@aaneja aaneja marked this pull request as ready for review September 19, 2024 14:05
Copy link
Contributor

@elharo elharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice discovery. Thanks!

@jaystarshot
Copy link
Member

Should we add a test from #23660?

effectiveConjuncts.add(rewritten);
}
// If equality inference has reduced the predicate to an expression referring to only constants, it does not make sense to pull this predicate up
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit confused here, as to why do we have the condition to skip if rewritten has variable references? What if the variable reference is pointing to a constant in some project below?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if it is below, then correctness should already be correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If rewritten has variable references that are in(variables) we do not skip them and pull them up (by adding them to effectiveConjuncts. The problem here is that EqaulityInference is designed to rewrite expressions to just constants (see test), even though, technically this expression does not refer to any variables

However, if the rewritten expression has no variable references, le.g mod(10,5)=1 - this predicate should not be pulled up, to say, a Join node, because it has no references in the outputs of the Join and cannot impact any predicate inferencing there

"SELECT regionkey, (select name from nation where false) from region",
"SELECT regionkey, NULL from region");
assertQuery(
"SELECT regionkey, (select name from nation where nationkey = 5 and mod(nationkey,5) = 1) from region",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaystarshot The query used here is logically identical to one from the #23660 issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this current only tests execution and not result expectations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It asserts that results are identical to those from SELECT regionkey, NULL from region

effectiveConjuncts.add(rewritten);
}
// If equality inference has reduced the predicate to an expression referring to only constants, it does not make sense to pull this predicate up
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If rewritten has variable references that are in(variables) we do not skip them and pull them up (by adding them to effectiveConjuncts. The problem here is that EqaulityInference is designed to rewrite expressions to just constants (see test), even though, technically this expression does not refer to any variables

However, if the rewritten expression has no variable references, le.g mod(10,5)=1 - this predicate should not be pulled up, to say, a Join node, because it has no references in the outputs of the Join and cannot impact any predicate inferencing there

@aaneja aaneja merged commit 95d2439 into prestodb:master Sep 21, 2024
56 of 57 checks passed
@aaneja aaneja deleted the bugfixEqualityInference branch November 8, 2024 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Query with scalar subquery having mathematical function returns 0 rows
3 participants