Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a correctness issue around referenceless expressions being evaluated as partition filters #4069

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

brkyvz
Copy link
Collaborator

@brkyvz brkyvz commented Jan 17, 2025

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

Fixes a data correctness issue, when non-deterministic expressions without any reference columns are used, such as rand() as a filter on a Delta table. These filters were being evaluated as partition filters and getting double evaluated. This caused a filter such as rand() < 0.5 to filter ~75% of the data (due to being double evaluated) instead of just 50%.

Added a feature flag just in case for old behavior

How was this patch tested?

Added a unit test and tested the old behavior as well with a feature flag

Does this PR introduce any user-facing changes?

Filters such as rand() will not be double evaluated anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant