Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

column to column comparisons for filtering file scans and row data #11152

Draft
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

jenbaldwin
Copy link

Draft PR for early feedback. (btw otf-1500 is internal jira number)

Jennifer Baldwin and others added 3 commits September 17, 2024 13:51
Handle case where the VectorHolder contains a null value
* main: (208 commits)
  Docs: Fix Flink 1.20 support versions (apache#11065)
  Flink: Fix compile warning (apache#11072)
  Docs: Initial committer guidelines and requirements for merging (apache#10780)
  Core: Refactor ZOrderByteUtils (apache#10624)
  API: implement types timestamp_ns and timestamptz_ns (apache#9008)
  Build: Bump com.google.errorprone:error_prone_annotations (apache#11055)
  Build: Bump mkdocs-material from 9.5.33 to 9.5.34 (apache#11062)
  Flink: Backport PR apache#10526 to v1.18 and v1.20 (apache#11018)
  Kafka Connect: Disable publish tasks in runtime project (apache#11032)
  Flink: add unit tests for range distribution on bucket partition column (apache#11033)
  Spark 3.5: Use FileGenerationUtil in PlanningBenchmark (apache#11027)
  Core: Add benchmark for appending files (apache#11029)
  Build: Ignore benchmark output folders across all modules (apache#11030)
  Spec: Add RemovePartitionSpecsUpdate REST update type (apache#10846)
  Docs: bump latest version to 1.6.1 (apache#11036)
  OpenAPI, Build: Apply spotless to testFixtures source code (apache#11024)
  Core: Generate realistic bounds in benchmarks (apache#11022)
  Add REST Compatibility Kit (apache#10908)
  Flink: backport PR apache#10832 of inferring parallelism in FLIP-27 source (apache#11009)
  Docs: Add Druid docs url to sidebar (apache#10997)
  ...
@github-actions github-actions bot added the arrow label Sep 17, 2024
@jenbaldwin jenbaldwin changed the title Feature/otf 1500 column comparison column to column comparisons for filtering file scans and row data Sep 18, 2024
@jenbaldwin
Copy link
Author

A feature for comparisons using column references on the left and right side of an expression wherever iceberg supports column reference to literal value(s) comparisons. The use case we want to support is filtering of date columns from a single table. For instance:

select * from travel_table
where expected_date > travel_date;

select * from travel_table
where payment_date <> due_date;

The changes will impact row and scan file filtering. Impacted jars are iceberg-api, iceberg-core, iceberg-orc and iceberg-parquet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants