Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DuckDB] Support predicate pushdown via pyarrow dataset #642

Open
eddyxu opened this issue Feb 25, 2023 · 3 comments
Open

[DuckDB] Support predicate pushdown via pyarrow dataset #642

eddyxu opened this issue Feb 25, 2023 · 3 comments
Labels
benchmark duckdb good first issue Good for newcomers help wanted Extra attention is needed rust Rust related tasks

Comments

@eddyxu
Copy link
Contributor

eddyxu commented Feb 25, 2023

Problem Statement

Allow the duckdb extension to pass the filters (predicates) to lance, so it gives lance a chance to optimize I/O exec plan.

SELECT col1, col2 FROM s3://bucket/path/foo.lance WHERE a > 1 AND b < 2

will essentially calls

Dataset.scanner(columns=["col1", "col2"], filter="a > 1 AND b < 2")
@eddyxu eddyxu added duckdb benchmark rust Rust related tasks labels Feb 25, 2023
@changhiskhan
Copy link
Contributor

We're not limited to the pyarrow interface here right? With the extension we could push down arbitrary predicates

@eddyxu
Copy link
Contributor Author

eddyxu commented Feb 25, 2023

Duckdb seems has another set of limitations tho. For example, unlikely it will push functions down.

@eddyxu eddyxu added the good first issue Good for newcomers label Mar 9, 2023
@changhiskhan changhiskhan added the help wanted Extra attention is needed label Jul 2, 2023
@wjones127 wjones127 changed the title [DuckDB] Support predicate pushdown via duckdb extension [DuckDB] Support predicate pushdown via pyarrow dataset Sep 21, 2023
@wjones127
Copy link
Contributor

DuckDB pushes down only a limited subset of expressions, so it wouldn't be hard to create a simple mapping from PyArrow expressions to Lance SQL just for those.

https://github.com/duckdb/duckdb/blob/239f51293c429168774c3943e96ddf2451253a07/tools/pythonpkg/src/arrow/arrow_array_stream.cpp#L294-L356

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark duckdb good first issue Good for newcomers help wanted Extra attention is needed rust Rust related tasks
Projects
None yet
Development

No branches or pull requests

3 participants