Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add python script to compare lance performance vs parquet TPCH #749

Merged
merged 5 commits into from
Apr 12, 2023

Conversation

hzhang86
Copy link
Contributor

@hzhang86 hzhang86 commented Apr 5, 2023

Compare lance vs parquet for TPCH Q1 and Q6 using SF1 dataset
Steps to run the benchmark:

  1. cd lance/benchmarks/tpch
  2. mkdir dataset && cd dataset
  3. download parquet file lineitem from : "https://github.com/cwida/duckdb-data/releases/download/v1.0/lineitemsf1.snappy.parquet"; then rename it to "lineitem_sf1.parquet"
  4. generate lance file from the parquet file in the same directory
  5. cd ..
  6. python3 benchmark.py q1

@hzhang86 hzhang86 changed the title add python script to compare lance performance against parquet TPCH add python script to compare lance performance vs parquet TPCH Apr 5, 2023
@changhiskhan
Copy link
Contributor

changhiskhan commented Apr 5, 2023

Thanks @hzhang86

Transcribing some context from the email thread:

"A few issues can make lance faster in TPC-H, including compression, better row-group stats and pruning, and maybe partitioning. Parquet is better in that regards."

see #738 for context as well

@eddyxu
Copy link
Contributor

eddyxu commented Apr 5, 2023

@hzhang86 , could you add a README.md with the same content you described here in the issue?

@hzhang86
Copy link
Contributor Author

hzhang86 commented Apr 5, 2023 via email

@hzhang86
Copy link
Contributor Author

hzhang86 commented Apr 6, 2023

@hzhang86 , could you add a README.md with the same content you described here in the issue?
what do you mean by add a README.md in the Issue? Shall I just create an
Issue and copy the description?

@eddyxu
Copy link
Contributor

eddyxu commented Apr 6, 2023

@hzhang86 , could you add a README.md with the same content you described here in the issue?
what do you mean by add a README.md in the Issue? Shall I just create an
Issue and copy the description?

Oh, i mean that add a README.md in this PR, and put the instruction of how to run TPCH in the README, so that it could be easier for other people to learn how to reproduce the benchmark you have later.

@changhiskhan changhiskhan merged commit e9476a8 into lancedb:main Apr 12, 2023
@changhiskhan
Copy link
Contributor

Thanks @hzhang86 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants