Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use patch file to use requests to download dataset for 24.12 #723

Closed
27 changes: 27 additions & 0 deletions context/cuvs-bench/cuvs_bench_get_dataset.patch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If going this route, this patch file needs to be copied into the container, like here: https://github.com/rapidsai/docker/blob/branch-25.02/cuvs-bench/gpu/Dockerfile#L55

Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
diff --git a/python/cuvs_bench/cuvs_bench/get_dataset/__main__.py b/python/cuvs_bench/cuvs_bench/get_dataset/__main__.py
index a6b154ef..b023fcbd 100644
--- a/python/cuvs_bench/cuvs_bench/get_dataset/__main__.py
+++ b/python/cuvs_bench/cuvs_bench/get_dataset/__main__.py
@@ -17,7 +17,7 @@ import argparse
import os
import subprocess
import sys
-from urllib.request import urlretrieve
+import requests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the container have requests installed?



def get_dataset_path(name, ann_bench_data_path):
@@ -29,7 +29,12 @@ def get_dataset_path(name, ann_bench_data_path):
def download_dataset(url, path):
if not os.path.exists(path):
print(f"downloading {url} -> {path}...")
- urlretrieve(url, path)
+ with requests.get(url, stream=True) as r:
+ r.raise_for_status()
+ with open(path, "wb") as f:
+ for chunk in r.iter_content(chunk_size=8192):
+ if chunk:
+ f.write(chunk)


def convert_hdf5_to_fbin(path, normalize):
7 changes: 7 additions & 0 deletions context/cuvs-bench/get_datasets.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,13 @@

set -eo pipefail

# find cuVS in the environment
PACKAGE_FILE_PATH=$(python -c "import cuvs-bench; print(package_name.__file__)")
PACKAGE_DIR=$(dirname "$PACKAGE_FILE_PATH")

# Apply the patch
patch "$PACKAGE_DIR/get_dataset/__main__.py" < cuvs_bench_get_dataset.patch

python -m cuvs_bench.get_dataset --dataset deep-image-96-angular --normalize --dataset-path /home/rapids/preloaded_datasets
python -m cuvs_bench.get_dataset --dataset fashion-mnist-784-euclidean --dataset-path /home/rapids/preloaded_datasets
python -m cuvs_bench.get_dataset --dataset glove-50-angular --normalize --dataset-path /home/rapids/preloaded_datasets
Expand Down
Loading