Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use patch file to use requests to download dataset for 24.12 #723

Closed
Closed
27 changes: 27 additions & 0 deletions context/cuvs-bench/cuvs_bench_get_dataset.patch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If going this route, this patch file needs to be copied into the container, like here: https://github.com/rapidsai/docker/blob/branch-25.02/cuvs-bench/gpu/Dockerfile#L55

Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
diff --git a/python/cuvs_bench/cuvs_bench/get_dataset/__main__.py b/python/cuvs_bench/cuvs_bench/get_dataset/__main__.py
index a6b154ef..b023fcbd 100644
--- a/python/cuvs_bench/cuvs_bench/get_dataset/__main__.py
+++ b/python/cuvs_bench/cuvs_bench/get_dataset/__main__.py
@@ -17,7 +17,7 @@ import argparse
import os
import subprocess
import sys
-from urllib.request import urlretrieve
+import requests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the container have requests installed?



def get_dataset_path(name, ann_bench_data_path):
@@ -29,7 +29,12 @@ def get_dataset_path(name, ann_bench_data_path):
def download_dataset(url, path):
if not os.path.exists(path):
print(f"downloading {url} -> {path}...")
- urlretrieve(url, path)
+ with requests.get(url, stream=True) as r:
+ r.raise_for_status()
+ with open(path, "wb") as f:
+ for chunk in r.iter_content(chunk_size=8192):
+ if chunk:
+ f.write(chunk)


def convert_hdf5_to_fbin(path, normalize):
7 changes: 7 additions & 0 deletions context/cuvs-bench/get_datasets.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,13 @@

set -eo pipefail

# find cuVS-bench in the environment
# __file__ is empty, so we use __path__
PACKAGE_FILE_PATH=$(python -c "import cuvs_bench; print(list(cuvs_bench.__path__)[0])")

# Apply the patch
patch "$PACKAGE_FILE_PATH/get_dataset/__main__.py" < /home/rapids/cuvs-bench/cuvs_bench_get_dataset.patch

python -m cuvs_bench.get_dataset --dataset deep-image-96-angular --normalize --dataset-path /home/rapids/preloaded_datasets
python -m cuvs_bench.get_dataset --dataset fashion-mnist-784-euclidean --dataset-path /home/rapids/preloaded_datasets
python -m cuvs_bench.get_dataset --dataset glove-50-angular --normalize --dataset-path /home/rapids/preloaded_datasets
Expand Down
13 changes: 11 additions & 2 deletions cuvs-bench/cpu/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,13 @@ echo ". /opt/conda/etc/profile.d/conda.sh; conda activate base" >> /etc/bash.bas
EOF

# we need perl temporarily for the remaining benchmark perl scripts
RUN apt-get install perl -y
RUN <<EOF
apt-get update
apt-get install -y \
patch \
perl
rm -rf /var/cache/apt/archives /var/lib/apt/lists/*
EOF

# update everything before other environment changes, to ensure mixing
# an older conda with newer packages still works well
Expand All @@ -32,7 +38,8 @@ mamba install -y -n base "python=${PYTHON_VER}"
mamba update --all -y -n base
mamba install -y -n base \
"cuvs-bench-cpu=${RAPIDS_VER}.*" \
"python=${PYTHON_VER}"
"python=${PYTHON_VER}" \
"requests"
conda clean -afy
chmod -R 777 /opt/conda
EOF
Expand All @@ -52,6 +59,8 @@ FROM bench-base AS cuvs-bench-cpu-datasets

SHELL ["/bin/bash", "-euo", "pipefail", "-c"]

COPY cuvs-bench/cuvs_bench_get_dataset.patch /home/rapids/cuvs-bench/cuvs_bench_get_dataset.patch

COPY cuvs-bench/get_datasets.sh /home/rapids/cuvs-bench/get_datasets.sh

COPY cuvs-bench/run_benchmark.sh /data/scripts/run_benchmark_preloaded_datasets.sh
Expand Down
13 changes: 11 additions & 2 deletions cuvs-bench/gpu/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,20 @@ echo ". /opt/conda/etc/profile.d/conda.sh; conda activate base" >> /etc/bash.bas
EOF

# we need perl temporarily for the remaining benchmark perl scripts
RUN apt-get install perl -y
RUN <<EOF
apt-get update
apt-get install -y \
patch \
perl
rm -rf /var/cache/apt/archives /var/lib/apt/lists/*
EOF

RUN <<EOF
mamba update --all -y -n base
mamba install -y -n base \
"cuvs-bench=${RAPIDS_VER}.*" \
"cuda-version=${CUDA_VER%.*}.*"
"cuda-version=${CUDA_VER%.*}.*" \
"requests"
conda clean -afy
chmod -R 777 /opt/conda
EOF
Expand All @@ -52,6 +59,8 @@ FROM cuvs-bench AS cuvs-bench-datasets

SHELL ["/bin/bash", "-euo", "pipefail", "-c"]

COPY cuvs-bench/cuvs_bench_get_dataset.patch /home/rapids/cuvs-bench/cuvs_bench_get_dataset.patch

COPY cuvs-bench/get_datasets.sh /home/rapids/cuvs-bench/get_datasets.sh

COPY cuvs-bench/run_benchmarks_preloaded_datasets.sh /data/scripts/run_benchmarks_preloaded_datasets.sh
Expand Down
Loading