Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[custom-images] deprecate gsutil command #112

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cjac
Copy link
Contributor

@cjac cjac commented Feb 18, 2025

This PR includes recent changes I've made as I've iterated on the examples in the example/secure-boot/ directory as well as a patch to fix issue #111

README.md
custom_image_utils/args_parser.py
tests/test_args_parser.py

  • returned default disk size to 30GB. It could be 20GB since the base image only occupies 18GB, but the source disk image is already set to 30GB, so we'll maintain that. If we need a large disk, we can specify the size at the time of image generation.
  • You can see the current disk usage statistics per tensorflow (tf) image here:
    "2.0-debian10" ) disk_size_gb="42" ;; # 41.11G 36.28G 3.04G 93% / # tf-pre-init
    "2.0-rocky8" ) disk_size_gb="45" ;; # 44.79G 38.43G 6.36G 86% / # tf-pre-init
    "2.0-ubuntu18" ) disk_size_gb="41" ;; # 39.55G 35.39G 4.14G 90% / # tf-pre-init
    "2.1-debian11" ) disk_size_gb="46" ;; # 45.04G 39.31G 3.78G 92% / # tf-pre-init
    "2.1-rocky8" ) disk_size_gb="48" ;; # 48.79G 41.73G 7.06G 86% / # tf-pre-init
    "2.1-ubuntu20" ) disk_size_gb="46" ;; # 44.40G 39.92G 4.46G 90% / # tf-pre-init
    "2.2-debian12" ) disk_size_gb="47" ;; # 46.03G 40.76G 3.28G 93% / # tf-pre-init
    "2.2-rocky9" ) disk_size_gb="47" ;; # 46.79G 40.86G 5.93G 88% / # tf-pre-init
    "2.2-ubuntu22" ) disk_size_gb="47" ;; # 45.37G 40.56G 4.79G 90% / # tf-pre-init

custom_image_utils/shell_script_generator.py

  • Remove some uses of execute_with_retries
  • deprecate gsutil ; prefer gcloud
  • only print red/green for a short section, not the entire line
  • include signing key material in the metadata attributes
  • reduce noise from dd

examples/secure-boot/build-current-images.sh
examples/secure-boot/create-key-pair.sh
examples/secure-boot/dask.sh
examples/secure-boot/env.json.sample
examples/secure-boot/install_gpu_driver.sh
examples/secure-boot/pre-init.sh
examples/secure-boot/rapids.sh

  • iterative changes in the example scripts

scripts/customize_conda.sh
startup_script/run.sh

  • deprecate gsutil ; prefer gcloud

@cjac cjac self-assigned this Feb 18, 2025
@cjac cjac force-pushed the deprecate-gsutil-20250218 branch 4 times, most recently from 96e2255 to 38aee32 Compare February 19, 2025 00:06
@cjac
Copy link
Contributor Author

cjac commented Feb 19, 2025

All of my tests are passing, and I'm prepared to submit this change with customer's confirmation that the updates do not cause their workflow to fail.

@cjac
Copy link
Contributor Author

cjac commented Feb 19, 2025

I've pushed these changes up to the 2025.02 branch of this repo if you'd like to test them. Here is an example of how one might exercise the script on all supported images:

git clone -b 2025.02 [email protected]:GoogleCloudDataproc/custom-images.git
cd custom-images
grep -B3 'bash .*/build-current-images.sh' examples/secure-boot/README.md
cp examples/secure-boot/env.json.sample env.json
vi env.json
docker build -t dataproc-dask-rapids-pre-init:latest .
docker run -it dataproc-dask-rapids-pre-init:latest /bin/bash examples/secure-boot/build-current-images.sh

This PR includes recent changes I've made as I've iterated on the
examples in the example/secure-boot/ as well as a patch to fix issue

README.md
custom_image_utils/args_parser.py
tests/test_args_parser.py

* returned default disk size to 30GB.  It could be 20GB since the base
image only occupies 18GB, but the sourcedisk image is already set to
30GB, so we'll maintain that.  If we need a large disk, we can specify
the size at the time of image generation.

custom_image_utils/shell_script_generator.py

* Remove some uses of execute_with_retries
* deprecate gsutil ; prefer gcloud
* only print red/green for a short section, not the entire line
* include signing key material in the metadata attributes
* reduce noise from dd

examples/secure-boot/build-current-images.sh
examples/secure-boot/create-key-pair.sh
examples/secure-boot/dask.sh
examples/secure-boot/env.json.sample
examples/secure-boot/install_gpu_driver.sh
examples/secure-boot/pre-init.sh
examples/secure-boot/rapids.sh

* iterative changes in the example scripts

scripts/customize_conda.sh
startup_script/run.sh

* deprecate gsutil ; prefer gcloud
Copy link
Collaborator

@prince-cs prince-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cjac cjac changed the title [custom-images] deprecate gcloud command [custom-images] deprecate gsutil command Feb 26, 2025
fi

echo 'Uploading local logs to GCS bucket.'
gsutil -m rsync -r {log_dir}/ {gcs_log_dir}/
gsutil rsync -r {log_dir}/ {gcs_log_dir}/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this has dropped the -m flag if the invocation is using legacy gsutil instead of gcloud storage. This might impact performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gcloud storage considers the -m implied and will fail if it is passed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking of the case where this is run on an older base image with gcloud SDK < 402.0.0. In that case, prepare would fall back to gsutil, and with this change, we'd lose the -m passed to gsutil.

@@ -35,32 +35,51 @@
local -r cmd="$*"

for ((i = 0; i < 3; i++)); do
if eval "$cmd"; then return 0 ; fi
time eval "$cmd" > "/tmp/{run_id}/install.log" 2>&1 && retval=$? || {{ retval=$? ; cat "/tmp/{run_id}/install.log" ; }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this change made?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to reduce noise in the log from successful runs of retried tasks, and to track the duration of long-running tasks.

This will only print the log on error

sleep 5
done
return 1
)

function gsutil() {{ ${{gsutil_cmd}} $* ; }}

function version_ge() ( set +x ; [ "$1" = "$(echo -e "$1\n$2" | sort -V | tail -n1)" ] ; )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all of these function definitions required? I think I only saw calls to version_lt, which in turn can call version_le.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not necessary, but will allow us to use the functions in the future without adding them. I can remove anything that isn't presently called if you think that's the right approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, generally it is good practice to remove unused code to simplify the codebase. If it's not covered by any tests or real-world usage, then we might not have confidence in it.

@@ -1,5 +1,6 @@
#!/bin/bash

# Copyright 2024 Google LLC and contributors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2025?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started writing it in 2024, and I've been advised to only use the earliest date, and not include additional years that the file was touched.

@@ -38,13 +38,26 @@ DATAPROC_VERSION=$(/usr/share/google/get_metadata_value attributes/dataproc-vers

ready=""

function version_ge() ( set +x ; [ "$1" = "$(echo -e "$1\n$2" | sort -V | tail -n1)" ] ; )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment here about whether or not all of the functions are required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not called right now, so if you're asking that I not pre-define these functions, I can do that. But beware that without this function, users will likely use ! version_ge to mean version_lt or version_le interchangeably. There is currently an example of this in bdutil that is breaking the build.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these functions intended for any kind of reuse by users outside this context? If not, then I think it's fine to remove them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They mirror the same functions in [1]. My concern is that lacking all variants, developers who update this file in the future might make a convoluted and ultimately inaccurate call to this function by negating the results.

It's not a huge concern, though, and we can unbreak the build if it happens again.

I will remove the unused functions in this code.

[1] https://github.com/GoogleCloudDataproc/initialization-actions/blob/989b445b20a2be99b22f169ab9e85f8def9be534/templates/common/util_functions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants