Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dind-rootless workaround for mounting workspace #96

Open
zxiiro opened this issue Mar 27, 2024 · 3 comments
Open

Remove dind-rootless workaround for mounting workspace #96

zxiiro opened this issue Mar 27, 2024 · 3 comments

Comments

@zxiiro
Copy link
Collaborator

zxiiro commented Mar 27, 2024

In dind-rootless container mode, volume mounts in docker is always mounted as the root user (usermapped to the "runner' (id: 1000) in the ARC runner container. Thusly this requires a workaround:

# Workaround for dind-rootless userid mapping
WORKSPACE_ORIGINAL_OWNER_ID=$(stat -c '%u' "/var/lib/jenkins/workspace")
sudo chown -R jenkins /var/lib/jenkins/workspace

....

sudo chown -R "$WORKSPACE_ORIGINAL_OWNER_ID" /var/lib/jenkins/workspace

To get around the fact that we are using a non-root user in the docker container to run the build.

The jenkins user also appears to be some legacy user that was likely used at at time when PyTorch was using Jenkins CI. Post ARC Runner migration when all the jobs are running on ARC we should look into removing the need for this workaround and updating the user where appropriate to remove this legacy configuration.

@zxiiro
Copy link
Collaborator Author

zxiiro commented Apr 5, 2024

Linking pytorch/pytorch#122922 which was ultimately merged to put the workaround in place.

@ZainRizvi
Copy link
Contributor

Sounds like this is done @zxiiro?

@ZainRizvi ZainRizvi added this to the CPU Runners functional milestone Apr 30, 2024
@zxiiro
Copy link
Collaborator Author

zxiiro commented May 2, 2024

Sounds like this is done @zxiiro?

No this is a future issue to track the removal of the workaround we put in place. The workaround to chmod stuff was put in place to allow ARC to move forward but I believe it is technical debt which at some point we should find a cleaner solution for that doesn't require so much chowing. It would likely require making some big changes to pytorch build and test code though so I don't expect this issue to be done in the near term.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants