Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARC Runners need permissions to docker pull from 308535385114 #137

Open
zxiiro opened this issue Apr 18, 2024 · 3 comments
Open

ARC Runners need permissions to docker pull from 308535385114 #137

zxiiro opened this issue Apr 18, 2024 · 3 comments
Labels
workstream/linux-cpu Get CPU jobs working on linux

Comments

@zxiiro
Copy link
Collaborator

zxiiro commented Apr 18, 2024

The ARC Runners need permission to pull from the account 308535385114 ECR images.

This affects linux-focal-py3_8-clang9-xla

docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch-canary/308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/xla_base:v1.1-lite:4ed1110890b550bcef197e2423bde1a8eb22ee74

Reference: https://github.com/pytorch/pytorch-canary/actions/runs/8743956080/job/23995766058

@ZainRizvi
Copy link
Contributor

Similar failure is also affecting the test runners, causing the long "Calculate docker image" step:

https://github.com/pytorch/pytorch-canary/actions/runs/8742048269/job/23990838178?pr=211#step:6:122

+ docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch-canary/pytorch-linux-focal-py3.11-clang10:1b6850753b4c39818abf536373ed7c0ebda61f27
denied: User: arn:aws:sts::391835788720:assumed-role/KarpenterNodeRole-ghci-arc-c-runners-eks-III/i-0dbfd85460611594c is not authorized to perform: ecr:BatchGetImage on resource: arn:aws:ecr:us-east-1:308535385114:repository/pytorch-canary/pytorch-linux-focal-py3.11-clang10 because no resource-based policy allows the ecr:BatchGetImage action

Here's where the above docker manifest command gets run:
https://github.com/pytorch/test-infra/blob/327618bf4d14e9c2772b7cf5f00a02443574f57f/.github/actions/calculate-docker-image/action.yml#L105

@ZainRizvi ZainRizvi added the workstream/linux-cpu Get CPU jobs working on linux label Apr 30, 2024
@ZainRizvi ZainRizvi added this to the CPU Runners functional milestone Apr 30, 2024
@ZainRizvi
Copy link
Contributor

More of the logs:
https://github.com/pytorch/pytorch-canary/actions/runs/9081981203/job/24957202976?pr=215

+ login 308535385114.dkr.ecr.us-east-1.amazonaws.com
+ aws ecr get-login-password --region us-east-1
+ docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com
WARNING! Your password will be stored unencrypted in /home/runner/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store
Login Succeeded
+ docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch-canary/pytorch-linux-jammy-py3.8-gcc11:58fe44695ea6de887dbb7bfcd983c3c120a1abd9
denied: User: arn:aws:sts::391835788720:assumed-role/KarpenterNodeRole-ghci-arc-c-runners-eks-I/i-05e42149e01a1e357 is not authorized to perform: ecr:BatchGetImage on resource: arn:aws:ecr:us-east-1:308535385114:repository/pytorch-canary/pytorch-linux-jammy-py3.8-gcc11 because no resource-based policy allows the ecr:BatchGetImage action
++ git rev-parse HEAD
+ [[ ed48ea9997c2b04736096e4b6669543ab2e627d5 = \4\2\a\0\e\0\3\e\e\e\3\7\6\4\6\4\e\8\c\1\d\6\d\d\f\2\f\b\9\c\2\0\0\c\c\6\8\4\f\8 ]]
++ git merge-base HEAD ed48ea9997c2b04736096e4b6669543ab2e627d5
+ MERGE_BASE=ed48ea9997c2b04736096e4b6669543ab2e627d5
+ [[ -z ed48ea9997c2b04736096e4b6669543ab2e627d5 ]]
+ git rev-parse ed48ea9997c2b04736096e4b6669543ab2e627d5:.ci/docker
58fe44695ea6de887dbb7bfcd983c3c120a1abd9
++ git rev-parse ed48ea9997c2b04736096e4b6669543ab2e627d5:.ci/docker
+ PREVIOUS_DOCKER_TAG=58fe44695ea6de887dbb7bfcd983c3c120a1abd9
+ [[ 58fe44695ea6de887dbb7bfcd983c3c120a1abd9 == \5\8\f\e\4\4\6\9\5\e\a\6\d\e\8\8\7\d\b\b\7\b\f\c\d\9\8\3\c\3\c\1\2\0\a\1\a\b\d\9 ]]
+ echo 'WARNING: Something has gone wrong and the previous image isn'\''t available for the merge-base of your branch'
WARNING: Something has gone wrong and the previous image isn't available for the merge-base of your branch
+ echo '         Will re-build docker image to store in local cache, TTS may be longer'
+ echo rebuild=true
         Will re-build docker image to store in local cache, TTS may be longer
Run set -x

@zxiiro
Copy link
Collaborator Author

zxiiro commented Aug 7, 2024

I believe this is related to #252 which has a fix for ALI that will most likely apply to this as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
workstream/linux-cpu Get CPU jobs working on linux
Projects
None yet
Development

No branches or pull requests

2 participants