-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark backprop of a te.TransformerLayer. #2956
base: main
Are you sure you want to change the base?
Conversation
wujingyue
commented
Sep 18, 2024
•
edited
Loading
edited
!build |
!build |
!build |
I'm getting a weird error from jit_python_distributed_tests_17_A100 based on this PR. I'll fix that before merging this PR. I've yet to reproduce that locally unfortunately. |
I believe the error has something to do with calling init_process_group and destroy_process_group for each test. Due to race conditions, some rank calls init_process_group from the second test before all ranks finished destroying the default process group. |
How about we don't add this to CI for now? We can benchmark locally and then work on fixing the CI problems |