-
Notifications
You must be signed in to change notification settings - Fork 505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Train loop takes exponentially longer as number of layers increase #8824
Comments
Thank you for filing this issue. |
Issue does NOT happen if I use pytorch CUDA device. Code below to demonstrate issue not happening if not using XLA:
The issue is slightly different when running on a TPU v6e. With the nightly build, I see compile times getting pretty large in the initial steps, but the execution time only increases exponentially in the stable build (2.6.0), but not the nightly build (2.7.0). That seems to suggest this is fixed in a later build. Any idea what the fix could be? |
If I understood it correctly, you said that the issue seems to be solved in the nightly build, using TPU v6e, correct? Is this also true for CUDA (accelerator used in the original post)?
Unfortunately, I don't know. |
🐛 Bug
Training time increases exponentially with increase in number of layers.
To Reproduce
The following code takes around 16 seconds per step at layers=1000, where it takes 0.38 seconds per step at layers=500.
Expected behavior
Wondering if it is expected that the training step time increases exponentially (rather than linearly) with more layers.
Environment
Additional context
The text was updated successfully, but these errors were encountered: