Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make torchft work for llama3_8b 8x #104

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Conversation

d4l3k
Copy link
Member

@d4l3k d4l3k commented Feb 8, 2025

as titled

it goes fast

Test plan:

Testing w/ 12 GB of 64 mb tensors

baseline

took 30.493701454252005 seconds

With streaming transfer

0 chunks
took 8.783997897058725 seconds

10 chunks
took 2.8615125976502895 seconds

20 chunks
took 2.433052882552147 seconds

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 8, 2025
@d4l3k d4l3k force-pushed the d4l3k/fast_checkpoint branch 5 times, most recently from 4236d2f to 7a89ce9 Compare February 8, 2025 05:10
@d4l3k d4l3k force-pushed the d4l3k/fast_checkpoint branch from 7a89ce9 to 225b1a3 Compare February 9, 2025 00:39
@d4l3k d4l3k changed the title CheckpointServer: fast streaming parallel transfers make torchft work for llama3_8b 8x Feb 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants