-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support ZBVZeroBubbleSchedule #817
base: main
Are you sure you want to change the base?
Conversation
elif is_last_stage: | ||
losses = [] | ||
pp_schedule.step(target=labels, losses=losses) | ||
targets = labels if has_last_stage else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kinda nit picking but i feel like if the stage object inside the schedule already knows that it is first or last, we can avoid having the logic in the training loop too.
otoh it seems nice to be explicit at the train.py layer on whether we are asking to compute loss or not.
thoughts?
@tianyu-l
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels nice when we only explicitly pass in meaningful targets/losses when we are not sure if they'll be properly accessed, so I'm OK with these if-else statements.
But how different is input_ids
? Can we just unify everything into pp_schedule.step(input_ids, target=targets, losses=losses)
and pass input_ids = None
when not has_first_stage
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't do input_ids=None
right now since we have logic that automatically splits all *args
into microbatches. For example if the user wants to do step(tensors, None)
that would be split up into microbatches of (tensor1, None)
, (tensor2, None)
, ... We could update the splitting logic but not sure if it is worth it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GPU CI failed, not sure if it is due to the reason I commented.
targets = labels if has_last_stage else None | ||
losses = [] if has_last_stage else None | ||
if has_first_stage: | ||
pp_schedule.step(input_ids, target=targets, losses=losses) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if a schedule has has_last_stage = True
and has_first_stage = False
for the output layer -- will it miss the chance to feed in losses
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops yeah, that was the issue. Updated it and will let the CI run again
This is dependent on the changes in this pytorch stack: pytorch/pytorch#146217
Add support for running
ZBVZeroBubbleSchedule
and v-shaped CSV schedules in torchtitanFixes #774