You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was debugging on rank 1 using torch.distributed.breakpoint(rank=1), but it's always hanging. It turns out to be caused by --local-ranks-filter 0 in run_llama_train.sh. Not sure if we want to remind people that two things don't work well together
I have to debug rank 1 (instead of rank0) because dim-0 sharding can be uneven and only rank1+ have paddings
I was debugging on rank 1 using
torch.distributed.breakpoint(rank=1)
, but it's always hanging. It turns out to be caused by--local-ranks-filter 0
inrun_llama_train.sh
. Not sure if we want to remind people that two things don't work well togetherI have to debug rank 1 (instead of rank0) because dim-0 sharding can be uneven and only rank1+ have paddings
repo:
The text was updated successfully, but these errors were encountered: