-
Notifications
You must be signed in to change notification settings - Fork 201
Issues: pytorch/torchtitan
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Config] Make the checkpoint Extra attention is needed
step
configurable.
help wanted
#662
opened Oct 30, 2024 by
casper-hansen
Questions about FSDP2 support and memory usage.
question
Further information is requested
#658
opened Oct 29, 2024 by
tangjiasheng
meta device issue with float8 delayed scale
bug
Something isn't working
#654
opened Oct 25, 2024 by
weifengpy
torch.distributed.breakpoint(rank=1) hangs because of --local-ranks-filter 0
bug
Something isn't working
#652
opened Oct 25, 2024 by
weifengpy
FP8Linear saves new parameters in ckpt and I cannot load the saved ckpt
bug
Something isn't working
#651
opened Oct 24, 2024 by
goldhuang
[Multimodal] Adding OBELICS DataLoader
enhancement
New feature or request
#650
opened Oct 24, 2024 by
TJ-Solergibert
[Config] Make FSDP New feature or request
reshard_after_forward: bool
configurable
enhancement
#644
opened Oct 22, 2024 by
awgu
What is the expected inference steps after I apply torchao in training?
question
Further information is requested
#638
opened Oct 21, 2024 by
goldhuang
add H100 in CI
better_engineering
Repo code quality improvements
integration test
Adding integration tests
#632
opened Oct 18, 2024 by
tianyu-l
create a note on torchtitan official release
documentation
Improvements or additions to documentation
release_blocking
Issues that are blocking the milestone / release completion
Non-DP runs default to float32 precision
enhancement
New feature or request
#630
opened Oct 18, 2024 by
carmocca
[Triton] Implement Liger Kernels
enhancement
New feature or request
#623
opened Oct 17, 2024 by
casper-hansen
Question about torch.compile has better throughput with 128-GPUs than 8-GPUs
question
Further information is requested
#619
opened Oct 15, 2024 by
dz1iang
Ability to train based on epoch
enhancement
New feature or request
good first issue
Good for newcomers
#613
opened Oct 13, 2024 by
abatilo
[Compile] Understand why FSDP2 saves both SDPA out and wo in for bwd
question
Further information is requested
#610
opened Oct 11, 2024 by
awgu
why is xformers not used for attention computation?
question
Further information is requested
#608
opened Oct 9, 2024 by
jason718
Granular layer selection during Pipeline Parallelism
question
Further information is requested
#598
opened Oct 3, 2024 by
bhuvan777
Gradient norm clipping with pipeline parallelism (PP)
bug
Something isn't working
release_blocking
Issues that are blocking the milestone / release completion
Support Gemma2 in torchtitan
enhancement
New feature or request
#594
opened Oct 1, 2024 by
pansershrek
reproducable numerics for loss, weights and gradients for single node (8 GPUs)
enhancement
New feature or request
#593
opened Oct 1, 2024 by
weifengpy
Inference with the checkpoint
enhancement
New feature or request
#586
opened Sep 23, 2024 by
mathmax12
Support INT8 mixed-precision training from torchao?
enhancement
New feature or request
#578
opened Sep 14, 2024 by
gau-nernst
Previous Next
ProTip!
What’s not been updated in a month: updated:<2024-10-09.