Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ring-based reduce-scatter pipelining, ATen implementation #2950

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Commits on Sep 17, 2024

  1. Configuration menu
    Copy the full SHA
    ffc7b19 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    d9c842f View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    4afcda1 View commit details
    Browse the repository at this point in the history
  4. still hanging

    samnordmann committed Sep 17, 2024
    Configuration menu
    Copy the full SHA
    86a0967 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    0977cc5 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    e34ff63 View commit details
    Browse the repository at this point in the history
  7. hang with all backends (nccl hangs at posting... which therefore seem…

    …s blocking) with 3 ranks which all first post send and then recv
    samnordmann committed Sep 17, 2024
    Configuration menu
    Copy the full SHA
    0743101 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    c36b8cc View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    532955e View commit details
    Browse the repository at this point in the history
  10. error with ucc: symbol lookup error: /usr/local/ucx/lib/ucx/libuct_cu…

    …da_gdrcopy.so.0: undefined symbol: gdr_get_info_v2
    samnordmann committed Sep 17, 2024
    Configuration menu
    Copy the full SHA
    b828755 View commit details
    Browse the repository at this point in the history
  11. working with S=1

    samnordmann committed Sep 17, 2024
    Configuration menu
    Copy the full SHA
    ae31616 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    017b36f View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    47c21f4 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    1512842 View commit details
    Browse the repository at this point in the history
  15. number_of_repetition

    samnordmann committed Sep 17, 2024
    Configuration menu
    Copy the full SHA
    d8dbc80 View commit details
    Browse the repository at this point in the history
  16. clean

    samnordmann committed Sep 17, 2024
    Configuration menu
    Copy the full SHA
    af789bb View commit details
    Browse the repository at this point in the history
  17. clean

    samnordmann committed Sep 17, 2024
    Configuration menu
    Copy the full SHA
    0487a80 View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    94d683d View commit details
    Browse the repository at this point in the history
  19. lintrunner

    samnordmann committed Sep 17, 2024
    Configuration menu
    Copy the full SHA
    fd134a4 View commit details
    Browse the repository at this point in the history
  20. Configuration menu
    Copy the full SHA
    1fe3dfb View commit details
    Browse the repository at this point in the history
  21. clean

    samnordmann committed Sep 17, 2024
    Configuration menu
    Copy the full SHA
    ee8f59c View commit details
    Browse the repository at this point in the history
  22. fix stream round robin

    samnordmann committed Sep 17, 2024
    Configuration menu
    Copy the full SHA
    dff90d0 View commit details
    Browse the repository at this point in the history

Commits on Sep 18, 2024

  1. Configuration menu
    Copy the full SHA
    f81d903 View commit details
    Browse the repository at this point in the history
  2. lintrunner

    samnordmann committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    b1cf578 View commit details
    Browse the repository at this point in the history
  3. fix number of streams

    samnordmann committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    b56c1b9 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    b7e7d56 View commit details
    Browse the repository at this point in the history
  5. do not test UCC

    samnordmann committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    582613a View commit details
    Browse the repository at this point in the history