Skip to content

Actions: deepspeedai/DeepSpeed

nv-torch-latest-v100

Actions

Loading...
Loading

Show workflow options

Create status badge

Loading
4,606 workflow runs
4,606 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

Correct the BACKWARD_PREFETCH_SUBMIT mismatch
nv-torch-latest-v100 #13661: Pull request #7120 synchronize by A-transformer
March 12, 2025 17:11 Action required A-transformer:summit_mismatch
March 12, 2025 17:11 Action required
Add conditional expression
nv-torch-latest-v100 #13660: Pull request #7119 synchronize by A-transformer
March 12, 2025 17:10 Action required A-transformer:add_conditional_expression
March 12, 2025 17:10 Action required
Enable torch.autocast with ZeRO
nv-torch-latest-v100 #13659: Pull request #6993 synchronize by tohtana
March 12, 2025 16:37 In progress tohtana/support_autocast
March 12, 2025 16:37 In progress
Enable torch.autocast with ZeRO
nv-torch-latest-v100 #13658: Pull request #6993 synchronize by tohtana
March 12, 2025 16:17 19m 56s tohtana/support_autocast
March 12, 2025 16:17 19m 56s
Enable ZeRO set/get APIs for NVMe offload
nv-torch-latest-v100 #13657: Pull request #7046 synchronize by tjruwase
March 12, 2025 13:48 36m 7s olruwase/update_nvme_offload_states
March 12, 2025 13:48 36m 7s
Avoid missing attr error
nv-torch-latest-v100 #13656: Pull request #7133 opened by tjruwase
March 12, 2025 11:14 1h 10m 26s olruwase/ds_7132
March 12, 2025 11:14 1h 10m 26s
[bugfix] update results of state_dict loading, embedding resizing to secondary partitions (hpz)
nv-torch-latest-v100 #13655: Pull request #7130 synchronize by tjruwase
March 12, 2025 10:51 Action required cyr0930:bug/2nd_part
March 12, 2025 10:51 Action required
Variable batch size and LR scheduler
nv-torch-latest-v100 #13654: Pull request #7104 synchronize by tjruwase
March 12, 2025 03:19 40m 48s bm-synth:variable_batch_size_and_lr_2
March 12, 2025 03:19 40m 48s
nv-torch-latest-v100
nv-torch-latest-v100 #13653: Scheduled
March 12, 2025 00:21 1h 30m 59s master
March 12, 2025 00:21 1h 30m 59s
Conditionally quote env vars
nv-torch-latest-v100 #13652: Pull request #7071 synchronize by saurabhkoshatwar
March 11, 2025 23:44 In progress saurabhkoshatwar:bugfix/env_export
March 11, 2025 23:44 In progress
Unpin once transformers latest is fixed
nv-torch-latest-v100 #13651: Pull request #7088 synchronize by loadams
March 11, 2025 23:26 58m 30s loadams/unpin-transformers-latest
March 11, 2025 23:26 58m 30s
Conditionally quote env vars
nv-torch-latest-v100 #13650: Pull request #7071 synchronize by loadams
March 11, 2025 23:26 Action required saurabhkoshatwar:bugfix/env_export
March 11, 2025 23:26 Action required
Improve overflow handling in ZeRO
nv-torch-latest-v100 #13649: Pull request #6976 synchronize by tjruwase
March 11, 2025 22:44 6h 0m 37s olruwase/ds_5241
March 11, 2025 22:44 6h 0m 37s
Variable batch size and LR scheduler
nv-torch-latest-v100 #13648: Pull request #7104 synchronize by loadams
March 11, 2025 21:03 38m 11s bm-synth:variable_batch_size_and_lr_2
March 11, 2025 21:03 38m 11s
nv-torch-latest-v100
nv-torch-latest-v100 #13647: Manually run by loadams
March 11, 2025 21:02 6h 24m 37s loadams/get-logs-ci-failure
March 11, 2025 21:02 6h 24m 37s
Enable ZeRO set/get APIs for NVMe offload
nv-torch-latest-v100 #13646: Pull request #7046 synchronize by loadams
March 11, 2025 21:02 54m 4s olruwase/update_nvme_offload_states
March 11, 2025 21:02 54m 4s
nv-torch-latest-v100
nv-torch-latest-v100 #13645: Merge group checks requested
March 11, 2025 20:59 1h 25m 44s
March 11, 2025 20:59 1h 25m 44s
Improve overflow handling in ZeRO
nv-torch-latest-v100 #13644: Pull request #6976 synchronize by tjruwase
March 11, 2025 20:58 1h 46m 12s olruwase/ds_5241
March 11, 2025 20:58 1h 46m 12s
Add pyproject.toml with legacy build backend to keep most logic in setup.py
nv-torch-latest-v100 #13643: Pull request #7033 synchronize by loadams
March 11, 2025 20:56 1h 18m 43s loadams/pyproject-toml
March 11, 2025 20:56 1h 18m 43s
Enable ZeRO set/get APIs for NVMe offload
nv-torch-latest-v100 #13642: Pull request #7046 synchronize by loadams
March 11, 2025 19:24 1h 35m 34s olruwase/update_nvme_offload_states
March 11, 2025 19:24 1h 35m 34s
nv-torch-latest-v100
nv-torch-latest-v100 #13641: Merge group checks requested
March 11, 2025 17:20 1h 1m 51s
March 11, 2025 17:20 1h 1m 51s
Add sequential pytest mark to TestNVMeCheckpointing to resolve pytest forked hangs
nv-torch-latest-v100 #13640: Pull request #7131 synchronize by loadams
March 11, 2025 16:15 1h 4m 46s loadams/sequential
March 11, 2025 16:15 1h 4m 46s
Variable batch size and LR scheduler
nv-torch-latest-v100 #13639: Pull request #7104 synchronize by tjruwase
March 11, 2025 16:15 5h 4m 59s bm-synth:variable_batch_size_and_lr_2
March 11, 2025 16:15 5h 4m 59s
Add sequential pytest mark to TestNVMeCheckpointing to resolve pytest forked hangs
nv-torch-latest-v100 #13638: Pull request #7131 opened by loadams
March 11, 2025 16:09 6m 13s loadams/sequential
March 11, 2025 16:09 6m 13s
nv-torch-latest-v100
nv-torch-latest-v100 #13637: Manually run by loadams
March 11, 2025 16:03 1h 1m 54s loadams/sequential-2
March 11, 2025 16:03 1h 1m 54s