Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch from nvidia Megatron-LM #5

Open
wants to merge 4,332 commits into
base: load-iter
Choose a base branch
from
Open

Conversation

RaymondLi0
Copy link

No description provided.

ko3n1g and others added 30 commits February 3, 2025 02:44
test: Update nightly values

See merge request ADLR/megatron-lm!2618
Reuse global metadata for first saves

See merge request ADLR/megatron-lm!2517
Added CP support for partial DistOpt

See merge request ADLR/megatron-lm!2497
Bring in-job restart up-to-date with latest NVRx implementation

See merge request ADLR/megatron-lm!2560
Uneven Virtual Pipeline Parallelism

See merge request ADLR/megatron-lm!1961
Add aux loss free routing.

Closes #356

See merge request ADLR/megatron-lm!2026
…into 'main'

Broadcast sharded objects during fully parallel load

See merge request ADLR/megatron-lm!2417
Support CP + EP with DP last rank ordering

See merge request ADLR/megatron-lm!2586
ci: update nightly values

See merge request ADLR/megatron-lm!2635
Ensure CPU tensors are cloned

See merge request ADLR/megatron-lm!2604
ci: Release results

See merge request ADLR/megatron-lm!2636
Cudagraphable RNG and cudagraph memory fixes

See merge request ADLR/megatron-lm!2503
Support MCore MambaModel quantization through TensorRT Model Optimizer

See merge request ADLR/megatron-lm!2527
Disable the FP8 transpose cache when using torch FSDP2

See merge request ADLR/megatron-lm!2508
Port multimodal inference to MCore API

See merge request ADLR/megatron-lm!2445
sanandaraj5597 and others added 30 commits March 8, 2025 04:37
Added option for parallel cross entropy

See merge request ADLR/megatron-lm!2707
Fix multi-rank inference

See merge request ADLR/megatron-lm!2776
Fix RNG tracker for inference

See merge request ADLR/megatron-lm!2781
…ontiguous in `prepare_input_tensors_for_wgrad_compute`

Co-authored-by: Jennifer Chen <[email protected]>
NeMo SFT QAT fix: Make `all_gathered_input` contiguous in `prepare_input_tensors_for_wgrad_compute`

See merge request ADLR/megatron-lm!2822
Only materialize logits for the last token during inference

See merge request ADLR/megatron-lm!2624
barebones radio g support

See merge request ADLR/megatron-lm!2622
build: Better caching

See merge request ADLR/megatron-lm!2818
chore: Benchmark for PyTorch 24.10 (Mcore 0.11.0)

See merge request ADLR/megatron-lm!2788
build: Bisect depedencies

See merge request ADLR/megatron-lm!2779
Configurable FSDP modules

See merge request ADLR/megatron-lm!2765
Workload Inspector on-demand profiling feature

See merge request ADLR/megatron-lm!2714
Change Mamba textgen server to use MCore inference

See merge request ADLR/megatron-lm!2621
ci: Publish analytics

See merge request ADLR/megatron-lm!2839
ci: Small improvements to release tests

See merge request ADLR/megatron-lm!2835
ci: Upload statistics only for MRs

See merge request ADLR/megatron-lm!2842
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.