Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eagerly Awaiting the Megatron Code Release #1

Closed
mactavish91 opened this issue Feb 23, 2025 · 6 comments
Closed

Eagerly Awaiting the Megatron Code Release #1

mactavish91 opened this issue Feb 23, 2025 · 6 comments

Comments

@mactavish91
Copy link

I really appreciate the amazing work you’ve done on the Moonshot AI open source MoE model. I’m excited about the upcoming release of the Megatron code and can't wait to explore it.
Thank you for your efforts!

@toothacher17
Copy link
Collaborator

Thanks for the interest! We will publish it ASAP!

@toothacher17
Copy link
Collaborator

hi, @mactavish91 Please see the example proof of concept Megatron-LM PR in here:

NVIDIA/Megatron-LM#1428

@huyiwen
Copy link

huyiwen commented Feb 26, 2025

Are there any plab to open source the training code for MoE mentioned in the paper?

@toothacher17
Copy link
Collaborator

@huyiwen MoE training code is not related with the Muon optimizer itself, so we do not plan to release it.

@huyiwen
Copy link

huyiwen commented Feb 26, 2025

are the released intermediate checkpoints for the MoE model or the dense model? If they are for the MoE model, how should we load and train them?

@toothacher17
Copy link
Collaborator

toothacher17 commented Feb 26, 2025

are the released intermediate checkpoints for the MoE model or the dense model? If they are for the MoE model, how should we load and train them?

They are MoE models:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants