-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SmolLM (smollm2) #9354
base: viable/strict
Are you sure you want to change the base?
Add SmolLM (smollm2) #9354
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9354
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Hi @Inklingdq! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
@@ -0,0 +1,14 @@ | |||
{ | |||
"dim": 576, | |||
"ffn_dim_multiplier": 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some size mismatch error during quantization
size mismatch for layers.0.feed_forward.w1.weight: copying a param with shape torch.Size([1536, 576]) from checkpoint, the shape in current model is torch.Size([576, 576]).
size mismatch for layers.0.feed_forward.w2.weight: copying a param with shape torch.Size([576, 1536]) from checkpoint, the shape in current model is torch.Size([576, 576]).
size mismatch for layers.0.feed_forward.w3.weight: copying a param with shape torch.Size([1536, 576]) from checkpoint, the shape in current model is torch.Size([576, 576]).
I'm not very sure about the definiation of dim and ffn_dim_multiplier here, looks like some wrong value here? Would you mind provide some pointer/context on this? Appreciate it! @jackzhxng
The model structure is below
LlamaForCausalLM(
(model): LlamaModel(
(embed_tokens): Embedding(49152, 576)
(layers): ModuleList(
(0-29): 30 x LlamaDecoderLayer(
(self_attn): LlamaSdpaAttention(
(q_proj): Linear(in_features=576, out_features=576, bias=False)
(k_proj): Linear(in_features=576, out_features=192, bias=False)
(v_proj): Linear(in_features=576, out_features=192, bias=False)
(o_proj): Linear(in_features=576, out_features=576, bias=False)
(rotary_emb): LlamaRotaryEmbedding()
)
(mlp): LlamaMLP(
(gate_proj): Linear(in_features=576, out_features=1536, bias=False)
(up_proj): Linear(in_features=576, out_features=1536, bias=False)
(down_proj): Linear(in_features=1536, out_features=576, bias=False)
(act_fn): SiLU()
)
(input_layernorm): LlamaRMSNorm((576,), eps=1e-05)
(post_attention_layernorm): LlamaRMSNorm((576,), eps=1e-05)
)
)
(norm): LlamaRMSNorm((576,), eps=1e-05)
(rotary_emb): LlamaRotaryEmbedding()
)
(lm_head): Linear(in_features=576, out_features=49152, bias=False)
)
checkpoint_dir=args.input_dir, | ||
checkpoint_files=["model.safetensors"], | ||
output_dir=".", | ||
model_type="MISTRAL", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to Llama
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you updated~
converted_state_dict[new_key] = value | ||
|
||
# Input and output embeddings are tied. | ||
converted_state_dict["output.weight"] = converted_state_dict[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be because of this, input and output embeddings are not shared for Llama which this model is based off of
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, removed this
@@ -94,6 +94,7 @@ | |||
"static_llama", | |||
"qwen2_5", | |||
"phi-4-mini", | |||
"smollm", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename this and directory to smolllm2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Should it be smollm2 or smolllm2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - it should be smollm2*
52e68fc
to
34c5dee
Compare
{ | ||
"dim": 576, | ||
"ffn_dim_multiplier": 1, | ||
"hidden_dim": 576, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I got it mixed with the hidden_size 😄
"rope_theta": 10000.0, | ||
"use_scaled_rope": false, | ||
"vocab_size": 49152, | ||
"use_hf_rope": true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!! Updated
Summary
Add SmolLM 135M model (smollm2) for issue #9324
output agrees with the eager model output

Test plan
Convert to meta format
Run export
Run test
python -m examples.models.llama.runner.native --model smollm2
--pte smollm2.pte
--tokenizer /Users/danqingwang/tmp/snapshots/1d461723eec654e65efdc40cf49301c89c0c92f4/tokenizer.json
--tokenizer_config /Users/danqingwang/tmp/snapshots/1d461723eec654e65efdc40cf49301c89c0c92f4/tokenizer_config.json
--prompt "What ingredients are in a California roll?"
--params examples/models/smollm2/135M_config.json --max_len 64
--temperature 0 -kv