-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support phi3.5 moe #2479
feat: support phi3.5 moe #2479
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
This PR adds support for phi 3.5 moe, and improves the chat endpoint to assume greedy generation unless the temp is explicitly set by the user in the request (this helped align the expected output from phi with the reference impl). Start phi3.5moe text-generation-launcher \
--model-id microsoft/Phi-3.5-MoE-instruct \
--num-shard 4 \
--cuda-graphs 1,2 \
--trust-remote-code send a request curl 127.0.0.1:3000/generate -X POST \
-H 'Content-Type: application/json' \
-d '{
"inputs": "Hello who are you?",
"parameters": {
"max_new_tokens": 20
}
}' response {
"generated_text": " I'm an artificial intelligence developed by Microsoft to assist with a variety of tasks and provide information."
} |
4b8856d
to
a5fbbd1
Compare
dc2c25b
to
b5fa8bd
Compare
1921256
to
016cf4e
Compare
* feat: support phi3.5 moe model loading * fix: prefer llama base model and improve rotary logic * feat: return reasonable generation and add integration test * fix: run lint and update docs * fix: rerun lint for openapi docs * fix: prefer do_sample false unless temp is set by user, and update chat tests * fix: small typo adjustments * fix: consolidate long rope paths * fix: revert greedy by default and test changes * Vendor configuration so that we don't have to `trust_remote_code` * Use SparseMoELayer * Add support for dense MoE * Some type annotations * Add the usual model tests * Ruff. --------- Co-authored-by: Daniël de Kok <[email protected]> Co-authored-by: Nicolas Patry <[email protected]>
This is a work in progress PR to add support for microsoft/Phi-3.5-MoE-instruct
TODO
ModelType