Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support phi3.5 moe #2479

Merged
merged 15 commits into from
Sep 30, 2024
Merged

feat: support phi3.5 moe #2479

merged 15 commits into from
Sep 30, 2024

Conversation

drbh
Copy link
Collaborator

@drbh drbh commented Aug 30, 2024

This is a work in progress PR to add support for microsoft/Phi-3.5-MoE-instruct

TODO

  • add phi 3.5 to ModelType
  • load weights into memory
  • prefer moe over mlp in layers
  • enable long/short rope scaling
  • validate scaling logic
  • ensure layer logic is correct
  • ensure no regressions on existing phi models
  • identify issue with allocating graphs
  • refactor/cleanup/add tests

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@drbh drbh marked this pull request as ready for review September 2, 2024 21:14
@drbh drbh changed the title feat: support phi3.5 moe model loading feat: support phi3.5 moe Sep 2, 2024
@drbh
Copy link
Collaborator Author

drbh commented Sep 3, 2024

This PR adds support for phi 3.5 moe, and improves the chat endpoint to assume greedy generation unless the temp is explicitly set by the user in the request (this helped align the expected output from phi with the reference impl).

Start phi3.5moe

text-generation-launcher \
  --model-id microsoft/Phi-3.5-MoE-instruct \
  --num-shard 4 \
  --cuda-graphs 1,2 \
  --trust-remote-code

send a request

curl 127.0.0.1:3000/generate -X POST \
  -H 'Content-Type: application/json' \
  -d '{
    "inputs": "Hello who are you?",
    "parameters": {
      "max_new_tokens": 20
    }
  }'

response

{
    "generated_text": " I'm an artificial intelligence developed by Microsoft to assist with a variety of tasks and provide information."
}

@danieldk danieldk self-requested a review September 9, 2024 07:09
@danieldk danieldk force-pushed the impl-phi-3-5-moe branch 3 times, most recently from dc2c25b to b5fa8bd Compare September 25, 2024 08:48
Narsil
Narsil previously approved these changes Sep 27, 2024
@danieldk danieldk merged commit 93a7042 into main Sep 30, 2024
12 of 13 checks passed
@danieldk danieldk deleted the impl-phi-3-5-moe branch September 30, 2024 09:15
yuanwu2017 pushed a commit to yuanwu2017/tgi-gaudi that referenced this pull request Oct 27, 2024
* feat: support phi3.5 moe model loading

* fix: prefer llama base model and improve rotary logic

* feat: return reasonable generation and add integration test

* fix: run lint and update docs

* fix: rerun lint for openapi docs

* fix: prefer do_sample false unless temp is set by user, and update chat tests

* fix: small typo adjustments

* fix: consolidate long rope paths

* fix: revert greedy by default and test changes

* Vendor configuration so that we don't have to `trust_remote_code`

* Use SparseMoELayer

* Add support for dense MoE

* Some type annotations

* Add the usual model tests

* Ruff.

---------

Co-authored-by: Daniël de Kok <[email protected]>
Co-authored-by: Nicolas Patry <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants