feat: support phi3.5 moe #2479

drbh · 2024-08-30T15:18:57Z

This is a work in progress PR to add support for microsoft/Phi-3.5-MoE-instruct

TODO

add phi 3.5 to ModelType
load weights into memory
prefer moe over mlp in layers
enable long/short rope scaling
validate scaling logic
ensure layer logic is correct
ensure no regressions on existing phi models
identify issue with allocating graphs
refactor/cleanup/add tests

HuggingFaceDocBuilderDev · 2024-09-02T19:30:43Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

integration-tests/models/test_flash_phi35_moe.py

drbh · 2024-09-03T17:40:18Z

This PR adds support for phi 3.5 moe, and improves the chat endpoint to assume greedy generation unless the temp is explicitly set by the user in the request (this helped align the expected output from phi with the reference impl).

Start phi3.5moe

text-generation-launcher \
  --model-id microsoft/Phi-3.5-MoE-instruct \
  --num-shard 4 \
  --cuda-graphs 1,2 \
  --trust-remote-code

send a request

curl 127.0.0.1:3000/generate -X POST \
  -H 'Content-Type: application/json' \
  -d '{
    "inputs": "Hello who are you?",
    "parameters": {
      "max_new_tokens": 20
    }
  }'

response

{
    "generated_text": " I'm an artificial intelligence developed by Microsoft to assist with a variety of tasks and provide information."
}

server/text_generation_server/layers/rotary.py

…at tests

* feat: support phi3.5 moe model loading * fix: prefer llama base model and improve rotary logic * feat: return reasonable generation and add integration test * fix: run lint and update docs * fix: rerun lint for openapi docs * fix: prefer do_sample false unless temp is set by user, and update chat tests * fix: small typo adjustments * fix: consolidate long rope paths * fix: revert greedy by default and test changes * Vendor configuration so that we don't have to `trust_remote_code` * Use SparseMoELayer * Add support for dense MoE * Some type annotations * Add the usual model tests * Ruff. --------- Co-authored-by: Daniël de Kok <[email protected]> Co-authored-by: Nicolas Patry <[email protected]>

drbh marked this pull request as ready for review September 2, 2024 21:14

drbh changed the title ~~feat: support phi3.5 moe model loading~~ feat: support phi3.5 moe Sep 2, 2024

drbh commented Sep 3, 2024

View reviewed changes

integration-tests/models/test_flash_phi35_moe.py Outdated Show resolved Hide resolved

danieldk reviewed Sep 6, 2024

View reviewed changes

server/text_generation_server/layers/rotary.py Outdated Show resolved Hide resolved

drbh force-pushed the impl-phi-3-5-moe branch from 4b8856d to a5fbbd1 Compare September 6, 2024 14:18

danieldk self-requested a review September 9, 2024 07:09

danieldk force-pushed the impl-phi-3-5-moe branch 3 times, most recently from dc2c25b to b5fa8bd Compare September 25, 2024 08:48

Narsil previously approved these changes Sep 27, 2024

View reviewed changes

drbh and others added 15 commits September 30, 2024 07:33

feat: support phi3.5 moe model loading

853bc51

fix: prefer llama base model and improve rotary logic

dff1b9f

feat: return reasonable generation and add integration test

1fb9d40

fix: run lint and update docs

1c917c0

fix: rerun lint for openapi docs

d356555

fix: prefer do_sample false unless temp is set by user, and update ch…

b4e7601

…at tests

fix: small typo adjustments

b1026a8

fix: consolidate long rope paths

dad070b

fix: revert greedy by default and test changes

c2c3e72

Vendor configuration so that we don't have to trust_remote_code

f4cadd7

Use SparseMoELayer

c07c80f

Add support for dense MoE

0cafbf3

Some type annotations

245d6d8

Add the usual model tests

e3e483c

Ruff.

016cf4e

danieldk dismissed Narsil’s stale review via 016cf4e September 30, 2024 07:50

danieldk force-pushed the impl-phi-3-5-moe branch from 1921256 to 016cf4e Compare September 30, 2024 07:50

Narsil approved these changes Sep 30, 2024

View reviewed changes

danieldk merged commit 93a7042 into main Sep 30, 2024
12 of 13 checks passed

danieldk deleted the impl-phi-3-5-moe branch September 30, 2024 09:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support phi3.5 moe #2479

feat: support phi3.5 moe #2479

drbh commented Aug 30, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 2, 2024

drbh commented Sep 3, 2024 •

edited

Loading

feat: support phi3.5 moe #2479

feat: support phi3.5 moe #2479

Conversation

drbh commented Aug 30, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Sep 2, 2024

drbh commented Sep 3, 2024 • edited Loading

drbh commented Aug 30, 2024 •

edited

Loading

drbh commented Sep 3, 2024 •

edited

Loading