[Frontend] Add `/v1/audio/transcriptions` OpenAI API endpoint #12909

NickLucche · 2025-02-07T16:02:41Z

Follow up on the great work by @robertgshaw2-redhat and @varun-sundar-rabindranath here #12458.

This PR adds the /v1/audio/transcriptions OpenAI API endpoint.

Basic example (start server with vllm serve openai/whisper-large-v3):

from openai import OpenAI
from vllm.assets.audio import AudioAsset

mary_had_lamb = AudioAsset('mary_had_lamb').get_asset_path()

openai_api_base = "http://localhost:8000/v1"
client = OpenAI(
    api_key="EMPTY",
    base_url=openai_api_base,
)
with open(str(mary_had_lamb), "rb") as f:
    transcription = client.audio.transcriptions.create(
        file=f,
        model="openai/whisper-large-v3",
        language="en",
        response_format="text",
        temperature=0.0)
    print("transcription:", transcription)

I am also adding a correctness test based on computing the WER for a subset of a dataset found here https://huggingface.co/datasets/open-asr-leaderboard/datasets, comparing against the transformers baseline.

Mind that this is all currently a bit "fit" to Whisper, being the only model we support. In particular in the way special tokens (<|startoftranscript|>, <|transcribe|>).. are sent as input, as this would require a more general design. Same thing in the validating supported languages and audio duration limit.

Let me know if I overlooked something in this PR.

TODOs (next PRs):

verbose_json response format
Group api validation (audio length limit above all) by model so we can extend to other models.
Openai does has no "streaming" support, perhaps we could add it as a custom addition (shouldn't be too different from what we have).
Translation API (more of the same with a token)

github-actions · 2025-02-07T16:03:01Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: NickLucche <[email protected]>

Signed-off-by: [email protected] <[email protected]>

Signed-off-by: NickLucche <[email protected]>

NickLucche · 2025-02-07T18:02:03Z

Had to rebase and force push to make some very old commits compliant with dco and pre-commit

NickLucche requested review from DarkLight1337, robertgshaw2-redhat and simon-mo as code owners February 7, 2025 16:02

mergify bot added documentation Improvements or additions to documentation ci/build frontend labels Feb 7, 2025

Varun Sundar Rabindranath and others added 10 commits February 7, 2025 17:37

whisper-async working poc

2118a19

Signed-off-by: NickLucche <[email protected]>

updated

1f3bae0

Signed-off-by: [email protected] <[email protected]>

language+prompt+validation and first tests

458dc79

Signed-off-by: NickLucche <[email protected]>

error msgs

453118e

Signed-off-by: NickLucche <[email protected]>

more tests

d6093a9

Signed-off-by: NickLucche <[email protected]>

docs

f1fa57e

Signed-off-by: NickLucche <[email protected]>

cleanup

190cf32

Signed-off-by: NickLucche <[email protected]>

CI correctness tests

ef49943

Signed-off-by: NickLucche <[email protected]>

clean up

100c362

Signed-off-by: NickLucche <[email protected]>

rebase leftovers

222c094

Signed-off-by: NickLucche <[email protected]>

NickLucche force-pushed the nicolo/whisper-api branch from d13f9f7 to 222c094 Compare February 7, 2025 18:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Frontend] Add `/v1/audio/transcriptions` OpenAI API endpoint #12909

[Frontend] Add `/v1/audio/transcriptions` OpenAI API endpoint #12909

NickLucche commented Feb 7, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Feb 7, 2025

NickLucche commented Feb 7, 2025

[Frontend] Add /v1/audio/transcriptions OpenAI API endpoint #12909

Are you sure you want to change the base?

[Frontend] Add /v1/audio/transcriptions OpenAI API endpoint #12909

Conversation

NickLucche commented Feb 7, 2025 • edited by github-actions bot Loading

github-actions bot commented Feb 7, 2025

NickLucche commented Feb 7, 2025

[Frontend] Add `/v1/audio/transcriptions` OpenAI API endpoint #12909

[Frontend] Add `/v1/audio/transcriptions` OpenAI API endpoint #12909

NickLucche commented Feb 7, 2025 •

edited by github-actions bot

Loading