-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VLM] Implement merged multimodal processor for Mllama #11427
Conversation
Isotr0py
commented
Dec 23, 2024
•
edited by github-actions
bot
Loading
edited by github-actions
bot
- Initialize merged multimodal processor implementation for encoder-decoder LMMs
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
To avoid scope creep, let's wait until #11396 has been merged first. |
For that PR, I'm waiting for @ywang96 to perform benchmarks to confirm the effectiveness of the cache. |
No problem, there is no rush for this PR. :) |
Signed-off-by: Isotr0py <[email protected]>
Oh, seems text-only explicit encoder-decoder prompt is broken now... |
Would it be easier for the processor to handle both explicit and implicit prompts internally instead of using the shared logic with text-based models? |
Yes, I agreed that we should handle both explicit and implicit prompts in processor internally, passing separated encoder and decoder inputs to the processor is a bit awkward tbh, especially we can't identify which part the prompt is from in this case. Let me try to do some refactoring for this. |
Both explicit and implicit prompts with text/token_ids inputs (with and without multimodal data) should work now: Prompt
Outputs
|
Can you add a test to |
Thanks, LGTM overall. @ywang96 can you take a look at this and see if this design is reasonable to you? |
Signed-off-by: isotr0py <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - I think as long as we're not breaking the user interface this is good! Thanks for working on this!
Signed-off-by: isotr0py <[email protected]>
Signed-off-by: isotr0py <[email protected]>
Signed-off-by: isotr0py <[email protected]>
Encoder-decoder models in general are not supported on V1 yet. This remains true even after this PR. |