-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multimodal Eval Enablement (Looking for Developer to Implement Design) #1334
Comments
Thanks Olivia for the RFC! I would like to provide the third option: creating our own I think such way can have following benefits:
Please let me how's that feel |
I would like to take this up, can try to help out in this. |
I think this makes a lot of sense! Thanks for writing it up. Let's prefer this over the Approach B above. I'll still prefer to get Approach A work first if possible, given how simple that can be. |
@Vishnu-sai-teja That'd be awesome! Please go ahead and take it. Looking forward to it! |
@Vishnu-sai-teja Once you have an ETA for a PR, please kindly let us know! |
Hi @Olivia-liu, I plan to submit the initial PR in 3-4 days. Here's the brief timeline: Day 1-2: Study existing eval implementation and HFMultimodalLM class As a newbie to the codebase, this timeline ensures a quality implementation while I thoroughly understand it. Let me know if you need any adjustments. Thanks! |
Sounds great, thanks for contributing! @Olivia-liu is OOO for a bit, so I can help with any questions/blockers you might run into |
You are also welcome to join our slack channel to chat with us. Please see https://github.com/pytorch/torchchat?tab=readme-ov-file#community-contributions for more info. |
@Vishnu-sai-teja, hey checking in! Still able to take this one on? |
Hey tried to implement both ways, getting errors in the torch time imports while running the evaluation for the torchchat. |
Easiest to discuss if you join the slack channel. But we can do it here if that works best. Can you share a link to your branch, the commands you ran and the full output you got? |
hi @Vishnu-sai-teja just wanna check if everything is ok here. |
Looking for new assignee(s) of this Issue. Is anyone interested in taking it? |
Hey @Olivia-liu, I'd like to take this up if it's still not done yet. |
Would love for you to give it a shot, feel free to tag us with any questions/clarification If you haven't already you should join the torchchat-contributer channel in the PyTorch slack too: https://github.com/pytorch/torchchat?tab=readme-ov-file#community-contributions |
Hey Here's a screenshot of the contents of the torchtune package inside my virtual env: |
Hi @anirudhs001 that's a good question. I found out that we're not supposed to import from torchtune's recipes directory: https://github.com/pytorch/torchtune/blob/059cad9c1c0b684ec095634992468eca18bbd395/recipes/__init__.py#L20 Given that, that's take the copy-paste approach (Approach B). |
Hey @Olivia-liu I have made a WIP PR in my fork. Need your help on a few things:
|
Hi @anirudhs001, I'll be taking over support for Olivia
Also if you haven't already, hop into our slack channel: https://pytorch.slack.com/archives/C07UMLFQHT6 (torchchat-contributers) |
Hey @Jack-Khuu Tried messing around with
Will try to look deeper in a day or two. |
@anirudhs001 Are you able to access the larger PyTorch Slack? https://github.com/pytorch/pytorch?tab=readme-ov-file#communication I can add you in the channel that way (Or if I have can send you email, I'll send a direct invite)
Interesting, this should be plenty powerful. Getting a tensor mismatch is where the fun part starts: seems to suggest a disconnect between how things work in torchtune and what we're expecting in torchchat. Were you able to get the torchtune eval running? |
@Jack-Khuu Nope, Can't open that too. Can you please share the invite. Running everything on the cpu (instead of apple's mps) got rid of the oom errors.
Yep, the torchtune evals works fine. I'll try to see what we do differently. |
Can you try out this form?
Sweet, feel free to spin up an issue in torchtune with this issue/me if you run into anything funky |
Hey @Jack-Khuu, I’ve filled out that form a couple of times now but still can’t sign into the Slack workspace. That aside, it’s working now. I’ve created PR #1499. |
That's amazing! I'll take a look on Monday @anirudhs001 hmm maybe something is going on in the background with the slack setup |
Thanks! Sent you a connection request. |
🚀 The feature, motivation and pitch
Please note that since the actual implementation is going to be simple, and the design has already been reviewed, the purpose of this GitHub Issue is to look for a developer to implement this feature ASAP.
LLM eval stands for the process of assessing the perplexity, performance and capabilities of LLMs, usually by having the model complete one or a series of tasks and assigning them scores. Torchchat is already using EleutherAI’s lm-evaluation-harness to do eval on text LLM (code pointer). Recently, torchtune has worked with EleutherAI to enable eval on text-image models in the harness, and has integrated this feature into torchtune (code pointer). Torchchat wants to just copy that solution from torchtune for text-image models.
Without the ability to do eval on multimodal LLMs, the enablement of multimodal LLMs on torchchat is incomplete. It’s critical to understand how well torchchat performs with image inputs.
Additional context
Assumptions
The Main Goal
A torchchat user can run eval on the llama 3.2-11b model (which image-text-in, text-out). Note that we don’t need to worry about the internals of how the eval happens because we will only be calling the EleutherAI’s eval libraries and report the metrics it returns.
The user interface will be a commandline
python torchchat.py eval <model-name>
with additional arguments specifying detailed requirements for the eval tasks.The result will be printed out on the terminal which include the following metrics:
RFC (Optional)
Design
Overview
In this design, the multimodal eval in torchchat will borrow from the implementation of multimodal eval in torchtune which utilizes EleutherAI’s lm-evaluation-harness. The reason we can do this is that torchchat uses the same Llama 3.2-11b model definition as torchtune.
Details
The Core Eval Implementation
[Preferred] Approach A: import the implementation of
HFMultimodalLM
from torchtune directlyThe easiest implementation is to import the implementation of
HFMultimodalLM
directly from torchtune, then callevaluate()
with this wrapper class passed in.Here’s torchtune’s implementation of
HFMultimodalLM
: code pointer.Pseudocode:
The pros and cons of this solution is discussed in the following “Alternatives Discussion” section. This solution should be the one to start with given how quick it can enable multimodal eval on torchchat. If for some unforeseen reason that it doesn’t work, then take the following approach that requires more work.
Approach B: copy the implementation of
HFMultimodalLM
from torchtuneHFMultimodalLM
, which is an abstract Hugging Face model class for multimodal models. The implementation of this class can be copied from torchtune, code pointer.evaluate()
with this wrapper class passed in.Pseudocode:
The Commandline Arguments
User command should be
python torchchat.py eval llama3.2-11b
+ some optional arguments.In terms of implementation, reuse the same cli entry point as the text eval: torchchat.py, eval.py. Then in def eval(), have an if-else to decide which eval wrapper (
GPTFastEvalWrapper
or the newVLMEvalWrapper
) to use based on model type.Alternatives Discussion
Discuss the pros and cons of importing torchtune’s implementation directly
Pro:
Cons:
Testing & Tooling Plan
Run command
python torchchat.py eval llama3.2-11b
with different parameter combinations.The expected output is the tasks that have been run, their scores and the time it took to run each task.
The text was updated successfully, but these errors were encountered: