Multimodal Eval Enablement (Looking for Developer to Implement Design) #1334

Olivia-liu · 2024-10-29T01:01:50Z

🚀 The feature, motivation and pitch

Please note that since the actual implementation is going to be simple, and the design has already been reviewed, the purpose of this GitHub Issue is to look for a developer to implement this feature ASAP.

LLM eval stands for the process of assessing the perplexity, performance and capabilities of LLMs, usually by having the model complete one or a series of tasks and assigning them scores. Torchchat is already using EleutherAI’s lm-evaluation-harness to do eval on text LLM (code pointer). Recently, torchtune has worked with EleutherAI to enable eval on text-image models in the harness, and has integrated this feature into torchtune (code pointer). Torchchat wants to just copy that solution from torchtune for text-image models.

Without the ability to do eval on multimodal LLMs, the enablement of multimodal LLMs on torchchat is incomplete. It’s critical to understand how well torchchat performs with image inputs.

Additional context

Assumptions

The eval for text LLMs is already enabled on torchchat. Code pointer to the core eval function and the main function.
The Llama 3.2-11b multimodal model has been onboarded to torchchat, and in the future there will be more multimodal LLMs on torchchat.
EleutherAI’s lm-evaluation-harness has enabled eval on llama3.2-11b, thus we don’t need to make code changes in EleutherAI repo.

The Main Goal

A torchchat user can run eval on the llama 3.2-11b model (which image-text-in, text-out). Note that we don’t need to worry about the internals of how the eval happens because we will only be calling the EleutherAI’s eval libraries and report the metrics it returns.

The user interface will be a commandline python torchchat.py eval <model-name> with additional arguments specifying detailed requirements for the eval tasks.

The result will be printed out on the terminal which include the following metrics:

Tasks that have been run
The score to each task
The time it took to run each task

RFC (Optional)

Design

Overview

In this design, the multimodal eval in torchchat will borrow from the implementation of multimodal eval in torchtune which utilizes EleutherAI’s lm-evaluation-harness. The reason we can do this is that torchchat uses the same Llama 3.2-11b model definition as torchtune.

Details

The Core Eval Implementation

[Preferred] Approach A: import the implementation of `HFMultimodalLM` from torchtune directly

The easiest implementation is to import the implementation of HFMultimodalLM directly from torchtune, then call evaluate() with this wrapper class passed in.

Here’s torchtune’s implementation of HFMultimodalLM: code pointer.

Pseudocode:

# In eval.py
from torchtune.recipes.eleuther_eval import _VLMEvalWrapper

if model is text-based:
   do the existing text-based model eval
elif model is text-image-based:
   eval_results = evaluate(_VLMEvalWrapper(...))

The pros and cons of this solution is discussed in the following “Alternatives Discussion” section. This solution should be the one to start with given how quick it can enable multimodal eval on torchchat. If for some unforeseen reason that it doesn’t work, then take the following approach that requires more work.

Approach B: copy the implementation of `HFMultimodalLM` from torchtune

Creating a wrapper class that overrides class HFMultimodalLM, which is an abstract Hugging Face model class for multimodal models. The implementation of this class can be copied from torchtune, code pointer.
Then call evaluate() with this wrapper class passed in.

Pseudocode:

# In eval.py
from lm_eval.models.hf_vlms import HFMultimodalLM
from lm_eval.evaluator import evaluate

class VLMEvalWrapper(HFMultimodalLM):
   ...# implementation

if model is text-based:
   do the existing text-based model eval
elif model is text-image-based:
   eval_results = evaluate(VLMEvalWrapper(...))

The Commandline Arguments

User command should be python torchchat.py eval llama3.2-11b + some optional arguments.

In terms of implementation, reuse the same cli entry point as the text eval: torchchat.py, eval.py. Then in def eval(), have an if-else to decide which eval wrapper (GPTFastEvalWrapper or the new VLMEvalWrapper) to use based on model type.

Alternatives Discussion

Discuss the pros and cons of importing torchtune’s implementation directly

Pro:

Easy to implement because it’s just an import
Consistency between torchchat and torchtune
Easy maintenance for us
Torchtune has a better relationship with EleutherAI

Cons:

Hard to customize the implementation for torchchat’s needs
For some models, we use model definitions that are different from torchtune’s
We rely on the compatibility on their side
We have more dependency on torchtune

Testing & Tooling Plan

Run command python torchchat.py eval llama3.2-11b with different parameter combinations.

The expected output is the tasks that have been run, their scores and the time it took to run each task.

The text was updated successfully, but these errors were encountered:

Gasoonjia · 2024-10-29T18:16:11Z

Thanks Olivia for the RFC!

I would like to provide the third option: creating our own VLMEvalWrapper, but instead of coping-and-pasting the implementation of HFMultimodalLM from torchtune, we can make it inherit from HFMultimodalLM.

I think such way can have following benefits:

Easy to implement. First version we can make VLMEvalWrapper a simple wrapper on the top of HFMultimodalLM without any other add-ons, so that it is just an import at very beginning
Easy to maintain for us and deduplicate code across different repos
Make torchchat as lean as possible. The starting point of torchchat would always a small codebase showcasing the ability to run large language models (LLMs) seamlessly.
Keep the customizing ability. We can always add new functions or even overwrite the existing functions for our own model definitions or other purpose
Make eval.py lean and easy to maintain. Personally, I would like to eliminate as much if-else statement as possible during the validating process, to avoid bloated code and unclear logic (that's what we plan to do on generate.py and build.py). Creating our own VLMEvalWrapper can help us create a structural validation logic (e.g. absorb current text-only validation logic into the class.)

Please let me how's that feel

Vishnu-sai-teja · 2024-10-30T11:20:23Z

I would like to take this up, can try to help out in this.

Olivia-liu · 2024-10-30T16:59:24Z

Thanks Olivia for the RFC!

I would like to provide the third option: creating our own VLMEvalWrapper, but instead of coping-and-pasting the implementation of HFMultimodalLM from torchtune, we can make it inherit from HFMultimodalLM.

I think such way can have following benefits:

Easy to implement. First version we can make VLMEvalWrapper a simple wrapper on the top of HFMultimodalLM without any other add-ons, so that it is just an import at very beginning

Easy to maintain for us and deduplicate code across different repos

Make torchchat as lean as possible. The starting point of torchchat would always a small codebase showcasing the ability to run large language models (LLMs) seamlessly.

Keep the customizing ability. We can always add new functions or even overwrite the existing functions for our own model definitions or other purpose

Make eval.py lean and easy to maintain. Personally, I would like to eliminate as much if-else statement as possible during the validating process, to avoid bloated code and unclear logic (that's what we plan to do on generate.py and build.py). Creating our own VLMEvalWrapper can help us create a structural validation logic (e.g. absorb current text-only validation logic into the class.)

Please let me how's that feel

I think this makes a lot of sense! Thanks for writing it up. Let's prefer this over the Approach B above. I'll still prefer to get Approach A work first if possible, given how simple that can be.

Olivia-liu · 2024-10-30T17:06:50Z

I would like to take this up, can try to help out in this.

@Vishnu-sai-teja That'd be awesome! Please go ahead and take it. Looking forward to it!

Olivia-liu · 2024-10-30T20:09:57Z

I would like to take this up, can try to help out in this.

@Vishnu-sai-teja That'd be awesome! Please go ahead and take it. Looking forward to it!

@Vishnu-sai-teja Once you have an ETA for a PR, please kindly let us know!

Vishnu-sai-teja · 2024-10-31T01:37:49Z

Hi @Olivia-liu,

I plan to submit the initial PR in 3-4 days. Here's the brief timeline:

Day 1-2: Study existing eval implementation and HFMultimodalLM class
Day 3-4: Implement VLMEvalWrapper and integrate with eval.py, along with basic testing

As a newbie to the codebase, this timeline ensures a quality implementation while I thoroughly understand it. Let me know if you need any adjustments.

Thanks!

Jack-Khuu · 2024-11-01T00:44:56Z

Sounds great, thanks for contributing!

@Olivia-liu is OOO for a bit, so I can help with any questions/blockers you might run into

Gasoonjia · 2024-11-04T07:54:33Z

You are also welcome to join our slack channel to chat with us.

Please see https://github.com/pytorch/torchchat?tab=readme-ov-file#community-contributions for more info.

byjlw · 2024-11-05T19:54:00Z

@Vishnu-sai-teja, hey checking in! Still able to take this one on?

Vishnu-sai-teja · 2024-11-06T02:01:24Z

@Vishnu-sai-teja, hey checking in! Still able to take this one on?

Hey tried to implement both ways, getting errors in the torch time imports while running the evaluation for the torchchat.

byjlw · 2024-11-06T05:57:57Z

@Vishnu-sai-teja, hey checking in! Still able to take this one on?

Hey tried to implement both ways, getting errors in the torch time imports while running the evaluation for the torchchat.

Easiest to discuss if you join the slack channel.

But we can do it here if that works best.

Can you share a link to your branch, the commands you ran and the full output you got?

Gasoonjia · 2024-11-11T21:20:09Z

hi @Vishnu-sai-teja just wanna check if everything is ok here.
If you need help, feel free to share branch or any info here, or using the slack channel to share with us.

Olivia-liu · 2024-11-12T19:07:44Z

Looking for new assignee(s) of this Issue. Is anyone interested in taking it?

anirudhs001 · 2025-01-29T16:58:28Z

Hey @Olivia-liu, I'd like to take this up if it's still not done yet.

Jack-Khuu · 2025-01-29T22:46:18Z

Would love for you to give it a shot, feel free to tag us with any questions/clarification

If you haven't already you should join the torchchat-contributer channel in the PyTorch slack too: https://github.com/pytorch/torchchat?tab=readme-ov-file#community-contributions

anirudhs001 · 2025-02-01T17:16:30Z

Hey
This likely is a noobie question, but how do I import _VLMEvalWrapper from torchtune?
I installed torchtune via pip and the installed package only has the contents from https://github.com/pytorch/torchtune/tree/main/torchtune. Can't see recipes and other folders.

Here's a screenshot of the contents of the torchtune package inside my virtual env:

Olivia-liu · 2025-02-03T22:04:27Z

Hi @anirudhs001 that's a good question. I found out that we're not supposed to import from torchtune's recipes directory: https://github.com/pytorch/torchtune/blob/059cad9c1c0b684ec095634992468eca18bbd395/recipes/__init__.py#L20

Given that, that's take the copy-paste approach (Approach B).

anirudhs001 · 2025-02-09T16:11:59Z

Hey @Olivia-liu

I have made a WIP PR in my fork. Need your help on a few things:

Our requirements.txt right now has lm_eval==0.4.2. The support for multi-modal models came in lm_eval since v0.4.5 (This is the version in which hf_vlms.py, the file that defines HFMultimodalLM was added to the repo). I have updated requirements.txt to use v0.4.5. Is this ok, or should we define HFMultimodalLM ourselves too?
The Llama-3.2-Vision.json specifies using tiktoken as the tokenizer. AFAICT, tiktoken only supports tokenising text (don't see a mention of image or anything related here). I have added Llama3VisionTransform from torchtune to create the tokenizer for now.
The only multi-modal model supported right now is the 11b llama-3.2. I get OOM errors when I load a model this big with fp32, and the generation with bf16 takes too long. Is there an easy way to add the support for smaller multi-modal models, or any other place I can test the changes?

Jack-Khuu · 2025-02-10T18:30:10Z

Hi @anirudhs001, I'll be taking over support for Olivia

Totally works to bump the requirements!
Sgtm, we can work with torchtune on how we want to approach this once we get an e2e flow working
May I ask how much memory your machine has? We were looking at quantizing 11B, but never got around to pushing it through (if you're interested in a fun little detour, we'd gladly take the help)

Also how slowly did bf16 run? For the sake of starting off, we can do 2 samples and reduced max_seq_len

Also if you haven't already, hop into our slack channel: https://pytorch.slack.com/archives/C07UMLFQHT6 (torchchat-contributers)

anirudhs001 · 2025-02-14T18:31:25Z

Hey @Jack-Khuu
I am using a 32 Gb M2 Max Macbook pro.
I have filled the contribution form, but can't open the slack channel with any of my accounts. Do I need to something else too?
Any pointers on what I can do to help with the quantization?

Tried messing around with max_seq_len. Left my laptop running overnight with 4096 but didn't get anything out. Got this error with 200:

(venv) (base) anirudhsingh@Anirudhs-MacBook-Pro-4 torchchat % python torchchat.py eval Llama-3.2-mm --dtype bf16 --task mmmu_val_art --limit 1 --modality text-image --max-seq-length 200
NumExpr defaulting to 12 threads.
PyTorch version 2.7.0.dev20250124 available.
Looking for libcustom_ops_aot_lib.so in /Users/anirudhsingh/MISC/playground/torchchat/venv/lib/python3.10/site-packages/executorch
Loading custom ops library: /Users/anirudhsingh/MISC/playground/torchchat/venv/lib/python3.10/site-packages/executorch/extension/llm/custom_ops/libcustom_ops_aot_lib.dylib
Unable to import torchao experimental quant_api with error:  [Errno 2] No such file or directory: '/Users/anirudhsingh/MISC/playground/torchchat/torchao-build/src/ao/torchao/experimental/quant_api.py'
Modality of model=text-image
Using device=mps
Loading model...
Time to load model: 70.82 seconds
-----------------------------------------------------------
Building contexts for mmmu_val_art on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 5322.72it/s]
Running generate_until requests
Running generate_until requests with text+image input:   0%|                                                               | 0/1 [00:00<?, ?it/s]Time to run eval: 86.67s.
Traceback (most recent call last):
  File "/Users/anirudhsingh/MISC/playground/torchchat/torchchat.py", line 100, in <module>
    eval_main(args)
  File "/Users/anirudhsingh/MISC/playground/torchchat/torchchat/usages/eval.py", line 593, in main
    result = multi_model_eval(
  File "/Users/anirudhsingh/MISC/playground/torchchat/torchchat/usages/eval.py", line 523, in multi_model_eval
    eval_results = evaluate(
  File "/Users/anirudhsingh/MISC/playground/torchchat/venv/lib/python3.10/site-packages/lm_eval/utils.py", line 397, in _wrapper
    return fn(*args, **kwargs)
  File "/Users/anirudhsingh/MISC/playground/torchchat/venv/lib/python3.10/site-packages/lm_eval/evaluator.py", line 500, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)
  File "/Users/anirudhsingh/MISC/playground/torchchat/venv/lib/python3.10/site-packages/lm_eval/models/hf_vlms.py", line 691, in generate_until
    cont = self._model_multimodal_generate(inputs, stop=until, **kwargs)
  File "/Users/anirudhsingh/MISC/playground/torchchat/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/Users/anirudhsingh/MISC/playground/torchchat/torchchat/usages/eval.py", line 401, in _model_multimodal_generate
    logits = self.model(prompt, **batch)[:, -1]
  File "/Users/anirudhsingh/MISC/playground/torchchat/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1749, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/anirudhsingh/MISC/playground/torchchat/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1760, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/anirudhsingh/MISC/playground/torchchat/torchchat/model.py", line 595, in forward
    return self.model(
  File "/Users/anirudhsingh/MISC/playground/torchchat/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1749, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/anirudhsingh/MISC/playground/torchchat/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1760, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/anirudhsingh/MISC/playground/torchchat/venv/lib/python3.10/site-packages/torchtune/modules/model_fusion/_deep_fusion.py", line 205, in forward
    output = self.decoder(
  File "/Users/anirudhsingh/MISC/playground/torchchat/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1749, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/anirudhsingh/MISC/playground/torchchat/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1760, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/anirudhsingh/MISC/playground/torchchat/venv/lib/python3.10/site-packages/torchtune/modules/transformer.py", line 635, in forward
    h = layer(
  File "/Users/anirudhsingh/MISC/playground/torchchat/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1749, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/anirudhsingh/MISC/playground/torchchat/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1760, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/anirudhsingh/MISC/playground/torchchat/venv/lib/python3.10/site-packages/torchtune/modules/transformer.py", line 122, in forward
    attn_out = self.attn(h, h, mask=mask, input_pos=input_pos)
  File "/Users/anirudhsingh/MISC/playground/torchchat/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1749, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/anirudhsingh/MISC/playground/torchchat/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1760, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/anirudhsingh/MISC/playground/torchchat/venv/lib/python3.10/site-packages/torchtune/modules/attention.py", line 243, in forward
    q = self.pos_embeddings(q, input_pos=input_pos)
  File "/Users/anirudhsingh/MISC/playground/torchchat/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1749, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/anirudhsingh/MISC/playground/torchchat/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1760, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/anirudhsingh/MISC/playground/torchchat/venv/lib/python3.10/site-packages/torchtune/models/llama3_1/_position_embeddings.py", line 171, in forward
    xshaped = x.float().reshape(*x.shape[:-1], -1, 2)
RuntimeError: cannot reshape tensor of 0 elements into shape [1, 0, 32, -1, 2] because the unspecified dimension size -1 can be any value and is ambiguous
Running generate_until requests with text+image input:   0%|                                                               | 0/1 [01:13<?, ?it/s]
(venv) (base) anirudhsingh@Anirudhs-MacBook-Pro-4 torchchat %

Will try to look deeper in a day or two.

Jack-Khuu · 2025-02-18T17:34:06Z

@anirudhs001 Are you able to access the larger PyTorch Slack? https://github.com/pytorch/pytorch?tab=readme-ov-file#communication

I can add you in the channel that way (Or if I have can send you email, I'll send a direct invite)

32 Gb M2 Max

Interesting, this should be plenty powerful.

Getting a tensor mismatch is where the fun part starts: seems to suggest a disconnect between how things work in torchtune and what we're expecting in torchchat.

Were you able to get the torchtune eval running?

anirudhs001 · 2025-02-19T09:27:05Z

@Jack-Khuu Nope, Can't open that too. Can you please share the invite.

Running everything on the cpu (instead of apple's mps) got rid of the oom errors.

Were you able to get the torchtune eval running?

Yep, the torchtune evals works fine. I'll try to see what we do differently.

Jack-Khuu · 2025-02-19T18:40:09Z

Can you try out this form?
https://docs.google.com/forms/d/e/1FAIpQLSeADnUNW36fjKjYzyHDOzEB_abKQE9b6gqqW9NXse6O0MWh0A/viewform

I'll try to see what we do differently.

Sweet, feel free to spin up an issue in torchtune with this issue/me if you run into anything funky

anirudhs001 · 2025-02-23T04:18:03Z

Hey @Jack-Khuu,

I’ve filled out that form a couple of times now but still can’t sign into the Slack workspace.

That aside, it’s working now. I’ve created PR #1499.

Jack-Khuu · 2025-02-23T21:20:14Z

That's amazing! I'll take a look on Monday

@anirudhs001 hmm maybe something is going on in the background with the slack setup
If you don't mind, can you add me via linkedin? We'll try to add you that way

anirudhs001 · 2025-02-24T01:18:39Z

Thanks! Sent you a connection request.

Olivia-liu added enhancement New feature or request good first issue Good for newcomers actionable Items in the backlog waiting for an appropriate impl/fix Llama 3.2- Multimodal Issues related to Multimodal of Llama3.2 labels Oct 29, 2024

Olivia-liu closed this as completed Oct 30, 2024

Olivia-liu reopened this Oct 30, 2024

Olivia-liu assigned Vishnu-sai-teja Oct 30, 2024

Jack-Khuu added the RFC Request for Comment label Oct 30, 2024

Olivia-liu unassigned Vishnu-sai-teja Nov 12, 2024

Olivia-liu removed the RFC Request for Comment label Nov 12, 2024

Olivia-liu changed the title ~~RFC: Multimodal Eval Enablement (Looking for Developer to Implement Design)~~ Multimodal Eval Enablement (Looking for Developer to Implement Design) Nov 12, 2024

JacobSzwejbka added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 7, 2025

Jack-Khuu assigned anirudhs001 Feb 21, 2025

Jack-Khuu added this to [torchchat] Looking for Contributors Feb 21, 2025

Jack-Khuu moved this to In progress in [torchchat] Looking for Contributors Feb 21, 2025

This was referenced Feb 23, 2025

Added support for Multimodal eval #1498

Closed

Added support for Multimodal eval #1499

Merged

Jack-Khuu moved this from In progress to Done in [torchchat] Looking for Contributors Mar 25, 2025

Jack-Khuu closed this as completed by moving to Done in [torchchat] Looking for Contributors Mar 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multimodal Eval Enablement (Looking for Developer to Implement Design) #1334

Multimodal Eval Enablement (Looking for Developer to Implement Design) #1334

Olivia-liu commented Oct 29, 2024 •

edited

Loading

Gasoonjia commented Oct 29, 2024

Vishnu-sai-teja commented Oct 30, 2024

Olivia-liu commented Oct 30, 2024

Olivia-liu commented Oct 30, 2024

Olivia-liu commented Oct 30, 2024

Vishnu-sai-teja commented Oct 31, 2024

Jack-Khuu commented Nov 1, 2024

Gasoonjia commented Nov 4, 2024

byjlw commented Nov 5, 2024

Vishnu-sai-teja commented Nov 6, 2024

byjlw commented Nov 6, 2024

Gasoonjia commented Nov 11, 2024 •

edited

Loading

Olivia-liu commented Nov 12, 2024 •

edited

Loading

anirudhs001 commented Jan 29, 2025

Jack-Khuu commented Jan 29, 2025

anirudhs001 commented Feb 1, 2025

Olivia-liu commented Feb 3, 2025

anirudhs001 commented Feb 9, 2025

Jack-Khuu commented Feb 10, 2025

anirudhs001 commented Feb 14, 2025 •

edited

Loading

Jack-Khuu commented Feb 18, 2025

anirudhs001 commented Feb 19, 2025 •

edited

Loading

Jack-Khuu commented Feb 19, 2025

anirudhs001 commented Feb 23, 2025

Jack-Khuu commented Feb 23, 2025 •

edited

Loading

anirudhs001 commented Feb 24, 2025

Multimodal Eval Enablement (Looking for Developer to Implement Design) #1334

Multimodal Eval Enablement (Looking for Developer to Implement Design) #1334

Comments

Olivia-liu commented Oct 29, 2024 • edited Loading

🚀 The feature, motivation and pitch

Additional context

Assumptions

The Main Goal

RFC (Optional)

Design

Overview

Details

The Core Eval Implementation

[Preferred] Approach A: import the implementation of HFMultimodalLM from torchtune directly

Approach B: copy the implementation of HFMultimodalLM from torchtune

The Commandline Arguments

Alternatives Discussion

Testing & Tooling Plan

Gasoonjia commented Oct 29, 2024

Vishnu-sai-teja commented Oct 30, 2024

Olivia-liu commented Oct 30, 2024

Olivia-liu commented Oct 30, 2024

Olivia-liu commented Oct 30, 2024

Vishnu-sai-teja commented Oct 31, 2024

Jack-Khuu commented Nov 1, 2024

Gasoonjia commented Nov 4, 2024

byjlw commented Nov 5, 2024

Vishnu-sai-teja commented Nov 6, 2024

byjlw commented Nov 6, 2024

Gasoonjia commented Nov 11, 2024 • edited Loading

Olivia-liu commented Nov 12, 2024 • edited Loading

anirudhs001 commented Jan 29, 2025

Jack-Khuu commented Jan 29, 2025

anirudhs001 commented Feb 1, 2025

Olivia-liu commented Feb 3, 2025

anirudhs001 commented Feb 9, 2025

Jack-Khuu commented Feb 10, 2025

anirudhs001 commented Feb 14, 2025 • edited Loading

Jack-Khuu commented Feb 18, 2025

anirudhs001 commented Feb 19, 2025 • edited Loading

Jack-Khuu commented Feb 19, 2025

anirudhs001 commented Feb 23, 2025

Jack-Khuu commented Feb 23, 2025 • edited Loading

anirudhs001 commented Feb 24, 2025

Olivia-liu commented Oct 29, 2024 •

edited

Loading

[Preferred] Approach A: import the implementation of `HFMultimodalLM` from torchtune directly

Approach B: copy the implementation of `HFMultimodalLM` from torchtune

Gasoonjia commented Nov 11, 2024 •

edited

Loading

Olivia-liu commented Nov 12, 2024 •

edited

Loading

anirudhs001 commented Feb 14, 2025 •

edited

Loading

anirudhs001 commented Feb 19, 2025 •

edited

Loading

Jack-Khuu commented Feb 23, 2025 •

edited

Loading