Question about whether Yi-VL-6B can be fine-tuned using its own dataset #348

a2382625920 · 2024-01-24T07:30:18Z

Reminder

I have searched the Github Discussion and issues and have not found anything similar to this.

Motivation

The low footprint of Yi-VL's video memory and the high speed of its inference allows room for more utility. If the Yi-VL series of multimodal macromodels can be fine-tuned using its own dataset, it many projects will be a great leap forward!

Solution

No response

Alternatives

No response

Anything Else?

No response

Are you willing to submit a PR?

I'm willing to submit a PR!

Jintao-Huang · 2024-01-27T04:55:25Z

hello~

yi-vl-6b is an excellent performing model, and the ms-swift LLM training framework has incorporated sft for yi-vl. It provides example scripts and supports custom datasets. You can check it out here~
https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/yi_vl_6b_chat

The script utilizes the COCO dataset for fine-tuning. After training, the generated samples are as follows:

"""
[PROMPT]This is a chat between an inquisitive human and an AI assistant. Assume the role of the AI assistant. Read all the images carefully, and respond to the human's questions with informative, helpful, detailed and polite answers. 这是一个好奇的人类和一个人工智能助手之间的对话。假设你扮演这个AI助手的角色。仔细阅读所有的图像，并对人类的问题做出信息丰富、有帮助、详细的和礼貌的回答。

### Human: [-200 * 1]
please describe the image.
### Assistant:
[OUTPUT]A large airplane is on display in a museum. 
###

[LABELS]People walking in a museum with a airplane hanging from the celing.
[IMAGES]['https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/coco/2014/val2014/COCO_val2014_000000492132.jpg']
--------------------------------------------------------------------
[PROMPT]This is a chat between an inquisitive human and an AI assistant. Assume the role of the AI assistant. Read all the images carefully, and respond to the human's questions with informative, helpful, detailed and polite answers. 这是一个好奇的人类和一个人工智能助手之间的对话。假设你扮演这个AI助手的角色。仔细阅读所有的图像，并对人类的问题做出信息丰富、有帮助、详细的和礼貌的回答。

### Human: [-200 * 1]
please describe the image.
### Assistant:
[OUTPUT]A bowl of fruit and cake next to a cup of coffee. 
###

[LABELS]a bowl of fruit and pastry on a table
[IMAGES]['https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/coco/2014/val2014/COCO_val2014_000000558642.jpg']
"""

minlik · 2024-01-31T07:43:17Z

I added the finetuning scripts. See #368

a2382625920 · 2024-01-31T07:52:18Z

hello~

yi-vl-6b is an excellent performing model, and the ms-swift LLM training framework has incorporated sft for yi-vl. It provides example scripts and supports custom datasets. You can check it out here~ https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/yi_vl_6b_chat

The script utilizes the COCO dataset for fine-tuning. After training, the generated samples are as follows:

"""
[PROMPT]This is a chat between an inquisitive human and an AI assistant. Assume the role of the AI assistant. Read all the images carefully, and respond to the human's questions with informative, helpful, detailed and polite answers. 这是一个好奇的人类和一个人工智能助手之间的对话。假设你扮演这个AI助手的角色。仔细阅读所有的图像，并对人类的问题做出信息丰富、有帮助、详细的和礼貌的回答。

### Human: [-200 * 1]
please describe the image.
### Assistant:
[OUTPUT]A large airplane is on display in a museum. 
###

[LABELS]People walking in a museum with a airplane hanging from the celing.
[IMAGES]['https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/coco/2014/val2014/COCO_val2014_000000492132.jpg']
--------------------------------------------------------------------
[PROMPT]This is a chat between an inquisitive human and an AI assistant. Assume the role of the AI assistant. Read all the images carefully, and respond to the human's questions with informative, helpful, detailed and polite answers. 这是一个好奇的人类和一个人工智能助手之间的对话。假设你扮演这个AI助手的角色。仔细阅读所有的图像，并对人类的问题做出信息丰富、有帮助、详细的和礼貌的回答。

### Human: [-200 * 1]
please describe the image.
### Assistant:
[OUTPUT]A bowl of fruit and cake next to a cup of coffee. 
###

[LABELS]a bowl of fruit and pastry on a table
[IMAGES]['https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/coco/2014/val2014/COCO_val2014_000000558642.jpg']
"""

Thanks, I'm swift trying to register the dataset and train with the already downloaded model!

a2382625920 · 2024-01-31T07:52:48Z

I added the finetuning scripts. See #368

Thank you very much for your work, I will try to use it!

a2382625920 · 2024-02-01T05:27:02Z

I added the finetuning scripts. See #368

Hi, I tried the method you provided and it comes up with the following warning, which may have an effect on the final fine-tuned result:

WARNING: tokenization mismatch: 208 vs. 210. (ignored)
WARNING: tokenization mismatch: 220 vs. 222. (ignored)
WARNING: tokenization mismatch: 174 vs. 176. (ignored)
WARNING: tokenization mismatch: 219 vs. 221. (ignored)
WARNING: tokenization mismatch: 197 vs. 199. (ignored)
WARNING: tokenization mismatch: 216 vs. 218. (ignored)
WARNING: tokenization mismatch: 196 vs. 198. (ignored)
WARNING: tokenization mismatch: 222 vs. 224. (ignored)
WARNING: tokenization mismatch: 180 vs. 182. (ignored)
WARNING: tokenization mismatch: 219 vs. 221. (ignored)
WARNING: tokenization mismatch: 233 vs. 235. (ignored)
WARNING: tokenization mismatch: 177 vs. 179. (ignored)
WARNING: tokenization mismatch: 195 vs. 197. (ignored)
WARNING: tokenization mismatch: 227 vs. 229. (ignored)
WARNING: tokenization mismatch: 226 vs. 228. (ignored)
WARNING: tokenization mismatch: 221 vs. 223. (ignored)
WARNING: tokenization mismatch: 178 vs. 180. (ignored)
WARNING: tokenization mismatch: 237 vs. 239. (ignored)
WARNING: tokenization mismatch: 178 vs. 180. (ignored)
WARNING: tokenization mismatch: 227 vs. 229. (ignored)
WARNING: tokenization mismatch: 175 vs. 177. (ignored)
WARNING: tokenization mismatch: 222 vs. 224. (ignored)
WARNING: tokenization mismatch: 215 vs. 217. (ignored)
WARNING: tokenization mismatch: 217 vs. 219. (ignored)
WARNING: tokenization mismatch: 220 vs. 222. (ignored)
WARNING: tokenization mismatch: 215 vs. 217. (ignored)
WARNING: tokenization mismatch: 227 vs. 229. (ignored)
WARNING: tokenization mismatch: 178 vs. 180. (ignored)
WARNING: tokenization mismatch: 235 vs. 237. (ignored)
WARNING: tokenization mismatch: 177 vs. 179. (ignored)
WARNING: tokenization mismatch: 221 vs. 223. (ignored)
WARNING: tokenization mismatch: 197 vs. 199. (ignored)
WARNING: tokenization mismatch: 220 vs. 222. (ignored)
WARNING: tokenization mismatch: 176 vs. 178. (ignored)
WARNING: tokenization mismatch: 228 vs. 230. (ignored)
WARNING: tokenization mismatch: 221 vs. 223. (ignored)
WARNING: tokenization mismatch: 175 vs. 177. (ignored)
WARNING: tokenization mismatch: 220 vs. 222. (ignored)
WARNING: tokenization mismatch: 177 vs. 179. (ignored)
WARNING: tokenization mismatch: 176 vs. 178. (ignored)
WARNING: tokenization mismatch: 176 vs. 178. (ignored)
WARNING: tokenization mismatch: 229 vs. 231. (ignored)
WARNING: tokenization mismatch: 221 vs. 223. (ignored)
WARNING: tokenization mismatch: 177 vs. 179. (ignored)
WARNING: tokenization mismatch: 220 vs. 222. (ignored)
WARNING: tokenization mismatch: 227 vs. 229. (ignored)
WARNING: tokenization mismatch: 178 vs. 180. (ignored)
WARNING: tokenization mismatch: 233 vs. 235. (ignored)
WARNING: tokenization mismatch: 176 vs. 178. (ignored)
WARNING: tokenization mismatch: 175 vs. 177. (ignored)
WARNING: tokenization mismatch: 230 vs. 232. (ignored)

a2382625920 · 2024-02-01T05:30:00Z

hello~

yi-vl-6b is an excellent performing model, and the ms-swift LLM training framework has incorporated sft for yi-vl. It provides example scripts and supports custom datasets. You can check it out here~ https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/yi_vl_6b_chat

The script utilizes the COCO dataset for fine-tuning. After training, the generated samples are as follows:

"""
[PROMPT]This is a chat between an inquisitive human and an AI assistant. Assume the role of the AI assistant. Read all the images carefully, and respond to the human's questions with informative, helpful, detailed and polite answers. 这是一个好奇的人类和一个人工智能助手之间的对话。假设你扮演这个AI助手的角色。仔细阅读所有的图像，并对人类的问题做出信息丰富、有帮助、详细的和礼貌的回答。

### Human: [-200 * 1]
please describe the image.
### Assistant:
[OUTPUT]A large airplane is on display in a museum. 
###

[LABELS]People walking in a museum with a airplane hanging from the celing.
[IMAGES]['https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/coco/2014/val2014/COCO_val2014_000000492132.jpg']
--------------------------------------------------------------------
[PROMPT]This is a chat between an inquisitive human and an AI assistant. Assume the role of the AI assistant. Read all the images carefully, and respond to the human's questions with informative, helpful, detailed and polite answers. 这是一个好奇的人类和一个人工智能助手之间的对话。假设你扮演这个AI助手的角色。仔细阅读所有的图像，并对人类的问题做出信息丰富、有帮助、详细的和礼貌的回答。

### Human: [-200 * 1]
please describe the image.
### Assistant:
[OUTPUT]A bowl of fruit and cake next to a cup of coffee. 
###

[LABELS]a bowl of fruit and pastry on a table
[IMAGES]['https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/coco/2014/val2014/COCO_val2014_000000558642.jpg']
"""

I don't know how I can register my own dataset. And this doesn't allow me to enter my own local model path, after I enter the local model path, swift still uses the network to download the model. How can I solve this?

minlik · 2024-02-01T06:07:05Z

WARNING: tokenization mismatch

I didn't encounter the same issue. Could you please share your training scripts?

After reviewing the code, I noticed that the WARNING might be caused by the commented code here. Could you please checke your local code?

minlik · 2024-02-01T06:10:08Z

@a2382625920 I thought of another possibility. The training code is modified from LLaVA. If you have installed llava locally, you can uninstall it and try again.

a2382625920 · 2024-02-01T07:29:33Z

我想到了另一种可能性。训练代码是从 LLaVA 修改而来的。如果您已在本地安装了 llava，则可以将其卸载并重试。

I did use llava's virtual environment to run the code, and after I uninstalled it and installed Yi's installer environment, the following error was reported:

Traceback (most recent call last):
File "/root/siton-glusterfs-eaxtsxdfs/hzt/projects/Yi/VL/llava/train/train_mem.py", line 6, in
from llava.train import llama_flash_attn_monkey_patch
ModuleNotFoundError: No module named 'llava'

Do you have to use the llava environment?

a2382625920 · 2024-02-01T07:33:00Z

WARNING: tokenization mismatch

I didn't encounter the same issue. Could you please share your training scripts?

After reviewing the code, I noticed that the WARNING might be caused by the commented code here. Could you please checke your local code?

#!/bin/bash

deepspeed --include localhost:0 --master_port 1234 llava/train/train_mem.py
--deepspeed ./scripts/zero2.json
--lora_enable True
--model_name_or_path /root/siton-glusterfs-eaxtsxdfs/hzt/model/Yi-VL-6B
--data_path /root/siton-glusterfs-eaxtsxdfs/hzt/data/LLaVa_data/filter_cap_1226.json
--image_folder /root/siton-glusterfs-eaxtsxdfs/xts/data/s_mix/image
--vision_tower /root/siton-glusterfs-eaxtsxdfs/hzt/model/Yi-VL-6B/vit/clip-vit-H-14-laion2B-s32B-b79K-yi-vl-6B-448
--output_dir ./checkpoint/Yi-VL-6B
--mm_vision_select_layer -2
--mm_use_im_start_end False
--mm_use_im_patch_token False
--bf16 True
--num_train_epochs 10
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--gradient_accumulation_steps 8
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 200
--save_total_limit 3
--learning_rate 2e-5
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--tf32 True
--model_max_length 2048
--gradient_checkpointing True
--lazy_preprocess True
--dataloader_num_workers 4
--report_to wandb

minlik · 2024-02-01T09:12:16Z

我想到了另一种可能性。训练代码是从 LLaVA 修改而来的。如果您已在本地安装了 llava，则可以将其卸载并重试。

I did use llava's virtual environment to run the code, and after I uninstalled it and installed Yi's installer environment, the following error was reported:

Traceback (most recent call last): File "/root/siton-glusterfs-eaxtsxdfs/hzt/projects/Yi/VL/llava/train/train_mem.py", line 6, in from llava.train import llama_flash_attn_monkey_patch ModuleNotFoundError: No module named 'llava'

Do you have to use the llava environment?

Could you please run the command export PYTHONPATH=$PWD:$PYTHONPATH under the VL folder and then try again? Thank you.

lucasjinreal · 2024-02-27T07:40:15Z

Can u also share pretraining script? Which tuning projector and vision encoder with stage 1 and stage 2?
This not same as llava.

Jintao-Huang · 2024-03-14T13:08:45Z

Hello! 😊

Now Swift is enhancing its multimodal capabilities through fine-tuning. It has already supported custom datasets and full parameter fine-tuning. For best practices, you can refer to this link: https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/yi-vl%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md#%E5%BE%AE%E8%B0%83

If interested, you are welcome to use it ～

Jintao-Huang · 2024-03-14T13:13:00Z

Looking forward to the official fine-tuning script provided!

Jintao-Huang · 2024-03-14T13:17:05Z

令人忧伤的故事，没有过来查看issue的进展

Iven2132 · 2024-03-30T16:04:28Z

@Jintao-Huang Hi, How can I fine-tune Yi-VL on my dataset? any docs link?

Jintao-Huang · 2024-03-30T16:41:40Z

@Jintao-Huang Hi, How can I fine-tune Yi-VL on my dataset? any docs link?

ms-swift offers fine-tuning for custom datasets on YI-VL, including LoRA and full-parameter options, following best practices, haha~ 😊

https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/yi-vl%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md#%E5%BE%AE%E8%B0%83

Iven2132 · 2024-03-30T16:50:57Z

@Jintao-Huang is there any notebook I can use to do that? also, is yi-vl-6b-chat better than neva?

Iven2132 · 2024-03-31T08:01:10Z

@Jintao-Huang How can I fine-tune the model with my custom dataset which is a JSON file? I saw the docs is using coco-mini-en-2

Jintao-Huang · 2024-03-31T15:07:20Z

Here ~

https://github.com/modelscope/swift/blob/main/docs/source/LLM/%E8%87%AA%E5%AE%9A%E4%B9%89%E4%B8%8E%E6%8B%93%E5%B1%95.md#-%E6%8E%A8%E8%8D%90%E5%91%BD%E4%BB%A4%E8%A1%8C%E5%8F%82%E6%95%B0%E7%9A%84%E5%BD%A2%E5%BC%8F

    --custom_train_dataset_path xxx.json \
    --custom_val_dataset_path yyy.json \

[{"query": "55555", "response": "66666", "images": ["image_path"]},
{"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path"]},
{"query": "EEEEE", "response": "FFFFF", "history": [["AAAAA", "BBBBB"], ["CCCCC", "DDDDD"]], "images": ["image_path", "image_path2", "image_path3"]}]

@Jintao-Huang How can I fine-tune the model with my custom dataset which is a JSON file? I saw the docs is using coco-mini-en-2

Iven2132 · 2024-03-31T15:59:00Z

Here ~

https://github.com/modelscope/swift/blob/main/docs/source/LLM/%E8%87%AA%E5%AE%9A%E4%B9%89%E4%B8%8E%E6%8B%93%E5%B1%95.md#-%E6%8E%A8%E8%8D%90%E5%91%BD%E4%BB%A4%E8%A1%8C%E5%8F%82%E6%95%B0%E7%9A%84%E5%BD%A2%E5%BC%8F
    --custom_train_dataset_path xxx.jsonl \
    --custom_val_dataset_path yyy.jsonl \
[{"query": "55555", "response": "66666", "images": ["image_path"]},
{"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path"]},
{"query": "EEEEE", "response": "FFFFF", "history": [["AAAAA", "BBBBB"], ["CCCCC", "DDDDD"]], "images": ["image_path", "image_path2", "image_path3"]}]
@Jintao-Huang How can I fine-tune the model with my custom dataset which is a JSON file? I saw the docs is using coco-mini-en-2

I tried this but was getting errors, Do you have any notebook I can use? are you on 01-ai discord server, I'd love to chat!

Yimi81 mentioned this issue Feb 26, 2024

VL training code release? #409

Closed

2 tasks

vidddddaaaaa added the doc-required Your PR changes impact docs and you will update later. label Feb 29, 2024

Yimi81 mentioned this issue Mar 9, 2024

Will opensource llava trainer? #428

Closed

2 tasks

Yimi81 closed this as completed Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about whether Yi-VL-6B can be fine-tuned using its own dataset #348

Question about whether Yi-VL-6B can be fine-tuned using its own dataset #348

a2382625920 commented Jan 24, 2024 •

edited

Loading

Jintao-Huang commented Jan 27, 2024 •

edited

Loading

minlik commented Jan 31, 2024

a2382625920 commented Jan 31, 2024

a2382625920 commented Jan 31, 2024

a2382625920 commented Feb 1, 2024

a2382625920 commented Feb 1, 2024

minlik commented Feb 1, 2024

minlik commented Feb 1, 2024

a2382625920 commented Feb 1, 2024

a2382625920 commented Feb 1, 2024

minlik commented Feb 1, 2024

lucasjinreal commented Feb 27, 2024

Jintao-Huang commented Mar 14, 2024

Jintao-Huang commented Mar 14, 2024

Jintao-Huang commented Mar 14, 2024

Iven2132 commented Mar 30, 2024

Jintao-Huang commented Mar 30, 2024

Iven2132 commented Mar 30, 2024

Iven2132 commented Mar 31, 2024

Jintao-Huang commented Mar 31, 2024 •

edited

Loading

Iven2132 commented Mar 31, 2024

Question about whether Yi-VL-6B can be fine-tuned using its own dataset #348

Question about whether Yi-VL-6B can be fine-tuned using its own dataset #348

Comments

a2382625920 commented Jan 24, 2024 • edited Loading

Reminder

Motivation

Solution

Alternatives

Anything Else?

Are you willing to submit a PR?

Jintao-Huang commented Jan 27, 2024 • edited Loading

minlik commented Jan 31, 2024

a2382625920 commented Jan 31, 2024

a2382625920 commented Jan 31, 2024

a2382625920 commented Feb 1, 2024

a2382625920 commented Feb 1, 2024

minlik commented Feb 1, 2024

minlik commented Feb 1, 2024

a2382625920 commented Feb 1, 2024

a2382625920 commented Feb 1, 2024

minlik commented Feb 1, 2024

lucasjinreal commented Feb 27, 2024

Jintao-Huang commented Mar 14, 2024

Jintao-Huang commented Mar 14, 2024

Jintao-Huang commented Mar 14, 2024

Iven2132 commented Mar 30, 2024

Jintao-Huang commented Mar 30, 2024

Iven2132 commented Mar 30, 2024

Iven2132 commented Mar 31, 2024

Jintao-Huang commented Mar 31, 2024 • edited Loading

Iven2132 commented Mar 31, 2024

a2382625920 commented Jan 24, 2024 •

edited

Loading

Jintao-Huang commented Jan 27, 2024 •

edited

Loading

Jintao-Huang commented Mar 31, 2024 •

edited

Loading