-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about whether Yi-VL-6B can be fine-tuned using its own dataset #348
Comments
hello~ yi-vl-6b is an excellent performing model, and the ms-swift LLM training framework has incorporated sft for yi-vl. It provides example scripts and supports custom datasets. You can check it out here~ The script utilizes the COCO dataset for fine-tuning. After training, the generated samples are as follows: """
[PROMPT]This is a chat between an inquisitive human and an AI assistant. Assume the role of the AI assistant. Read all the images carefully, and respond to the human's questions with informative, helpful, detailed and polite answers. 这是一个好奇的人类和一个人工智能助手之间的对话。假设你扮演这个AI助手的角色。仔细阅读所有的图像,并对人类的问题做出信息丰富、有帮助、详细的和礼貌的回答。
### Human: [-200 * 1]
please describe the image.
### Assistant:
[OUTPUT]A large airplane is on display in a museum.
###
[LABELS]People walking in a museum with a airplane hanging from the celing.
[IMAGES]['https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/coco/2014/val2014/COCO_val2014_000000492132.jpg']
--------------------------------------------------------------------
[PROMPT]This is a chat between an inquisitive human and an AI assistant. Assume the role of the AI assistant. Read all the images carefully, and respond to the human's questions with informative, helpful, detailed and polite answers. 这是一个好奇的人类和一个人工智能助手之间的对话。假设你扮演这个AI助手的角色。仔细阅读所有的图像,并对人类的问题做出信息丰富、有帮助、详细的和礼貌的回答。
### Human: [-200 * 1]
please describe the image.
### Assistant:
[OUTPUT]A bowl of fruit and cake next to a cup of coffee.
###
[LABELS]a bowl of fruit and pastry on a table
[IMAGES]['https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/coco/2014/val2014/COCO_val2014_000000558642.jpg']
""" |
I added the finetuning scripts. See #368 |
Thanks, I'm swift trying to register the dataset and train with the already downloaded model! |
Thank you very much for your work, I will try to use it! |
Hi, I tried the method you provided and it comes up with the following warning, which may have an effect on the final fine-tuned result: WARNING: tokenization mismatch: 208 vs. 210. (ignored) |
I don't know how I can register my own dataset. And this doesn't allow me to enter my own local model path, after I enter the local model path, swift still uses the network to download the model. How can I solve this? |
I didn't encounter the same issue. Could you please share your training scripts? After reviewing the code, I noticed that the WARNING might be caused by the commented code here. Could you please checke your local code? |
@a2382625920 I thought of another possibility. The training code is modified from LLaVA. If you have installed llava locally, you can uninstall it and try again. |
I did use llava's virtual environment to run the code, and after I uninstalled it and installed Yi's installer environment, the following error was reported: Traceback (most recent call last): Do you have to use the llava environment? |
#!/bin/bash deepspeed --include localhost:0 --master_port 1234 llava/train/train_mem.py |
Could you please run the command |
Can u also share pretraining script? Which tuning projector and vision encoder with stage 1 and stage 2? |
Hello! 😊 Now Swift is enhancing its multimodal capabilities through fine-tuning. It has already supported custom datasets and full parameter fine-tuning. For best practices, you can refer to this link: https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/yi-vl%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md#%E5%BE%AE%E8%B0%83 If interested, you are welcome to use it ~ |
Looking forward to the official fine-tuning script provided! |
令人忧伤的故事,没有过来查看issue的进展 |
@Jintao-Huang Hi, How can I fine-tune Yi-VL on my dataset? any docs link? |
ms-swift offers fine-tuning for custom datasets on YI-VL, including LoRA and full-parameter options, following best practices, haha~ 😊 |
@Jintao-Huang is there any notebook I can use to do that? also, is yi-vl-6b-chat better than neva? |
@Jintao-Huang How can I fine-tune the model with my custom dataset which is a JSON file? I saw the docs is using coco-mini-en-2 |
Here ~ --custom_train_dataset_path xxx.json \
--custom_val_dataset_path yyy.json \ [{"query": "55555", "response": "66666", "images": ["image_path"]},
{"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path"]},
{"query": "EEEEE", "response": "FFFFF", "history": [["AAAAA", "BBBBB"], ["CCCCC", "DDDDD"]], "images": ["image_path", "image_path2", "image_path3"]}]
|
I tried this but was getting errors, Do you have any notebook I can use? are you on 01-ai discord server, I'd love to chat! |
Reminder
Motivation
The low footprint of Yi-VL's video memory and the high speed of its inference allows room for more utility. If the Yi-VL series of multimodal macromodels can be fine-tuned using its own dataset, it many projects will be a great leap forward!
Solution
No response
Alternatives
No response
Anything Else?
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: