Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deepseek_r1_distill无法使用liger #3555

Open
xiezhipeng-git opened this issue Mar 18, 2025 · 1 comment
Open

deepseek_r1_distill无法使用liger #3555

xiezhipeng-git opened this issue Mar 18, 2025 · 1 comment

Comments

@xiezhipeng-git
Copy link

xiezhipeng-git commented Mar 18, 2025

Describe the bug
用任意deepseek蒸馏模型使用liger 训练

Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)

Additional context

Add any other context about the problem here(在这里补充其他信息)
Traceback (most recent call last):
  File "/root/anaconda3/lib/python3.12/runpy.py", line 198, in _run_module_as_main
    return _run_code(code, main_globals, None,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/lib/python3.12/runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "/root/.vscode-server/extensions/ms-python.debugpy-2025.4.1-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 71, in <module>
    cli.main()
  File "/root/.vscode-server/extensions/ms-python.debugpy-2025.4.1-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 501, in main
    run()
  File "/root/.vscode-server/extensions/ms-python.debugpy-2025.4.1-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 351, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/root/.vscode-server/extensions/ms-python.debugpy-2025.4.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 310, in run_path
    return _run_module_code(code, init_globals, run_name, pkg_name=pkg_name, script_name=fname)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.vscode-server/extensions/ms-python.debugpy-2025.4.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 127, in _run_module_code
    _run_code(code, mod_globals, init_globals, mod_name, mod_spec, pkg_name, script_name)
  File "/root/.vscode-server/extensions/ms-python.debugpy-2025.4.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 118, in _run_code
    exec(code, run_globals)
  File "/mnt/d/my/work/LLM/ai_train/ms-swift/swift/cli/run_sh.py", line 96, in <module>
    main()
  File "/mnt/d/my/work/LLM/ai_train/ms-swift/swift/cli/run_sh.py", line 92, in main
    subcommand_main(command[2:])  # 传递剩余的参数
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/d/my/work/LLM/ai_train/ms-swift/swift/llm/train/sft.py", line 265, in sft_main
    return SwiftSft(args).main()
           ^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/d/my/work/LLM/ai_train/ms-swift/swift/llm/base.py", line 47, in main
    result = self.run()
             ^^^^^^^^^^
  File "/mnt/d/my/work/LLM/ai_train/ms-swift/swift/llm/train/sft.py", line 125, in run
    self.model = self.prepare_model(self.args, self.model, template=self.template, train_dataset=train_dataset)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/d/my/work/LLM/ai_train/ms-swift/swift/llm/train/tuner.py", line 340, in prepare_model
    apply_liger(args.model_type)
  File "/mnt/d/my/work/LLM/ai_train/ms-swift/swift/llm/train/tuner.py", line 42, in apply_liger
    raise ValueError(f'Unsupported liger model_type: {model_type}')
ValueError: Unsupported liger model_type: deepseek_r1_distill
@xiezhipeng-git
Copy link
Author

xiezhipeng-git commented Mar 18, 2025

@Jintao-Huang
deepseek_r1_distill 无法使用liger.另外这是把模型替换标准化为其他函数来加速吧。如果使用
https://github.com/jiachenzhu/DyT
是不是还能加速呢?以及怎么替换呢? 毕竟都是为了让输出变为-1,1 或者0,1。从DyT的实验结果来看。对能力影响不大。但是对运算速度影响很大。

@xiezhipeng-git xiezhipeng-git changed the title deepseek_r1_distill无法使用liger lora deepseek_r1_distill无法使用liger Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant