GitHub

基础环境安装

ENV_NAME="finbot"
PYTHON_VERSION="3.10"

## 创建虚拟环境并激活
conda create -n $ENV_NAME python=$PYTHON_VERSION notebook -y
conda activate $ENV_NAME

## 设置pip国内源
export PIP_INDEX_URL=https://pypi.tuna.tsinghua.edu.cn/simple

## 安装基础依赖
pip install \
  addict simplejson sortedcontainers openpyxl matplotlib \
  vllm peft FlagEmbedding bitsandbytes \
  catboost xgboost polars_ta lightgbm \
  modelscope hf_transfer \

## 安装评测依赖
pip install segeval backtrader deepeval 

## 安装LLM微调框架依赖--LLaMAfactory
cd tools
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics]"
pip install deepspeed==0.15.4 # 此处版本号重要! 高版本有坑
llamafactory-cli version
cd ..

## 安装rag框架依赖--minirag, 该库未来应该可以通过pip install minirag-hku直接安装
cd tools
git clone https://github.com/HKUDS/MiniRAG.git
cd MiniRAG && pip install -e . && cd .. 
cd ..

部署LLM服务

通过vllm server来启动大模型，在终端输入：

CUDA_VISIBLE_DEVICES=0 \
vllm serve resources/open_models/Qwen2.5-3B-Instruct --trust-remote-code \
--served-model-name base   \
--enable-lora --lora-modules lora=resources/ckpts/Qwen2.5-3B-Instruct/lora_adapter \
--max-model-len 5000 --max-num-seqs 16 --quantization fp8 --gpu-memory-utilization 0.25 \
--port 12239

CUDA_VISIBLE_DEVICES=0,1 \
vllm serve resources/open_models/Qwen2.5-14B-Instruct --trust-remote-code \
--served-model-name judger  \
--max-model-len 5000 --max-num-seqs 30 --quantization fp8 --kv-cache-dtype fp8 \
--gpu-memory-utilization 0.4  --tensor-parallel-size 2 \
--port 12235

--tensor-parallel-size 2 --pipeline-parallel-size 2

模型基座

modelscope

终端下载示例：

MODEL=Qwen/Qwen2.5-14B-Instruct
LOCAL_PATH=resources/open_models/Qwen2.5-14B-Instruct
# 注意本地路径最后一级会直接作为模型文件夹
modelscope download --model $MODEL --local_dir $LOCAL_PATH

huggingface
- openbmb/MiniCPM3-4B
- ProsusAI/finbert

终端下载示例：

MODEL=ProsusAI/finbert
LOCAL_PATH=resources/open_models/ProsusAI-finbert
# 注意本地路径最后一级会直接作为模型文件夹
huggingface-cli download --resume-download --local-dir-use-symlinks False $MODEL --local-dir $LOCAL_PATH

数据源

LLM微调

这一部分主要根据qwen2.5训练文档改写而来。

数据准备

按照alpaca或sharegpt格式之一准备和注册数据集:

alpaca格式:

单条数据形式

[
  {
    "instruction": "user instruction (required)",
    "input": "user input (optional)",
    "output": "model response (required)",
    "system": "system prompt (optional)",
    "history": [
      ["user instruction in the first round (optional)", "model response in the first round (optional)"],
      ["user instruction in the second round (optional)", "model response in the second round (optional)"]
    ]
  }
]

在$resources/dataset_info.json中注册

"dataset_name": {
  "file_name": "path/to/dataset",
  "columns": {
    "prompt": "instruction",
    "query": "input",
    "response": "output",
    "system": "system",
    "history": "history"
  }
}

sharegpt:

单条数据形式

[
  {
    "messages": [
      {"from": "user", "value": "user instruction"},
      {"from": "assistant", "value": "model response"}
    ],
    "system": "system prompt (optional)",
    "tools": "tool description (optional)"
  }
]

在$resources/dataset_info.json中添加

"dataset_name": {
      "file_name": "path/to/dataset",
      "formatting": "sharegpt",
      "columns": {
        "messages": "messages"
      },
      "tags": {
        "role_tag": "role",
        "content_tag": "content",
        "user_tag": "user",
        "assistant_tag": "assistant",
        "system_tag": "system"
      }
  }

启动训练

训练脚本存放为$project-root/dev/sft_*.sh的形式

调整好其中的参数后从本项目根目录通过以下命令启动:

source train/sft_qwen2_5_3B_for_FINNA.sh

Note: 注意DATA_SETTINGS中template参数与所选模型匹配,详见https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#supported-models

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
demo		demo
dev		dev
eval		eval
resources		resources
tools		tools
train		train
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

基础环境安装

部署LLM服务

模型基座

数据源

LLM微调

About

Releases

Packages

Languages

huakyouin/finbot

Folders and files

Latest commit

History

Repository files navigation

基础环境安装

部署LLM服务

模型基座

数据源

LLM微调

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages