Skip to content

投资组合的智能助手,融合多种语言模型

Notifications You must be signed in to change notification settings

huakyouin/finbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

基础环境安装

ENV_NAME="finbot"
PYTHON_VERSION="3.10"

## 创建虚拟环境并激活
conda create -n $ENV_NAME python=$PYTHON_VERSION notebook -y
conda activate $ENV_NAME

## 设置pip国内源
export PIP_INDEX_URL=https://pypi.tuna.tsinghua.edu.cn/simple

## 安装基础依赖
pip install \
  addict simplejson sortedcontainers openpyxl matplotlib \
  vllm peft FlagEmbedding bitsandbytes \
  catboost xgboost polars_ta lightgbm \
  modelscope hf_transfer \

## 安装评测依赖
pip install segeval backtrader deepeval 

## 安装LLM微调框架依赖--LLaMAfactory
cd tools
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics]"
pip install deepspeed==0.15.4 # 此处版本号重要! 高版本有坑
llamafactory-cli version
cd ..

## 安装rag框架依赖--minirag, 该库未来应该可以通过pip install minirag-hku直接安装
cd tools
git clone https://github.com/HKUDS/MiniRAG.git
cd MiniRAG && pip install -e . && cd .. 
cd ..

部署LLM服务

通过vllm server来启动大模型,在终端输入:

CUDA_VISIBLE_DEVICES=0 \
vllm serve resources/open_models/Qwen2.5-3B-Instruct --trust-remote-code \
--served-model-name base   \
--enable-lora --lora-modules lora=resources/ckpts/Qwen2.5-3B-Instruct/lora_adapter \
--max-model-len 5000 --max-num-seqs 16 --quantization fp8 --gpu-memory-utilization 0.25 \
--port 12239
CUDA_VISIBLE_DEVICES=0,1 \
vllm serve resources/open_models/Qwen2.5-14B-Instruct --trust-remote-code \
--served-model-name judger  \
--max-model-len 5000 --max-num-seqs 30 --quantization fp8 --kv-cache-dtype fp8 \
--gpu-memory-utilization 0.4  --tensor-parallel-size 2 \
--port 12235 

--tensor-parallel-size 2 --pipeline-parallel-size 2

模型基座

终端下载示例:

MODEL=Qwen/Qwen2.5-14B-Instruct
LOCAL_PATH=resources/open_models/Qwen2.5-14B-Instruct
# 注意本地路径最后一级会直接作为模型文件夹
modelscope download --model $MODEL --local_dir $LOCAL_PATH

终端下载示例:

MODEL=ProsusAI/finbert
LOCAL_PATH=resources/open_models/ProsusAI-finbert
# 注意本地路径最后一级会直接作为模型文件夹
huggingface-cli download --resume-download --local-dir-use-symlinks False $MODEL --local-dir $LOCAL_PATH

数据源

LLM微调

这一部分主要根据qwen2.5训练文档改写而来。

  1. 数据准备

按照alpacasharegpt格式之一准备和注册数据集:

  • alpaca格式:

    • 单条数据形式

      [
        {
          "instruction": "user instruction (required)",
          "input": "user input (optional)",
          "output": "model response (required)",
          "system": "system prompt (optional)",
          "history": [
            ["user instruction in the first round (optional)", "model response in the first round (optional)"],
            ["user instruction in the second round (optional)", "model response in the second round (optional)"]
          ]
        }
      ]
    • $resources/dataset_info.json中注册

      "dataset_name": {
        "file_name": "path/to/dataset",
        "columns": {
          "prompt": "instruction",
          "query": "input",
          "response": "output",
          "system": "system",
          "history": "history"
        }
      }
  • sharegpt:

    • 单条数据形式

      [
        {
          "messages": [
            {"from": "user", "value": "user instruction"},
            {"from": "assistant", "value": "model response"}
          ],
          "system": "system prompt (optional)",
          "tools": "tool description (optional)"
        }
      ]
    • $resources/dataset_info.json中添加

      "dataset_name": {
            "file_name": "path/to/dataset",
            "formatting": "sharegpt",
            "columns": {
              "messages": "messages"
            },
            "tags": {
              "role_tag": "role",
              "content_tag": "content",
              "user_tag": "user",
              "assistant_tag": "assistant",
              "system_tag": "system"
            }
        }
  1. 启动训练

训练脚本存放为$project-root/dev/sft_*.sh的形式

调整好其中的参数后从本项目根目录通过以下命令启动:

source train/sft_qwen2_5_3B_for_FINNA.sh

Note: 注意DATA_SETTINGS中template参数与所选模型匹配,详见https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#supported-models

About

投资组合的智能助手,融合多种语言模型

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages