Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yi-1.5-9B指标没法复现 #21

Open
Firefly-001 opened this issue May 23, 2024 · 1 comment
Open

Yi-1.5-9B指标没法复现 #21

Firefly-001 opened this issue May 23, 2024 · 1 comment
Labels
question Further information is requested

Comments

@Firefly-001
Copy link

我使用opencompass对Yi-1.5-9B在MATH(4 shot),HumanEval/HumanEval plus(0 shot),MBPP(3 shot)的测试集上进行评估。评估的结果和官方提供的指标有一定差距,能否提供一下官方的评测脚本或者详细参数以便复现指标?

下面是我的评测脚本和结果

  • 脚本:
cd opencompass
python run.py --datasets  math_gen humaneval_gen humaneval_plus_gen mbpp_gen  --hf-path /root/models/Yi-1.5-9B --model-kwargs device_map='auto' --tokenizer-kwargs padding_side='left' truncation='left' use_fast=False --max-out-len 512 --max-seq-len 4096 --batch-size 8 --no-batch-padding --num-gpus 1
  • 结果:
dataset           version    metric                 mode      opencompass.models.huggingface.HuggingFace_models_Yi-1.5-9B

---
math              5f997e     accuracy               gen                                                             28.3
openai_humaneval  8e312c     humaneval_pass@1       gen                                                             25.61
humaneval_plus    8e312c     humaneval_plus_pass@1  gen                                                             21.34
mbpp              3ede66     score                  gen                                                             58.6
mbpp              3ede66     pass                   gen                                                            293
mbpp              3ede66     timeout                gen                                                              4
mbpp              3ede66     failed                 gen                                                             24
mbpp              3ede66     wrong_answer           gen                                                            179
@Yimi81
Copy link
Contributor

Yimi81 commented May 27, 2024

非常抱歉,评测脚本是内部的框架没法提供,具体详细的参数我也不知道~ 至于使用opencompass没法复现的情况我会进行测试,后续有结果我会在这与你讨论

@Haijian06 Haijian06 added the question Further information is requested label Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants