What's new

Tasks

LiveCodeBench by @plaguss in #548, #587, #518
GPQA diamond by @lewtun in #534
Humanity's last exam by @clefourrier in #520
Olympiad Bench by @NathanHB in #521
aime24, 25 and math500 by @NathanHB in #586
french models Evals by @mdiazmel in #505

Metrics

Pass@k by @clefourrier in #519
Extractive Match metric by @hynky1999 in #495, #503, #522, #535

Features

Better logging

log model config by @NathanHB in #627
Support custom results/details push to hub by @albertvillanova in #457
Push details without converting fields to str by @NathanHB in #572

Inference providers

adds inference providers support by @NathanHB in #616

Load details to be evaluated

Implemented the possibility to load predictions from details files and continue evaluating from there by @JoelNiklaus in #488

sglang support

sglang by @Jayon02 in #552

Bug Fixes and refacto

Tiny improvements to endpoint_model.py, base_model.py,... by @sadra-barikbin in #219
Update README.md by @NathanHB in #486
Fix issue with encodings for together models. by @JoelNiklaus in #483
Made litellm judge backend more robust. by @JoelNiklaus in #485
Fix T_co import bug by @gucci-j in #484
fix README link by @vxw3t8fhjsdkghvbdifuk in #500
Fixed issue with o1 in litellm. by @JoelNiklaus in #493
Hotfix for litellm judge by @JoelNiklaus in #490
Made judge response processing more robust. by @JoelNiklaus in #491
VLLM: Allows for max tokens to be set in model config file by @NathanHB in #547
Bump up the latex2sympy2_extended version + more tests by @hynky1999 in #510
Fixed bug of import url_to_fs from fsspec by @LoserCheems in #507)
Fix Ukrainian indices and confirmation word by @ayukh in #516
Fix VLLM data-parallel by @hynky1999 in #541
relax spacy import to relax dep by @clefourrier in #622
vllm fix sampling params by @NathanHB in #625
relax deps for tgi by @NathanHB in #626
Bug fix extractive match by @hynky1999 in #540
Fix loading of vllm model from files by @NathanHB in #533
fix: broken URLs by @deep-diver in #550
typo(vllm): gpu_memory_utilisation typo by @tpoisonooo in #553
allows better flexibility for litellm endpoints by @NathanHB in #549
Translate task template to Catalan and Galician and fix typos by @mariagrandury in #506
Relax upper bound on torch by @lewtun in #508
Fix vLLM generation with sampling params by @lewtun in #578
Make BLEURT lazy by @hynky1999 in #536
Fixing backend error in main_sglang. by @TankNee in #597
VLLM + Math-Verify fixes by @hynky1999 in #603
raise exception when generation size is more than model length by @NathanHB in #571

Thanks

Huge thanks to Hyneck, Lewis, Ben, Agustín, Elie and everyone helping and and giving feedback 💙

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@hynky1999
- Extractive Match metric (#495)
- Fix math extraction (#503)
- Bump up the latex2sympy2_extended version + more tests (#510)
- Math extraction - allow only trying the first match, more customizable latex extraction + bump deps (#522)
- add missing inits (#524)
- Sync Math-verify (#535)
- Make BLEURT lazy (#536)
- Bug fix extractive match (#540)
- Fix VLLM data-parallel (#541)
- VLLM + Math-Verify fixes (#603)
@plaguss
- Add extended task for LiveCodeBench codegeneration (#548)
- Add subsets for lcb (#587)
@Jayon02
- Let lighteval support sglang (#552)
@NathanHB
- adds olympiad bench (#521)
- Fix loading of vllm model from files (#533)
- [VLLM] Allows for max tokens to be set in model config file (#547)
- allows better flexibility for litellm endpoints (#549)
- raise exception when generation size is more than model length (#571)
- Push details without converting fields to str (#572)
- adds aime24, 25 and math500 (#586)
- adds inference providers support (#616)
- vllm fix sampling params (#625)
- relax deps for tgi (#626)
- log model config (#627)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.8.0

What's new

Tasks

Metrics

Features

Better logging

Inference providers

Load details to be evaluated

sglang support

Bug Fixes and refacto

Thanks

Significant community contributions

Contributors