Skip to content

Pull requests: EleutherAI/lm-evaluation-harness

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Adding the Evalita-LLM benchmark
#2671 opened Feb 5, 2025 by m-resta Loading…
Convert gen tasks to multiple_choice
#2670 opened Feb 4, 2025 by baberabb Draft
handle chat kwargs for construct_requests
#2668 opened Jan 31, 2025 by baberabb Loading…
[hf-multimodal] pass kwargs to self.processor
#2667 opened Jan 31, 2025 by baberabb Loading…
Add from dataframe
#2655 opened Jan 25, 2025 by AMindToThink Loading…
humaneval instruct
#2650 opened Jan 22, 2025 by baberabb Loading…
Easily evaluate models steered by SAEs
#2641 opened Jan 21, 2025 by AMindToThink Loading…
Include all test files in sdist
#2634 opened Jan 19, 2025 by booxter Loading…
Add loncxt tasks
#2629 opened Jan 17, 2025 by baberabb Draft
Added EU20 task suite
#2620 opened Jan 10, 2025 by KlaudiaTH Loading…
change to single process for bootstrap_stderr
#2593 opened Dec 23, 2024 by zhuyuhua-v Loading…
Added caseHOLD task
#2570 opened Dec 16, 2024 by zolastro Loading…
add llama3 tasks
#2556 opened Dec 10, 2024 by baberabb Loading…
[MM] Chartqa
#2544 opened Dec 5, 2024 by baberabb Draft
[MM] Ai2d
#2542 opened Dec 5, 2024 by baberabb Draft
max_length not used
#2515 opened Nov 25, 2024 by lintangsutawika Loading…
Added regex filter for bbh fewshot
#2502 opened Nov 18, 2024 by RawthiL Loading…
Add GigaChat API
#2495 opened Nov 15, 2024 by seldereyy Loading…
Yaml crowspairs tasks
#2488 opened Nov 14, 2024 by NAM00 Loading…
ProTip! Mix and match filters to narrow down what you’re looking for.