Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

EleutherAI / lm-evaluation-harness Public

Notifications You must be signed in to change notification settings
Fork 2.1k
Star 7.7k

Code
Issues 355
Pull requests 99
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Pull requests: EleutherAI/lm-evaluation-harness

Labels 10 Milestones 1

Labels 10 Milestones 1

New pull request New

99 Open 1,322 Closed

99 Open 1,322 Closed

Author

Filter by author

Loading

Label

Filter by label

Loading

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Loading

Milestones

Filter by milestone

Loading

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Loading

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Adding the Evalita-LLM benchmark

#2671 opened Feb 5, 2025 by m-resta

Loading…

1

Convert gen tasks to multiple_choice

#2670 opened Feb 4, 2025 by baberabb • Draft

handle chat kwargs for construct_requests

#2668 opened Jan 31, 2025 by baberabb

Loading…

[hf-multimodal] pass kwargs to self.processor

#2667 opened Jan 31, 2025 by baberabb

Loading…

3

Add from dataframe

#2655 opened Jan 25, 2025 by AMindToThink

Loading…

1

humaneval instruct

#2650 opened Jan 22, 2025 by baberabb

Loading…

Easily evaluate models steered by SAEs

#2641 opened Jan 21, 2025 by AMindToThink

Loading…

1

Include all test files in sdist

#2634 opened Jan 19, 2025 by booxter

Loading…

3

Add loncxt tasks

#2629 opened Jan 17, 2025 by baberabb • Draft

1

fix: nemo eval in containers with TransformerEngine > 1.10 won't error

#2621 opened Jan 10, 2025 by terrykong • Draft

1

Added EU20 task suite

#2620 opened Jan 10, 2025 by KlaudiaTH

Loading…

2

Add support for generative answering of multiple_choice tasks

#2601 opened Dec 29, 2024 by pasky

Loading…

9

change to single process for bootstrap_stderr

#2593 opened Dec 23, 2024 by zhuyuhua-v

Loading…

4

Update dependencies in pyproject.toml: remove duplicated 'evaluate', add 'anthropic' and 'openai' to optional dependencies

#2582 opened Dec 19, 2024 by Technolog796

Loading…

5

Added caseHOLD task

#2570 opened Dec 16, 2024 by zolastro

Loading…

2

add llama3 tasks

#2556 opened Dec 10, 2024 by baberabb

Loading…

[MM] Chartqa

#2544 opened Dec 5, 2024 by baberabb • Draft

[MM] Ai2d

#2542 opened Dec 5, 2024 by baberabb • Draft

add Darija (Moroccan dialects) tasks including darijammlu. darijahellaswag and darija_bench

#2521 opened Nov 28, 2024 by hadi-abdine

Loading…

Add --examples Argument for Fine-Grained Task Evaluation in lm-evaluation-harness. This feature is the first step towards efficient multi-prompt evaluation with PromptEval [1,2]

#2520 opened Nov 26, 2024 by felipemaiapolo

Loading…

6

max_length not used

#2515 opened Nov 25, 2024 by lintangsutawika

Loading…

3

Added small fix to split by eos_token_id before decoding

#2512 opened Nov 24, 2024 by EtashGuha

Loading…

1

Added regex filter for bbh fewshot

#2502 opened Nov 18, 2024 by RawthiL

Loading…

1

Add GigaChat API

#2495 opened Nov 15, 2024 by seldereyy

Loading…

Yaml crowspairs tasks

#2488 opened Nov 14, 2024 by NAM00

Loading…

1

Previous 1 2 3 4 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.