benchflow-ai/awesome-evals

Evaluation & Testing

A curated, non-BS library of the best resources for building and evaluating AI agents — papers, blogs, talks, tools, benchmarks. Maintained by BenchFlow.

GitHub →

4.7

GitHub Metrics

Stars

558

Forks

Open Issues

Watchers

Contributors

Weekly Commits

Language

—

License

NOASSERTION

Last Commit

Jun 28, 2026

Created

Jun 24, 2026

Latest Release

—

Release Date

—

Synced: Jun 29, 2026

Quality Scores

Documentation Qualityw: 20%

3.8

No dedicated docs site. Description: 153 chars. Stars signal: 558. Contributors: 4. Score: 3.8/10

Community Healthw: 20%

3.8

Stars: 558. Contributors: 4. Watchers: 1. Forks: 41. Issue ratio: 0.2%. Score: 3.8/10

Maintenance Velocityw: 15%

6.6

Last commit: 1d ago. Weekly commits: 2. No releases published. Score: 6.6/10

API Design & DXw: 20%

6.4

Stars/issues ratio: 558. No dedicated API docs. License: NOASSERTION. Popularity signal: 558 stars. Score: 6.4/10

Production Readinessw: 15%

3.9

Battle-tested: 558 stars. Peer review: 4 contributors. No versioned releases. Licensed: NOASSERTION. Age: 0.0 years. Maintenance: last commit 1d ago. Score: 3.9/10

Ecosystem Integrationw: 10%

3.2

Fork interest: 41. License: NOASSERTION. Adoption: 558 stars. Score: 3.2/10