STACKQUADRANT

benchflow-ai/awesome-evals

Evaluation & Testing

A curated, non-BS library of the best resources for building and evaluating AI agents — papers, blogs, talks, tools, benchmarks. Maintained by BenchFlow.

4.7
GitHub Metrics
Stars
558
Forks
41
Open Issues
1
Watchers
1
Contributors
4
Weekly Commits
2
Language
License
NOASSERTION
Last Commit
Jun 28, 2026
Created
Jun 24, 2026
Latest Release
Release Date
Synced: Jun 29, 2026
Quality Scores
Documentation Qualityw: 20%
3.8

No dedicated docs site. Description: 153 chars. Stars signal: 558. Contributors: 4. Score: 3.8/10

Community Healthw: 20%
3.8

Stars: 558. Contributors: 4. Watchers: 1. Forks: 41. Issue ratio: 0.2%. Score: 3.8/10

Maintenance Velocityw: 15%
6.6

Last commit: 1d ago. Weekly commits: 2. No releases published. Score: 6.6/10

API Design & DXw: 20%
6.4

Stars/issues ratio: 558. No dedicated API docs. License: NOASSERTION. Popularity signal: 558 stars. Score: 6.4/10

Production Readinessw: 15%
3.9

Battle-tested: 558 stars. Peer review: 4 contributors. No versioned releases. Licensed: NOASSERTION. Age: 0.0 years. Maintenance: last commit 1d ago. Score: 3.9/10

Ecosystem Integrationw: 10%
3.2

Fork interest: 41. License: NOASSERTION. Adoption: 558 stars. Score: 3.2/10

Tags
agent-evaluationai-agentsawesomeawesome-listbenchmarksevalsllmllm-evaluationrl-environments
Radar
Documentation Quality
Community Health
Maintenance Velocity
API Design & DX
Production Readiness
Ecosystem Integration