STACKQUADRANT

onejune2018/Awesome-LLM-Eval

Evaluation & Testing

Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表,主要面向基础大模型评测,旨在探求生成式AI的技术边界.

4.7
GitHub Metrics
Stars
647
Forks
76
Open Issues
35
Watchers
8
Contributors
5
Weekly Commits
0
Language
License
MIT
Last Commit
Nov 24, 2025
Created
Apr 26, 2023
Latest Release
Release Date
Synced: Jun 29, 2026
Quality Scores
Documentation Qualityw: 20%
5.7

Has docs site (https://arxiv.org/abs/2508.18646). Description: 198 chars. Stars signal: 647. Contributors: 5. Score: 5.7/10

Community Healthw: 20%
3.7

Stars: 647. Contributors: 5. Watchers: 8. Forks: 76. Issue ratio: 5.4%. Score: 3.7/10

Maintenance Velocityw: 15%
3.0

Last commit: 217d ago. Weekly commits: 0. No releases published. Maturity bonus: 3.2y old. Score: 3/10

API Design & DXw: 20%
6.4

Stars/issues ratio: 18. Has documentation site. Permissive license: MIT. Popularity signal: 647 stars. Score: 6.4/10

Production Readinessw: 15%
3.5

Battle-tested: 647 stars. Peer review: 5 contributors. No versioned releases. Licensed: MIT. Age: 3.2 years. Maintenance: last commit 217d ago. Score: 3.5/10

Ecosystem Integrationw: 10%
5.6

Fork interest: 76. Integration-friendly: MIT. Adoption: 647 stars. Has web presence. Score: 5.6/10

Tags
awsome-listawsome-listsbenchmarkbertchatglmchatgptdatasetevaluationgpt3large-language-model
Radar
Documentation Quality
Community Health
Maintenance Velocity
API Design & DX
Production Readiness
Ecosystem Integration