deepeval
confident-ai/deepeval
8.0
Evaluation & Testing
★ 16.5k◇ 1.6kPythonApache-2.02d ago
Ragas
explodinggradients/ragas
7.4
Evaluation & Testing
★ 14.6k◇ 1.5kPythonApache-2.04mo ago
garak
NVIDIA/garak
7.5
Evaluation & Testing
★ 8.2k◇ 1.1kPythonApache-2.02d ago
chinese-llm-benchmark
jeinlee1991/chinese-llm-benchmark
6.3
Evaluation & Testing
★ 6.2k◇ 2541d ago
LLM-Engineers-Handbook
PacktPublishing/LLM-Engineers-Handbook
6.6
Evaluation & Testing
★ 5.1k◇ 1.2kPythonMIT2mo ago
lmms-eval
EvolvingLMMs-Lab/lmms-eval
7.5
Evaluation & Testing
★ 4.3k◇ 608PythonNOASSERTION4d ago
agenta
Agenta-AI/agenta
7.4
Evaluation & Testing
★ 4.2k◇ 555TypeScriptNOASSERTIONtoday
AI-Infra-Guard
Tencent/AI-Infra-Guard
7.4
Evaluation & Testing
★ 4.0k◇ 385PythonApache-2.02d ago
trulens
truera/trulens
7.3
Evaluation & Testing
★ 3.4k◇ 306PythonMIT6d ago
lmnr
lmnr-ai/lmnr
7.0
Evaluation & Testing
★ 3.0k◇ 212TypeScriptApache-2.0today
Observal
BlazeUp-AI/Observal
6.0
Evaluation & Testing
★ 2.1k◇ 459PythonNOASSERTIONtoday
aisheets
huggingface/aisheets
6.1
Evaluation & Testing
★ 1.6k◇ 141TypeScriptApache-2.01mo ago
FuzzyAI
cyberark/FuzzyAI
5.4
Evaluation & Testing
★ 1.5k◇ 207Jupyter NotebookApache-2.04mo ago
prompty
microsoft/prompty
6.8
Evaluation & Testing
★ 1.2k◇ 118TypeScriptMITtoday
uqlm
cvs-health/uqlm
6.7
Evaluation & Testing
★ 1.2k◇ 127PythonApache-2.020d ago
FinSight-AI
juanjuandog/FinSight-AI
5.1
Evaluation & Testing
★ 1.1k◇ 59JavaMIT1mo ago
passmark
bug0inc/passmark
5.9
Evaluation & Testing
★ 1.1k◇ 170TypeScriptNOASSERTION12d ago
judgeval
JudgmentLabs/judgeval
6.7
Evaluation & Testing
★ 1.0k◇ 93PythonApache-2.03d ago
WHartTest
MGdaasLab/WHartTest
6.2
Evaluation & Testing
★ 938◇ 131PythonMIT2d ago
scenario
langwatch/scenario
5.9
Evaluation & Testing
★ 906◇ 67PythonMIT2d ago
Awesome-LLM-Eval
onejune2018/Awesome-LLM-Eval
4.7
Evaluation & Testing
★ 647◇ 76MIT7mo ago
aimock
CopilotKit/aimock
6.3
Evaluation & Testing
★ 637◇ 44TypeScriptMITtoday
Awesome-LLM-in-Social-Science
ValueByte-AI/Awesome-LLM-in-Social-Science
5.1
Evaluation & Testing
★ 633◇ 49MIT20d ago
agent-skills-eval
darkrishabh/agent-skills-eval
5.3
Evaluation & Testing
★ 603◇ 30TypeScriptMIT4d ago
iFixAi
ifixai-ai/iFixAi
6.2
Evaluation & Testing
★ 570◇ 112PythonApache-2.01d ago
langtest
Pacific-AI-Corp/langtest
5.8
Evaluation & Testing
★ 562◇ 50PythonApache-2.02mo ago
langtest
PacificAI/langtest
5.8
Evaluation & Testing
★ 562◇ 50PythonApache-2.02mo ago
awesome-evals
benchflow-ai/awesome-evals
4.7
Evaluation & Testing
★ 552◇ 40NOASSERTIONtoday
continuous-eval
relari-ai/continuous-eval
4.7
Evaluation & Testing
★ 517◇ 38PythonApache-2.01y ago
fakecloud
faiscadev/fakecloud
5.7
Evaluation & Testing
★ 453◇ 29RustAGPL-3.0today
rhesis
rhesis-ai/rhesis
5.5
Evaluation & Testing
★ 373◇ 26PythonNOASSERTION1d ago
llm-leaderboard
JonathanChavezTamales/llm-leaderboard
4.7
Evaluation & Testing
★ 361◇ 40JavaScriptNOASSERTION8mo ago
palico-ai
palico-ai/palico-ai
4.5
Evaluation & Testing
★ 343◇ 28TypeScriptMIT1y ago
llms-tools
PetroIvaniuk/llms-tools
4.8
Evaluation & Testing
★ 319◇ 45Apache-2.027d ago
flutter-skill
ai-dashboad/flutter-skill
5.4
Evaluation & Testing
★ 315◇ 44DartMIT16d ago
athina-evals
athina-ai/athina-evals
4.1
Evaluation & Testing
★ 301◇ 22Python1y ago
testdriverai
testdriverai/testdriverai
4.6
Evaluation & Testing
★ 222◇ 33JavaScript1mo ago
qaskills
PramodDutta/qaskills
4.2
Evaluation & Testing
★ 160◇ 16TypeScripttoday
agent-qa
vostride/agent-qa
4.8
Evaluation & Testing
★ 150◇ 7TypeScriptNOASSERTION14d ago