LLM Evaluation
4 packages
LLM EvaluationActive
Ragas
Metrics and tooling for evaluating RAG systems — faithfulness, answer relevance, context precision/recall, and dataset utilities.
Hero Score 58
LLM EvaluationRecently updated
Giskard
Open-source testing & evaluation platform for ML/LLM — test suites, bias/safety checks, and regression testing with a web UI.
Hero Score 51
LLM EvaluationRecently updated
DeepEval
Pytest-style framework for evaluating LLM outputs with built-in metrics — hallucination, answer relevancy, faithfulness, G-Eval, and more.
Hero Score 71
LLM EvaluationInactive
Inspect AI
UK AI Safety Institute's framework for large-scale LLM evaluations — solvers, scorers, and rich logging for benchmarks and safety evals.
Hero Score 60