LLM Evaluation

4 packages

LLM Evaluation
Ragas
Metrics and tooling for evaluating RAG systems — faithfulness, answer relevance, context precision/recall, and dataset utilities.
Active
Hero Score 58
LLM Evaluation
Giskard
Open-source testing & evaluation platform for ML/LLM — test suites, bias/safety checks, and regression testing with a web UI.
Recently updated
Hero Score 51
LLM Evaluation
DeepEval
Pytest-style framework for evaluating LLM outputs with built-in metrics — hallucination, answer relevancy, faithfulness, G-Eval, and more.
Recently updated
Hero Score 71
LLM Evaluation
Inspect AI
UK AI Safety Institute's framework for large-scale LLM evaluations — solvers, scorers, and rich logging for benchmarks and safety evals.
Inactive
Hero Score 60
Compare Python Packages with ease.