LLM Evaluation

4 packages

LLM Evaluation

Ragas

Metrics and tooling for evaluating RAG systems — faithfulness, answer relevance, context precision/recall, and dataset utilities.

Inactive

Hero Score 57

LLM Evaluation

Giskard

Open-source testing & evaluation platform for ML/LLM — test suites, bias/safety checks, and regression testing with a web UI.

Recently updated

Hero Score 51

LLM Evaluation

DeepEval

Pytest-style framework for evaluating LLM outputs with built-in metrics — hallucination, answer relevancy, faithfulness, G-Eval, and more.

Recently updated

Hero Score 71

LLM Evaluation

Inspect AI

UK AI Safety Institute's framework for large-scale LLM evaluations — solvers, scorers, and rich logging for benchmarks and safety evals.

Inactive

Hero Score 60