Ragas
Metrics and tooling for evaluating RAG systems — faithfulness, answer relevance, context precision/recall, and dataset utilities.
llm-evaluation-frameworksNew to PyRadar
58
Hero Score
Popularity
52
Performance
70
Ecosystem
50
Maturity
61
Dev Experience
57
⭐ 14,176 stars⬇ 315.0K downloads/wkFirst release: May 2023Last release: Jan 2026
Async Support: YesPlugin Extensions: MediumSpeed: MediumDoc Quality: HighLearning Curve: Medium
Pros
- • Purpose-built metrics for RAG pipelines (faithfulness, grounding)
- • Integrations with LangChain/LlamaIndex and dataset generation
- • Clear recipes for offline eval and leaderboard-style reports
Cons
- • Many metrics rely on LLM-as-judge (latency/cost considerations)
- • Best practices still evolving; metric assumptions matter
- • Large-scale evals can be slow without caching/parallelism