Docling
Open-source toolkit by IBM Research for parsing diverse document formats (PDF, DOCX, PPTX, HTML, XLSX, images, audio) into a unified structured representation ideal for GenAI and RAG workflows. Hosted by LF AI & Data Foundation.
document-parsing-frameworksRecently released
79
Hero Score
Popularity
76
Performance
70
Ecosystem
100
Maturity
76
Dev Experience
72
⭐ 60,752 stars⬇ 1.8M downloads/wkFirst release: Jul 2024Last release: Jun 2026
Async Support: YesPlugin Extensions: Very highSpeed: MediumDoc Quality: Very highLearning Curve: Medium
Pros
- • Parses many formats (PDF, DOCX, PPTX, HTML, XLSX, images, audio) into unified DoclingDocument model for AI/LLM use cases
- • Advanced PDF understanding including layout, reading order, table structures, OCR, and VLM support (GraniteDocling)
- • Native integrations with LangChain, LlamaIndex, Crew AI, Haystack, and MCP server for agentic applications
Cons
- • Some features require heavyweight optional dependencies (OCR backends, VLM models, ASR models)
- • Complex documents with unusual layouts may still need custom post-processing
- • Initial model download and setup can be time-consuming for full feature set