BentoML
Open-source Python framework for packaging, serving, and deploying ML/LLM models as production APIs — model-agnostic with built-in adaptive batching.
model-serving-frameworksRecently released
73
Hero Score
Popularity
46
Performance
85
Ecosystem
75
Maturity
92
Dev Experience
68
⭐ 8,661 stars⬇ 49.7K downloads/wkFirst release: Jan 2019Last release: May 2026
Async Support: YesPlugin Extensions: HighSpeed: FastDoc Quality: Very highLearning Curve: Medium
Pros
- • Framework-agnostic — wraps PyTorch, TensorFlow, scikit-learn, Hugging Face, and LLM models behind one API
- • Built-in adaptive batching and parallel workers for production throughput
- • Produces self-contained deployable bundles ("bentos") with Docker, Kubernetes, and cloud targets
Cons
- • Adds a packaging concept (Service / bento) on top of plain Python code
- • Less specialized for raw LLM throughput than dedicated engines like vLLM
- • Full deployment story leans on BentoCloud or extra infra