Model Serving

4 packages

Model Serving
vLLM
High-throughput, memory-efficient LLM inference and serving engine with PagedAttention and continuous batching — built for production GPU workloads.
Recently updated
Hero Score 74
Model Serving
Ollama
Official Python client for the Ollama local LLM runtime — pull, run, and chat with open-weight models on your machine.
Active
Hero Score 71
Model Serving
BentoML
Open-source Python framework for packaging, serving, and deploying ML/LLM models as production APIs — model-agnostic with built-in adaptive batching.
Recently updated
Hero Score 73
Model Serving
OpenLLM
BentoML-built CLI and Python framework for self-hosting open-weight LLMs as OpenAI-compatible API endpoints with one command.
Inactive
Hero Score 58
Compare Python Packages with ease.