Model Serving
4 packages
Model ServingRecently updated
vLLM
High-throughput, memory-efficient LLM inference and serving engine with PagedAttention and continuous batching — built for production GPU workloads.
Hero Score 74
Model ServingActive
Ollama
Official Python client for the Ollama local LLM runtime — pull, run, and chat with open-weight models on your machine.
Hero Score 71
Model ServingRecently updated
BentoML
Open-source Python framework for packaging, serving, and deploying ML/LLM models as production APIs — model-agnostic with built-in adaptive batching.
Hero Score 73
Model ServingInactive
OpenLLM
BentoML-built CLI and Python framework for self-hosting open-weight LLMs as OpenAI-compatible API endpoints with one command.
Hero Score 58