Model Serving

4 packages

Model Serving

vLLM

High-throughput, memory-efficient LLM inference and serving engine with PagedAttention and continuous batching — built for production GPU workloads.

Recently updated

Hero Score 75

Model Serving

Ollama

Official Python client for the Ollama local LLM runtime — pull, run, and chat with open-weight models on your machine.

Active

Hero Score 71

Model Serving

BentoML

Open-source Python framework for packaging, serving, and deploying ML/LLM models as production APIs — model-agnostic with built-in adaptive batching.

Active

Hero Score 73

Model Serving

OpenLLM

BentoML-built CLI and Python framework for self-hosting open-weight LLMs as OpenAI-compatible API endpoints with one command.

Inactive

Hero Score 58