FastEmbed

Lightweight ONNX-based embedding library by Qdrant — runs on CPU with no PyTorch dependency, designed for fast batch inference in production RAG pipelines.

embedding-frameworksNew to PyRadarnpm: fastembed

Hero Score

Popularity

Performance

100

Ecosystem

Maturity

Dev Experience

⭐ 3,086 stars⬇ 2.7M downloads/wkFirst release: Jul 2023Last release: Mar 2026

Async Support: YesPlugin Extensions: MediumSpeed: Very fastDoc Quality: HighLearning Curve: Easy

Pros

• ONNX runtime delivers fast CPU inference without PyTorch — small Docker images and quick cold starts
• Ships with curated quantized models (dense, sparse, ColBERT-style) ready out of the box
• Tight integration with Qdrant for end-to-end RAG pipelines

Cons

• Smaller model catalog than sentence-transformers — limited to curated ONNX-converted set
• Less flexible for custom fine-tuning workflows
• First-party JS sister exists but Python remains the canonical implementation

Alternatives in embedding-frameworks

View documentation →