Datafast
Synthetic text dataset generator for LLM projects supporting classification, instruction, MCQ, and preference datasets via multiple LLM providers.
synthetic-data-generation-frameworksNew to PyRadar
41
Hero Score
Popularity
34
Performance
30
Ecosystem
25
Maturity
46
Dev Experience
68
⭐ 58 stars⬇ 316 downloads/wkFirst release: Jan 2025Last release: Mar 2026
Async Support: NoPlugin Extensions: GrowingSpeed: MediumDoc Quality: MediumLearning Curve: Easy
Pros
- • Multi-provider support (OpenAI, Anthropic, Gemini, Ollama, Mistral) with combinatorial prompt expansion for diverse outputs
- • Simple config-driven API for generating classification, instruction, MCQ, and preference datasets
- • Built-in Hugging Face Hub integration for direct dataset publishing
Cons
- • Very early stage (v0.0.x) with unstable APIs that may change
- • Small community and limited ecosystem compared to Faker or SDV
- • Requires LLM API keys or local Ollama setup; generation cost depends on provider