Datafast

Synthetic text dataset generator for LLM projects supporting classification, instruction, MCQ, and preference datasets via multiple LLM providers.

synthetic-data-generation-frameworksNew to PyRadar
41
Hero Score
Popularity
34
Performance
30
Ecosystem
25
Maturity
46
Dev Experience
68
⭐ 58 stars⬇ 316 downloads/wkFirst release: Jan 2025Last release: Mar 2026
Async Support: NoPlugin Extensions: GrowingSpeed: MediumDoc Quality: MediumLearning Curve: Easy

Pros

  • Multi-provider support (OpenAI, Anthropic, Gemini, Ollama, Mistral) with combinatorial prompt expansion for diverse outputs
  • Simple config-driven API for generating classification, instruction, MCQ, and preference datasets
  • Built-in Hugging Face Hub integration for direct dataset publishing

Cons

  • Very early stage (v0.0.x) with unstable APIs that may change
  • Small community and limited ecosystem compared to Faker or SDV
  • Requires LLM API keys or local Ollama setup; generation cost depends on provider

Alternatives in synthetic-data-generation-frameworks

Compare Python Packages with ease.