← Back to ML & GenAI

ML Data Generator

A FastAPI microservice engineered to programmatically synthesize massive NLU training datasets from base utterance templates.

Combinatorial Expansion Engine

Engineered a Python script capable of ingesting raw CSV/Excel templates and exploding them using Cartesian product logic. It maps base phrases to designated system and custom entity dictionaries to generate thousands of unique, valid training variations.

FastAPI REST Architecture

Wrapped the generation logic into a scalable FastAPI REST microservice. This decouples the engine from local execution, allowing external bots, CI/CD pipelines, or web interfaces to dynamically request data manipulation and synthesis on demand.

Configurable Entity Mapping

Designed a schema that natively parses YAML and Excel index files. This allows Conversation Designers to dictate synonym pools, regex patterns, and entity relationships without ever having to touch the underlying Python source code.

api_response.json
// POST /api/v1/synthesize
{
  "template": "I want to update my [beneficiary_type]",
  "entity_map": "change_beneficiary.xlsx",
  "permutations": "max"
}
// RESPONSE 200 OK - 1,240 generated variations
{
  "status": "success",
  "dataset": [
    "I want to update my primary beneficiary",
    "I need to change my primary beneficiary",
    "I want to update my contingent beneficiary",
    "Could you update my life insurance beneficiary",
    "Please change my 401k beneficiary details",
    ... 1,235 more items ...
  ]
}