forked from sagnik/Project_Velocity
#24 WebOS Completion Co-authored-by: Sayan Datta <sayan@Sayans-MacBook-Air.local> Reviewed-on: sagnik/Project_Velocity#25
2.2 KiB
2.2 KiB
Kimi Synthetic Data Downstream Plan
Goal
Use the Oracle template taxonomy as the control surface for generating structured synthetic examples that can be replayed into analytics, training, QA, and demo environments without coupling generation logic to any one UI surface.
Inputs
backend/oracle/oracle_template_seed_db.jsonfor chapters, subchapters, and exemplar promptsschema_extension_v2.sqltables for templates, synthetic jobs, and auditability- Admin surface actions for publish, archive, trigger, and cancel workflows
Downstream stages
- Template selection
- Admin or operator selects a published template chapter and revision.
- The request binds tenant, locale, target channel, and generation volume.
- Prompt expansion
- Seed examples are expanded into structured prompt packs.
- Each pack should preserve chapter lineage and example provenance.
- Synthetic generation
- Queue work into
oracle_synthetic_generation_jobs. - Persist idempotency keys so reruns can be traced without duplicate publication.
- Queue work into
- Validation and scoring
- Run schema validation on every generated artifact.
- Score for completeness, realism, and chapter coverage.
- Distribution
- Publish accepted outputs to analytics sandboxes, QA fixtures, or demonstration bundles.
- Keep rejected artifacts attached to the job for review rather than dropping them silently.
Contract shape
- Request
template_idtemplate_revisionchapter_keysubchapter_keytenant_idlocalerecord_counttarget_surface
- Result
job_idstatusaccepted_recordsrejected_recordsoutput_manifestlineage
Guardrails
- Only published templates can be used for production synthetic jobs.
- Every output record must retain template lineage metadata.
- Cancelled jobs remain queryable from the admin surface.
- Generated content should never overwrite operator-authored production data.
Immediate next step
Implement a background worker that consumes pending rows from oracle_synthetic_generation_jobs, writes structured manifests, and exposes completion state through the admin surface queue endpoints.