Files

sayan 84e439712c feat/#24 WebOS Completion (#25 )

#24 WebOS Completion

Co-authored-by: Sayan Datta <sayan@Sayans-MacBook-Air.local>
Reviewed-on: #25

2026-04-18 18:59:04 +05:30

2.2 KiB

Raw Permalink Blame History

Kimi Synthetic Data Downstream Plan

Goal

Use the Oracle template taxonomy as the control surface for generating structured synthetic examples that can be replayed into analytics, training, QA, and demo environments without coupling generation logic to any one UI surface.

Inputs

backend/oracle/oracle_template_seed_db.json for chapters, subchapters, and exemplar prompts
schema_extension_v2.sql tables for templates, synthetic jobs, and auditability
Admin surface actions for publish, archive, trigger, and cancel workflows

Downstream stages

Template selection
- Admin or operator selects a published template chapter and revision.
- The request binds tenant, locale, target channel, and generation volume.
Prompt expansion
- Seed examples are expanded into structured prompt packs.
- Each pack should preserve chapter lineage and example provenance.
Synthetic generation
- Queue work into oracle_synthetic_generation_jobs.
- Persist idempotency keys so reruns can be traced without duplicate publication.
Validation and scoring
- Run schema validation on every generated artifact.
- Score for completeness, realism, and chapter coverage.
Distribution
- Publish accepted outputs to analytics sandboxes, QA fixtures, or demonstration bundles.
- Keep rejected artifacts attached to the job for review rather than dropping them silently.

Contract shape

Request
- template_id
- template_revision
- chapter_key
- subchapter_key
- tenant_id
- locale
- record_count
- target_surface
Result
- job_id
- status
- accepted_records
- rejected_records
- output_manifest
- lineage

Guardrails

Only published templates can be used for production synthetic jobs.
Every output record must retain template lineage metadata.
Cancelled jobs remain queryable from the admin surface.
Generated content should never overwrite operator-authored production data.

Immediate next step

Implement a background worker that consumes pending rows from oracle_synthetic_generation_jobs, writes structured manifests, and exposes completion state through the admin surface queue endpoints.

2.2 KiB Raw Permalink Blame History