feat/#24 WebOS Completion (#25)

#24 WebOS Completion

Co-authored-by: Sayan Datta <sayan@Sayans-MacBook-Air.local>
Reviewed-on: #25
This commit was merged in pull request #25.
This commit is contained in:
2026-04-18 18:59:04 +05:30
parent 857e0b88e6
commit 84e439712c
459 changed files with 11713 additions and 3853 deletions

View File

@@ -0,0 +1,59 @@
# Kimi Synthetic Data Downstream Plan
## Goal
Use the Oracle template taxonomy as the control surface for generating structured synthetic examples that can be replayed into analytics, training, QA, and demo environments without coupling generation logic to any one UI surface.
## Inputs
- `backend/oracle/oracle_template_seed_db.json` for chapters, subchapters, and exemplar prompts
- `schema_extension_v2.sql` tables for templates, synthetic jobs, and auditability
- Admin surface actions for publish, archive, trigger, and cancel workflows
## Downstream stages
1. Template selection
- Admin or operator selects a published template chapter and revision.
- The request binds tenant, locale, target channel, and generation volume.
2. Prompt expansion
- Seed examples are expanded into structured prompt packs.
- Each pack should preserve chapter lineage and example provenance.
3. Synthetic generation
- Queue work into `oracle_synthetic_generation_jobs`.
- Persist idempotency keys so reruns can be traced without duplicate publication.
4. Validation and scoring
- Run schema validation on every generated artifact.
- Score for completeness, realism, and chapter coverage.
5. Distribution
- Publish accepted outputs to analytics sandboxes, QA fixtures, or demonstration bundles.
- Keep rejected artifacts attached to the job for review rather than dropping them silently.
## Contract shape
- Request
- `template_id`
- `template_revision`
- `chapter_key`
- `subchapter_key`
- `tenant_id`
- `locale`
- `record_count`
- `target_surface`
- Result
- `job_id`
- `status`
- `accepted_records`
- `rejected_records`
- `output_manifest`
- `lineage`
## Guardrails
- Only published templates can be used for production synthetic jobs.
- Every output record must retain template lineage metadata.
- Cancelled jobs remain queryable from the admin surface.
- Generated content should never overwrite operator-authored production data.
## Immediate next step
Implement a background worker that consumes pending rows from `oracle_synthetic_generation_jobs`, writes structured manifests, and exposes completion state through the admin surface queue endpoints.