feat/#24 WebOS Completion (#25)
#24 WebOS Completion Co-authored-by: Sayan Datta <sayan@Sayans-MacBook-Air.local> Reviewed-on: #25
This commit was merged in pull request #25.
This commit is contained in:
59
docs/KIMI_SYNTHETIC_DATA_DOWNSTREAM_PLAN.md
Normal file
59
docs/KIMI_SYNTHETIC_DATA_DOWNSTREAM_PLAN.md
Normal file
@@ -0,0 +1,59 @@
|
||||
# Kimi Synthetic Data Downstream Plan
|
||||
|
||||
## Goal
|
||||
|
||||
Use the Oracle template taxonomy as the control surface for generating structured synthetic examples that can be replayed into analytics, training, QA, and demo environments without coupling generation logic to any one UI surface.
|
||||
|
||||
## Inputs
|
||||
|
||||
- `backend/oracle/oracle_template_seed_db.json` for chapters, subchapters, and exemplar prompts
|
||||
- `schema_extension_v2.sql` tables for templates, synthetic jobs, and auditability
|
||||
- Admin surface actions for publish, archive, trigger, and cancel workflows
|
||||
|
||||
## Downstream stages
|
||||
|
||||
1. Template selection
|
||||
- Admin or operator selects a published template chapter and revision.
|
||||
- The request binds tenant, locale, target channel, and generation volume.
|
||||
2. Prompt expansion
|
||||
- Seed examples are expanded into structured prompt packs.
|
||||
- Each pack should preserve chapter lineage and example provenance.
|
||||
3. Synthetic generation
|
||||
- Queue work into `oracle_synthetic_generation_jobs`.
|
||||
- Persist idempotency keys so reruns can be traced without duplicate publication.
|
||||
4. Validation and scoring
|
||||
- Run schema validation on every generated artifact.
|
||||
- Score for completeness, realism, and chapter coverage.
|
||||
5. Distribution
|
||||
- Publish accepted outputs to analytics sandboxes, QA fixtures, or demonstration bundles.
|
||||
- Keep rejected artifacts attached to the job for review rather than dropping them silently.
|
||||
|
||||
## Contract shape
|
||||
|
||||
- Request
|
||||
- `template_id`
|
||||
- `template_revision`
|
||||
- `chapter_key`
|
||||
- `subchapter_key`
|
||||
- `tenant_id`
|
||||
- `locale`
|
||||
- `record_count`
|
||||
- `target_surface`
|
||||
- Result
|
||||
- `job_id`
|
||||
- `status`
|
||||
- `accepted_records`
|
||||
- `rejected_records`
|
||||
- `output_manifest`
|
||||
- `lineage`
|
||||
|
||||
## Guardrails
|
||||
|
||||
- Only published templates can be used for production synthetic jobs.
|
||||
- Every output record must retain template lineage metadata.
|
||||
- Cancelled jobs remain queryable from the admin surface.
|
||||
- Generated content should never overwrite operator-authored production data.
|
||||
|
||||
## Immediate next step
|
||||
|
||||
Implement a background worker that consumes pending rows from `oracle_synthetic_generation_jobs`, writes structured manifests, and exposes completion state through the admin surface queue endpoints.
|
||||
Reference in New Issue
Block a user