Project_Velocity/.Agent Context/Oracle Canvas Codebook Production Truth.md

# Oracle Canvas Codebook Production Truth

Date: 2026-04-19
Repo: `Project_Velocity`

## Purpose

This document freezes the current production truth for the Oracle Canvas template/codebook system, the expanded GPT and Claude corpora, the runtime merge policy, and the current rendering limits that matter for delivery.

This is not a concept note. It is the implementation-facing truth for the Oracle template layer as it exists now.

## Current Source Of Truth

The Oracle template book is split across three layers:

1. Structural database schema
   - `backend/oracle/schema_extension_v2.sql`
   - Defines:
     - `oracle_template_chapters`
     - `oracle_template_subchapters`
     - `oracle_template_seed_examples`
     - chapter/subchapter linkage on `oracle_component_templates`
     - `oracle_synthetic_generation_jobs`

2. Runtime seed DB
   - `backend/oracle/oracle_template_seed_db.json`
   - This is the lightweight fallback DB shipped with the runtime.
   - It is structurally correct but incomplete relative to the intended corpus.

3. Expanded authoring corpora
   - GPT pack:
     - `Project_Velocity/.Agent Context/Sprint 1/Sayan Multi-Surface and Oracle Delivery Pack/Sample JSON Schema/GPT 5.4/oracle_canvas_json_expansion_pack/db/oracle_template_seed_db_expanded_v1.pretty.json`
   - Claude pack:
   - `Project_Velocity/.Agent Context/Sprint 1/Sayan Multi-Surface and Oracle Delivery Pack/Sample JSON Schema/Claude Sonnet 4.6/oracle_template_expansion/oracle_template_seed_db_expanded.json`

4. Frozen runtime merge artifact
   - `backend/oracle/oracle_runtime_codebook_merged.json`
   - This is the deploy-safe merged corpus generated from the GPT and Claude packs.
   - Production should prefer this file over the authoring packs whenever it is present.

## Corpus Status

The expanded corpora are materially useful and production-relevant.

### GPT 5.4 pack

- Chapters: `6`
- Subchapters: `24`
- Seed examples: `1200`
- Shape: already close to runtime needs
- Key field for examples: `seed_examples`

### Claude Sonnet 4.6 pack

- Chapters: `6`
- Subchapters: `24`
- Examples: `1200`
- Key field for examples: `examples`
- Shape: close, but requires normalization into runtime form

### Runtime fallback pack

- Chapters: `6`
- Subchapters: `24`
- Seed examples declared in metadata: `36`
- Seed examples physically present: lower than metadata
- Useful only as a fallback, not as the primary production corpus

## Super Codebook Policy

The current runtime now treats the codebook as a merged corpus rather than a single-file static DB.

The merge policy is:

1. Load GPT pack first.
2. Load Claude pack second.
3. Load runtime fallback pack last.
4. Normalize all example records to one runtime contract.
5. Deduplicate by:
   - `subchapter_id`
   - `template_name`
   - `title`
6. Prefer in this order:
   - GPT 5.4 examples
   - canonical examples
   - fallback records only when no richer example exists

This behavior is implemented in:

- `backend/oracle/codebook_service.py`
- `backend/scripts/build_oracle_runtime_codebook.py`

That file is now the effective runtime “super codebook” layer.

The generated runtime artifact currently contains the merged deployable corpus and is suitable for Linux-box deployment without requiring `.Agent Context` lookups at request time.

## What The Runtime Actually Uses

The runtime no longer needs to rely on hardcoded template lists in the Oracle v1 router.

The codebook service now provides:

- merged corpus loading
- search over both corpora
- normalized template listing
- best-match template synthesis from a user prompt

Primary runtime functions:

- `codebook_service.stats()`
- `codebook_service.list_templates(...)`
- `codebook_service.search_examples(prompt, limit=...)`
- `codebook_service.synthesize_template(prompt, data_shapes=...)`

## Current Supported Runtime Output Families

The expanded corpora include more component types than the current frontend renderer supports directly.

The current production-safe strategy is:

1. keep the full codebook corpus
2. map high-variety codebook component families into a smaller supported runtime renderer set
3. let Oracle render reliably today instead of failing on unsupported component types

### Supported runtime renderers today

- `textCanvas`
- `kpiTile`
- `barChart`
- `lineChart`
- `geoMap`
- `table`
- `pipelineBoard`
- `timeline`
- `activityStream`
- `errorNotice`

### Codebook-to-runtime normalization policy

Examples:

- `summary_card`, `summary_strip`, `metric_card_group`, `gauge_stack`
  - mapped to `kpiTile`
- `lead_profile_card`, `property_card`, `data_table`, `leaderboard_table`, `matrix_grid`
  - mapped to `table`
- `interaction_timeline`, `message_thread_summary`
  - mapped to `activityStream`
- `heatmap`
  - mapped to `geoMap`

This is deliberate. It keeps the UI stable while preserving the larger design vocabulary inside the template book.

## What Is Production-Ready Now

- Oracle template DB schema exists.
- Oracle template taxonomy APIs exist.
- Expanded GPT and Claude corpora are available locally in the repo.
- Runtime codebook merge and retrieval is implemented in `codebook_service.py`.
- A frozen merged runtime codebook now exists at `backend/oracle/oracle_runtime_codebook_merged.json`.
- Oracle v1 template listing/synthesis is being moved to the codebook-backed path.
- Oracle backend can now emit `textCanvas` planning blocks and the frontend has a renderer for them.

## What Is Still Constrained

- The runtime is not yet rendering all 47+ component families natively.
- The current system uses safe projection into supported runtime renderers.
- The template taxonomy routes existed, but were incorrectly using `user.role` as `tenant_id`; that has been corrected toward a fixed Oracle tenant policy.
- The lightweight fallback JSON DB remains incomplete and should not be treated as the main corpus.

## What Nemoclaw / Oracle Should Use For Retrieval

The correct order for Oracle prompt handling is:

1. Parse prompt.
2. Retrieve matching codebook examples from the merged corpus.
3. Build a safe retrieval plan against allowed DB datasets.
4. Query live CRM/intelligence/inventory datasets.
5. Build Oracle Canvas JSON with supported runtime component types.
6. Append to the existing canvas.

The codebook is not the final UI payload by itself.

It is the reference layer that guides:

- component family selection
- chapter/subchapter intent
- layout direction
- data-shape expectations
- policy hints
- backend contract hints

## Recommended Near-Term Hardening

1. Materialize a generated runtime codebook file if Linux deployment should not depend on `.Agent Context`.
2. Add explicit metadata versioning to the merged corpus.
3. Add a small admin endpoint for codebook stats and source summary.
4. Expand renderer coverage incrementally rather than trying to support all component families at once.
5. Add a batch offline export path if the team wants a frozen deploy artifact.

## Operator Bottom Line

The Oracle “book with chapters and JSON schema examples” is real and already useful.

The correct production interpretation is:

- DB schema and APIs are already present
- GPT and Claude expansion packs are the real high-value corpus
- `backend/oracle/codebook_service.py` is the runtime super-codebook layer
- Oracle should retrieve from this merged corpus first, then query live DB data, then render supported JSON Canvas components