Project_Velocity/.Agent Context/Oracle Canvas Runtime and Ollama Batch Architecture.md

# Oracle Canvas Runtime and Ollama Batch Architecture

Date: 2026-04-19
Repo: `Project_Velocity`

## Purpose

This document defines the current production Oracle Canvas runtime path, the intended Ollama/Nemoclaw model-routing strategy, and the target batch-processing API shape the team can use if Velocity exposes Oracle or coding-agent capabilities through the local model stack.

This is the operator and engineering artifact. It exists to remove ambiguity.

## Runtime Topology

### Linux origin box

Role:

- hosts Velocity frontend
- hosts FastAPI backend
- hosts PostgreSQL and application services
- terminates app-origin requests under the public site path

Primary concern:

- application routing
- auth/session enforcement
- Oracle API execution
- CRM/intelligence/inventory data access

### GPU box

Role:

- hosts ComfyUI
- hosts heavy model runtime
- hosts Ollama / Nemoclaw execution plane
- stores runtime/model payloads on NVMe only

Primary concern:

- inference
- media generation
- model serving
- agent runtime workloads

### Ingress

Role:

- stable public entry for GPU-backed services
- hides raw GPU host details from application code

Non-negotiable rule:

- never wire Oracle or frontend code to a raw GPU public IP

## Oracle Canvas Current Execution Path

The production-safe Oracle path is now:

1. User submits prompt from Oracle Canvas frontend.
2. Frontend calls:
   - `/api/oracle/v1/canvas-pages/{page_id}/prompts`
3. FastAPI Oracle orchestrator:
   - loads user context
   - retrieves best codebook matches
   - builds a safe retrieval plan
   - queries approved datasets from PostgreSQL
   - produces JSON Canvas components
   - commits a page revision
4. Frontend reloads/reconciles the canvas state and renders the new blocks.

## Current Oracle Backend Families

### Live today

- `/api/oracle/v1/me`
- `/api/oracle/v1/canvas-pages/{page_id}`
- `/api/oracle/v1/canvas-pages/{page_id}/prompts`
- `/api/oracle/v1/canvas-pages/{page_id}/forks`
- `/api/oracle/v1/canvas-pages/{page_id}/rollback`
- `/api/oracle/v1/canvas-pages/{page_id}/revisions`
- `/api/oracle/v1/component-templates`
- `/api/oracle/v1/component-templates/synthesize`
- `/api/oracle/v1/merge-requests`
- `/api/oracle/v1/merge-requests/{mr_id}/review`
- `/ws/oracle/canvas/{page_id}`

### Template taxonomy routes

- `/api/oracle/template-chapters`
- `/api/oracle/template-subchapters`
- `/api/oracle/component-templates`
- `/api/oracle/component-templates/{id}`
- `/api/oracle/component-templates/{id}/seed`
- `/api/oracle/component-templates/synthetic-jobs`

## Prompt Analysis Path

Oracle should not rely on one monolithic LLM call.

The correct production split is:

1. codebook retrieval
2. safe dataset selection
3. optional LLM planning
4. live DB fetch
5. JSON Canvas synthesis
6. revision commit

### Why this split is correct

- It reduces hallucination in UI structure.
- It keeps DB access whitelisted and auditable.
- It allows Oracle to keep working even when the LLM runtime is degraded.
- It keeps the Oracle Canvas deterministic enough for operational use.

## Current Model Routing Truth

### Present reality

The current Oracle backend has these runtime modes:

- `codebook_retrieval`
  - preferred when the prompt clearly matches the Oracle template corpus
- `nemoclaw_hosted`
  - used when `NEMOCLAW_API_URL` and `NEMOCLAW_API_KEY` are configured and reachable
- `deterministic_fallback`
  - used when the LLM planner is unavailable

### What Nemoclaw currently means in code

Current dispatch abstraction:

- `backend/services/nemoclaw_runtime.py`

This file is still a light dispatch envelope, not a fully featured provider router.

### Recommended production provider stack

Provider order:

1. codebook retrieval layer
2. Nemoclaw planner endpoint
3. local Ollama fallback
4. deterministic fallback

## Recommended Ollama Model Policy

### Default planning / Oracle analysis model

Use a local reasoning-capable model behind Ollama when Nemoclaw is not available or when the team wants deterministic private execution.

Recommended candidate:

- `qwen3.6:35b-a3b`

Reason:

- strong agentic coding and structured reasoning profile
- local execution path through Ollama
- realistic fit for GPU-box-hosted inference

### Deployment command

Example:

```bash
ollama run qwen3.6:35b-a3b
```

### Routing rule

- Oracle prompt planning:
  - small to medium prompts: local Ollama `qwen3.6:35b-a3b`
  - larger multi-step analytical plans: Nemoclaw planner if available
- Coding-agent batch workloads:
  - Ollama first for local/private jobs
  - Nemoclaw for heavier orchestration when the runtime is healthy

## Runtime LLM API

The backend now exposes a first-class runtime LLM family:

- `GET /api/runtime/llm/providers`
- `POST /api/runtime/llm/chat`
- `POST /api/runtime/llm/batch`
- `GET /api/runtime/llm/jobs/{job_id}`
- `GET /api/runtime/llm/jobs/{job_id}/results`

This router is mounted in:

- `backend/api/routes_runtime_llm.py`

The current persistence path uses the existing canonical table:

- `workflow_agent_runs`

That means batch jobs are now persisted against the live Velocity schema without requiring a new table family before the first production rollout.

## Implemented Batch Processing API

This is no longer only a proposal. The following contract family exists now and can be used by Oracle or future coding-agent surfaces.

### Single request inference

- `POST /api/runtime/llm/chat`

Payload:

```json
{
  "provider": "ollama",
  "model": "qwen3.6:35b-a3b",
  "system_prompt": "You are Oracle Planner.",
  "messages": [
    { "role": "user", "content": "Build a CRM pipeline view for high-intent NRI buyers." }
  ],
  "temperature": 0.2,
  "response_format": "json"
}
```

### Batch submission

- `POST /api/runtime/llm/batch`

Payload:

```json
{
  "provider": "ollama",
  "model": "qwen3.6:35b-a3b",
  "job_type": "oracle_canvas_planning",
  "items": [
    {
      "request_id": "req_001",
      "messages": [
        { "role": "user", "content": "Show overdue high-QD follow-ups." }
      ],
      "response_format": "json"
    },
    {
      "request_id": "req_002",
      "messages": [
        { "role": "user", "content": "Build a Kolkata luxury inventory comparison block." }
      ],
      "response_format": "json"
    }
  ]
}
```

### Batch status

- `GET /api/runtime/llm/jobs/{job_id}`

Response:

```json
{
  "job_id": "job_123",
  "status": "running",
  "provider": "ollama",
  "model": "qwen3.6:35b-a3b",
  "submitted_count": 2,
  "completed_count": 1,
  "failed_count": 0
}
```

### Batch results

- `GET /api/runtime/llm/jobs/{job_id}/results`

### Providers inventory

- `GET /api/runtime/llm/providers`

Example response:

```json
{
  "providers": [
    {
      "id": "nemoclaw",
      "status": "online",
      "models": ["nemotron", "remote_default"]
    },
    {
      "id": "ollama",
      "status": "online",
      "models": ["qwen3.6:35b-a3b"]
    }
  ]
}
```

## Batch Processing Design Rules

1. Batch jobs must be persisted.
2. Batch items must be individually addressable by `request_id`.
3. Every batch job must record:
   - provider
   - model
   - submitted payload hash
   - start/end timestamps
   - failure reason
4. Oracle must not block the main request thread for large batches.
5. Any DB writeback generated from a batch must go through approval tables, not direct execution.

## Oracle-Specific Runtime Policy

For Oracle Canvas, the LLM is not the source of truth for data.

The source of truth order is:

1. canonical DB tables
2. approved dataset projections
3. codebook template corpus
4. model planner

The model is only allowed to:

- classify intent
- choose likely component families
- propose layout direction
- summarize findings

The model is not allowed to:

- invent database facts
- bypass dataset allowlists
- emit arbitrary executable code into production rendering paths

## Current Production Readiness Assessment

### Ready now

- Oracle Canvas frontend-to-backend v1 route family
- codebook-backed template retrieval path
- safe DB execution gateway
- merge/fork/revision path
- deterministic fallback path
- runtime LLM provider inventory
- runtime single-chat execution
- runtime persisted batch execution through `workflow_agent_runs`
- Oracle planner fallback through the shared runtime LLM service

### Still needs explicit implementation if the team approves

- per-model selection UI in Catalyst or Oracle controls
- dedicated `runtime_llm_jobs` / `runtime_llm_job_items` tables if the team wants stronger audit/query ergonomics than `workflow_agent_runs`
- explicit Nemoclaw vs Ollama operator switch in a production admin surface
- richer provider health telemetry beyond simple reachability

## Recommended Next Build Steps

1. Add a dedicated runtime router:
   - `backend/api/routes_runtime_llm.py`
2. Add DB tables:
   - `runtime_llm_jobs`
   - `runtime_llm_job_items`
   - `runtime_llm_job_results`
3. Implement provider adapters:
   - Nemoclaw adapter
   - Ollama adapter
4. Expose provider status to Catalyst/Oracle settings surfaces.
5. Keep Oracle Canvas on the current codebook-first path even after LLM batching exists.

## Bottom Line

Oracle Canvas should be treated as a codebook-guided analytical surface with optional LLM planning, not as a raw chat-to-SQL toy.

The production-safe architecture is:

- Linux origin runs the application and DB access
- GPU box runs ComfyUI and model inference
- Oracle retrieves from the merged codebook first
- DB access stays whitelisted
- Nemoclaw and Ollama sit behind a documented provider interface
- batch processing is a separate runtime service contract, not an implicit side effect of the canvas endpoint