Files
Project_Velocity/.Agent Context/Oracle Canvas Runtime and Ollama Batch Architecture.md

383 lines
9.5 KiB
Markdown

# Oracle Canvas Runtime and Ollama Batch Architecture
Date: 2026-04-19
Repo: `Project_Velocity`
## Purpose
This document defines the current production Oracle Canvas runtime path, the intended Ollama/Nemoclaw model-routing strategy, and the target batch-processing API shape the team can use if Velocity exposes Oracle or coding-agent capabilities through the local model stack.
This is the operator and engineering artifact. It exists to remove ambiguity.
## Runtime Topology
### Linux origin box
Role:
- hosts Velocity frontend
- hosts FastAPI backend
- hosts PostgreSQL and application services
- terminates app-origin requests under the public site path
Primary concern:
- application routing
- auth/session enforcement
- Oracle API execution
- CRM/intelligence/inventory data access
### GPU box
Role:
- hosts ComfyUI
- hosts heavy model runtime
- hosts Ollama / Nemoclaw execution plane
- stores runtime/model payloads on NVMe only
Primary concern:
- inference
- media generation
- model serving
- agent runtime workloads
### Ingress
Role:
- stable public entry for GPU-backed services
- hides raw GPU host details from application code
Non-negotiable rule:
- never wire Oracle or frontend code to a raw GPU public IP
## Oracle Canvas Current Execution Path
The production-safe Oracle path is now:
1. User submits prompt from Oracle Canvas frontend.
2. Frontend calls:
- `/api/oracle/v1/canvas-pages/{page_id}/prompts`
3. FastAPI Oracle orchestrator:
- loads user context
- retrieves best codebook matches
- builds a safe retrieval plan
- queries approved datasets from PostgreSQL
- produces JSON Canvas components
- commits a page revision
4. Frontend reloads/reconciles the canvas state and renders the new blocks.
## Current Oracle Backend Families
### Live today
- `/api/oracle/v1/me`
- `/api/oracle/v1/canvas-pages/{page_id}`
- `/api/oracle/v1/canvas-pages/{page_id}/prompts`
- `/api/oracle/v1/canvas-pages/{page_id}/forks`
- `/api/oracle/v1/canvas-pages/{page_id}/rollback`
- `/api/oracle/v1/canvas-pages/{page_id}/revisions`
- `/api/oracle/v1/component-templates`
- `/api/oracle/v1/component-templates/synthesize`
- `/api/oracle/v1/merge-requests`
- `/api/oracle/v1/merge-requests/{mr_id}/review`
- `/ws/oracle/canvas/{page_id}`
### Template taxonomy routes
- `/api/oracle/template-chapters`
- `/api/oracle/template-subchapters`
- `/api/oracle/component-templates`
- `/api/oracle/component-templates/{id}`
- `/api/oracle/component-templates/{id}/seed`
- `/api/oracle/component-templates/synthetic-jobs`
## Prompt Analysis Path
Oracle should not rely on one monolithic LLM call.
The correct production split is:
1. codebook retrieval
2. safe dataset selection
3. optional LLM planning
4. live DB fetch
5. JSON Canvas synthesis
6. revision commit
### Why this split is correct
- It reduces hallucination in UI structure.
- It keeps DB access whitelisted and auditable.
- It allows Oracle to keep working even when the LLM runtime is degraded.
- It keeps the Oracle Canvas deterministic enough for operational use.
## Current Model Routing Truth
### Present reality
The current Oracle backend has these runtime modes:
- `codebook_retrieval`
- preferred when the prompt clearly matches the Oracle template corpus
- `nemoclaw_hosted`
- used when `NEMOCLAW_API_URL` and `NEMOCLAW_API_KEY` are configured and reachable
- `deterministic_fallback`
- used when the LLM planner is unavailable
### What Nemoclaw currently means in code
Current dispatch abstraction:
- `backend/services/nemoclaw_runtime.py`
This file is still a light dispatch envelope, not a fully featured provider router.
### Recommended production provider stack
Provider order:
1. codebook retrieval layer
2. Nemoclaw planner endpoint
3. local Ollama fallback
4. deterministic fallback
## Recommended Ollama Model Policy
### Default planning / Oracle analysis model
Use a local reasoning-capable model behind Ollama when Nemoclaw is not available or when the team wants deterministic private execution.
Recommended candidate:
- `qwen3.6:35b-a3b`
Reason:
- strong agentic coding and structured reasoning profile
- local execution path through Ollama
- realistic fit for GPU-box-hosted inference
### Deployment command
Example:
```bash
ollama run qwen3.6:35b-a3b
```
### Routing rule
- Oracle prompt planning:
- small to medium prompts: local Ollama `qwen3.6:35b-a3b`
- larger multi-step analytical plans: Nemoclaw planner if available
- Coding-agent batch workloads:
- Ollama first for local/private jobs
- Nemoclaw for heavier orchestration when the runtime is healthy
## Runtime LLM API
The backend now exposes a first-class runtime LLM family:
- `GET /api/runtime/llm/providers`
- `POST /api/runtime/llm/chat`
- `POST /api/runtime/llm/batch`
- `GET /api/runtime/llm/jobs/{job_id}`
- `GET /api/runtime/llm/jobs/{job_id}/results`
This router is mounted in:
- `backend/api/routes_runtime_llm.py`
The current persistence path uses the existing canonical table:
- `workflow_agent_runs`
That means batch jobs are now persisted against the live Velocity schema without requiring a new table family before the first production rollout.
## Implemented Batch Processing API
This is no longer only a proposal. The following contract family exists now and can be used by Oracle or future coding-agent surfaces.
### Single request inference
- `POST /api/runtime/llm/chat`
Payload:
```json
{
"provider": "ollama",
"model": "qwen3.6:35b-a3b",
"system_prompt": "You are Oracle Planner.",
"messages": [
{ "role": "user", "content": "Build a CRM pipeline view for high-intent NRI buyers." }
],
"temperature": 0.2,
"response_format": "json"
}
```
### Batch submission
- `POST /api/runtime/llm/batch`
Payload:
```json
{
"provider": "ollama",
"model": "qwen3.6:35b-a3b",
"job_type": "oracle_canvas_planning",
"items": [
{
"request_id": "req_001",
"messages": [
{ "role": "user", "content": "Show overdue high-QD follow-ups." }
],
"response_format": "json"
},
{
"request_id": "req_002",
"messages": [
{ "role": "user", "content": "Build a Kolkata luxury inventory comparison block." }
],
"response_format": "json"
}
]
}
```
### Batch status
- `GET /api/runtime/llm/jobs/{job_id}`
Response:
```json
{
"job_id": "job_123",
"status": "running",
"provider": "ollama",
"model": "qwen3.6:35b-a3b",
"submitted_count": 2,
"completed_count": 1,
"failed_count": 0
}
```
### Batch results
- `GET /api/runtime/llm/jobs/{job_id}/results`
### Providers inventory
- `GET /api/runtime/llm/providers`
Example response:
```json
{
"providers": [
{
"id": "nemoclaw",
"status": "online",
"models": ["nemotron", "remote_default"]
},
{
"id": "ollama",
"status": "online",
"models": ["qwen3.6:35b-a3b"]
}
]
}
```
## Batch Processing Design Rules
1. Batch jobs must be persisted.
2. Batch items must be individually addressable by `request_id`.
3. Every batch job must record:
- provider
- model
- submitted payload hash
- start/end timestamps
- failure reason
4. Oracle must not block the main request thread for large batches.
5. Any DB writeback generated from a batch must go through approval tables, not direct execution.
## Oracle-Specific Runtime Policy
For Oracle Canvas, the LLM is not the source of truth for data.
The source of truth order is:
1. canonical DB tables
2. approved dataset projections
3. codebook template corpus
4. model planner
The model is only allowed to:
- classify intent
- choose likely component families
- propose layout direction
- summarize findings
The model is not allowed to:
- invent database facts
- bypass dataset allowlists
- emit arbitrary executable code into production rendering paths
## Current Production Readiness Assessment
### Ready now
- Oracle Canvas frontend-to-backend v1 route family
- codebook-backed template retrieval path
- safe DB execution gateway
- merge/fork/revision path
- deterministic fallback path
- runtime LLM provider inventory
- runtime single-chat execution
- runtime persisted batch execution through `workflow_agent_runs`
- Oracle planner fallback through the shared runtime LLM service
### Still needs explicit implementation if the team approves
- per-model selection UI in Catalyst or Oracle controls
- dedicated `runtime_llm_jobs` / `runtime_llm_job_items` tables if the team wants stronger audit/query ergonomics than `workflow_agent_runs`
- explicit Nemoclaw vs Ollama operator switch in a production admin surface
- richer provider health telemetry beyond simple reachability
## Recommended Next Build Steps
1. Add a dedicated runtime router:
- `backend/api/routes_runtime_llm.py`
2. Add DB tables:
- `runtime_llm_jobs`
- `runtime_llm_job_items`
- `runtime_llm_job_results`
3. Implement provider adapters:
- Nemoclaw adapter
- Ollama adapter
4. Expose provider status to Catalyst/Oracle settings surfaces.
5. Keep Oracle Canvas on the current codebook-first path even after LLM batching exists.
## Bottom Line
Oracle Canvas should be treated as a codebook-guided analytical surface with optional LLM planning, not as a raw chat-to-SQL toy.
The production-safe architecture is:
- Linux origin runs the application and DB access
- GPU box runs ComfyUI and model inference
- Oracle retrieves from the merged codebook first
- DB access stays whitelisted
- Nemoclaw and Ollama sit behind a documented provider interface
- batch processing is a separate runtime service contract, not an implicit side effect of the canvas endpoint