# Oracle Canvas Runtime and Ollama Batch Architecture Date: 2026-04-19 Repo: `Project_Velocity` ## Purpose This document defines the current production Oracle Canvas runtime path, the intended Ollama/Nemoclaw model-routing strategy, and the target batch-processing API shape the team can use if Velocity exposes Oracle or coding-agent capabilities through the local model stack. This is the operator and engineering artifact. It exists to remove ambiguity. ## Runtime Topology ### Linux origin box Role: - hosts Velocity frontend - hosts FastAPI backend - hosts PostgreSQL and application services - terminates app-origin requests under the public site path Primary concern: - application routing - auth/session enforcement - Oracle API execution - CRM/intelligence/inventory data access ### GPU box Role: - hosts ComfyUI - hosts heavy model runtime - hosts Ollama / Nemoclaw execution plane - stores runtime/model payloads on NVMe only Primary concern: - inference - media generation - model serving - agent runtime workloads ### Ingress Role: - stable public entry for GPU-backed services - hides raw GPU host details from application code Non-negotiable rule: - never wire Oracle or frontend code to a raw GPU public IP ## Oracle Canvas Current Execution Path The production-safe Oracle path is now: 1. User submits prompt from Oracle Canvas frontend. 2. Frontend calls: - `/api/oracle/v1/canvas-pages/{page_id}/prompts` 3. FastAPI Oracle orchestrator: - loads user context - retrieves best codebook matches - builds a safe retrieval plan - queries approved datasets from PostgreSQL - produces JSON Canvas components - commits a page revision 4. Frontend reloads/reconciles the canvas state and renders the new blocks. ## Current Oracle Backend Families ### Live today - `/api/oracle/v1/me` - `/api/oracle/v1/canvas-pages/{page_id}` - `/api/oracle/v1/canvas-pages/{page_id}/prompts` - `/api/oracle/v1/canvas-pages/{page_id}/forks` - `/api/oracle/v1/canvas-pages/{page_id}/rollback` - `/api/oracle/v1/canvas-pages/{page_id}/revisions` - `/api/oracle/v1/component-templates` - `/api/oracle/v1/component-templates/synthesize` - `/api/oracle/v1/merge-requests` - `/api/oracle/v1/merge-requests/{mr_id}/review` - `/ws/oracle/canvas/{page_id}` ### Template taxonomy routes - `/api/oracle/template-chapters` - `/api/oracle/template-subchapters` - `/api/oracle/component-templates` - `/api/oracle/component-templates/{id}` - `/api/oracle/component-templates/{id}/seed` - `/api/oracle/component-templates/synthetic-jobs` ## Prompt Analysis Path Oracle should not rely on one monolithic LLM call. The correct production split is: 1. codebook retrieval 2. safe dataset selection 3. optional LLM planning 4. live DB fetch 5. JSON Canvas synthesis 6. revision commit ### Why this split is correct - It reduces hallucination in UI structure. - It keeps DB access whitelisted and auditable. - It allows Oracle to keep working even when the LLM runtime is degraded. - It keeps the Oracle Canvas deterministic enough for operational use. ## Current Model Routing Truth ### Present reality The current Oracle backend has these runtime modes: - `codebook_retrieval` - preferred when the prompt clearly matches the Oracle template corpus - `nemoclaw_hosted` - used when `NEMOCLAW_API_URL` and `NEMOCLAW_API_KEY` are configured and reachable - `deterministic_fallback` - used when the LLM planner is unavailable ### What Nemoclaw currently means in code Current dispatch abstraction: - `backend/services/nemoclaw_runtime.py` This file is still a light dispatch envelope, not a fully featured provider router. ### Recommended production provider stack Provider order: 1. codebook retrieval layer 2. Nemoclaw planner endpoint 3. local Ollama fallback 4. deterministic fallback ## Recommended Ollama Model Policy ### Default planning / Oracle analysis model Use a local reasoning-capable model behind Ollama when Nemoclaw is not available or when the team wants deterministic private execution. Recommended candidate: - `qwen3.6:35b-a3b` Reason: - strong agentic coding and structured reasoning profile - local execution path through Ollama - realistic fit for GPU-box-hosted inference ### Deployment command Example: ```bash ollama run qwen3.6:35b-a3b ``` ### Routing rule - Oracle prompt planning: - small to medium prompts: local Ollama `qwen3.6:35b-a3b` - larger multi-step analytical plans: Nemoclaw planner if available - Coding-agent batch workloads: - Ollama first for local/private jobs - Nemoclaw for heavier orchestration when the runtime is healthy ## Runtime LLM API The backend now exposes a first-class runtime LLM family: - `GET /api/runtime/llm/providers` - `POST /api/runtime/llm/chat` - `POST /api/runtime/llm/batch` - `GET /api/runtime/llm/jobs/{job_id}` - `GET /api/runtime/llm/jobs/{job_id}/results` This router is mounted in: - `backend/api/routes_runtime_llm.py` The current persistence path uses the existing canonical table: - `workflow_agent_runs` That means batch jobs are now persisted against the live Velocity schema without requiring a new table family before the first production rollout. ## Implemented Batch Processing API This is no longer only a proposal. The following contract family exists now and can be used by Oracle or future coding-agent surfaces. ### Single request inference - `POST /api/runtime/llm/chat` Payload: ```json { "provider": "ollama", "model": "qwen3.6:35b-a3b", "system_prompt": "You are Oracle Planner.", "messages": [ { "role": "user", "content": "Build a CRM pipeline view for high-intent NRI buyers." } ], "temperature": 0.2, "response_format": "json" } ``` ### Batch submission - `POST /api/runtime/llm/batch` Payload: ```json { "provider": "ollama", "model": "qwen3.6:35b-a3b", "job_type": "oracle_canvas_planning", "items": [ { "request_id": "req_001", "messages": [ { "role": "user", "content": "Show overdue high-QD follow-ups." } ], "response_format": "json" }, { "request_id": "req_002", "messages": [ { "role": "user", "content": "Build a Kolkata luxury inventory comparison block." } ], "response_format": "json" } ] } ``` ### Batch status - `GET /api/runtime/llm/jobs/{job_id}` Response: ```json { "job_id": "job_123", "status": "running", "provider": "ollama", "model": "qwen3.6:35b-a3b", "submitted_count": 2, "completed_count": 1, "failed_count": 0 } ``` ### Batch results - `GET /api/runtime/llm/jobs/{job_id}/results` ### Providers inventory - `GET /api/runtime/llm/providers` Example response: ```json { "providers": [ { "id": "nemoclaw", "status": "online", "models": ["nemotron", "remote_default"] }, { "id": "ollama", "status": "online", "models": ["qwen3.6:35b-a3b"] } ] } ``` ## Batch Processing Design Rules 1. Batch jobs must be persisted. 2. Batch items must be individually addressable by `request_id`. 3. Every batch job must record: - provider - model - submitted payload hash - start/end timestamps - failure reason 4. Oracle must not block the main request thread for large batches. 5. Any DB writeback generated from a batch must go through approval tables, not direct execution. ## Oracle-Specific Runtime Policy For Oracle Canvas, the LLM is not the source of truth for data. The source of truth order is: 1. canonical DB tables 2. approved dataset projections 3. codebook template corpus 4. model planner The model is only allowed to: - classify intent - choose likely component families - propose layout direction - summarize findings The model is not allowed to: - invent database facts - bypass dataset allowlists - emit arbitrary executable code into production rendering paths ## Current Production Readiness Assessment ### Ready now - Oracle Canvas frontend-to-backend v1 route family - codebook-backed template retrieval path - safe DB execution gateway - merge/fork/revision path - deterministic fallback path - runtime LLM provider inventory - runtime single-chat execution - runtime persisted batch execution through `workflow_agent_runs` - Oracle planner fallback through the shared runtime LLM service ### Still needs explicit implementation if the team approves - per-model selection UI in Catalyst or Oracle controls - dedicated `runtime_llm_jobs` / `runtime_llm_job_items` tables if the team wants stronger audit/query ergonomics than `workflow_agent_runs` - explicit Nemoclaw vs Ollama operator switch in a production admin surface - richer provider health telemetry beyond simple reachability ## Recommended Next Build Steps 1. Add a dedicated runtime router: - `backend/api/routes_runtime_llm.py` 2. Add DB tables: - `runtime_llm_jobs` - `runtime_llm_job_items` - `runtime_llm_job_results` 3. Implement provider adapters: - Nemoclaw adapter - Ollama adapter 4. Expose provider status to Catalyst/Oracle settings surfaces. 5. Keep Oracle Canvas on the current codebook-first path even after LLM batching exists. ## Bottom Line Oracle Canvas should be treated as a codebook-guided analytical surface with optional LLM planning, not as a raw chat-to-SQL toy. The production-safe architecture is: - Linux origin runs the application and DB access - GPU box runs ComfyUI and model inference - Oracle retrieves from the merged codebook first - DB access stays whitelisted - Nemoclaw and Ollama sit behind a documented provider interface - batch processing is a separate runtime service contract, not an implicit side effect of the canvas endpoint