Co-authored-by: Sagnik <sagnik7896@gmail.com> Reviewed-on: sagnik/Project_Velocity#31
9.5 KiB
Oracle Canvas Runtime and Ollama Batch Architecture
Date: 2026-04-19
Repo: Project_Velocity
Purpose
This document defines the current production Oracle Canvas runtime path, the intended Ollama/Nemoclaw model-routing strategy, and the target batch-processing API shape the team can use if Velocity exposes Oracle or coding-agent capabilities through the local model stack.
This is the operator and engineering artifact. It exists to remove ambiguity.
Runtime Topology
Linux origin box
Role:
- hosts Velocity frontend
- hosts FastAPI backend
- hosts PostgreSQL and application services
- terminates app-origin requests under the public site path
Primary concern:
- application routing
- auth/session enforcement
- Oracle API execution
- CRM/intelligence/inventory data access
GPU box
Role:
- hosts ComfyUI
- hosts heavy model runtime
- hosts Ollama / Nemoclaw execution plane
- stores runtime/model payloads on NVMe only
Primary concern:
- inference
- media generation
- model serving
- agent runtime workloads
Ingress
Role:
- stable public entry for GPU-backed services
- hides raw GPU host details from application code
Non-negotiable rule:
- never wire Oracle or frontend code to a raw GPU public IP
Oracle Canvas Current Execution Path
The production-safe Oracle path is now:
- User submits prompt from Oracle Canvas frontend.
- Frontend calls:
/api/oracle/v1/canvas-pages/{page_id}/prompts
- FastAPI Oracle orchestrator:
- loads user context
- retrieves best codebook matches
- builds a safe retrieval plan
- queries approved datasets from PostgreSQL
- produces JSON Canvas components
- commits a page revision
- Frontend reloads/reconciles the canvas state and renders the new blocks.
Current Oracle Backend Families
Live today
/api/oracle/v1/me/api/oracle/v1/canvas-pages/{page_id}/api/oracle/v1/canvas-pages/{page_id}/prompts/api/oracle/v1/canvas-pages/{page_id}/forks/api/oracle/v1/canvas-pages/{page_id}/rollback/api/oracle/v1/canvas-pages/{page_id}/revisions/api/oracle/v1/component-templates/api/oracle/v1/component-templates/synthesize/api/oracle/v1/merge-requests/api/oracle/v1/merge-requests/{mr_id}/review/ws/oracle/canvas/{page_id}
Template taxonomy routes
/api/oracle/template-chapters/api/oracle/template-subchapters/api/oracle/component-templates/api/oracle/component-templates/{id}/api/oracle/component-templates/{id}/seed/api/oracle/component-templates/synthetic-jobs
Prompt Analysis Path
Oracle should not rely on one monolithic LLM call.
The correct production split is:
- codebook retrieval
- safe dataset selection
- optional LLM planning
- live DB fetch
- JSON Canvas synthesis
- revision commit
Why this split is correct
- It reduces hallucination in UI structure.
- It keeps DB access whitelisted and auditable.
- It allows Oracle to keep working even when the LLM runtime is degraded.
- It keeps the Oracle Canvas deterministic enough for operational use.
Current Model Routing Truth
Present reality
The current Oracle backend has these runtime modes:
codebook_retrieval- preferred when the prompt clearly matches the Oracle template corpus
nemoclaw_hosted- used when
NEMOCLAW_API_URLandNEMOCLAW_API_KEYare configured and reachable
- used when
deterministic_fallback- used when the LLM planner is unavailable
What Nemoclaw currently means in code
Current dispatch abstraction:
backend/services/nemoclaw_runtime.py
This file is still a light dispatch envelope, not a fully featured provider router.
Recommended production provider stack
Provider order:
- codebook retrieval layer
- Nemoclaw planner endpoint
- local Ollama fallback
- deterministic fallback
Recommended Ollama Model Policy
Default planning / Oracle analysis model
Use a local reasoning-capable model behind Ollama when Nemoclaw is not available or when the team wants deterministic private execution.
Recommended candidate:
qwen3.6:35b-a3b
Reason:
- strong agentic coding and structured reasoning profile
- local execution path through Ollama
- realistic fit for GPU-box-hosted inference
Deployment command
Example:
ollama run qwen3.6:35b-a3b
Routing rule
- Oracle prompt planning:
- small to medium prompts: local Ollama
qwen3.6:35b-a3b - larger multi-step analytical plans: Nemoclaw planner if available
- small to medium prompts: local Ollama
- Coding-agent batch workloads:
- Ollama first for local/private jobs
- Nemoclaw for heavier orchestration when the runtime is healthy
Runtime LLM API
The backend now exposes a first-class runtime LLM family:
GET /api/runtime/llm/providersPOST /api/runtime/llm/chatPOST /api/runtime/llm/batchGET /api/runtime/llm/jobs/{job_id}GET /api/runtime/llm/jobs/{job_id}/results
This router is mounted in:
backend/api/routes_runtime_llm.py
The current persistence path uses the existing canonical table:
workflow_agent_runs
That means batch jobs are now persisted against the live Velocity schema without requiring a new table family before the first production rollout.
Implemented Batch Processing API
This is no longer only a proposal. The following contract family exists now and can be used by Oracle or future coding-agent surfaces.
Single request inference
POST /api/runtime/llm/chat
Payload:
{
"provider": "ollama",
"model": "qwen3.6:35b-a3b",
"system_prompt": "You are Oracle Planner.",
"messages": [
{ "role": "user", "content": "Build a CRM pipeline view for high-intent NRI buyers." }
],
"temperature": 0.2,
"response_format": "json"
}
Batch submission
POST /api/runtime/llm/batch
Payload:
{
"provider": "ollama",
"model": "qwen3.6:35b-a3b",
"job_type": "oracle_canvas_planning",
"items": [
{
"request_id": "req_001",
"messages": [
{ "role": "user", "content": "Show overdue high-QD follow-ups." }
],
"response_format": "json"
},
{
"request_id": "req_002",
"messages": [
{ "role": "user", "content": "Build a Kolkata luxury inventory comparison block." }
],
"response_format": "json"
}
]
}
Batch status
GET /api/runtime/llm/jobs/{job_id}
Response:
{
"job_id": "job_123",
"status": "running",
"provider": "ollama",
"model": "qwen3.6:35b-a3b",
"submitted_count": 2,
"completed_count": 1,
"failed_count": 0
}
Batch results
GET /api/runtime/llm/jobs/{job_id}/results
Providers inventory
GET /api/runtime/llm/providers
Example response:
{
"providers": [
{
"id": "nemoclaw",
"status": "online",
"models": ["nemotron", "remote_default"]
},
{
"id": "ollama",
"status": "online",
"models": ["qwen3.6:35b-a3b"]
}
]
}
Batch Processing Design Rules
- Batch jobs must be persisted.
- Batch items must be individually addressable by
request_id. - Every batch job must record:
- provider
- model
- submitted payload hash
- start/end timestamps
- failure reason
- Oracle must not block the main request thread for large batches.
- Any DB writeback generated from a batch must go through approval tables, not direct execution.
Oracle-Specific Runtime Policy
For Oracle Canvas, the LLM is not the source of truth for data.
The source of truth order is:
- canonical DB tables
- approved dataset projections
- codebook template corpus
- model planner
The model is only allowed to:
- classify intent
- choose likely component families
- propose layout direction
- summarize findings
The model is not allowed to:
- invent database facts
- bypass dataset allowlists
- emit arbitrary executable code into production rendering paths
Current Production Readiness Assessment
Ready now
- Oracle Canvas frontend-to-backend v1 route family
- codebook-backed template retrieval path
- safe DB execution gateway
- merge/fork/revision path
- deterministic fallback path
- runtime LLM provider inventory
- runtime single-chat execution
- runtime persisted batch execution through
workflow_agent_runs - Oracle planner fallback through the shared runtime LLM service
Still needs explicit implementation if the team approves
- per-model selection UI in Catalyst or Oracle controls
- dedicated
runtime_llm_jobs/runtime_llm_job_itemstables if the team wants stronger audit/query ergonomics thanworkflow_agent_runs - explicit Nemoclaw vs Ollama operator switch in a production admin surface
- richer provider health telemetry beyond simple reachability
Recommended Next Build Steps
- Add a dedicated runtime router:
backend/api/routes_runtime_llm.py
- Add DB tables:
runtime_llm_jobsruntime_llm_job_itemsruntime_llm_job_results
- Implement provider adapters:
- Nemoclaw adapter
- Ollama adapter
- Expose provider status to Catalyst/Oracle settings surfaces.
- Keep Oracle Canvas on the current codebook-first path even after LLM batching exists.
Bottom Line
Oracle Canvas should be treated as a codebook-guided analytical surface with optional LLM planning, not as a raw chat-to-SQL toy.
The production-safe architecture is:
- Linux origin runs the application and DB access
- GPU box runs ComfyUI and model inference
- Oracle retrieves from the merged codebook first
- DB access stays whitelisted
- Nemoclaw and Ollama sit behind a documented provider interface
- batch processing is a separate runtime service contract, not an implicit side effect of the canvas endpoint