383 lines
9.5 KiB
Markdown
383 lines
9.5 KiB
Markdown
# Oracle Canvas Runtime and Ollama Batch Architecture
|
|
|
|
Date: 2026-04-19
|
|
Repo: `Project_Velocity`
|
|
|
|
## Purpose
|
|
|
|
This document defines the current production Oracle Canvas runtime path, the intended Ollama/Nemoclaw model-routing strategy, and the target batch-processing API shape the team can use if Velocity exposes Oracle or coding-agent capabilities through the local model stack.
|
|
|
|
This is the operator and engineering artifact. It exists to remove ambiguity.
|
|
|
|
## Runtime Topology
|
|
|
|
### Linux origin box
|
|
|
|
Role:
|
|
|
|
- hosts Velocity frontend
|
|
- hosts FastAPI backend
|
|
- hosts PostgreSQL and application services
|
|
- terminates app-origin requests under the public site path
|
|
|
|
Primary concern:
|
|
|
|
- application routing
|
|
- auth/session enforcement
|
|
- Oracle API execution
|
|
- CRM/intelligence/inventory data access
|
|
|
|
### GPU box
|
|
|
|
Role:
|
|
|
|
- hosts ComfyUI
|
|
- hosts heavy model runtime
|
|
- hosts Ollama / Nemoclaw execution plane
|
|
- stores runtime/model payloads on NVMe only
|
|
|
|
Primary concern:
|
|
|
|
- inference
|
|
- media generation
|
|
- model serving
|
|
- agent runtime workloads
|
|
|
|
### Ingress
|
|
|
|
Role:
|
|
|
|
- stable public entry for GPU-backed services
|
|
- hides raw GPU host details from application code
|
|
|
|
Non-negotiable rule:
|
|
|
|
- never wire Oracle or frontend code to a raw GPU public IP
|
|
|
|
## Oracle Canvas Current Execution Path
|
|
|
|
The production-safe Oracle path is now:
|
|
|
|
1. User submits prompt from Oracle Canvas frontend.
|
|
2. Frontend calls:
|
|
- `/api/oracle/v1/canvas-pages/{page_id}/prompts`
|
|
3. FastAPI Oracle orchestrator:
|
|
- loads user context
|
|
- retrieves best codebook matches
|
|
- builds a safe retrieval plan
|
|
- queries approved datasets from PostgreSQL
|
|
- produces JSON Canvas components
|
|
- commits a page revision
|
|
4. Frontend reloads/reconciles the canvas state and renders the new blocks.
|
|
|
|
## Current Oracle Backend Families
|
|
|
|
### Live today
|
|
|
|
- `/api/oracle/v1/me`
|
|
- `/api/oracle/v1/canvas-pages/{page_id}`
|
|
- `/api/oracle/v1/canvas-pages/{page_id}/prompts`
|
|
- `/api/oracle/v1/canvas-pages/{page_id}/forks`
|
|
- `/api/oracle/v1/canvas-pages/{page_id}/rollback`
|
|
- `/api/oracle/v1/canvas-pages/{page_id}/revisions`
|
|
- `/api/oracle/v1/component-templates`
|
|
- `/api/oracle/v1/component-templates/synthesize`
|
|
- `/api/oracle/v1/merge-requests`
|
|
- `/api/oracle/v1/merge-requests/{mr_id}/review`
|
|
- `/ws/oracle/canvas/{page_id}`
|
|
|
|
### Template taxonomy routes
|
|
|
|
- `/api/oracle/template-chapters`
|
|
- `/api/oracle/template-subchapters`
|
|
- `/api/oracle/component-templates`
|
|
- `/api/oracle/component-templates/{id}`
|
|
- `/api/oracle/component-templates/{id}/seed`
|
|
- `/api/oracle/component-templates/synthetic-jobs`
|
|
|
|
## Prompt Analysis Path
|
|
|
|
Oracle should not rely on one monolithic LLM call.
|
|
|
|
The correct production split is:
|
|
|
|
1. codebook retrieval
|
|
2. safe dataset selection
|
|
3. optional LLM planning
|
|
4. live DB fetch
|
|
5. JSON Canvas synthesis
|
|
6. revision commit
|
|
|
|
### Why this split is correct
|
|
|
|
- It reduces hallucination in UI structure.
|
|
- It keeps DB access whitelisted and auditable.
|
|
- It allows Oracle to keep working even when the LLM runtime is degraded.
|
|
- It keeps the Oracle Canvas deterministic enough for operational use.
|
|
|
|
## Current Model Routing Truth
|
|
|
|
### Present reality
|
|
|
|
The current Oracle backend has these runtime modes:
|
|
|
|
- `codebook_retrieval`
|
|
- preferred when the prompt clearly matches the Oracle template corpus
|
|
- `nemoclaw_hosted`
|
|
- used when `NEMOCLAW_API_URL` and `NEMOCLAW_API_KEY` are configured and reachable
|
|
- `deterministic_fallback`
|
|
- used when the LLM planner is unavailable
|
|
|
|
### What Nemoclaw currently means in code
|
|
|
|
Current dispatch abstraction:
|
|
|
|
- `backend/services/nemoclaw_runtime.py`
|
|
|
|
This file is still a light dispatch envelope, not a fully featured provider router.
|
|
|
|
### Recommended production provider stack
|
|
|
|
Provider order:
|
|
|
|
1. codebook retrieval layer
|
|
2. Nemoclaw planner endpoint
|
|
3. local Ollama fallback
|
|
4. deterministic fallback
|
|
|
|
## Recommended Ollama Model Policy
|
|
|
|
### Default planning / Oracle analysis model
|
|
|
|
Use a local reasoning-capable model behind Ollama when Nemoclaw is not available or when the team wants deterministic private execution.
|
|
|
|
Recommended candidate:
|
|
|
|
- `qwen3.6:35b-a3b`
|
|
|
|
Reason:
|
|
|
|
- strong agentic coding and structured reasoning profile
|
|
- local execution path through Ollama
|
|
- realistic fit for GPU-box-hosted inference
|
|
|
|
### Deployment command
|
|
|
|
Example:
|
|
|
|
```bash
|
|
ollama run qwen3.6:35b-a3b
|
|
```
|
|
|
|
### Routing rule
|
|
|
|
- Oracle prompt planning:
|
|
- small to medium prompts: local Ollama `qwen3.6:35b-a3b`
|
|
- larger multi-step analytical plans: Nemoclaw planner if available
|
|
- Coding-agent batch workloads:
|
|
- Ollama first for local/private jobs
|
|
- Nemoclaw for heavier orchestration when the runtime is healthy
|
|
|
|
## Runtime LLM API
|
|
|
|
The backend now exposes a first-class runtime LLM family:
|
|
|
|
- `GET /api/runtime/llm/providers`
|
|
- `POST /api/runtime/llm/chat`
|
|
- `POST /api/runtime/llm/batch`
|
|
- `GET /api/runtime/llm/jobs/{job_id}`
|
|
- `GET /api/runtime/llm/jobs/{job_id}/results`
|
|
|
|
This router is mounted in:
|
|
|
|
- `backend/api/routes_runtime_llm.py`
|
|
|
|
The current persistence path uses the existing canonical table:
|
|
|
|
- `workflow_agent_runs`
|
|
|
|
That means batch jobs are now persisted against the live Velocity schema without requiring a new table family before the first production rollout.
|
|
|
|
## Implemented Batch Processing API
|
|
|
|
This is no longer only a proposal. The following contract family exists now and can be used by Oracle or future coding-agent surfaces.
|
|
|
|
### Single request inference
|
|
|
|
- `POST /api/runtime/llm/chat`
|
|
|
|
Payload:
|
|
|
|
```json
|
|
{
|
|
"provider": "ollama",
|
|
"model": "qwen3.6:35b-a3b",
|
|
"system_prompt": "You are Oracle Planner.",
|
|
"messages": [
|
|
{ "role": "user", "content": "Build a CRM pipeline view for high-intent NRI buyers." }
|
|
],
|
|
"temperature": 0.2,
|
|
"response_format": "json"
|
|
}
|
|
```
|
|
|
|
### Batch submission
|
|
|
|
- `POST /api/runtime/llm/batch`
|
|
|
|
Payload:
|
|
|
|
```json
|
|
{
|
|
"provider": "ollama",
|
|
"model": "qwen3.6:35b-a3b",
|
|
"job_type": "oracle_canvas_planning",
|
|
"items": [
|
|
{
|
|
"request_id": "req_001",
|
|
"messages": [
|
|
{ "role": "user", "content": "Show overdue high-QD follow-ups." }
|
|
],
|
|
"response_format": "json"
|
|
},
|
|
{
|
|
"request_id": "req_002",
|
|
"messages": [
|
|
{ "role": "user", "content": "Build a Kolkata luxury inventory comparison block." }
|
|
],
|
|
"response_format": "json"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Batch status
|
|
|
|
- `GET /api/runtime/llm/jobs/{job_id}`
|
|
|
|
Response:
|
|
|
|
```json
|
|
{
|
|
"job_id": "job_123",
|
|
"status": "running",
|
|
"provider": "ollama",
|
|
"model": "qwen3.6:35b-a3b",
|
|
"submitted_count": 2,
|
|
"completed_count": 1,
|
|
"failed_count": 0
|
|
}
|
|
```
|
|
|
|
### Batch results
|
|
|
|
- `GET /api/runtime/llm/jobs/{job_id}/results`
|
|
|
|
### Providers inventory
|
|
|
|
- `GET /api/runtime/llm/providers`
|
|
|
|
Example response:
|
|
|
|
```json
|
|
{
|
|
"providers": [
|
|
{
|
|
"id": "nemoclaw",
|
|
"status": "online",
|
|
"models": ["nemotron", "remote_default"]
|
|
},
|
|
{
|
|
"id": "ollama",
|
|
"status": "online",
|
|
"models": ["qwen3.6:35b-a3b"]
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
## Batch Processing Design Rules
|
|
|
|
1. Batch jobs must be persisted.
|
|
2. Batch items must be individually addressable by `request_id`.
|
|
3. Every batch job must record:
|
|
- provider
|
|
- model
|
|
- submitted payload hash
|
|
- start/end timestamps
|
|
- failure reason
|
|
4. Oracle must not block the main request thread for large batches.
|
|
5. Any DB writeback generated from a batch must go through approval tables, not direct execution.
|
|
|
|
## Oracle-Specific Runtime Policy
|
|
|
|
For Oracle Canvas, the LLM is not the source of truth for data.
|
|
|
|
The source of truth order is:
|
|
|
|
1. canonical DB tables
|
|
2. approved dataset projections
|
|
3. codebook template corpus
|
|
4. model planner
|
|
|
|
The model is only allowed to:
|
|
|
|
- classify intent
|
|
- choose likely component families
|
|
- propose layout direction
|
|
- summarize findings
|
|
|
|
The model is not allowed to:
|
|
|
|
- invent database facts
|
|
- bypass dataset allowlists
|
|
- emit arbitrary executable code into production rendering paths
|
|
|
|
## Current Production Readiness Assessment
|
|
|
|
### Ready now
|
|
|
|
- Oracle Canvas frontend-to-backend v1 route family
|
|
- codebook-backed template retrieval path
|
|
- safe DB execution gateway
|
|
- merge/fork/revision path
|
|
- deterministic fallback path
|
|
- runtime LLM provider inventory
|
|
- runtime single-chat execution
|
|
- runtime persisted batch execution through `workflow_agent_runs`
|
|
- Oracle planner fallback through the shared runtime LLM service
|
|
|
|
### Still needs explicit implementation if the team approves
|
|
|
|
- per-model selection UI in Catalyst or Oracle controls
|
|
- dedicated `runtime_llm_jobs` / `runtime_llm_job_items` tables if the team wants stronger audit/query ergonomics than `workflow_agent_runs`
|
|
- explicit Nemoclaw vs Ollama operator switch in a production admin surface
|
|
- richer provider health telemetry beyond simple reachability
|
|
|
|
## Recommended Next Build Steps
|
|
|
|
1. Add a dedicated runtime router:
|
|
- `backend/api/routes_runtime_llm.py`
|
|
2. Add DB tables:
|
|
- `runtime_llm_jobs`
|
|
- `runtime_llm_job_items`
|
|
- `runtime_llm_job_results`
|
|
3. Implement provider adapters:
|
|
- Nemoclaw adapter
|
|
- Ollama adapter
|
|
4. Expose provider status to Catalyst/Oracle settings surfaces.
|
|
5. Keep Oracle Canvas on the current codebook-first path even after LLM batching exists.
|
|
|
|
## Bottom Line
|
|
|
|
Oracle Canvas should be treated as a codebook-guided analytical surface with optional LLM planning, not as a raw chat-to-SQL toy.
|
|
|
|
The production-safe architecture is:
|
|
|
|
- Linux origin runs the application and DB access
|
|
- GPU box runs ComfyUI and model inference
|
|
- Oracle retrieves from the merged codebook first
|
|
- DB access stays whitelisted
|
|
- Nemoclaw and Ollama sit behind a documented provider interface
|
|
- batch processing is a separate runtime service contract, not an implicit side effect of the canvas endpoint
|