Files
Project_Velocity/.Agent Context/Oracle Canvas Runtime and Ollama Batch Architecture.md

9.5 KiB

Oracle Canvas Runtime and Ollama Batch Architecture

Date: 2026-04-19
Repo: Project_Velocity

Purpose

This document defines the current production Oracle Canvas runtime path, the intended Ollama/Nemoclaw model-routing strategy, and the target batch-processing API shape the team can use if Velocity exposes Oracle or coding-agent capabilities through the local model stack.

This is the operator and engineering artifact. It exists to remove ambiguity.

Runtime Topology

Linux origin box

Role:

  • hosts Velocity frontend
  • hosts FastAPI backend
  • hosts PostgreSQL and application services
  • terminates app-origin requests under the public site path

Primary concern:

  • application routing
  • auth/session enforcement
  • Oracle API execution
  • CRM/intelligence/inventory data access

GPU box

Role:

  • hosts ComfyUI
  • hosts heavy model runtime
  • hosts Ollama / Nemoclaw execution plane
  • stores runtime/model payloads on NVMe only

Primary concern:

  • inference
  • media generation
  • model serving
  • agent runtime workloads

Ingress

Role:

  • stable public entry for GPU-backed services
  • hides raw GPU host details from application code

Non-negotiable rule:

  • never wire Oracle or frontend code to a raw GPU public IP

Oracle Canvas Current Execution Path

The production-safe Oracle path is now:

  1. User submits prompt from Oracle Canvas frontend.
  2. Frontend calls:
    • /api/oracle/v1/canvas-pages/{page_id}/prompts
  3. FastAPI Oracle orchestrator:
    • loads user context
    • retrieves best codebook matches
    • builds a safe retrieval plan
    • queries approved datasets from PostgreSQL
    • produces JSON Canvas components
    • commits a page revision
  4. Frontend reloads/reconciles the canvas state and renders the new blocks.

Current Oracle Backend Families

Live today

  • /api/oracle/v1/me
  • /api/oracle/v1/canvas-pages/{page_id}
  • /api/oracle/v1/canvas-pages/{page_id}/prompts
  • /api/oracle/v1/canvas-pages/{page_id}/forks
  • /api/oracle/v1/canvas-pages/{page_id}/rollback
  • /api/oracle/v1/canvas-pages/{page_id}/revisions
  • /api/oracle/v1/component-templates
  • /api/oracle/v1/component-templates/synthesize
  • /api/oracle/v1/merge-requests
  • /api/oracle/v1/merge-requests/{mr_id}/review
  • /ws/oracle/canvas/{page_id}

Template taxonomy routes

  • /api/oracle/template-chapters
  • /api/oracle/template-subchapters
  • /api/oracle/component-templates
  • /api/oracle/component-templates/{id}
  • /api/oracle/component-templates/{id}/seed
  • /api/oracle/component-templates/synthetic-jobs

Prompt Analysis Path

Oracle should not rely on one monolithic LLM call.

The correct production split is:

  1. codebook retrieval
  2. safe dataset selection
  3. optional LLM planning
  4. live DB fetch
  5. JSON Canvas synthesis
  6. revision commit

Why this split is correct

  • It reduces hallucination in UI structure.
  • It keeps DB access whitelisted and auditable.
  • It allows Oracle to keep working even when the LLM runtime is degraded.
  • It keeps the Oracle Canvas deterministic enough for operational use.

Current Model Routing Truth

Present reality

The current Oracle backend has these runtime modes:

  • codebook_retrieval
    • preferred when the prompt clearly matches the Oracle template corpus
  • nemoclaw_hosted
    • used when NEMOCLAW_API_URL and NEMOCLAW_API_KEY are configured and reachable
  • deterministic_fallback
    • used when the LLM planner is unavailable

What Nemoclaw currently means in code

Current dispatch abstraction:

  • backend/services/nemoclaw_runtime.py

This file is still a light dispatch envelope, not a fully featured provider router.

Provider order:

  1. codebook retrieval layer
  2. Nemoclaw planner endpoint
  3. local Ollama fallback
  4. deterministic fallback

Default planning / Oracle analysis model

Use a local reasoning-capable model behind Ollama when Nemoclaw is not available or when the team wants deterministic private execution.

Recommended candidate:

  • qwen3.6:35b-a3b

Reason:

  • strong agentic coding and structured reasoning profile
  • local execution path through Ollama
  • realistic fit for GPU-box-hosted inference

Deployment command

Example:

ollama run qwen3.6:35b-a3b

Routing rule

  • Oracle prompt planning:
    • small to medium prompts: local Ollama qwen3.6:35b-a3b
    • larger multi-step analytical plans: Nemoclaw planner if available
  • Coding-agent batch workloads:
    • Ollama first for local/private jobs
    • Nemoclaw for heavier orchestration when the runtime is healthy

Runtime LLM API

The backend now exposes a first-class runtime LLM family:

  • GET /api/runtime/llm/providers
  • POST /api/runtime/llm/chat
  • POST /api/runtime/llm/batch
  • GET /api/runtime/llm/jobs/{job_id}
  • GET /api/runtime/llm/jobs/{job_id}/results

This router is mounted in:

  • backend/api/routes_runtime_llm.py

The current persistence path uses the existing canonical table:

  • workflow_agent_runs

That means batch jobs are now persisted against the live Velocity schema without requiring a new table family before the first production rollout.

Implemented Batch Processing API

This is no longer only a proposal. The following contract family exists now and can be used by Oracle or future coding-agent surfaces.

Single request inference

  • POST /api/runtime/llm/chat

Payload:

{
  "provider": "ollama",
  "model": "qwen3.6:35b-a3b",
  "system_prompt": "You are Oracle Planner.",
  "messages": [
    { "role": "user", "content": "Build a CRM pipeline view for high-intent NRI buyers." }
  ],
  "temperature": 0.2,
  "response_format": "json"
}

Batch submission

  • POST /api/runtime/llm/batch

Payload:

{
  "provider": "ollama",
  "model": "qwen3.6:35b-a3b",
  "job_type": "oracle_canvas_planning",
  "items": [
    {
      "request_id": "req_001",
      "messages": [
        { "role": "user", "content": "Show overdue high-QD follow-ups." }
      ],
      "response_format": "json"
    },
    {
      "request_id": "req_002",
      "messages": [
        { "role": "user", "content": "Build a Kolkata luxury inventory comparison block." }
      ],
      "response_format": "json"
    }
  ]
}

Batch status

  • GET /api/runtime/llm/jobs/{job_id}

Response:

{
  "job_id": "job_123",
  "status": "running",
  "provider": "ollama",
  "model": "qwen3.6:35b-a3b",
  "submitted_count": 2,
  "completed_count": 1,
  "failed_count": 0
}

Batch results

  • GET /api/runtime/llm/jobs/{job_id}/results

Providers inventory

  • GET /api/runtime/llm/providers

Example response:

{
  "providers": [
    {
      "id": "nemoclaw",
      "status": "online",
      "models": ["nemotron", "remote_default"]
    },
    {
      "id": "ollama",
      "status": "online",
      "models": ["qwen3.6:35b-a3b"]
    }
  ]
}

Batch Processing Design Rules

  1. Batch jobs must be persisted.
  2. Batch items must be individually addressable by request_id.
  3. Every batch job must record:
    • provider
    • model
    • submitted payload hash
    • start/end timestamps
    • failure reason
  4. Oracle must not block the main request thread for large batches.
  5. Any DB writeback generated from a batch must go through approval tables, not direct execution.

Oracle-Specific Runtime Policy

For Oracle Canvas, the LLM is not the source of truth for data.

The source of truth order is:

  1. canonical DB tables
  2. approved dataset projections
  3. codebook template corpus
  4. model planner

The model is only allowed to:

  • classify intent
  • choose likely component families
  • propose layout direction
  • summarize findings

The model is not allowed to:

  • invent database facts
  • bypass dataset allowlists
  • emit arbitrary executable code into production rendering paths

Current Production Readiness Assessment

Ready now

  • Oracle Canvas frontend-to-backend v1 route family
  • codebook-backed template retrieval path
  • safe DB execution gateway
  • merge/fork/revision path
  • deterministic fallback path
  • runtime LLM provider inventory
  • runtime single-chat execution
  • runtime persisted batch execution through workflow_agent_runs
  • Oracle planner fallback through the shared runtime LLM service

Still needs explicit implementation if the team approves

  • per-model selection UI in Catalyst or Oracle controls
  • dedicated runtime_llm_jobs / runtime_llm_job_items tables if the team wants stronger audit/query ergonomics than workflow_agent_runs
  • explicit Nemoclaw vs Ollama operator switch in a production admin surface
  • richer provider health telemetry beyond simple reachability
  1. Add a dedicated runtime router:
    • backend/api/routes_runtime_llm.py
  2. Add DB tables:
    • runtime_llm_jobs
    • runtime_llm_job_items
    • runtime_llm_job_results
  3. Implement provider adapters:
    • Nemoclaw adapter
    • Ollama adapter
  4. Expose provider status to Catalyst/Oracle settings surfaces.
  5. Keep Oracle Canvas on the current codebook-first path even after LLM batching exists.

Bottom Line

Oracle Canvas should be treated as a codebook-guided analytical surface with optional LLM planning, not as a raw chat-to-SQL toy.

The production-safe architecture is:

  • Linux origin runs the application and DB access
  • GPU box runs ComfyUI and model inference
  • Oracle retrieves from the merged codebook first
  • DB access stays whitelisted
  • Nemoclaw and Ollama sit behind a documented provider interface
  • batch processing is a separate runtime service contract, not an implicit side effect of the canvas endpoint