The complete code integration is done. Co-authored-by: Sagnik <sagnik7896@gmail.com> Reviewed-on: #18
12 KiB
NemoClaw Setup Truth
Updated: April 12, 2026
1. Purpose
This document records the actual NemoClaw-related deployment state for Project Velocity. It explains what exists, where it exists, why it exists, which ports are involved, and how the reasoning path works today.
This is not the original intended architecture. This is the current operational truth.
2. High-Level Summary
Project Velocity uses the term "NemoClaw" for the reasoning and prompt layer attached to the Sentinel QD Engine. In practice, this is now split into three different concerns:
- Prompted reasoning used by the FastAPI backend
- OpenShell / gateway infrastructure that remains installed on the AWS node
- Python-native append layers used by Oracle planning, MCP-style tool registration, and workflow dispatch preview
The active FastAPI inference path is NVIDIA-hosted OpenAI-compatible chat completions.
The OpenShell gateway and Ollama are still installed and running as adjacent infrastructure, but they are not the active primary scoring path used by backend/services/nemoclaw_client.py.
The root codebase now also includes Python-native compatibility layers inspired by Sourik's Go runtime:
backend/services/nemoclaw_runtime.pybackend/services/mcp_registry.pybackend/oracle/persona_service.py
These append the current root without replacing the active NVIDIA-hosted inference path.
3. Node and Network Truth
AWS region: us-east-1
Current public IP: 54.152.236.10
SSH user: ubuntu
Port Map
22
SSH access to the AWS node.
443
nginx TLS reverse proxy. Public entry point for the backend.
127.0.0.1:8001
FastAPI/Uvicorn backend. Not directly public.
127.0.0.1:5432
PostgreSQL. Local-only.
8080
OpenShell/NemoClaw gateway target. Internal service path for gateway bootstrap and sandbox-related flows.
11434
Local Ollama runtime. Installed and reachable on the node, but not the current primary backend scoring path.
/api/videos/marketing
Backend catalog endpoint for Sentinel live-session marketing videos.
4. File and Directory Layout
NVMe-backed runtime directories
/opt/dlami/nvme/velocity/current
Active backend code.
/opt/dlami/nvme/velocity/env
Environment file used by velocity-backend.service.
/opt/dlami/nvme/velocity/venv
Python virtual environment for the backend.
/opt/dlami/nvme/velocity/tls
TLS cert and key used by nginx.
/opt/dlami/nvme/nemoclaw/prompts
Prompt files used by the backend reasoning client.
/opt/dlami/nvme/assets/videos
Runtime marketing-video directory served by FastAPI static assets.
/opt/dlami/nvme/assets/videos/catalog.json
Optional checked catalog that controls video ordering, labels, and display metadata for the live-session picker.
/opt/dlami/nvme/pgdata/14/velocity
PostgreSQL 14 data directory.
Repo paths
backend/services/nemoclaw_client.py
Primary reasoning client used by the FastAPI backend.
backend/services/nemoclaw_runtime.py
Python-native append layer for workflow dispatch planning, webhook verification, and claim-style helper behavior.
backend/services/mcp_registry.py
Python-native MCP/search tool registry append layer used by Oracle helper surfaces.
backend/oracle/persona_service.py
Subordinate Oracle persona planning layer that recommends component templates, renders prompt assets, and augments Oracle v1.
backend/api/routes_crm.py
Root PostgreSQL-first CRM append layer for leads, chat_logs, kanban, and analytics routes.
backend/api/routes_oracle.py
Root Oracle helper append layer for workflow preview and MCP tool discovery.
backend/oracle/router_v1.py
Mounted Oracle v1 API surface for canvas, prompts, persona helpers, and collaboration.
backend/routers/videos.py
Marketing-video catalog endpoint for the Sentinel live-session picker.
backend/config/marketing_videos.catalog.json
Checked source catalog for the four current property walkthrough videos.
backend/nemoclaw_prompts/qd_calculator.md
QD scoring prompt.
backend/nemoclaw_prompts/lead_tagger.md
Lead enrichment prompt.
backend/nemoclaw_prompts/cctv_profiler.md
CCTV vehicle and plate profiling prompt.
backend/scripts/nemoclaw_deploy.sh
Historical deployment/bootstrap script for OpenShell/Ollama-style setup. Useful as reference, but no longer fully aligned with the active NVIDIA-primary truth.
5. Services
velocity-backend.service
Purpose: Runs FastAPI/Uvicorn from the NVMe release tree.
Why it exists: Provides the production API and WebSocket layer for Sentinel, Vault, Scenes, CCTV, and Auth.
Key behavior:
- Reads
/opt/dlami/nvme/velocity/env - Starts
uvicorn backend.main:app --host 127.0.0.1 --port 8001
nemoclaw-velocity.service
Purpose: Bootstraps the OpenShell/NemoClaw gateway state.
Why it exists: Keeps the local gateway selection and related tooling available on the node even though FastAPI currently scores against NVIDIA directly.
Current truth:
- Implemented as a non-blocking
oneshotsystemd unit - Leaves the service in
active (exited)when successful
nginx
Purpose: TLS reverse proxy for the backend.
Why it exists:
Exposes the backend on 443, terminates TLS, and forwards both HTTP and WebSocket traffic to Uvicorn.
postgresql@14-velocity.service
Purpose: Owns the NVMe-backed PostgreSQL cluster.
Why it exists: The Sentinel and Vault flows persist state in PostgreSQL, not Supabase.
6. Environment Variables
Active variables relevant to NemoClaw reasoning:
NVIDIA_API_KEY
Used by the backend to authenticate against NVIDIA hosted completions.
NVIDIA_BASE_URL
Set to https://integrate.api.nvidia.com/v1
NVIDIA_MODEL
Set to nvidia/nemotron-3-super-120b-a12b
NVIDIA_FALLBACK_MODEL
Set to nvidia/llama-3.3-nemotron-super-49b-v1
ALLOW_LOCAL_FALLBACK
Currently false
NEMOCLAW_PROMPT_DIR
Set to /opt/dlami/nvme/nemoclaw/prompts
Historical-but-not-primary variables:
OLLAMA_BASE_URL
Still relevant if local fallback is re-enabled.
NEMOCLAW_BASE_URL
No longer the primary path for backend scoring.
7. Inference Flow
Current backend inference flow
- Frontend emits biometric packet over
/api/sentinel/ws/perception backend/routers/sentinel.pyreceives the packet- Scene context is resolved from
video_scene_mapsifvideo_asset_idandvideo_ts_msare present backend/services/nemoclaw_client.pybuilds an OpenAI-compatible messages payload- The backend calls NVIDIA hosted completions using
nvidia/nemotron-3-super-120b-a12b - The result updates QD score state and is broadcast back over WebSocket
Current Oracle canvas planning append flow
- Frontend can call
/api/oracle/v1/canvas-pages/{pageId}/prompts backend/oracle/prompt_orchestrator.pybuilds a retrieval planbackend/oracle/persona_service.pyrecommends reusable component templates and emits a planning note blockbackend/services/nemoclaw_runtime.pyproduces a workflow dispatch preview for ComfyUI-backed executionbackend/oracle/data_access_gateway.pyruns only whitelisted PostgreSQL queries- Oracle commits the resulting components into the active canvas revision
Current CRM and analytics append flow
- Root FastAPI mounts
backend/api/routes_crm.py - Canonical root endpoints now exist for:
/api/leads/api/leads/demographics/api/chat-logs/api/kanban/board/api/kanban/move/api/analytics/sentiment-scatter
- These routes use the root asyncpg pool and PostgreSQL-first storage contract
- CRM WebSocket sync is still intentionally deferred
Current lead-tagging flow
- Broker or system calls
/api/sentinel/tag-lead tag_lead()uses the NVIDIA path- Lead tags are updated in
leads_intelligence LEAD_TAGGEDis broadcast to notifications
Current CCTV flow
- OCR/bridge posts to
/api/cctv/event profile_cctv_visitor()uses the NVIDIA pathcctv_eventsrow is written- Session evidence is updated
- Session can later be finalized through auto-mode matching
Current live-session video flow
- Frontend calls
GET /api/videos/marketing - Backend reads
/opt/dlami/nvme/assets/videos/catalog.jsonif present - Backend falls back to scanning
/opt/dlami/nvme/assets/videosrecursively for playable files if the catalog is missing or incomplete - FastAPI serves the MP4 files through
/assets/videos/... SentinelLiveSession.tsxrenders smaller preview cards that autoplay in 3-second bursts on hover and advance 10 seconds between burstsPerceptionPlayer.tsxplays the selected asset through the same/assets/videos/...path
8. OpenShell and Ollama Truth
OpenShell and Ollama still matter, but in a narrower way than originally planned.
Ollama
Location:
Runs locally on port 11434
Why it still exists:
- Historical deployment compatibility
- Potential local fallback if NVIDIA is disabled
- OpenShell-related infrastructure expectations
OpenShell gateway
Location:
Gateway target on port 8080
Why it still exists:
- NemoClaw sandbox bootstrap
- Local gateway control path
- Operational continuity for the previously onboarded sandbox
What it is not:
- It is not the current primary inference path for backend scoring
- It is not the root source of truth for Oracle or CRM orchestration
8.5 Python-Native Append Responsibilities
These are now part of root truth:
- Oracle persona prompt loading and render helpers live in Python, not Go
- MCP/search registration lives in Python, not Go
- Workflow dispatch planning for Oracle-to-Comfy orchestration lives in Python, not Go
- Claim-style helper behavior is appended in Python as a compatibility layer, not as a second backend center
What remains deferred:
- Full production webhook runtime parity with Sourik's Go stack
- Full external search provider execution inside the MCP layer
- Autonomous posting and non-root agent/webhook services
9. Prompts
Prompt source-of-truth in repo:
backend/nemoclaw_prompts/qd_calculator.mdbackend/nemoclaw_prompts/lead_tagger.mdbackend/nemoclaw_prompts/cctv_profiler.md
Prompt runtime location on node:
/opt/dlami/nvme/nemoclaw/prompts/qd_calculator.md/opt/dlami/nvme/nemoclaw/prompts/lead_tagger.md/opt/dlami/nvme/nemoclaw/prompts/cctv_profiler.md
Why copied to NVMe:
- Keeps runtime prompts off the root volume
- Aligns with the NVMe-first deployment strategy
- Prevents storage-eviction regressions
10. Known Operational Risks
JSON compliance risk
The NVIDIA model sometimes returns malformed or partially malformed JSON for the full QD prompt. The backend now includes partial-response recovery, but this is the biggest remaining correctness risk.
Dynamic IP risk
The public IP has changed during execution. A stable Elastic IP or DNS entry is still recommended.
Trust-chain risk
nginx TLS exists, but a production-trusted certificate should replace self-signed cert material.
External producer gap
The OCR bridge script exists, but a production ONVIF/RTSP/OCR producer still needs to be pointed at the ingestion endpoint.
Catalog drift risk
If new property videos are copied to NVMe without updating catalog.json, they will still be discoverable through directory scanning, but order, title, and display color may drift from the intended broker-facing presentation.
11. Validation Commands
Health:
curl -k https://54.152.236.10/health
curl -k https://54.152.236.10/api/videos/marketing
Backend service:
sudo systemctl status velocity-backend.service
Gateway bootstrap:
sudo systemctl status nemoclaw-velocity.service
PostgreSQL:
sudo systemctl status postgresql@14-velocity.service
sudo -u postgres psql -d velocity -c '\dt'
Local inference health from backend env:
source /opt/dlami/nvme/velocity/env
PYTHONPATH=/opt/dlami/nvme/velocity/current /opt/dlami/nvme/velocity/venv/bin/python - <<'PY'
import asyncio, json
from backend.services.nemoclaw_client import health_check
print(asyncio.run(health_check()))
PY
12. What to Update If the Truth Changes
Update this document whenever any of the following change:
- Public IP or DNS target
- Primary inference provider
- Primary model
- Prompt directory
- nginx port or TLS behavior
- OpenShell gateway port
- service unit names
- NVMe runtime paths