Files
Project_Velocity/.Agent Context/Sprint 1/nemoclaw_setup_truth.md
2026-04-12 02:02:58 +05:30

328 lines
9.3 KiB
Markdown

# NemoClaw Setup Truth
Updated: April 2, 2026
## 1. Purpose
This document records the actual NemoClaw-related deployment state for Project Velocity. It explains what exists, where it exists, why it exists, which ports are involved, and how the reasoning path works today.
This is not the original intended architecture. This is the current operational truth.
## 2. High-Level Summary
Project Velocity uses the term "NemoClaw" for the reasoning and prompt layer attached to the Sentinel QD Engine. In practice, this is now split into two different concerns:
1. Prompted reasoning used by the FastAPI backend
2. OpenShell / gateway infrastructure that remains installed on the AWS node
The active FastAPI inference path is NVIDIA-hosted OpenAI-compatible chat completions.
The OpenShell gateway and Ollama are still installed and running as adjacent infrastructure, but they are not the active primary scoring path used by `backend/services/nemoclaw_client.py`.
## 3. Node and Network Truth
AWS region: `us-east-1`
Current public IP: `54.152.236.10`
SSH user: `ubuntu`
### Port Map
`22`
SSH access to the AWS node.
`443`
nginx TLS reverse proxy. Public entry point for the backend.
`127.0.0.1:8001`
FastAPI/Uvicorn backend. Not directly public.
`127.0.0.1:5432`
PostgreSQL. Local-only.
`8080`
OpenShell/NemoClaw gateway target. Internal service path for gateway bootstrap and sandbox-related flows.
`11434`
Local Ollama runtime. Installed and reachable on the node, but not the current primary backend scoring path.
`/api/videos/marketing`
Backend catalog endpoint for Sentinel live-session marketing videos.
## 4. File and Directory Layout
### NVMe-backed runtime directories
`/opt/dlami/nvme/velocity/current`
Active backend code.
`/opt/dlami/nvme/velocity/env`
Environment file used by `velocity-backend.service`.
`/opt/dlami/nvme/velocity/venv`
Python virtual environment for the backend.
`/opt/dlami/nvme/velocity/tls`
TLS cert and key used by nginx.
`/opt/dlami/nvme/nemoclaw/prompts`
Prompt files used by the backend reasoning client.
`/opt/dlami/nvme/assets/videos`
Runtime marketing-video directory served by FastAPI static assets.
`/opt/dlami/nvme/assets/videos/catalog.json`
Optional checked catalog that controls video ordering, labels, and display metadata for the live-session picker.
`/opt/dlami/nvme/pgdata/14/velocity`
PostgreSQL 14 data directory.
### Repo paths
`backend/services/nemoclaw_client.py`
Primary reasoning client used by the FastAPI backend.
`backend/routers/videos.py`
Marketing-video catalog endpoint for the Sentinel live-session picker.
`backend/config/marketing_videos.catalog.json`
Checked source catalog for the four current property walkthrough videos.
`backend/nemoclaw_prompts/qd_calculator.md`
QD scoring prompt.
`backend/nemoclaw_prompts/lead_tagger.md`
Lead enrichment prompt.
`backend/nemoclaw_prompts/cctv_profiler.md`
CCTV vehicle and plate profiling prompt.
`backend/scripts/nemoclaw_deploy.sh`
Historical deployment/bootstrap script for OpenShell/Ollama-style setup. Useful as reference, but no longer fully aligned with the active NVIDIA-primary truth.
## 5. Services
### `velocity-backend.service`
Purpose:
Runs FastAPI/Uvicorn from the NVMe release tree.
Why it exists:
Provides the production API and WebSocket layer for Sentinel, Vault, Scenes, CCTV, and Auth.
Key behavior:
- Reads `/opt/dlami/nvme/velocity/env`
- Starts `uvicorn backend.main:app --host 127.0.0.1 --port 8001`
### `nemoclaw-velocity.service`
Purpose:
Bootstraps the OpenShell/NemoClaw gateway state.
Why it exists:
Keeps the local gateway selection and related tooling available on the node even though FastAPI currently scores against NVIDIA directly.
Current truth:
- Implemented as a non-blocking `oneshot` systemd unit
- Leaves the service in `active (exited)` when successful
### `nginx`
Purpose:
TLS reverse proxy for the backend.
Why it exists:
Exposes the backend on `443`, terminates TLS, and forwards both HTTP and WebSocket traffic to Uvicorn.
### `postgresql@14-velocity.service`
Purpose:
Owns the NVMe-backed PostgreSQL cluster.
Why it exists:
The Sentinel and Vault flows persist state in PostgreSQL, not Supabase.
## 6. Environment Variables
Active variables relevant to NemoClaw reasoning:
`NVIDIA_API_KEY`
Used by the backend to authenticate against NVIDIA hosted completions.
`NVIDIA_BASE_URL`
Set to `https://integrate.api.nvidia.com/v1`
`NVIDIA_MODEL`
Set to `nvidia/nemotron-3-super-120b-a12b`
`NVIDIA_FALLBACK_MODEL`
Set to `nvidia/llama-3.3-nemotron-super-49b-v1`
`ALLOW_LOCAL_FALLBACK`
Currently `false`
`NEMOCLAW_PROMPT_DIR`
Set to `/opt/dlami/nvme/nemoclaw/prompts`
Historical-but-not-primary variables:
`OLLAMA_BASE_URL`
Still relevant if local fallback is re-enabled.
`NEMOCLAW_BASE_URL`
No longer the primary path for backend scoring.
## 7. Inference Flow
### Current backend inference flow
1. Frontend emits biometric packet over `/api/sentinel/ws/perception`
2. `backend/routers/sentinel.py` receives the packet
3. Scene context is resolved from `video_scene_maps` if `video_asset_id` and `video_ts_ms` are present
4. `backend/services/nemoclaw_client.py` builds an OpenAI-compatible messages payload
5. The backend calls NVIDIA hosted completions using `nvidia/nemotron-3-super-120b-a12b`
6. The result updates QD score state and is broadcast back over WebSocket
### Current lead-tagging flow
1. Broker or system calls `/api/sentinel/tag-lead`
2. `tag_lead()` uses the NVIDIA path
3. Lead tags are updated in `leads_intelligence`
4. `LEAD_TAGGED` is broadcast to notifications
### Current CCTV flow
1. OCR/bridge posts to `/api/cctv/event`
2. `profile_cctv_visitor()` uses the NVIDIA path
3. `cctv_events` row is written
4. Session evidence is updated
5. Session can later be finalized through auto-mode matching
### Current live-session video flow
1. Frontend calls `GET /api/videos/marketing`
2. Backend reads `/opt/dlami/nvme/assets/videos/catalog.json` if present
3. Backend falls back to scanning `/opt/dlami/nvme/assets/videos` recursively for playable files if the catalog is missing or incomplete
4. FastAPI serves the MP4 files through `/assets/videos/...`
5. `SentinelLiveSession.tsx` renders smaller preview cards that autoplay in 3-second bursts on hover and advance 10 seconds between bursts
6. `PerceptionPlayer.tsx` plays the selected asset through the same `/assets/videos/...` path
## 8. OpenShell and Ollama Truth
OpenShell and Ollama still matter, but in a narrower way than originally planned.
### Ollama
Location:
Runs locally on port `11434`
Why it still exists:
- Historical deployment compatibility
- Potential local fallback if NVIDIA is disabled
- OpenShell-related infrastructure expectations
### OpenShell gateway
Location:
Gateway target on port `8080`
Why it still exists:
- NemoClaw sandbox bootstrap
- Local gateway control path
- Operational continuity for the previously onboarded sandbox
What it is not:
- It is not the current primary inference path for backend scoring
## 9. Prompts
Prompt source-of-truth in repo:
- `backend/nemoclaw_prompts/qd_calculator.md`
- `backend/nemoclaw_prompts/lead_tagger.md`
- `backend/nemoclaw_prompts/cctv_profiler.md`
Prompt runtime location on node:
- `/opt/dlami/nvme/nemoclaw/prompts/qd_calculator.md`
- `/opt/dlami/nvme/nemoclaw/prompts/lead_tagger.md`
- `/opt/dlami/nvme/nemoclaw/prompts/cctv_profiler.md`
Why copied to NVMe:
- Keeps runtime prompts off the root volume
- Aligns with the NVMe-first deployment strategy
- Prevents storage-eviction regressions
## 10. Known Operational Risks
### JSON compliance risk
The NVIDIA model sometimes returns malformed or partially malformed JSON for the full QD prompt. The backend now includes partial-response recovery, but this is the biggest remaining correctness risk.
### Dynamic IP risk
The public IP has changed during execution. A stable Elastic IP or DNS entry is still recommended.
### Trust-chain risk
nginx TLS exists, but a production-trusted certificate should replace self-signed cert material.
### External producer gap
The OCR bridge script exists, but a production ONVIF/RTSP/OCR producer still needs to be pointed at the ingestion endpoint.
### Catalog drift risk
If new property videos are copied to NVMe without updating `catalog.json`, they will still be discoverable through directory scanning, but order, title, and display color may drift from the intended broker-facing presentation.
## 11. Validation Commands
Health:
```bash
curl -k https://54.152.236.10/health
curl -k https://54.152.236.10/api/videos/marketing
```
Backend service:
```bash
sudo systemctl status velocity-backend.service
```
Gateway bootstrap:
```bash
sudo systemctl status nemoclaw-velocity.service
```
PostgreSQL:
```bash
sudo systemctl status postgresql@14-velocity.service
sudo -u postgres psql -d velocity -c '\dt'
```
Local inference health from backend env:
```bash
source /opt/dlami/nvme/velocity/env
PYTHONPATH=/opt/dlami/nvme/velocity/current /opt/dlami/nvme/velocity/venv/bin/python - <<'PY'
import asyncio, json
from backend.services.nemoclaw_client import health_check
print(asyncio.run(health_check()))
PY
```
## 12. What to Update If the Truth Changes
Update this document whenever any of the following change:
- Public IP or DNS target
- Primary inference provider
- Primary model
- Prompt directory
- nginx port or TLS behavior
- OpenShell gateway port
- service unit names
- NVMe runtime paths