feat: Oracle Canvas, Revision History and Canvas Sharing (#33)
Co-authored-by: Sagnik <sagnik7896@gmail.com> Reviewed-on: #33
This commit was merged in pull request #33.
This commit is contained in:
494
.Agent Context/Desineuron AWS Coding Runtime Truth Book.md
Normal file
494
.Agent Context/Desineuron AWS Coding Runtime Truth Book.md
Normal file
@@ -0,0 +1,494 @@
|
||||
# Desineuron AWS Coding Runtime Truth Book
|
||||
|
||||
Date: 2026-04-22
|
||||
Scope: Coding runtime, Roo Code access, NemoClaw runtime, ingress routing, GPU recovery, model staging
|
||||
|
||||
## 1. Current Runtime Truth
|
||||
|
||||
The Desineuron shared coding runtime has been cut over from Ollama to SGLang while preserving the public contracts already used by the team.
|
||||
|
||||
Locked production decisions:
|
||||
|
||||
- Public contract remains stable.
|
||||
- GPU inference remains on the AWS GPU worker, not on the Linux-origin box.
|
||||
- Linux-origin remains the control plane.
|
||||
- Ingress remains the stable routed entrypoint.
|
||||
- `Qwen 3.6 35B A3B` remains the production target model for the current `4 x L4` rollout.
|
||||
- `NemoClaw` moves onto the same shared runtime.
|
||||
- There is no production fallback to Ollama after cutover.
|
||||
|
||||
Current live public routes:
|
||||
|
||||
- `https://velocity.desineuron.in/llm`
|
||||
- `https://llm.desineuron.in`
|
||||
|
||||
Current live API shape after cutover:
|
||||
|
||||
- `https://velocity.desineuron.in/llm/v1/models`
|
||||
- `https://velocity.desineuron.in/llm/v1/chat/completions`
|
||||
- `https://llm.desineuron.in/v1/models`
|
||||
- `https://llm.desineuron.in/v1/chat/completions`
|
||||
- GPU SGLang bind: `172.31.46.190:30100`
|
||||
- Linux-origin LLM route-sync target port: `30100`
|
||||
|
||||
## 2. Infra Split
|
||||
|
||||
### Linux-origin
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- owns route-sync logic
|
||||
- owns operational orchestration
|
||||
- updates ingress upstream target when GPU private IP changes
|
||||
- does not host the heavy model runtime
|
||||
|
||||
### Ingress
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- terminates public hostname
|
||||
- renders stable reverse-proxy contracts
|
||||
- forwards `/llm/*` and `llm.desineuron.in` to the current GPU target
|
||||
|
||||
### GPU worker
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- hosts SGLang
|
||||
- hosts model payloads on NVMe only
|
||||
- serves Roo Code, Oracle runtime, runtime LLM, and NemoClaw inference
|
||||
|
||||
Non-negotiable rules:
|
||||
|
||||
- do not use the GPU public IP directly
|
||||
- do not keep model state on root disk
|
||||
- keep all large model/runtime caches on GPU NVMe
|
||||
|
||||
## 3. Live Hardware Target
|
||||
|
||||
Current worker class:
|
||||
|
||||
- `g6.12xlarge`
|
||||
- `4 x NVIDIA L4`
|
||||
- `96 GB VRAM total`
|
||||
|
||||
Serving profile for this hardware:
|
||||
|
||||
- tensor parallel size `4`
|
||||
- prompt-prefix caching enabled
|
||||
- async / continuous batching enabled through SGLang
|
||||
- FlashInfer preferred where supported by the live CUDA stack
|
||||
|
||||
Measured validation on the live GPU worker:
|
||||
|
||||
- host class: `g6.12xlarge`
|
||||
- GPU layout: `4 x NVIDIA L4`
|
||||
- model path used for the validated runtime: `/opt/dlami/nvme/models/Qwen-Qwen3.6-35B-A3B-FP8`
|
||||
- SGLang served model ID used for the test: `qwen3.6-35b-a3b`
|
||||
- validated SGLang launch profile:
|
||||
- `--tp-size 4`
|
||||
- `--attention-backend flashinfer`
|
||||
- `--context-length 131072`
|
||||
- `--mem-fraction-static 0.88`
|
||||
- `--dist-init-addr 127.0.0.1:50000`
|
||||
- `--enable-metrics`
|
||||
- required bind rule on this SGLang build:
|
||||
- public HTTP server must bind to the GPU private IP, not `0.0.0.0`
|
||||
- internal scheduler keeps a loopback listener on the API port
|
||||
- wildcard bind collides with that loopback listener on this build
|
||||
- public validation after cutover:
|
||||
- `https://velocity.desineuron.in/llm/v1/models` returns `200`
|
||||
- `https://llm.desineuron.in/v1/models` returns `200`
|
||||
- streamed chat TTFT through public ingress measured at about `2.36 s`
|
||||
- one short non-stream completion measured about `33.86 completion tok/s`
|
||||
|
||||
## 4. Production Model Policy
|
||||
|
||||
### Primary production model
|
||||
|
||||
- user-facing family: `Qwen 3.6 35B A3B`
|
||||
- exact SGLang served model ID: `qwen3.6-35b-a3b`
|
||||
|
||||
Why it remains live:
|
||||
|
||||
- fits the current `4 x L4` target
|
||||
- already aligned with current team workflows
|
||||
- suitable for coding/runtime use while the SGLang migration lands
|
||||
- measured well enough for three concurrent coding users on the current hardware
|
||||
|
||||
### Staged future model on current L4 hardware
|
||||
|
||||
- `cyankiwi/Qwen3.5-122B-A10B-AWQ-4bit`
|
||||
|
||||
Status:
|
||||
|
||||
- acquisition/staging path is added
|
||||
- not the live runtime on the current L4 cutover
|
||||
- should be treated as a staged artifact for later runtime experimentation and hardware-fit validation
|
||||
|
||||
Why this is the right 122B staging path for the current worker:
|
||||
|
||||
- `4 x L4` is a better fit for an AWQ/int4 track than for an NVFP4 track
|
||||
- this keeps the 122B experiment aligned with current hardware instead of assuming a Blackwell-oriented path
|
||||
|
||||
Why `txn545/Qwen3.5-122B-A10B-NVFP4` is not the active choice on L4:
|
||||
|
||||
- NVFP4 is not the safe default for the current L4 rollout
|
||||
- if the team wants that track later, it should be treated as a separate hardware/runtime validation branch
|
||||
|
||||
Why no 122B model is the active live model in this round:
|
||||
|
||||
- the current migration is locked to preserving service continuity on the existing `4 x L4` worker
|
||||
- the 122B track is a separate performance-fit and runtime-tuning exercise
|
||||
|
||||
## 5. Runtime Software Stack
|
||||
|
||||
Primary runtime after cutover:
|
||||
|
||||
- `SGLang`
|
||||
|
||||
Primary interface style:
|
||||
|
||||
- OpenAI-compatible `/v1/*`
|
||||
|
||||
Required runtime features:
|
||||
|
||||
- tensor parallel across all four GPUs
|
||||
- prefix cache / prompt cache
|
||||
- async scheduling
|
||||
- continuous batching
|
||||
- FlashInfer when supported by the live driver/runtime stack
|
||||
|
||||
Observed runtime note from the live bring-up:
|
||||
|
||||
- FlashInfer required `ninja-build` on the GPU box because it JIT-builds kernels on first run.
|
||||
- The current GPU image needed:
|
||||
- `ninja-build`
|
||||
- `build-essential`
|
||||
- After installing those packages, the FP8 runtime came up cleanly and served OpenAI-compatible traffic.
|
||||
|
||||
If stock SGLang underperforms:
|
||||
|
||||
- keep the same public routes
|
||||
- tune CUDA/runtime behavior behind the same routed contract
|
||||
- do not reintroduce Ollama fallback
|
||||
|
||||
## 6. Implemented Repo Changes
|
||||
|
||||
### Backend runtime service
|
||||
|
||||
File:
|
||||
|
||||
- `backend/services/runtime_llm_service.py`
|
||||
|
||||
Current state:
|
||||
|
||||
- provider catalog is standardized to `sglang`
|
||||
- legacy provider names like `ollama` and `nemoclaw` are mapped into `sglang` to avoid immediate caller breakage
|
||||
- model discovery uses `/v1/models`
|
||||
|
||||
### NemoClaw client
|
||||
|
||||
File:
|
||||
|
||||
- `backend/services/nemoclaw_client.py`
|
||||
|
||||
Current state:
|
||||
|
||||
- production path now targets the shared SGLang/OpenAI-compatible endpoint
|
||||
- NVIDIA and Ollama production fallback logic is removed from the runtime path
|
||||
- legacy env names still seed config where needed
|
||||
|
||||
### Prompt expander
|
||||
|
||||
File:
|
||||
|
||||
- `comfy_engine/scripts/prompt_expander.py`
|
||||
|
||||
Current state:
|
||||
|
||||
- now uses the shared OpenAI-compatible runtime instead of Ollama `/api/generate`
|
||||
|
||||
### NemoClaw deploy helper
|
||||
|
||||
File:
|
||||
|
||||
- `backend/scripts/nemoclaw_deploy.sh`
|
||||
|
||||
Current state:
|
||||
|
||||
- rewritten around SGLang-compatible inference
|
||||
- no Ollama-era deployment assumptions
|
||||
|
||||
## 7. Route Sync And Stable Hostnames
|
||||
|
||||
Route-sync files:
|
||||
|
||||
- `infrastructure/desineuron_ingress/sync_llm_route.py`
|
||||
- `infrastructure/desineuron_ingress/run_llm_route_sync.sh`
|
||||
- `infrastructure/desineuron_ingress/desineuron-llm-route-sync.service`
|
||||
- `infrastructure/desineuron_ingress/desineuron-llm-route-sync.timer`
|
||||
- `infrastructure/desineuron_ingress/install_linux_llm_route_sync.sh`
|
||||
|
||||
Important behavior:
|
||||
|
||||
- Linux-origin discovers the current GPU private IP
|
||||
- Linux-origin updates ingress-managed route state
|
||||
- ingress forwards `llm.desineuron.in` and `/llm/*` to the GPU worker
|
||||
|
||||
Current safe default route-sync port in the repo:
|
||||
|
||||
- `11434`
|
||||
|
||||
Reason:
|
||||
|
||||
- the repo now contains the SGLang installer and watchdog, but the public route should not auto-cut from Ollama to SGLang until the GPU runtime is actually installed and validated on-host
|
||||
- when SGLang is installed on the GPU worker, operators should flip `LLM_ROUTE_PORT` to the live SGLang port and then run route-sync
|
||||
|
||||
Manual operator-safe route sync entrypoint:
|
||||
|
||||
- `/usr/local/bin/run_llm_route_sync.sh`
|
||||
|
||||
This avoids the prior failure mode where operators accidentally used a system Python without `boto3`.
|
||||
|
||||
## 8. GPU Watchdog And Auto-Recovery
|
||||
|
||||
Added GPU-side scripts:
|
||||
|
||||
- `infrastructure/desineuron_ingress/install_gpu_sglang_runtime.sh`
|
||||
- `infrastructure/desineuron_ingress/install_gpu_sglang_watchdog.sh`
|
||||
|
||||
Installed unit names expected on the GPU worker:
|
||||
|
||||
- `desineuron-sglang.service`
|
||||
- `desineuron-sglang-watchdog.service`
|
||||
- `desineuron-sglang-watchdog.timer`
|
||||
|
||||
Recovery policy:
|
||||
|
||||
- ensure the SGLang service is running
|
||||
- verify `/v1/models` health locally
|
||||
- if the configured model path is missing, rehydrate from the canonical source
|
||||
- only report healthy after successful verification
|
||||
|
||||
Required recovery assertions for the SGLang watchdog:
|
||||
|
||||
- confirm the process is serving `/v1/models`
|
||||
- confirm the returned model list contains `qwen3.6-35b-a3b`
|
||||
- confirm all 4 GPUs are engaged during model load
|
||||
- confirm FlashInfer dependencies are present before declaring runtime healthy
|
||||
|
||||
## 9. Model Rehydration And Staging
|
||||
|
||||
Added staging helper:
|
||||
|
||||
- `infrastructure/desineuron_ingress/acquire_qwen35_122b_nvfp4.sh`
|
||||
|
||||
Purpose:
|
||||
|
||||
- stages `cyankiwi/Qwen3.5-122B-A10B-AWQ-4bit` onto GPU NVMe by default
|
||||
- does not automatically flip production traffic to that model
|
||||
|
||||
Expected current live model path style:
|
||||
|
||||
- `/opt/dlami/nvme/models/Qwen-Qwen3.6-35B-A3B-FP8`
|
||||
|
||||
Expected staged 122B path style:
|
||||
|
||||
- `/opt/dlami/nvme/models/cyankiwi-Qwen3.5-122B-A10B-AWQ-4bit`
|
||||
|
||||
## 10. Roo Code Team Setup
|
||||
|
||||
After SGLang cutover, team members should stop using the Ollama provider mode for Desineuron-hosted inference.
|
||||
|
||||
Canonical team profile:
|
||||
|
||||
- API Provider: OpenAI-compatible / custom OpenAI
|
||||
- Base URL: `https://llm.desineuron.in/v1`
|
||||
- Model: `qwen3.6-35b-a3b`
|
||||
- Temperature: `0.1` to `0.2`
|
||||
- Server context ceiling: `131072`
|
||||
- Recommended Roo context: `131072`
|
||||
|
||||
Team decision for this wave:
|
||||
|
||||
- all three team members can target `128K` context through the same shared runtime
|
||||
- if real concurrent repo-heavy usage causes OOM or latency regression, the first rollback knob is the client context setting, not the model family
|
||||
- the current production-ready long-context path is pure VRAM on `4 x L4`, not host-RAM spill
|
||||
|
||||
## 11. Measured SGLang Performance
|
||||
|
||||
Benchmark date:
|
||||
|
||||
- `2026-04-22`
|
||||
|
||||
Benchmark topology:
|
||||
|
||||
- live AWS GPU worker
|
||||
- `SGLang + Qwen 3.6 35B A3B FP8`
|
||||
- tensor parallel `4`
|
||||
- FlashInfer enabled
|
||||
- async scheduler / SGLang default continuous batching path
|
||||
- prompt-prefix caching available in runtime
|
||||
- server context ceiling: `131072`
|
||||
|
||||
Measured results:
|
||||
|
||||
- time to first token: `0.12 s`
|
||||
- streamed completion wall time for a short coding/planning answer: `1.31 s`
|
||||
- test concurrency: `3`
|
||||
- aggregate wall time for `3 x 256-token` responses: `3.61 s`
|
||||
- aggregate completion tokens: `768`
|
||||
- aggregate prompt tokens: `168`
|
||||
- aggregate total tokens: `936`
|
||||
- aggregate completion throughput: `212.76 tokens/s`
|
||||
|
||||
Per-request timing under `3` concurrent requests:
|
||||
|
||||
- request 1: `3.608 s` for `256` completion tokens
|
||||
- request 2: `3.609 s` for `256` completion tokens
|
||||
- request 3: `3.608 s` for `256` completion tokens
|
||||
|
||||
Long-context smoke validation:
|
||||
|
||||
- prompt size validated: `50010` prompt tokens
|
||||
- completion size: `8` tokens
|
||||
- total request size: `50018` tokens
|
||||
- wall time: `8.345 s`
|
||||
|
||||
Operational interpretation:
|
||||
|
||||
- the runtime is fast enough for three simultaneous coding users
|
||||
- TTFT is already in the sub-200 ms range on the warmed runtime
|
||||
- aggregate decode throughput is materially better than the previous Ollama-backed path while holding a `128K` server context ceiling
|
||||
- `Qwen 3.6 35B A3B` is the correct production choice for the current one-week delivery window
|
||||
|
||||
## 12. Cutover Guidance
|
||||
|
||||
Use this model ID consistently across SGLang-facing clients:
|
||||
|
||||
- `qwen3.6-35b-a3b`
|
||||
|
||||
Do not use this older Ollama-style model ID against SGLang:
|
||||
|
||||
- `qwen3.6:35b-a3b`
|
||||
|
||||
Why:
|
||||
|
||||
- SGLang rejects colons in `served_model_name`
|
||||
- the colon is reserved internally for adapter syntax
|
||||
|
||||
Backend compatibility note:
|
||||
|
||||
- the Velocity backend can still map legacy provider naming internally
|
||||
- external Roo Code and OpenAI-compatible clients should use the hyphenated SGLang model ID only
|
||||
|
||||
Canonical Roo configuration:
|
||||
|
||||
- API Provider: `OpenAI-compatible` or `Custom OpenAI`
|
||||
- Base URL: `https://llm.desineuron.in/v1`
|
||||
- Model: `qwen3.6-35b-a3b`
|
||||
- Context window: `131072`
|
||||
- Temperature: `0.1` to `0.2`
|
||||
|
||||
Recommended initial values:
|
||||
|
||||
- `Base URL`: `https://llm.desineuron.in/v1`
|
||||
- `Model`: `qwen3.6-35b-a3b`
|
||||
- `Context Window Size (num_ctx equivalent)`: `131072`
|
||||
|
||||
Do not use:
|
||||
|
||||
- Ollama provider mode pointing at the public Desineuron route after the cutover
|
||||
|
||||
Reason:
|
||||
|
||||
- the stable contract is moving to SGLang's OpenAI-compatible interface
|
||||
|
||||
## 13. Most Efficient Working Long-Context Strategy On Current Hardware
|
||||
|
||||
Strategies tested against the live `4 x L4` worker:
|
||||
|
||||
1. Pure-VRAM `131072` context on SGLang with tensor parallel `4`
|
||||
Result:
|
||||
|
||||
- works
|
||||
- preserves sub-200 ms TTFT on warm short prompts
|
||||
- preserved about `212.76 tok/s` aggregate completion throughput in the 3-user benchmark
|
||||
|
||||
2. Hierarchical host-memory cache with `131072` context
|
||||
Result:
|
||||
|
||||
- not production-safe on the current stack for this model
|
||||
- first failed on a model-specific `page_size=1` requirement for the hybrid Mamba cache
|
||||
- second attempt progressed further but one rank died with exit code `-9`
|
||||
- current interpretation: this path is materially less stable than the pure-VRAM profile
|
||||
|
||||
Current decision:
|
||||
|
||||
- keep `131072` in VRAM as the production target
|
||||
- do not use host-RAM hierarchical cache for this model in the current rollout
|
||||
- if more headroom is needed later, tune kernels and scheduling first before re-opening host-memory spill
|
||||
|
||||
## 14. NemoClaw Runtime Policy
|
||||
|
||||
NemoClaw should use the same shared SGLang runtime as:
|
||||
|
||||
- Roo Code
|
||||
- Oracle runtime
|
||||
- backend runtime LLM jobs
|
||||
|
||||
This is a deliberate single-stack decision:
|
||||
|
||||
- one serving runtime
|
||||
- one model family for the current wave
|
||||
- one stable routed contract
|
||||
|
||||
If later profiles differ, express that with config, not with a second serving stack in this phase.
|
||||
|
||||
## 15. Endpoint Checklist
|
||||
|
||||
These should work after cutover:
|
||||
|
||||
- `https://velocity.desineuron.in/llm/v1/models`
|
||||
- `https://velocity.desineuron.in/llm/v1/chat/completions`
|
||||
- `https://llm.desineuron.in/v1/models`
|
||||
- `https://llm.desineuron.in/v1/chat/completions`
|
||||
|
||||
Internal backend envs:
|
||||
|
||||
- `LLM_BASE_URL`
|
||||
- `SGLANG_BASE_URL`
|
||||
- `SGLANG_CHAT_URL`
|
||||
- `SGLANG_MODELS_URL`
|
||||
- `SGLANG_MODEL`
|
||||
- `SGLANG_API_TOKEN`
|
||||
|
||||
## 16. What Is Left
|
||||
|
||||
Still required to complete the migration end to end:
|
||||
|
||||
1. Persist the `131072` launch profile into the GPU systemd runtime using the updated installer.
|
||||
2. Reinstall or update the GPU watchdog so it validates the same `131072` service profile.
|
||||
3. Repoint Linux-origin route-sync env from `11434` to the live SGLang port after GPU validation.
|
||||
4. Validate both public routes against `/v1/models`.
|
||||
5. Run one more public-route benchmark through ingress after cutover to capture real routed TTFT.
|
||||
6. Generate tuned L4-specific runtime configs if we want to push further on throughput without lowering context.
|
||||
7. Keep the 122B track separate; it is not part of the current production coding-runtime choice.
|
||||
|
||||
## 17. Team Hand-Off
|
||||
|
||||
For Roo Code today, once cutover is complete, the team only needs:
|
||||
|
||||
- Base URL: `https://llm.desineuron.in/v1`
|
||||
- Model: `qwen3.6-35b-a3b`
|
||||
- Context window: `131072`
|
||||
- Provider type: OpenAI-compatible
|
||||
|
||||
For operators, the important truth is:
|
||||
|
||||
- Linux-origin controls routing
|
||||
- ingress owns the stable hostname
|
||||
- GPU box owns inference
|
||||
- NVMe owns model state
|
||||
- SGLang is the production runtime
|
||||
@@ -0,0 +1,10 @@
|
||||
# Deprecated Title
|
||||
|
||||
This document has been superseded by:
|
||||
|
||||
- [Desineuron AWS Coding Runtime Truth Book](F:\Workin In Progress\DESINEURON\GITLAB\Project_Velocity\.Agent Context\Desineuron AWS Coding Runtime Truth Book.md)
|
||||
|
||||
Reason:
|
||||
|
||||
- the coding runtime is no longer being tracked as an Ollama-only Qwen note
|
||||
- the canonical truth now covers SGLang, Roo Code access, NemoClaw runtime, route-sync, watchdog recovery, and staged support for `txn545/Qwen3.5-122B-A10B-NVFP4`
|
||||
891
.Agent Context/README.md
Normal file
891
.Agent Context/README.md
Normal file
@@ -0,0 +1,891 @@
|
||||
# Project Velocity — Truthbook
|
||||
|
||||
> **What this is:** The single source of truth for Project Velocity. If it's written down here, it's how the system works — not how someone hoped it would work.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [What Is Project Velocity](#what-is-project-velocity)
|
||||
2. [Quick Start](#quick-start)
|
||||
3. [Architecture Overview](#architecture-overview)
|
||||
4. [Runtime Truth](#runtime-truth)
|
||||
5. [Team Setup](#team-setup)
|
||||
6. [GPU & Model Runtime](#gpu--model-runtime)
|
||||
7. [Infrastructure](#infrastructure)
|
||||
8. [Runbooks](#runbooks)
|
||||
9. [API Reference](#api-reference)
|
||||
10. [Contributing](#contributing)
|
||||
|
||||
---
|
||||
|
||||
## What Is Project Velocity
|
||||
|
||||
Project Velocity is a multi-agent AI development platform. It orchestrates intelligent agents (powered by Qwen 3.6 35B A3B and other models) to collaborate on software engineering tasks — code generation, review, testing, deployment — as a coordinated team rather than isolated tools.
|
||||
|
||||
**Why it exists:** Single-agent coding tools hit a ceiling. They lack context persistence, cross-task coordination, and operational reliability. Velocity solves this by:
|
||||
|
||||
- **Multi-agent collaboration** — Agents communicate via WebSocket channels and shared memory
|
||||
- **Persistent state** — PostgreSQL backs user data, CRM records, and agent memory
|
||||
- **GPU-accelerated inference** — Local Ollama runtime on NVIDIA GPU hardware
|
||||
- **Role-based access control** — Admin and standard user tiers with avatar support
|
||||
- **Live event broadcasting** — Real-time campaign and catalyst events via WebSocket
|
||||
|
||||
**Core stack:**
|
||||
|
||||
| Layer | Technology |
|
||||
|-------|-----------|
|
||||
| Backend API | Python / FastAPI |
|
||||
| Database | PostgreSQL (via `databases` library with connection pooling) |
|
||||
| Frontend | React 19 + TypeScript + Vite + Tailwind CSS + Framer Motion |
|
||||
| Inference | Ollama (Qwen 3.6 35B A3B primary model) |
|
||||
| Real-time | WebSocket (Catalyst channel, CRM channel) |
|
||||
| Deployment | systemd services on Linux with NVIDIA GPU |
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- **GPU Machine:** NVIDIA GPU with sufficient VRAM (≥16GB recommended for Qwen 3.6 35B A3B)
|
||||
- **NVMe Storage:** For model weights and cache
|
||||
- **Linux OS:** Ubuntu 22.04+ or equivalent
|
||||
- **Python 3.11+:** Backend runtime
|
||||
- **Node.js 18+:** Frontend build
|
||||
- **Ollama:** Latest stable with Qwen 3.6 35B A3B model pulled
|
||||
- **PostgreSQL 15+:** Database backend
|
||||
|
||||
### One-Line Bootstrap
|
||||
|
||||
```bash
|
||||
bash bootstrap/setup.sh
|
||||
```
|
||||
|
||||
This script handles:
|
||||
1. GPU driver verification
|
||||
2. Ollama installation and model pull
|
||||
3. PostgreSQL setup
|
||||
4. Backend dependency installation
|
||||
5. Frontend dependency installation
|
||||
6. systemd service creation
|
||||
|
||||
### Manual Setup
|
||||
|
||||
#### 1. GPU & Ollama
|
||||
|
||||
```bash
|
||||
# Verify GPU
|
||||
nvidia-smi
|
||||
|
||||
# Install Ollama
|
||||
curl -fsSL https://ollama.ai/install.sh | sh
|
||||
|
||||
# Pull the primary model
|
||||
ollama pull qwen3.6:35b-a3b
|
||||
|
||||
# Verify model is loaded
|
||||
curl http://localhost:11434/api/tags | jq '.models[] | select(.name == "qwen3.6:35b-a3b")'
|
||||
```
|
||||
|
||||
#### 2. Database
|
||||
|
||||
```bash
|
||||
# Start PostgreSQL
|
||||
sudo systemctl start postgresql
|
||||
|
||||
# Create database and user
|
||||
psql -U postgres -c "CREATE DATABASE velocity;"
|
||||
psql -U postgres -c "CREATE USER velocity WITH PASSWORD 'secure_password';"
|
||||
psql -U postgres -c "GRANT ALL PRIVILEGES ON DATABASE velocity TO velocity;"
|
||||
```
|
||||
|
||||
#### 3. Backend
|
||||
|
||||
```bash
|
||||
cd Project_Velocity/backend
|
||||
|
||||
# Install dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Configure environment
|
||||
cp .env.example .env
|
||||
# Edit .env with your database credentials and secrets
|
||||
|
||||
# Run migrations
|
||||
python migrate.py
|
||||
|
||||
# Start server
|
||||
uvicorn main:app --host 0.0.0.0 --port 8000
|
||||
```
|
||||
|
||||
#### 4. Frontend
|
||||
|
||||
```bash
|
||||
cd Project_Velocity/app
|
||||
|
||||
# Install dependencies
|
||||
npm install
|
||||
|
||||
# Start dev server
|
||||
npm run dev
|
||||
```
|
||||
|
||||
Frontend is now available at `http://localhost:5173`.
|
||||
|
||||
#### 5. Verify Everything
|
||||
|
||||
```bash
|
||||
# Backend health
|
||||
curl http://localhost:8000/health
|
||||
|
||||
# Model availability
|
||||
curl http://localhost:11434/api/tags
|
||||
|
||||
# Frontend
|
||||
open http://localhost:5173
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
### System Diagram
|
||||
|
||||
```
|
||||
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
|
||||
│ React UI │────▶│ FastAPI │────▶│ PostgreSQL │
|
||||
│ (Port 5173)│◀────│ (Port 8000) │◀────│ (Port 5432)│
|
||||
└─────────────┘ └──────┬───────┘ └─────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────┐
|
||||
│ Ollama │
|
||||
│ (Port 11434) │
|
||||
│ Qwen 3.6 35B │
|
||||
└──────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────┐
|
||||
│ NVIDIA GPU │
|
||||
└──────────────┘
|
||||
```
|
||||
|
||||
### Component Breakdown
|
||||
|
||||
#### Backend (`backend/`)
|
||||
|
||||
[`main.py`](Project_Velocity/backend/main.py) — FastAPI application with:
|
||||
|
||||
- **Auth system** — Login, profile lookup, user listing, avatar upload
|
||||
- **WebSocket managers** — [`_CatalystManager()`](Project_Velocity/backend/main.py:296) and [`_CRMManager()`](Project_Velocity/backend/main.py:320) for real-time event broadcasting
|
||||
- **Connection pooling** — PostgreSQL via `databases` library with async context management
|
||||
- **Lifespan hooks** — [`lifespan()`](Project_Velocity/backend/main.py:83) initializes and cleans up resources
|
||||
|
||||
Key endpoints:
|
||||
|
||||
| Endpoint | Method | Purpose |
|
||||
|----------|--------|---------|
|
||||
| `/api/auth/login` | POST | Authenticate user |
|
||||
| `/api/auth/me` | GET | Get current user profile |
|
||||
| `/api/auth/users` | GET | List all users (admin) |
|
||||
| `/api/auth/profile/avatar` | POST | Upload profile avatar |
|
||||
| `/ws/catalyst` | WS | Catalyst event channel |
|
||||
| `/ws/crm` | WS | CRM event channel |
|
||||
| `/health` | GET | Health check |
|
||||
|
||||
#### Frontend (`app/`)
|
||||
|
||||
[`App.tsx`](Project_Velocity/app/src/App.tsx) — React application with:
|
||||
|
||||
- **Protected routes** — [`ProtectedRoute()`](Project_Velocity/app/src/App.tsx:66) wraps authenticated paths
|
||||
- **Route module sync** — [`RouteModuleSync()`](Project_Velocity/app/src/App.tsx:90) handles dynamic route loading
|
||||
- **Main layout** — [`MainLayout()`](Project_Velocity/app/src/App.tsx:90) provides chrome (header, sidebar, content area)
|
||||
- **Role rendering** — [`formatRoleLabel()`](Project_Velocity/app/src/App.tsx:379) converts role codes to display labels
|
||||
- **Auth state management** — Dual `useEffect` hooks handle token persistence and user fetch
|
||||
|
||||
#### Agent Context (`.Agent Context/`)
|
||||
|
||||
Documents that define how agents operate within Velocity:
|
||||
|
||||
- [`Qwen 3.6 35B A3B Ollama Access, Recovery, and Team Setup.md`](Project_Velocity/.Agent%20Context/Qwen%203.6%2035B%20A3B%20Ollama%20Access,%20Recovery,%20and%20Team%20Setup.md) — Model runtime, recovery policies, team onboarding
|
||||
- `README.md` — This file
|
||||
|
||||
#### Infrastructure (`.Infrastructure/`)
|
||||
|
||||
Deployment and operational documentation:
|
||||
|
||||
- systemd unit files for backend, frontend, Ollama services
|
||||
- Network configuration and ingress rules
|
||||
- Monitoring and alerting setup
|
||||
|
||||
---
|
||||
|
||||
## Runtime Truth
|
||||
|
||||
### What "Works" Means in Velocity
|
||||
|
||||
Velocity has three runtime layers, each with different failure modes:
|
||||
|
||||
#### Layer A: Fast Runtime Recovery
|
||||
|
||||
If the API crashes or restarts:
|
||||
- PostgreSQL connection pool rebuilds automatically via [`lifespan()`](Project_Velocity/backend/main.py:83)
|
||||
- WebSocket managers reinitialize and accept new connections
|
||||
- No data loss — all state is in PostgreSQL
|
||||
|
||||
#### Layer B: Model Rehydration Recovery
|
||||
|
||||
If Ollama loses the Qwen model:
|
||||
- Watchdog systemd unit detects absence via `/api/tags`
|
||||
- Auto-registers model from NVMe cache or S3 artifact storage
|
||||
- **Production requirement:** Same-run auto-hydration logic must complete before any agent request
|
||||
|
||||
#### Layer C: Full System Recovery
|
||||
|
||||
If everything goes down:
|
||||
1. PostgreSQL recovers WAL logs
|
||||
2. Ollama watchdog restores model
|
||||
3. Backend systemd unit restarts API
|
||||
4. Frontend rebuilds if artifacts are corrupted
|
||||
|
||||
### Critical Contracts
|
||||
|
||||
**Auth contract:**
|
||||
```
|
||||
Client → POST /api/auth/login {email, password}
|
||||
→ 200 OK {token, user}
|
||||
|
||||
Client → GET /api/auth/me (Authorization: Bearer <token>)
|
||||
→ 200 OK {id, email, role, avatar_url}
|
||||
→ 401 Unauthorized
|
||||
```
|
||||
|
||||
**WebSocket contract:**
|
||||
```
|
||||
Client → WS /ws/catalyst
|
||||
→ Accepts live events: {event_type, campaign_name, value, timestamp}
|
||||
|
||||
Client → WS /ws/crm
|
||||
→ Accepts CRM events: {type, payload, timestamp}
|
||||
```
|
||||
|
||||
**Model contract:**
|
||||
```
|
||||
Ollama → GET /api/tags returns qwen3.6:35b-a3b
|
||||
→ Context window: 131072 tokens
|
||||
→ Provider: OpenAI-compatible interface at http://localhost:11434/v1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Team Setup
|
||||
|
||||
### Developer Onboarding
|
||||
|
||||
#### 1. Clone & Bootstrap
|
||||
|
||||
```bash
|
||||
git clone <repo-url>
|
||||
cd Project_Velocity
|
||||
bash bootstrap/setup.sh
|
||||
```
|
||||
|
||||
#### 2. VS Code / Roo Code Configuration
|
||||
|
||||
Edit `.vscode/settings.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"roo-cline.provider": "openai-compatible",
|
||||
"roo-cline.baseUrl": "http://localhost:11434/v1",
|
||||
"roo-cline.modelId": "qwen3.6:35b-a3b",
|
||||
"roo-cline.contextWindow": 131072,
|
||||
"roo-cline.temperature": 0.7
|
||||
}
|
||||
```
|
||||
|
||||
#### 3. Verify Team Access
|
||||
|
||||
```bash
|
||||
# Backend health
|
||||
curl http://localhost:8000/health
|
||||
# Expected: {"status": "ok"}
|
||||
|
||||
# Model loaded
|
||||
curl http://localhost:11434/api/tags | jq -r '.models[].name'
|
||||
# Expected: qwen3.6:35b-a3b
|
||||
|
||||
# Frontend
|
||||
open http://localhost:5173
|
||||
# Expected: Login screen
|
||||
```
|
||||
|
||||
### Role Definitions
|
||||
|
||||
| Role | Access Level | Can Do |
|
||||
|------|-------------|--------|
|
||||
| `admin` | Full | User management, system config, agent orchestration |
|
||||
| `developer` | Standard | Code generation, review, testing |
|
||||
| `viewer` | Read-only | Dashboard, campaign monitoring |
|
||||
|
||||
### Performance Expectations
|
||||
|
||||
| Scenario | Tokens/sec | Latency |
|
||||
|----------|-----------|---------|
|
||||
| Single-stream (local GPU) | ~80-120 tok/s | ~200ms first token |
|
||||
| Two concurrent requests | ~60-90 tok/s each | ~300ms first token |
|
||||
| Four-way batch | ~40-60 tok/s each | ~500ms first token |
|
||||
|
||||
*Numbers vary by GPU hardware. Measure your setup.*
|
||||
|
||||
---
|
||||
|
||||
## GPU & Model Runtime
|
||||
|
||||
### Hardware Requirements
|
||||
|
||||
| Component | Minimum | Recommended |
|
||||
|-----------|---------|-------------|
|
||||
| GPU VRAM | 16GB | 24GB+ |
|
||||
| GPU Compute | Turing architecture | Ada Lovelace / Hopper |
|
||||
| NVMe Storage | 50GB free | 100GB+ NVMe Gen4 |
|
||||
| RAM | 32GB | 64GB+ |
|
||||
|
||||
### Ollama Watchdog
|
||||
|
||||
The watchdog is a systemd-managed service that ensures the Qwen model stays loaded:
|
||||
|
||||
**Location:** `.Infrastructure/systemd/ollama-watchdog.service`
|
||||
|
||||
**Behavior:**
|
||||
1. Every 60 seconds, queries `http://localhost:11434/api/tags`
|
||||
2. If `qwen3.6:35b-a3b` is absent, triggers rehydration
|
||||
3. Rehydration priority: NVMe cache → S3 artifact → remote pull
|
||||
4. Logs all actions to journalctl
|
||||
|
||||
**Manual watchdog check:**
|
||||
```bash
|
||||
sudo systemctl status ollama-watchdog
|
||||
journalctl -u ollama-watchdog --since "1 hour ago"
|
||||
```
|
||||
|
||||
### Model Hydration Strategies
|
||||
|
||||
| Strategy | Speed | Use Case |
|
||||
|----------|-------|----------|
|
||||
| NVMe local registration | ~2 seconds | Primary recovery path |
|
||||
| Local manifest `ollama create` | ~5 seconds | Fresh hydration from extracted weights |
|
||||
| S3 cold hydrate | ~60-300 seconds | No local cache available |
|
||||
|
||||
### Critical: What Watchdog Must NOT Do
|
||||
|
||||
- ❌ Delete model layers during recovery
|
||||
- ❌ Modify GPU memory directly
|
||||
- ❌ Block agent requests during hydration (graceful degradation only)
|
||||
- ❌ Restart Ollama process unless absolutely necessary
|
||||
|
||||
---
|
||||
|
||||
## Infrastructure
|
||||
|
||||
### Deployment Topology
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────┐
|
||||
│ Production Host │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
|
||||
│ │ Backend │ │ Frontend │ │ Ollama │ │
|
||||
│ │ :8000 │ │ :5173 │ │ :11434 │ │
|
||||
│ │ systemd │ │ nginx │ │ systemd │ │
|
||||
│ └────┬─────┘ └────┬─────┘ └──────┬───────┘ │
|
||||
│ │ │ │ │
|
||||
│ └─────────────┴───────────────┘ │
|
||||
│ │ │
|
||||
│ ┌──────▼───────┐ │
|
||||
│ │ PostgreSQL │ │
|
||||
│ │ :5432 │ │
|
||||
│ │ systemd │ │
|
||||
│ └──────────────┘ │
|
||||
│ │
|
||||
│ ┌──────────────────────────────────────────┐ │
|
||||
│ │ NVIDIA GPU (CUDA + TensorRT) │ │
|
||||
│ └──────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### systemd Services
|
||||
|
||||
| Service | File | Restart Policy |
|
||||
|---------|------|---------------|
|
||||
| Backend API | `velocity-backend.service` | always |
|
||||
| Frontend (nginx) | `velocity-frontend.service` | always |
|
||||
| Ollama | `ollama.service` | on-failure |
|
||||
| Watchdog | `ollama-watchdog.service` | always |
|
||||
| PostgreSQL | `postgresql.service` | on-failure |
|
||||
|
||||
### Network Rules
|
||||
|
||||
| Port | Protocol | Service | External Access |
|
||||
|------|----------|---------|-----------------|
|
||||
| 80 | HTTP | nginx → frontend | Yes (public) |
|
||||
| 443 | HTTPS | nginx → frontend | Yes (public) |
|
||||
| 8000 | TCP | FastAPI backend | No (internal only) |
|
||||
| 5173 | TCP | Vite dev server | No (dev only) |
|
||||
| 5432 | TCP | PostgreSQL | No (internal only) |
|
||||
| 11434 | TCP | Ollama API | No (internal only) |
|
||||
|
||||
### Monitoring
|
||||
|
||||
```bash
|
||||
# All service health
|
||||
systemctl status velocity-backend ollama postgresql
|
||||
|
||||
# GPU utilization
|
||||
nvidia-smi -l 1
|
||||
|
||||
# Model inference logs
|
||||
journalctl -u ollama -f
|
||||
|
||||
# API error rate
|
||||
curl -s http://localhost:8000/health | jq .
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Runbooks
|
||||
|
||||
### Runbook: Backend Crashes at 2 AM
|
||||
|
||||
**Symptom:** Frontend shows 500 errors on API calls.
|
||||
|
||||
**Steps:**
|
||||
|
||||
```bash
|
||||
# 1. Check backend status
|
||||
sudo systemctl status velocity-backend
|
||||
# Expected: active (running)
|
||||
|
||||
# 2. If stopped, restart
|
||||
sudo systemctl restart velocity-backend
|
||||
|
||||
# 3. Check logs for root cause
|
||||
sudo journalctl -u velocity-backend --since "30 minutes ago" --no-pager
|
||||
|
||||
# 4. Verify recovery
|
||||
curl http://localhost:8000/health
|
||||
# Expected: {"status": "ok"}
|
||||
|
||||
# 5. If crash repeats, check database connectivity
|
||||
psql -U velocity -d velocity -c "SELECT 1;"
|
||||
# Expected: 1
|
||||
```
|
||||
|
||||
**If still broken:**
|
||||
1. Check disk space: `df -h /`
|
||||
2. Check memory: `free -h`
|
||||
3. Check PostgreSQL: `sudo systemctl status postgresql`
|
||||
4. Escalate with logs from step 3
|
||||
|
||||
---
|
||||
|
||||
### Runbook: Ollama Model Disappeared
|
||||
|
||||
**Symptom:** Agents return empty responses or errors.
|
||||
|
||||
**Steps:**
|
||||
|
||||
```bash
|
||||
# 1. Check if Ollama is running
|
||||
sudo systemctl status ollama
|
||||
# Expected: active (running)
|
||||
|
||||
# 2. Check loaded models
|
||||
curl http://localhost:11434/api/tags | jq '.models[].name'
|
||||
# Expected: qwen3.6:35b-a3b
|
||||
|
||||
# 3. If model is missing, check watchdog
|
||||
sudo systemctl status ollama-watchdog
|
||||
journalctl -u ollama-watchdog --since "1 hour ago" --no-pager
|
||||
|
||||
# 4. Manual recovery if watchdog failed
|
||||
ollama pull qwen3.6:35b-a3b
|
||||
|
||||
# 5. Verify model is usable
|
||||
curl http://localhost:11434/api/generate -d '{
|
||||
"model": "qwen3.6:35b-a3b",
|
||||
"prompt": "Hello",
|
||||
"stream": false
|
||||
}' | jq .done
|
||||
# Expected: true
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Runbook: Database Connection Failures
|
||||
|
||||
**Symptom:** Backend logs show `connection refused` or `pool exhausted`.
|
||||
|
||||
**Steps:**
|
||||
|
||||
```bash
|
||||
# 1. Check PostgreSQL status
|
||||
sudo systemctl status postgresql
|
||||
# Expected: active (running)
|
||||
|
||||
# 2. Check connection count
|
||||
psql -U postgres -c "SELECT count(*) FROM pg_stat_activity;"
|
||||
# Should be < max_connections (default 100)
|
||||
|
||||
# 3. Check disk space for WAL files
|
||||
df -h /var/lib/postgresql
|
||||
|
||||
# 4. Restart if hung
|
||||
sudo systemctl restart postgresql
|
||||
|
||||
# 5. Verify backend reconnects
|
||||
sudo journalctl -u velocity-backend --since "1 minute ago" | grep -i "connected\|error"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Runbook: GPU Memory Exhaustion
|
||||
|
||||
**Symptom:** Ollama returns `out of memory` errors.
|
||||
|
||||
**Steps:**
|
||||
|
||||
```bash
|
||||
# 1. Check current GPU usage
|
||||
nvidia-smi
|
||||
# Note: PID, memory usage, temperature
|
||||
|
||||
# 2. Kill non-essential GPU processes if needed
|
||||
nvidia-smi --id=0 --query-compute-apps=pid,name,used_memory --format=csv
|
||||
kill <PID>
|
||||
|
||||
# 3. Check Ollama memory allocation
|
||||
ollama show qwen3.6:35b-a3b | grep -i "layer\|memory"
|
||||
|
||||
# 4. If still exhausted, reduce model quantization
|
||||
ollama pull qwen3.6:35b-a3b-q4_0
|
||||
|
||||
# 5. Monitor recovery
|
||||
watch -n 1 nvidia-smi
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Reference
|
||||
|
||||
### Auth Endpoints
|
||||
|
||||
#### `POST /api/auth/login`
|
||||
|
||||
Authenticate a user and receive a JWT token.
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"email": "user@example.com",
|
||||
"password": "secure_password"
|
||||
}
|
||||
```
|
||||
|
||||
**Response (200 OK):**
|
||||
```json
|
||||
{
|
||||
"token": "eyJhbGciOiJIUzI1NiIs...",
|
||||
"user": {
|
||||
"id": "uuid-here",
|
||||
"email": "user@example.com",
|
||||
"role": "developer",
|
||||
"avatar_url": null
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Errors:**
|
||||
| Status | Meaning |
|
||||
|--------|---------|
|
||||
| 401 | Invalid credentials |
|
||||
| 422 | Malformed request body |
|
||||
|
||||
---
|
||||
|
||||
#### `GET /api/auth/me`
|
||||
|
||||
Get the current authenticated user's profile.
|
||||
|
||||
**Headers:**
|
||||
```
|
||||
Authorization: Bearer <token>
|
||||
```
|
||||
|
||||
**Response (200 OK):**
|
||||
```json
|
||||
{
|
||||
"id": "uuid-here",
|
||||
"email": "user@example.com",
|
||||
"role": "developer",
|
||||
"avatar_url": "https://cdn.example.com/avatars/user.png"
|
||||
}
|
||||
```
|
||||
|
||||
**Errors:**
|
||||
| Status | Meaning |
|
||||
|--------|---------|
|
||||
| 401 | Token missing or invalid |
|
||||
| 403 | Token expired |
|
||||
|
||||
---
|
||||
|
||||
#### `GET /api/auth/users`
|
||||
|
||||
List all users in the system. Admin only.
|
||||
|
||||
**Headers:**
|
||||
```
|
||||
Authorization: Bearer <admin_token>
|
||||
```
|
||||
|
||||
**Response (200 OK):**
|
||||
```json
|
||||
[
|
||||
{
|
||||
"id": "uuid-1",
|
||||
"email": "admin@example.com",
|
||||
"role": "admin",
|
||||
"avatar_url": null
|
||||
},
|
||||
{
|
||||
"id": "uuid-2",
|
||||
"email": "dev@example.com",
|
||||
"role": "developer",
|
||||
"avatar_url": "https://cdn.example.com/avatars/dev.png"
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
**Errors:**
|
||||
| Status | Meaning |
|
||||
|--------|---------|
|
||||
| 403 | User is not admin |
|
||||
|
||||
---
|
||||
|
||||
#### `POST /api/auth/profile/avatar`
|
||||
|
||||
Upload a profile avatar image.
|
||||
|
||||
**Headers:**
|
||||
```
|
||||
Authorization: Bearer <token>
|
||||
Content-Type: multipart/form-data
|
||||
```
|
||||
|
||||
**Form Data:**
|
||||
| Field | Type | Required |
|
||||
|-------|------|----------|
|
||||
| avatar | file (image/jpeg, image/png) | Yes |
|
||||
|
||||
**Response (200 OK):**
|
||||
```json
|
||||
{
|
||||
"avatar_url": "https://cdn.example.com/avatars/new-avatar.png"
|
||||
}
|
||||
```
|
||||
|
||||
**Errors:**
|
||||
| Status | Meaning |
|
||||
|--------|---------|
|
||||
| 401 | Not authenticated |
|
||||
| 422 | Invalid file type or size > 5MB |
|
||||
|
||||
---
|
||||
|
||||
### WebSocket Endpoints
|
||||
|
||||
#### `WS /ws/catalyst`
|
||||
|
||||
Real-time channel for Catalyst events (agent coordination, task updates).
|
||||
|
||||
**Connection:**
|
||||
```javascript
|
||||
const ws = new WebSocket('ws://localhost:8000/ws/catalyst');
|
||||
ws.onmessage = (event) => {
|
||||
const data = JSON.parse(event.data);
|
||||
console.log(data.event_type, data.campaign_name, data.value);
|
||||
};
|
||||
```
|
||||
|
||||
**Event Format:**
|
||||
```json
|
||||
{
|
||||
"event_type": "task_complete",
|
||||
"campaign_name": "codegen-sprint-42",
|
||||
"value": 0.97,
|
||||
"timestamp": "2026-04-21T16:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### `WS /ws/crm`
|
||||
|
||||
Real-time channel for CRM events (customer interactions, lead updates).
|
||||
|
||||
**Connection:**
|
||||
```javascript
|
||||
const ws = new WebSocket('ws://localhost:8000/ws/crm');
|
||||
ws.onmessage = (event) => {
|
||||
const data = JSON.parse(event.data);
|
||||
console.log(data.type, data.payload);
|
||||
};
|
||||
```
|
||||
|
||||
**Event Format:**
|
||||
```json
|
||||
{
|
||||
"type": "lead_created",
|
||||
"payload": {
|
||||
"id": "crm-uuid",
|
||||
"name": "Acme Corp",
|
||||
"status": "new"
|
||||
},
|
||||
"timestamp": "2026-04-21T16:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Health Check
|
||||
|
||||
#### `GET /health`
|
||||
|
||||
Verify system health.
|
||||
|
||||
**Response (200 OK):**
|
||||
```json
|
||||
{
|
||||
"status": "ok",
|
||||
"database": "connected",
|
||||
"ollama": "available",
|
||||
"gpu": "present"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Contributing
|
||||
|
||||
### Code Structure
|
||||
|
||||
```
|
||||
Project_Velocity/
|
||||
├── .Agent Context/ # Agent documentation, model specs
|
||||
├── .Infrastructure/ # Deployment configs, systemd units
|
||||
├── backend/ # FastAPI backend
|
||||
│ ├── main.py # Application entry point
|
||||
│ ├── requirements.txt # Python dependencies
|
||||
│ └── migrate.py # Database migrations
|
||||
├── app/ # React frontend
|
||||
│ ├── src/
|
||||
│ │ ├── App.tsx # Root component
|
||||
│ │ └── ... # Components, routes, utils
|
||||
│ ├── package.json # Node dependencies
|
||||
│ └── vite.config.ts # Build config
|
||||
├── bootstrap/ # Setup scripts
|
||||
│ └── setup.sh # One-line bootstrap
|
||||
└── README.md # This file
|
||||
```
|
||||
|
||||
### Making a Contribution
|
||||
|
||||
1. **Fork and branch**
|
||||
```bash
|
||||
git checkout -b feature/your-feature-name
|
||||
```
|
||||
|
||||
2. **Make changes**
|
||||
- Backend: Follow FastAPI conventions, add type hints
|
||||
- Frontend: Follow React + TypeScript patterns, use existing components
|
||||
- Docs: Update this README if behavior changes
|
||||
|
||||
3. **Test locally**
|
||||
```bash
|
||||
# Backend tests
|
||||
cd backend && pytest
|
||||
|
||||
# Frontend checks
|
||||
cd app && npm run build
|
||||
```
|
||||
|
||||
4. **Submit PR**
|
||||
- Title: Clear, action-oriented
|
||||
- Description: What + Why + How to test
|
||||
- Link any related issues
|
||||
|
||||
### Documentation Standards
|
||||
|
||||
- **Every endpoint:** Document inputs, outputs, errors
|
||||
- **Every component:** JSDoc for public APIs
|
||||
- **Every runbook:** Write as if for on-call at 2am
|
||||
- **Every decision:** Record in `DECISIONS.md` with rationale
|
||||
|
||||
---
|
||||
|
||||
## Appendix
|
||||
|
||||
### A. Environment Variables
|
||||
|
||||
| Variable | Required | Description |
|
||||
|----------|----------|-------------|
|
||||
| `DATABASE_URL` | Yes | PostgreSQL connection string |
|
||||
| `SECRET_KEY` | Yes | JWT signing key |
|
||||
| `OLLAMA_BASE_URL` | No | Ollama API URL (default: `http://localhost:11434`) |
|
||||
| `GPU_ENABLED` | No | Enable GPU path (default: `true`) |
|
||||
| `LOG_LEVEL` | No | Logging level (default: `INFO`) |
|
||||
|
||||
### B. Troubleshooting Matrix
|
||||
|
||||
| Symptom | Likely Cause | Fix |
|
||||
|---------|-------------|-----|
|
||||
| Frontend blank screen | Backend down | `curl http://localhost:8000/health` |
|
||||
| 401 on all calls | Token expired | Re-login |
|
||||
| Agent returns empty | Model unloaded | `ollama pull qwen3.6:35b-a3b` |
|
||||
| Slow responses | GPU not used | Check `nvidia-smi`, verify CUDA |
|
||||
| Database errors | Pool exhausted | Check `max_connections`, restart backend |
|
||||
| WebSocket disconnects | Network issue | Check firewall, reverse proxy config |
|
||||
|
||||
### C. Useful Commands Cheat Sheet
|
||||
|
||||
```bash
|
||||
# Full system status
|
||||
systemctl status velocity-backend ollama postgresql ollama-watchdog
|
||||
|
||||
# GPU实时监控
|
||||
watch -n 1 nvidia-smi
|
||||
|
||||
# Model check
|
||||
curl http://localhost:11434/api/tags | jq '.models[].name'
|
||||
|
||||
# API health
|
||||
curl -s http://localhost:8000/health | jq .
|
||||
|
||||
# Database connection test
|
||||
psql -U velocity -d velocity -c "SELECT version();"
|
||||
|
||||
# Frontend rebuild
|
||||
cd app && npm run build && cp -r dist/* ../nginx/html/
|
||||
|
||||
# Restart everything (nuclear option)
|
||||
sudo systemctl restart velocity-backend ollama postgresql
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
> **Last verified:** 2026-04-21
|
||||
> **Maintained by:** Velocity Team
|
||||
> **If this doc is wrong, the system is broken. Fix the doc first.**
|
||||
324
.Agent/Context/Sprint 1/Sprint 1 Fact Table - 2026-04-21.md
Normal file
324
.Agent/Context/Sprint 1/Sprint 1 Fact Table - 2026-04-21.md
Normal file
@@ -0,0 +1,324 @@
|
||||
# Sprint 1 Fact Table — Updated 2026-04-21
|
||||
|
||||
> **Purpose**: Track what's done vs. what's left across all Project Velocity modules.
|
||||
> **Last Audit Date**: 2026-04-21 (full codebase review)
|
||||
> **Previous Version**: Sprint 1 Fact Table - 2026-04-12 (marked many items "Missing" that are now implemented)
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| **Total Backend Route Files** | 10 (`routes_crm.py`, `routes_crm_imports.py`, `routes_oracle.py`, `routes_oracle_templates.py`, `routes_catalyst.py`, `routes_inventory.py`, `routes_mobile_edge.py`, `routes_runtime_llm.py`, `routes_admin_surface.py`, `routes_weaver.py`) |
|
||||
| **Total Backend Services** | 5 (aggregation_service, ingest_service, ad_network_service, nemoclaw_runtime, runtime_llm_service) |
|
||||
| **Frontend Modules (React)** | 7 (Dashboard, Oracle, Sentinel, Inventory, Catalyst, CRM, Settings) + Admin page |
|
||||
| **iOS Apps** | 2 (velocity iPad app, velocity-iphone Edge app) |
|
||||
| **Infrastructure Layers** | 4 (aws_scale, blackbox_local, desineuron_ingress, ops_control_plane) |
|
||||
| **Test Coverage** | 10 test files across backend |
|
||||
|
||||
### Status Legend
|
||||
- ✅ **Done** — Fully implemented, functional code exists
|
||||
- 🔶 **Partial** — Core logic exists but needs refinement/completion
|
||||
- ❌ **Missing** — No implementation found in current codebase
|
||||
- 📋 **Planned** — Documented in specs but not yet coded
|
||||
|
||||
---
|
||||
|
||||
## User Story Rollup
|
||||
|
||||
### US-01: FastAPI Neural Core (Unified Backend)
|
||||
| Item | Status | Evidence |
|
||||
|------|--------|----------|
|
||||
| FastAPI app with auth middleware | ✅ Done | `backend/auth/` — `get_current_user`, `UserPrincipal` |
|
||||
| PostgreSQL connection pooling | ✅ Done | All routes use `request.app.state.db_pool` |
|
||||
| WebSocket support | 🔶 Partial | `useVelocitySocket` hook exists in frontend; backend WS layer not confirmed in current scan |
|
||||
| Auth (login/logout/session) | ✅ Done | `getVelocityMe`, `clearVelocityToken`, token validation in `App.tsx` |
|
||||
| Role-based access (admin/superadmin) | ✅ Done | `routes_admin_surface.py` enforces `ADMIN_ROLES`; `isAdminRole()` guard in frontend |
|
||||
|
||||
**Verdict**: ✅ **Done** — Core backend is production-ready.
|
||||
|
||||
---
|
||||
|
||||
### US-02: CRM — Canonical Layer
|
||||
| Item | Status | Evidence | Notes |
|
||||
|------|--------|----------|-------|
|
||||
| `POST/GET /crm/imports` (CSV upload + lifecycle) | ✅ Done | [`routes_crm_imports.py`](backend/api/routes_crm_imports.py:102) — 799 lines | Full import pipeline: upload → parse → infer mapping → proposals → review → commit |
|
||||
| `POST/GET /crm/contacts` | ✅ Done | [`routes_crm_imports.py`](backend/api/routes_crm_imports.py:429) | CRUD for `crm_people` |
|
||||
| `GET /crm/client-360/{id}` | ✅ Done | [`routes_crm_imports.py`](backend/api/routes_crm_imports.py:527) | Joins across 8 canonical tables via [`aggregation_service.py`](backend/services/client_graph/aggregation_service.py:102) |
|
||||
| `GET /crm/opportunities` | ✅ Done | [`routes_crm_imports.py`](backend/api/routes_crm_imports.py:544) | Full pipeline list with stage/probability/value |
|
||||
| `GET/POST /crm/tasks` | ✅ Done | [`routes_crm_imports.py`](backend/api/routes_crm_imports.py:603) | Reminder/inbox system |
|
||||
| `GET /crm/kanban` | ✅ Done | [`routes_crm_imports.py`](backend/api/routes_crm_imports.py:697) | Kanban board from canonical data |
|
||||
| `GET /crm/qd/{id}` (Quantum Dynamics scores) | ✅ Done | [`routes_crm_imports.py`](backend/api/routes_crm_imports.py:752) | QD score summary + timeseries |
|
||||
| CSV import column mapping heuristics | ✅ Done | [`ingest_service.py`](backend/services/imports/ingest_service.py:30) — 40+ canonical mappings | Confidence scoring, review_required flags |
|
||||
| CRM Frontend — Contacts view | ✅ Done | [`CRM.tsx`](app/src/components/modules/CRM.tsx:89) — ContactListView with search/filter/pagination |
|
||||
| CRM Frontend — Kanban view | ✅ Done | [`CRM.tsx`](app/src/components/modules/CRM.tsx:282) — PipelineView with drag-ready columns |
|
||||
| CRM Frontend — Opportunities view | ✅ Done | [`CRM.tsX`](app/src/components/modules/CRM.tsx:363) — Deal pipeline table |
|
||||
| CRM Frontend — Tasks view | ✅ Done | [`CRM.tsx`](app/src/components/modules/CRM.tsx:448) — Priority-ordered task list |
|
||||
| CRM Frontend — Import view | ✅ Done | [`CRM.tsx`](app/src/components/modules/CRM.tsx:518) — File picker with live upload |
|
||||
| CRM Frontend — Client 360 panel | ✅ Done | [`CRM.tsx`](app/src/components/modules/CRM.tsx:550) — Slide-over dossier with QD bars, risk flags, recommended actions |
|
||||
| Canonical schema (`schema_crm_canonical.sql`) | ✅ Done | 709 lines — 25+ tables across CRM Core, Intel Graph, Inventory Domain, Workflow Governance |
|
||||
|
||||
**Verdict**: ✅ **Done** — CRM is the most complete module. Both backend and frontend are fully implemented with canonical data model.
|
||||
|
||||
---
|
||||
|
||||
### US-03: CRM — Legacy Layer (routes_crm.py)
|
||||
| Item | Status | Evidence | Notes |
|
||||
|------|--------|----------|-------|
|
||||
| `GET/POST /leads` | ✅ Done | [`routes_crm.py`](backend/api/routes_crm.py:227) — 631 lines | Legacy leads table (separate from canonical) |
|
||||
| `PUT/DELETE /leads/{id}` | ✅ Done | Same file | Full CRUD |
|
||||
| `POST /leads/seed-synthetic` | ✅ Done | Generates 100 synthetic leads with chat logs |
|
||||
| `GET /chat-logs` | ✅ Done | Chat log endpoints functional |
|
||||
| `GET /kanban/board` | ✅ Done | Legacy kanban board |
|
||||
| `GET /leads/demographics` | ✅ Done | Demographics analytics |
|
||||
| WebSocket CRM events | 🔶 Partial | `_broadcast_crm_event()` helper exists (line 60) but WS server not confirmed |
|
||||
|
||||
**Verdict**: 🔶 **Partial** — Fully coded but legacy. Should be deprecated in favor of canonical layer. Two parallel CRM surfaces exist (`routes_crm.py` vs `routes_crm_imports.py`).
|
||||
|
||||
---
|
||||
|
||||
### US-04: Oracle Canvas System
|
||||
| Item | Status | Evidence | Notes |
|
||||
|------|--------|----------|-------|
|
||||
| Oracle canvas API (`routes_oracle.py`) | ✅ Done | 107 lines — health, MCP tools, workflow preview, actions/writeback | Mounted router with `persona_service`, `mcp_registry`, `nemoclaw_runtime` |
|
||||
| Oracle template catalog (`routes_oracle_templates.py`) | ✅ Done | 405 lines — chapters, subchapters, component templates, seed examples, synthetic jobs | Full taxonomy CRUD |
|
||||
| Oracle frontend page | ✅ Done | [`app/oracle/page.tsx`](app/oracle/page.tsx) — Full canvas viewport |
|
||||
| Oracle components (BranchBar, CanvasViewport, ComponentRegistry, PromptRail) | ✅ Done | 10+ React components in `oracle/components/` |
|
||||
| Oracle renderers (9 types) | ✅ Done | ActivityStream, BarChart, ErrorNotice, GeoMap, KpiTile, LineChart, PipelineBoard, Table, TextCanvas, Timeline |
|
||||
| Oracle hooks (`useOracleExecution`, `useOraclePage`) | ✅ Done | Execution and page state management |
|
||||
| Oracle canvas TypeScript types | ✅ Done | `oracle/types/canvas.ts` — Full type definitions |
|
||||
| Oracle collaboration service | 🔶 Partial | Test file exists (`test_collaboration_service.py`) but production code not confirmed |
|
||||
| Oracle policy service | 🔶 Partial | Test file exists (`test_policy_service.py`) but production code not confirmed |
|
||||
|
||||
**Verdict**: 🔶 **Partial** — Core canvas API and template system are done. Collaboration and policy services need confirmation of production readiness.
|
||||
|
||||
---
|
||||
|
||||
### US-05: The Catalyst (Marketing Automation)
|
||||
| Item | Status | Evidence | Notes |
|
||||
|------|--------|----------|-------|
|
||||
| Meta Marketing API integration | ✅ Done | [`routes_catalyst.py`](backend/api/routes_catalyst.py:134) — 513 lines | Campaigns, creative sync, insights, budget/bid, lookalike audiences |
|
||||
| `POST /auth/meta` (OAuth token exchange) | ✅ Done | Meta OAuth flow endpoint |
|
||||
| Google Ads platform support | 🔶 Partial | Platform mappers exist but Google is simulated (not live) |
|
||||
| Campaign Command frontend | ✅ Done | [`Catalyst.tsx`](app/src/components/modules/Catalyst.tsx:537) — KPI cards, spend chart, campaign list |
|
||||
| The Studio (ComfyUI workflow input) | ✅ Done | Ground Truth picker, reference slots, image/video toggle |
|
||||
| Intelligence & ROI tab | ✅ Done | CPA trend chart, ad-set performance bars |
|
||||
| War Room (Meta Graph settings) | ✅ Done | API credential forms, business asset links, required scopes |
|
||||
| Marketing tab | ✅ Done | [`CatalystMarketingTab.tsx`](app/src/components/modules/CatalystMarketingTab.tsx) |
|
||||
| Live Optimization Feed | ✅ Done | Real-time event stream with 6 event types |
|
||||
| Meta SDK integration | ✅ Done | `facebook_business` SDK for live API calls |
|
||||
|
||||
**Verdict**: 🔶 **Partial** — Meta integration is fully functional. Google Ads is simulated. Production Meta credentials required for full operation.
|
||||
|
||||
---
|
||||
|
||||
### US-06: Inventory Pipeline
|
||||
| Item | Status | Evidence | Notes |
|
||||
|------|--------|----------|-------|
|
||||
| Import batches API | ✅ Done | [`routes_inventory.py`](backend/api/routes_inventory.py:95) — 400 lines | CRUD for `inventory_import_batches` |
|
||||
| Properties CRUD | ✅ done | Same file — create, list, get, patch, delete |
|
||||
| Media assets management | ✅ Done | Attach/list/delete media to properties |
|
||||
| Inventory frontend | ✅ Done | [`Inventory.tsx`](app/src/components/modules/Inventory.tsx:829) — Grid/list views, 3D viewer, blueprint studio |
|
||||
| 3D model viewer (React Three Fiber) | ✅ Done | GLTF loading, orbit controls, auto-fit |
|
||||
| Blueprint Studio (zoom/pan) | ✅ Done | Wheel zoom, drag pan, fit-to-height |
|
||||
| Unit detail modal | ✅ Done | Full property details with payment plans |
|
||||
| Google Maps embed | ✅ Done | Right-pane map integration |
|
||||
|
||||
**Verdict**: ✅ **Done** — Inventory is fully implemented with rich frontend.
|
||||
|
||||
---
|
||||
|
||||
### US-07: Mobile Edge API
|
||||
| Item | Status | Evidence | Notes |
|
||||
|------|--------|----------|-------|
|
||||
| Communication events CRUD | ✅ Done | [`routes_mobile_edge.py`](backend/api/routes_mobile_edge.py:133) — 659 lines | All channels (PSTN, WhatsApp, email, FB, IG, VoIP) |
|
||||
| Memory facts (edge_communication_memory_facts) | ✅ Done | List endpoint at line 211 |
|
||||
| Operator-assisted import | ✅ Done | Creates events + triggers transcription jobs |
|
||||
| Quick notes | ✅ Done | Direct fact insertion |
|
||||
| Calendar CRUD | ✅ Done | Full calendar event management |
|
||||
| Transcript retrieval | ✅ Done | Joins `edge_transcription_jobs` → `edge_transcript_segments` |
|
||||
| Insights (recommendations) | ✅ Done | List + act/dismiss endpoints |
|
||||
| Alerts (combined view) | ✅ Done | Aggregates pending insights, upcoming events, pending transcriptions |
|
||||
| Session heartbeat | ✅ Done | Surface session tracking with screen sequence |
|
||||
| iOS Oracle view | ✅ Done | Pipeline, timeline, calendar canvases |
|
||||
| iOS Sentinel view | ✅ Done | Posture cards (pending insights, transcript queue, upcoming 24h) |
|
||||
| iOS Edge apps (iPhone + iPad) | ✅ Done | `velocity-iphone/` — Alerts, Communications, LeadSummary, Notes, Transcriptions |
|
||||
|
||||
**Verdict**: ✅ **Done** — Mobile edge API is comprehensive. Both backend and iOS clients are functional.
|
||||
|
||||
---
|
||||
|
||||
### US-08: Runtime LLM Service
|
||||
| Item | Status | Evidence | Notes |
|
||||
|------|--------|----------|-------|
|
||||
| Provider listing | ✅ Done | [`routes_runtime_llm.py`](backend/api/routes_runtime_llm.py:53) — `GET /providers` |
|
||||
| Chat completion | ✅ Done | `POST /chat` with provider/model routing |
|
||||
| Batch job submission | ✅ Done | `POST /batch` with persisted job tracking |
|
||||
| Job status/results | ✅ Done | `GET /jobs/{id}` and `GET /jobs/{id}/results` |
|
||||
| `runtime_llm_service.py` | ✅ Done | Service layer with provider abstraction |
|
||||
|
||||
**Verdict**: ✅ **Done** — Runtime LLM surface is complete.
|
||||
|
||||
---
|
||||
|
||||
### US-09: Admin Control Plane
|
||||
| Item | Status | Evidence | Notes |
|
||||
|------|--------|----------|-------|
|
||||
| System health overview | ✅ Done | [`routes_admin_surface.py`](backend/api/routes_admin_surface.py:86) — DB latency, queue depths, session counts |
|
||||
| Queue visibility | ✅ Done | Transcription, synthetic, inventory, admin action queues |
|
||||
| Install/surface overview | ✅ Done | Surface type + app version breakdown |
|
||||
| Admin actions (audit trail) | ✅ Done | 13 action types with idempotency keys |
|
||||
| Audit log | ✅ Done | `oracle_audit_events` query surface |
|
||||
| Template admin (publish/archive) | ✅ Done | Full template lifecycle management |
|
||||
| Synthetic job admin | ✅ Done | List + cancel synthetic generation jobs |
|
||||
| Admin frontend page | ✅ Done | [`app/admin/page.tsx`](app/admin/page.tsx) |
|
||||
|
||||
**Verdict**: ✅ **Done** — Admin control plane is fully implemented.
|
||||
|
||||
---
|
||||
|
||||
### US-10: Dream Weaver (ComfyUI Engine)
|
||||
| Item | Status | Evidence | Notes |
|
||||
|------|--------|----------|-------|
|
||||
| ComfyUI workflows | ✅ Done | 8 workflow JSON files in `comfy_engine/workflows/` |
|
||||
| Test inputs (20+ images) | ✅ Done | Diverse test set across room types |
|
||||
| Dream Weaver spec | ✅ Done | `docs/DREAMWEAVER_TECHNICAL_SPEC.md` |
|
||||
| `routes_weaver.py` | ❌ Missing | File exists but is **empty** (0 bytes) |
|
||||
| Weaver gateway (`dw_gateway_v2_min.py`) | 🔶 Partial | Root-level file exists — needs review for integration status |
|
||||
|
||||
**Verdict**: 🔶 **Partial** — ComfyUI engine has workflows and test data. Routes file is empty; gateway file needs integration review.
|
||||
|
||||
---
|
||||
|
||||
### US-11: Sentinel (Biometric Intelligence)
|
||||
| Item | Status | Evidence | Notes |
|
||||
|------|--------|----------|-------|
|
||||
| Sentinel overview frontend | ✅ Done | [`Sentinel.tsx`](app/src/modules/Sentinel.tsx:321) — Visitor counts, sentiment, dwell time, alerts |
|
||||
| Journey River component | ✅ Done | `components/sentinel/JourneyRiver/` — Path, inspector panel |
|
||||
| Live Session component | ✅ Done | `SentinelLiveSession.tsx` |
|
||||
| Perception player | ✅ Done | `PerceptionPlayer.tsx` |
|
||||
| iOS Sentinel view | 🔶 Partial | Shows posture cards from mobile-edge backend; explicitly notes "No mock feed" — real Sentinel stream route needed |
|
||||
| MediaPipe hooks | 🔶 Partial | `useMediapipeFaceLandmarker` hook exists in frontend |
|
||||
| QD scoring (nemoclaw) | ✅ Done | `nemoclaw_runtime.py` + test file exist |
|
||||
| Auto-mode matcher | ✅ Done | `auto_mode_matcher.py` service |
|
||||
| Sentinel backend routes | ❌ Missing | No dedicated Sentinel API routes found in `backend/api/` |
|
||||
|
||||
**Verdict**: 🔶 **Partial** — Frontend is rich and functional. iOS shows real data from mobile-edge. Backend biometric stream route is missing.
|
||||
|
||||
---
|
||||
|
||||
### US-12: iOS Time & Light Engine
|
||||
| Item | Status | Evidence | Notes |
|
||||
|------|--------|----------|-------|
|
||||
| AR Sun Overlay | 🔶 Partial | `ARSunOverlayView.swift` exists in both iPad and iPhone apps |
|
||||
| Sunseeker ViewModel | ✅ Done | `SunseekerViewModel.swift` — Solar position calculations |
|
||||
| Simulator Sun overlay | ✅ Done | `SimulatorSunOverlayView.swift` fallback |
|
||||
| Inventory AR features | 🔶 Partial | Connected to Inventory module but needs real-time sun data pipeline |
|
||||
|
||||
**Verdict**: 🔶 **Partial** — Core components exist. Real-time sun data integration needed.
|
||||
|
||||
---
|
||||
|
||||
### US-13: Infrastructure & Deployment
|
||||
| Item | Status | Evidence | Notes |
|
||||
|------|--------|----------|-------|
|
||||
| AWS ingress (t4g.micro) | 🔶 Partial | `infrastructure/aws_scale/` directory exists |
|
||||
| GPU workers (g6.12xlarge) | 🔶 Partial | Referenced in docs but IaC not confirmed |
|
||||
| Caddy reverse proxy | 🔶 Partial | `infrastructure/blackbox_local/` — needs review |
|
||||
| Rathole tunnels | 🔶 Partial | `infrastructure/desineuron_ingress/` — needs review |
|
||||
| Ops control plane | 🔶 Partial | `infrastructure/ops_control_plane/` — needs review |
|
||||
| NVMe-first deployment | 🔶 Partial | `monitor_nvme.py` exists at root |
|
||||
| Deploy scripts | 🔶 Partial | `patch_nemoclaw_service_20260401.sh`, `.oracle_deploy_stage.tar` |
|
||||
|
||||
**Verdict**: 🔶 **Partial** — Infrastructure artifacts exist but need consolidation and review.
|
||||
|
||||
---
|
||||
|
||||
### US-14: Synthetic Data & Testing
|
||||
| Item | Status | Evidence | Notes |
|
||||
|------|--------|----------|-------|
|
||||
| Synthetic CRM v1 dataset | ✅ Done | `db assets/synthetic_crm_v1/` — 360 snapshots, mapping manifest, relationships, transcripts |
|
||||
| Test suite (10 files) | ✅ Done | `backend/tests/` — catalyst, crm, websocket, nemoclaw, oracle, vault tests |
|
||||
| Oracle sub-tests | ✅ Done | canvas_service, collaboration_service, persona_service, policy_service, prompt_orchestrator |
|
||||
|
||||
**Verdict**: ✅ **Done** — Testing and synthetic data are comprehensive.
|
||||
|
||||
---
|
||||
|
||||
## Cross-Reference: Old Fact Table vs Current Codebase
|
||||
|
||||
| Claim in Old Fact Table (2026-04-12) | Current Reality | Delta |
|
||||
|---------------------------------------|-----------------|-------|
|
||||
| `backend/api/routes_crm.py` = 0 bytes | **631 lines** — full CRUD + seed + demographics + kanban | ✅ Now Done |
|
||||
| `/api/leads` = Missing | **Fully implemented** in both legacy and canonical layers | ✅ Now Done |
|
||||
| `/api/chat-logs` = Missing | **Fully implemented** with synthetic data generation | ✅ Now Done |
|
||||
| Kanban board = Missing | **Implemented in both** `routes_crm.py` (legacy) and `routes_crm_imports.py` (canonical) | ✅ Now Done |
|
||||
| `backend/api/routes_oracle.py` = 0 bytes | **107 lines** — health, MCP, workflow preview, actions | ✅ Now Done |
|
||||
| Oracle canvas = Missing | **Fully implemented** with 10+ frontend components + template system | ✅ Now Done |
|
||||
| CRM imports = Missing | **799-line canonical import pipeline** with CSV parsing, mapping, proposals | ✅ Now Done |
|
||||
| Inventory API = Partial | **400-line full CRUD** with media assets | ✅ Now Done |
|
||||
| Mobile edge = Partial | **659-line comprehensive API** with events, calendar, transcripts, insights | ✅ Now Done |
|
||||
|
||||
---
|
||||
|
||||
## What's Left (Sprint 2+ Priorities)
|
||||
|
||||
### BLOCKERS (Must complete before production)
|
||||
1. **Sentinel biometric stream route** — No dedicated backend endpoint for live CCTV/face detection pipeline
|
||||
2. **Dream Weaver routes** — `routes_weaver.py` is empty; ComfyUI gateway needs integration
|
||||
3. **WebSocket server confirmation** — WS layer exists in hooks but backend WS server not confirmed
|
||||
|
||||
### HIGH PRIORITY
|
||||
4. **Google Ads platform** — Currently simulated; needs live Google Ads API integration
|
||||
5. **Oracle collaboration service** — Test exists, production code unconfirmed
|
||||
6. **Oracle policy service** — Test exists, production code unconfirmed
|
||||
7. **Infrastructure consolidation** — 4 infrastructure directories need review and unified deployment
|
||||
|
||||
### MEDIUM PRIORITY
|
||||
8. **Legacy CRM deprecation** — Two parallel CRM surfaces (`routes_crm.py` vs `routes_crm_imports.py`) create maintenance burden
|
||||
9. **iOS AR sun data pipeline** — Real-time solar position integration needed
|
||||
10. **CI/CD pipeline** — No build/deploy automation found
|
||||
|
||||
### LOW PRIORITY (Nice to have)
|
||||
11. **Multi-tenant isolation** — Current code uses `user.role` as tenant_id; needs proper tenant separation
|
||||
12. **Rate limiting** — No rate limiting middleware found
|
||||
13. **API documentation** — No OpenAPI/Swagger docs generated
|
||||
|
||||
---
|
||||
|
||||
## Module Health Matrix
|
||||
|
||||
| Module | Backend | Frontend | iOS | Tests | Overall |
|
||||
|--------|---------|----------|-----|-------|---------|
|
||||
| CRM (Canonical) | ✅ Done | ✅ Done | 🔶 Partial | ✅ Done | ✅ **Done** |
|
||||
| CRM (Legacy) | ✅ Done | N/A | N/A | ✅ Done | 🔶 **Partial** |
|
||||
| Oracle Canvas | ✅ Done | ✅ Done | ✅ Done | ✅ Done | ✅ **Done** |
|
||||
| Catalyst | ✅ Done | ✅ Done | N/A | ✅ Done | 🔶 **Partial** |
|
||||
| Inventory | ✅ Done | ✅ Done | N/A | N/A | ✅ **Done** |
|
||||
| Mobile Edge | ✅ Done | N/A | ✅ Done | ✅ Done | ✅ **Done** |
|
||||
| Runtime LLM | ✅ Done | N/A | N/A | ✅ Done | ✅ **Done** |
|
||||
| Admin Control | ✅ Done | ✅ Done | N/A | ✅ Done | ✅ **Done** |
|
||||
| Dream Weaver | ❌ Missing | N/A | N/A | N/A | 🔶 **Partial** |
|
||||
| Sentinel | ❌ Missing | ✅ Done | 🔶 Partial | ✅ Done | 🔶 **Partial** |
|
||||
| Time & Light | N/A | N/A | 🔶 Partial | N/A | 🔶 **Partial** |
|
||||
| Infrastructure | 🔶 Partial | N/A | N/A | N/A | 🔶 **Partial** |
|
||||
|
||||
---
|
||||
|
||||
## Code Quality Notes
|
||||
|
||||
### [BLOCKER]
|
||||
- **Dual CRM surfaces**: Both `routes_crm.py` (legacy) and `routes_crm_imports.py` (canonical) handle leads. Plan deprecation of legacy layer.
|
||||
|
||||
### [SUGGESTION]
|
||||
- **SQL injection risk in dynamic WHERE clauses**: [`routes_inventory.py`](backend/api/routes_inventory.py:209-231) and [`routes_mobile_edge.py`](backend/api/routes_mobile_edge.py:334-356) build WHERE clauses with f-strings. Parameterized values are safe, but column names are interpolated — ensure no user input reaches these.
|
||||
- **Hardcoded tenant ID**: [`routes_oracle_templates.py`](backend/api/routes_oracle_templates.py:36) uses `os.getenv("ORACLE_DEFAULT_TENANT_ID", "tenant_velocity")` — consider making this a request-scoped value.
|
||||
|
||||
### [NIT]
|
||||
- **Import organization**: Several files use inline `import json` inside functions rather than at module level.
|
||||
- **Magic numbers**: Threshold values (e.g., `30 minutes` in session heartbeat) should be constants.
|
||||
|
||||
---
|
||||
|
||||
*Fact table generated by Chanakya (Review Mode) on 2026-04-21 after full codebase audit.*
|
||||
BIN
android-edge-phone/.gradle/8.9/checksums/checksums.lock
Normal file
BIN
android-edge-phone/.gradle/8.9/checksums/checksums.lock
Normal file
Binary file not shown.
BIN
android-edge-phone/.gradle/8.9/checksums/md5-checksums.bin
Normal file
BIN
android-edge-phone/.gradle/8.9/checksums/md5-checksums.bin
Normal file
Binary file not shown.
BIN
android-edge-phone/.gradle/8.9/checksums/sha1-checksums.bin
Normal file
BIN
android-edge-phone/.gradle/8.9/checksums/sha1-checksums.bin
Normal file
Binary file not shown.
BIN
android-edge-phone/.gradle/8.9/fileChanges/last-build.bin
Normal file
BIN
android-edge-phone/.gradle/8.9/fileChanges/last-build.bin
Normal file
Binary file not shown.
BIN
android-edge-phone/.gradle/8.9/fileHashes/fileHashes.lock
Normal file
BIN
android-edge-phone/.gradle/8.9/fileHashes/fileHashes.lock
Normal file
Binary file not shown.
0
android-edge-phone/.gradle/8.9/gc.properties
Normal file
0
android-edge-phone/.gradle/8.9/gc.properties
Normal file
Binary file not shown.
@@ -0,0 +1,2 @@
|
||||
#Tue Apr 21 00:04:24 IST 2026
|
||||
gradle.version=8.9
|
||||
0
android-edge-phone/.gradle/vcs-1/gc.properties
Normal file
0
android-edge-phone/.gradle/vcs-1/gc.properties
Normal file
BIN
android-tablet/.gradle/8.9/checksums/checksums.lock
Normal file
BIN
android-tablet/.gradle/8.9/checksums/checksums.lock
Normal file
Binary file not shown.
BIN
android-tablet/.gradle/8.9/fileChanges/last-build.bin
Normal file
BIN
android-tablet/.gradle/8.9/fileChanges/last-build.bin
Normal file
Binary file not shown.
BIN
android-tablet/.gradle/8.9/fileHashes/fileHashes.lock
Normal file
BIN
android-tablet/.gradle/8.9/fileHashes/fileHashes.lock
Normal file
Binary file not shown.
0
android-tablet/.gradle/8.9/gc.properties
Normal file
0
android-tablet/.gradle/8.9/gc.properties
Normal file
Binary file not shown.
@@ -0,0 +1,2 @@
|
||||
#Tue Apr 21 00:05:34 IST 2026
|
||||
gradle.version=8.9
|
||||
0
android-tablet/.gradle/vcs-1/gc.properties
Normal file
0
android-tablet/.gradle/vcs-1/gc.properties
Normal file
2
app/dist/index.html
vendored
2
app/dist/index.html
vendored
@@ -4,7 +4,7 @@
|
||||
<meta charset="UTF-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Velocity WebOS</title>
|
||||
<script type="module" crossorigin src="./assets/index-C2Cn6fx_.js"></script>
|
||||
<script type="module" crossorigin src="./assets/index-BbE_azx6.js"></script>
|
||||
<link rel="stylesheet" crossorigin href="./assets/index-CILgAuxv.css">
|
||||
</head>
|
||||
<body>
|
||||
|
||||
8
app/node_modules/.vite/deps/@radix-ui_react-avatar.js
generated
vendored
8
app/node_modules/.vite/deps/@radix-ui_react-avatar.js
generated
vendored
@@ -1,18 +1,18 @@
|
||||
"use client";
|
||||
import {
|
||||
createSlot
|
||||
} from "./chunk-5HUACAZ7.js";
|
||||
import {
|
||||
useCallbackRef,
|
||||
useLayoutEffect2
|
||||
} from "./chunk-GRXJTWBV.js";
|
||||
import "./chunk-HPBHRBIF.js";
|
||||
import {
|
||||
require_react_dom
|
||||
} from "./chunk-YLZ34CCM.js";
|
||||
import {
|
||||
require_shim
|
||||
} from "./chunk-642Z5WD3.js";
|
||||
import {
|
||||
createSlot
|
||||
} from "./chunk-5HUACAZ7.js";
|
||||
import "./chunk-HPBHRBIF.js";
|
||||
import {
|
||||
require_jsx_runtime
|
||||
} from "./chunk-USXRE7Q2.js";
|
||||
|
||||
6
app/node_modules/.vite/deps/@radix-ui_react-dropdown-menu.js
generated
vendored
6
app/node_modules/.vite/deps/@radix-ui_react-dropdown-menu.js
generated
vendored
@@ -3,13 +3,13 @@ import {
|
||||
useCallbackRef,
|
||||
useLayoutEffect2
|
||||
} from "./chunk-GRXJTWBV.js";
|
||||
import {
|
||||
require_react_dom
|
||||
} from "./chunk-YLZ34CCM.js";
|
||||
import {
|
||||
composeRefs,
|
||||
useComposedRefs
|
||||
} from "./chunk-HPBHRBIF.js";
|
||||
import {
|
||||
require_react_dom
|
||||
} from "./chunk-YLZ34CCM.js";
|
||||
import {
|
||||
require_jsx_runtime
|
||||
} from "./chunk-USXRE7Q2.js";
|
||||
|
||||
6
app/node_modules/.vite/deps/@react-three_drei.js
generated
vendored
6
app/node_modules/.vite/deps/@react-three_drei.js
generated
vendored
@@ -1,9 +1,9 @@
|
||||
import {
|
||||
subscribeWithSelector
|
||||
} from "./chunk-XGWIEMTH.js";
|
||||
import {
|
||||
create
|
||||
} from "./chunk-QJTQF54Q.js";
|
||||
import {
|
||||
subscribeWithSelector
|
||||
} from "./chunk-XGWIEMTH.js";
|
||||
import {
|
||||
Events
|
||||
} from "./chunk-OAEA5FZL.js";
|
||||
|
||||
64
app/node_modules/.vite/deps/_metadata.json
generated
vendored
64
app/node_modules/.vite/deps/_metadata.json
generated
vendored
@@ -7,127 +7,127 @@
|
||||
"react": {
|
||||
"src": "../../react/index.js",
|
||||
"file": "react.js",
|
||||
"fileHash": "44c1ad00",
|
||||
"fileHash": "c178e920",
|
||||
"needsInterop": true
|
||||
},
|
||||
"react-dom": {
|
||||
"src": "../../react-dom/index.js",
|
||||
"file": "react-dom.js",
|
||||
"fileHash": "09fbf9a4",
|
||||
"fileHash": "071b9320",
|
||||
"needsInterop": true
|
||||
},
|
||||
"react/jsx-dev-runtime": {
|
||||
"src": "../../react/jsx-dev-runtime.js",
|
||||
"file": "react_jsx-dev-runtime.js",
|
||||
"fileHash": "ce2da90b",
|
||||
"fileHash": "72ddf78c",
|
||||
"needsInterop": true
|
||||
},
|
||||
"react/jsx-runtime": {
|
||||
"src": "../../react/jsx-runtime.js",
|
||||
"file": "react_jsx-runtime.js",
|
||||
"fileHash": "52be981b",
|
||||
"fileHash": "14b8d385",
|
||||
"needsInterop": true
|
||||
},
|
||||
"@radix-ui/react-avatar": {
|
||||
"src": "../../@radix-ui/react-avatar/dist/index.mjs",
|
||||
"file": "@radix-ui_react-avatar.js",
|
||||
"fileHash": "63b564be",
|
||||
"fileHash": "590b7679",
|
||||
"needsInterop": false
|
||||
},
|
||||
"@radix-ui/react-dropdown-menu": {
|
||||
"src": "../../@radix-ui/react-dropdown-menu/dist/index.mjs",
|
||||
"file": "@radix-ui_react-dropdown-menu.js",
|
||||
"fileHash": "b9686e90",
|
||||
"fileHash": "087b631e",
|
||||
"needsInterop": false
|
||||
},
|
||||
"@radix-ui/react-slot": {
|
||||
"src": "../../@radix-ui/react-slot/dist/index.mjs",
|
||||
"file": "@radix-ui_react-slot.js",
|
||||
"fileHash": "417c3a07",
|
||||
"fileHash": "4e55412b",
|
||||
"needsInterop": false
|
||||
},
|
||||
"@react-three/drei": {
|
||||
"src": "../../@react-three/drei/index.js",
|
||||
"file": "@react-three_drei.js",
|
||||
"fileHash": "b25127e3",
|
||||
"fileHash": "ba800aca",
|
||||
"needsInterop": false
|
||||
},
|
||||
"@react-three/fiber": {
|
||||
"src": "../../@react-three/fiber/dist/react-three-fiber.esm.js",
|
||||
"file": "@react-three_fiber.js",
|
||||
"fileHash": "22a2309e",
|
||||
"fileHash": "12f23541",
|
||||
"needsInterop": false
|
||||
},
|
||||
"class-variance-authority": {
|
||||
"src": "../../class-variance-authority/dist/index.mjs",
|
||||
"file": "class-variance-authority.js",
|
||||
"fileHash": "6e6c6fd0",
|
||||
"fileHash": "0153428f",
|
||||
"needsInterop": false
|
||||
},
|
||||
"clsx": {
|
||||
"src": "../../clsx/dist/clsx.mjs",
|
||||
"file": "clsx.js",
|
||||
"fileHash": "eb68424d",
|
||||
"fileHash": "99f068f1",
|
||||
"needsInterop": false
|
||||
},
|
||||
"framer-motion": {
|
||||
"src": "../../framer-motion/dist/es/index.mjs",
|
||||
"file": "framer-motion.js",
|
||||
"fileHash": "1cbcab3b",
|
||||
"fileHash": "c1fc1ac2",
|
||||
"needsInterop": false
|
||||
},
|
||||
"lucide-react": {
|
||||
"src": "../../lucide-react/dist/esm/lucide-react.js",
|
||||
"file": "lucide-react.js",
|
||||
"fileHash": "6dded310",
|
||||
"fileHash": "4418176c",
|
||||
"needsInterop": false
|
||||
},
|
||||
"react-dom/client": {
|
||||
"src": "../../react-dom/client.js",
|
||||
"file": "react-dom_client.js",
|
||||
"fileHash": "c3a7edc3",
|
||||
"fileHash": "8029f031",
|
||||
"needsInterop": true
|
||||
},
|
||||
"react-router-dom": {
|
||||
"src": "../../react-router-dom/dist/index.mjs",
|
||||
"file": "react-router-dom.js",
|
||||
"fileHash": "e91f778e",
|
||||
"fileHash": "c673e5a0",
|
||||
"needsInterop": false
|
||||
},
|
||||
"recharts": {
|
||||
"src": "../../recharts/es6/index.js",
|
||||
"file": "recharts.js",
|
||||
"fileHash": "d7f9dad1",
|
||||
"fileHash": "41235262",
|
||||
"needsInterop": false
|
||||
},
|
||||
"sonner": {
|
||||
"src": "../../sonner/dist/index.mjs",
|
||||
"file": "sonner.js",
|
||||
"fileHash": "8433c1a9",
|
||||
"fileHash": "c99e6320",
|
||||
"needsInterop": false
|
||||
},
|
||||
"tailwind-merge": {
|
||||
"src": "../../tailwind-merge/dist/bundle-mjs.mjs",
|
||||
"file": "tailwind-merge.js",
|
||||
"fileHash": "772f1bbd",
|
||||
"fileHash": "017ed736",
|
||||
"needsInterop": false
|
||||
},
|
||||
"three": {
|
||||
"src": "../../three/build/three.module.js",
|
||||
"file": "three.js",
|
||||
"fileHash": "490e5c00",
|
||||
"fileHash": "8d6b5e64",
|
||||
"needsInterop": false
|
||||
},
|
||||
"zustand": {
|
||||
"src": "../../zustand/esm/index.mjs",
|
||||
"file": "zustand.js",
|
||||
"fileHash": "315f8e85",
|
||||
"fileHash": "bcef7203",
|
||||
"needsInterop": false
|
||||
},
|
||||
"zustand/middleware": {
|
||||
"src": "../../zustand/esm/middleware.mjs",
|
||||
"file": "zustand_middleware.js",
|
||||
"fileHash": "2563a89b",
|
||||
"fileHash": "1afe1817",
|
||||
"needsInterop": false
|
||||
}
|
||||
},
|
||||
@@ -135,12 +135,12 @@
|
||||
"hls-Q6LDPZPT": {
|
||||
"file": "hls-Q6LDPZPT.js"
|
||||
},
|
||||
"chunk-XGWIEMTH": {
|
||||
"file": "chunk-XGWIEMTH.js"
|
||||
},
|
||||
"chunk-QJTQF54Q": {
|
||||
"file": "chunk-QJTQF54Q.js"
|
||||
},
|
||||
"chunk-XGWIEMTH": {
|
||||
"file": "chunk-XGWIEMTH.js"
|
||||
},
|
||||
"chunk-OAEA5FZL": {
|
||||
"file": "chunk-OAEA5FZL.js"
|
||||
},
|
||||
@@ -150,15 +150,12 @@
|
||||
"chunk-H4GSM2WL": {
|
||||
"file": "chunk-H4GSM2WL.js"
|
||||
},
|
||||
"chunk-5HUACAZ7": {
|
||||
"file": "chunk-5HUACAZ7.js"
|
||||
"chunk-U7P2NEEE": {
|
||||
"file": "chunk-U7P2NEEE.js"
|
||||
},
|
||||
"chunk-GRXJTWBV": {
|
||||
"file": "chunk-GRXJTWBV.js"
|
||||
},
|
||||
"chunk-HPBHRBIF": {
|
||||
"file": "chunk-HPBHRBIF.js"
|
||||
},
|
||||
"chunk-YLZ34CCM": {
|
||||
"file": "chunk-YLZ34CCM.js"
|
||||
},
|
||||
@@ -177,15 +174,18 @@
|
||||
"chunk-642Z5WD3": {
|
||||
"file": "chunk-642Z5WD3.js"
|
||||
},
|
||||
"chunk-5HUACAZ7": {
|
||||
"file": "chunk-5HUACAZ7.js"
|
||||
},
|
||||
"chunk-HPBHRBIF": {
|
||||
"file": "chunk-HPBHRBIF.js"
|
||||
},
|
||||
"chunk-USXRE7Q2": {
|
||||
"file": "chunk-USXRE7Q2.js"
|
||||
},
|
||||
"chunk-ZNKPWGXJ": {
|
||||
"file": "chunk-ZNKPWGXJ.js"
|
||||
},
|
||||
"chunk-U7P2NEEE": {
|
||||
"file": "chunk-U7P2NEEE.js"
|
||||
},
|
||||
"chunk-G3PMV62Z": {
|
||||
"file": "chunk-G3PMV62Z.js"
|
||||
}
|
||||
|
||||
6
app/node_modules/.vite/deps/recharts.js
generated
vendored
6
app/node_modules/.vite/deps/recharts.js
generated
vendored
@@ -1,15 +1,15 @@
|
||||
import {
|
||||
_extends
|
||||
} from "./chunk-H4GSM2WL.js";
|
||||
import {
|
||||
clsx_default
|
||||
} from "./chunk-U7P2NEEE.js";
|
||||
import {
|
||||
require_react_dom
|
||||
} from "./chunk-YLZ34CCM.js";
|
||||
import {
|
||||
require_react
|
||||
} from "./chunk-ZNKPWGXJ.js";
|
||||
import {
|
||||
clsx_default
|
||||
} from "./chunk-U7P2NEEE.js";
|
||||
import {
|
||||
__commonJS,
|
||||
__export,
|
||||
|
||||
@@ -454,6 +454,7 @@ export default function OraclePage() {
|
||||
page={page}
|
||||
isOpen={shareOpen}
|
||||
onClose={() => setShareOpen(false)}
|
||||
currentUserId={me?.userId ?? null}
|
||||
onShare={handleShare}
|
||||
/>
|
||||
|
||||
|
||||
@@ -39,7 +39,35 @@ function groupBySection(components: CanvasComponent[]): Array<{ sectionId: strin
|
||||
sectionMap.get(sid)!.push(comp);
|
||||
}
|
||||
|
||||
return Array.from(sectionMap.entries()).map(([sectionId, comps]) => ({ sectionId, components: comps }));
|
||||
return Array.from(sectionMap.entries())
|
||||
.map(([sectionId, comps]) => ({ sectionId, components: comps }))
|
||||
.sort((a, b) => {
|
||||
const aPrompt = a.sectionId.startsWith('sec_prompt_generated');
|
||||
const bPrompt = b.sectionId.startsWith('sec_prompt_generated');
|
||||
if (aPrompt && bPrompt) {
|
||||
const aCreated = Math.max(...a.components.map((comp) => Date.parse(comp.provenance.createdAt || '1970-01-01T00:00:00Z')));
|
||||
const bCreated = Math.max(...b.components.map((comp) => Date.parse(comp.provenance.createdAt || '1970-01-01T00:00:00Z')));
|
||||
return bCreated - aCreated;
|
||||
}
|
||||
if (aPrompt !== bPrompt) return aPrompt ? -1 : 1;
|
||||
return Math.min(...a.components.map((comp) => comp.layout.orderIndex)) - Math.min(...b.components.map((comp) => comp.layout.orderIndex));
|
||||
});
|
||||
}
|
||||
|
||||
function getSectionLabel(sectionId: string, sectionComps: CanvasComponent[]): string {
|
||||
if (SECTION_LABELS[sectionId]) return SECTION_LABELS[sectionId];
|
||||
if (sectionId.startsWith('sec_prompt_generated')) {
|
||||
const planning = sectionComps.find((comp) => comp.type === 'textCanvas');
|
||||
const content = planning?.visualizationParameters?.content;
|
||||
if (typeof content === 'string') {
|
||||
const firstLine = content.split('\n')[0]?.trim();
|
||||
if (firstLine?.startsWith('Oracle received:')) {
|
||||
return firstLine.replace('Oracle received:', '').trim();
|
||||
}
|
||||
}
|
||||
return 'Oracle Response';
|
||||
}
|
||||
return sectionId.replace(/^sec_/, '').replace(/_/g, ' ');
|
||||
}
|
||||
|
||||
/** CSS content-visibility wrapper for off-screen components, applying width mode to the flex item */
|
||||
@@ -93,7 +121,7 @@ export function CanvasViewport({
|
||||
<div className="flex items-center gap-3">
|
||||
<div className="w-1 h-4 rounded-full bg-gradient-to-b from-blue-400 to-cyan-500" />
|
||||
<h2 className="text-xs font-semibold uppercase tracking-widest text-zinc-500">
|
||||
{SECTION_LABELS[sectionId] ?? sectionId.replace(/^sec_/, '').replace(/_/g, ' ')}
|
||||
{getSectionLabel(sectionId, sectionComps)}
|
||||
</h2>
|
||||
<div className="flex-1 h-[1px]" style={{ background: 'rgba(255,255,255,0.05)' }} />
|
||||
<span className="text-[10px] text-zinc-700">{sectionComps.length}</span>
|
||||
|
||||
@@ -10,6 +10,7 @@ interface ShareModalProps {
|
||||
page: CanvasPage | null;
|
||||
isOpen: boolean;
|
||||
onClose: () => void;
|
||||
currentUserId?: string | null;
|
||||
onShare: (params: {
|
||||
recipientUserId: string;
|
||||
visibility: 'private' | 'team';
|
||||
@@ -40,7 +41,7 @@ function getInitials(member: VelocityActiveUser): string {
|
||||
.join('') || 'U';
|
||||
}
|
||||
|
||||
export function ShareModal({ page, isOpen, onClose, onShare }: ShareModalProps) {
|
||||
export function ShareModal({ page, isOpen, onClose, currentUserId, onShare }: ShareModalProps) {
|
||||
const [mounted, setMounted] = useState(false);
|
||||
const [teamMembers, setTeamMembers] = useState<VelocityActiveUser[]>([]);
|
||||
const [loadingMembers, setLoadingMembers] = useState(false);
|
||||
@@ -50,6 +51,7 @@ export function ShareModal({ page, isOpen, onClose, onShare }: ShareModalProps)
|
||||
const [message, setMessage] = useState('');
|
||||
const [submitting, setSubmitting] = useState(false);
|
||||
const [success, setSuccess] = useState(false);
|
||||
const [submitError, setSubmitError] = useState<string | null>(null);
|
||||
const [memberDropOpen, setMemberDropOpen] = useState(false);
|
||||
|
||||
useEffect(() => setMounted(true), []);
|
||||
@@ -57,6 +59,7 @@ export function ShareModal({ page, isOpen, onClose, onShare }: ShareModalProps)
|
||||
useEffect(() => {
|
||||
if (!isOpen) {
|
||||
setMemberDropOpen(false);
|
||||
setSubmitError(null);
|
||||
return;
|
||||
}
|
||||
|
||||
@@ -83,6 +86,17 @@ export function ShareModal({ page, isOpen, onClose, onShare }: ShareModalProps)
|
||||
};
|
||||
}, [isOpen]);
|
||||
|
||||
const availableMembers = useMemo(
|
||||
() => teamMembers.filter((member) => member.user_id !== currentUserId),
|
||||
[teamMembers, currentUserId],
|
||||
);
|
||||
|
||||
useEffect(() => {
|
||||
if (recipient && recipient.user_id === currentUserId) {
|
||||
setRecipient(null);
|
||||
}
|
||||
}, [recipient, currentUserId]);
|
||||
|
||||
const selectedRecipientLabel = useMemo(
|
||||
() => (recipient ? getDisplayName(recipient) : 'Select verified teammate...'),
|
||||
[recipient],
|
||||
@@ -91,6 +105,7 @@ export function ShareModal({ page, isOpen, onClose, onShare }: ShareModalProps)
|
||||
const handleShare = async () => {
|
||||
if (!recipient || !page) return;
|
||||
setSubmitting(true);
|
||||
setSubmitError(null);
|
||||
try {
|
||||
await onShare({
|
||||
recipientUserId: recipient.user_id,
|
||||
@@ -105,8 +120,8 @@ export function ShareModal({ page, isOpen, onClose, onShare }: ShareModalProps)
|
||||
setRecipient(null);
|
||||
setMessage('');
|
||||
}, 1800);
|
||||
} catch {
|
||||
// keep modal open and let caller surface the error upstream
|
||||
} catch (error) {
|
||||
setSubmitError(error instanceof Error ? error.message : 'Share failed.');
|
||||
} finally {
|
||||
setSubmitting(false);
|
||||
}
|
||||
@@ -180,6 +195,17 @@ export function ShareModal({ page, isOpen, onClose, onShare }: ShareModalProps)
|
||||
</div>
|
||||
) : (
|
||||
<div className="space-y-4">
|
||||
{submitError && (
|
||||
<div
|
||||
className="rounded-xl px-3 py-2 text-xs text-red-300"
|
||||
style={{
|
||||
background: 'rgba(239,68,68,0.08)',
|
||||
border: '1px solid rgba(239,68,68,0.2)',
|
||||
}}
|
||||
>
|
||||
{submitError}
|
||||
</div>
|
||||
)}
|
||||
<div>
|
||||
<label className="text-xs font-medium text-zinc-400 mb-1.5 block">Recipient</label>
|
||||
<div className="relative">
|
||||
@@ -217,10 +243,10 @@ export function ShareModal({ page, isOpen, onClose, onShare }: ShareModalProps)
|
||||
{!loadingMembers && membersError && (
|
||||
<div className="px-3 py-3 text-xs text-red-400">{membersError}</div>
|
||||
)}
|
||||
{!loadingMembers && !membersError && teamMembers.length === 0 && (
|
||||
{!loadingMembers && !membersError && availableMembers.length === 0 && (
|
||||
<div className="px-3 py-3 text-xs text-zinc-500">No verified users available.</div>
|
||||
)}
|
||||
{!loadingMembers && !membersError && teamMembers.map((member) => (
|
||||
{!loadingMembers && !membersError && availableMembers.map((member) => (
|
||||
<button
|
||||
key={member.user_id}
|
||||
className="w-full flex items-center gap-3 px-3 py-2.5 hover:bg-white/5 transition-colors text-left"
|
||||
|
||||
@@ -32,6 +32,20 @@ SUPABASE_SERVICE_ROLE_KEY=PLACEHOLDER_your_supabase_service_role_key
|
||||
# Base URL of ComfyUI server running locally or on GPU node
|
||||
COMFY_BASE_URL=http://localhost:8188
|
||||
|
||||
# —— Shared Desineuron coding / Oracle / NemoClaw runtime —————————————————————
|
||||
# Stable OpenAI-compatible SGLang route rendered through ingress.
|
||||
LLM_BASE_URL=https://llm.desineuron.in
|
||||
SGLANG_BASE_URL=https://llm.desineuron.in
|
||||
SGLANG_CHAT_URL=https://llm.desineuron.in/v1/chat/completions
|
||||
SGLANG_MODELS_URL=https://llm.desineuron.in/v1/models
|
||||
SGLANG_MODEL=qwen3.6:35b-a3b
|
||||
SGLANG_API_TOKEN=
|
||||
|
||||
# NemoClaw follows the same routed SGLang runtime.
|
||||
NEMOCLAW_BASE_URL=https://llm.desineuron.in
|
||||
NEMOCLAW_MODEL=qwen3.6:35b-a3b
|
||||
NEMOCLAW_API_TOKEN=
|
||||
|
||||
# ── Backend ───────────────────────────────────────────────────────────────────
|
||||
# CORS origins — comma-separated list of allowed frontend origins
|
||||
CORS_ORIGINS=http://localhost:5173,http://localhost:3000
|
||||
|
||||
@@ -70,6 +70,31 @@ def _json_object(value: Any) -> dict[str, Any]:
|
||||
return {}
|
||||
|
||||
|
||||
def _json_array(value: Any) -> list[Any]:
|
||||
if isinstance(value, list):
|
||||
return value
|
||||
if isinstance(value, str) and value.strip():
|
||||
try:
|
||||
parsed = json.loads(value)
|
||||
if isinstance(parsed, list):
|
||||
return parsed
|
||||
except Exception:
|
||||
logger.warning("canvas_service: failed to parse JSON array field; using empty array")
|
||||
return []
|
||||
|
||||
|
||||
def _json_safe(value: Any) -> Any:
|
||||
if isinstance(value, datetime):
|
||||
return value.isoformat()
|
||||
if isinstance(value, dict):
|
||||
return {str(key): _json_safe(val) for key, val in value.items()}
|
||||
if isinstance(value, list):
|
||||
return [_json_safe(item) for item in value]
|
||||
if isinstance(value, tuple):
|
||||
return [_json_safe(item) for item in value]
|
||||
return value
|
||||
|
||||
|
||||
def _normalize_component(component: dict[str, Any]) -> dict[str, Any]:
|
||||
normalized = deepcopy(component)
|
||||
normalized["componentId"] = _stringify(normalized.get("componentId"))
|
||||
@@ -224,9 +249,15 @@ class CanvasService:
|
||||
async def get_first_page_for_owner(self, *, tenant_id: str, owner_id: str) -> dict[str, Any] | None:
|
||||
_ensure_ready()
|
||||
if _is_demo():
|
||||
for page in _DEMO_PAGES.values():
|
||||
if page["tenantId"] == tenant_id and page["ownerId"] == owner_id:
|
||||
return {**page, "components": deepcopy(_DEMO_COMPONENTS.get(page["pageId"], []))}
|
||||
candidates = [
|
||||
page
|
||||
for page in _DEMO_PAGES.values()
|
||||
if page["tenantId"] == tenant_id and page["ownerId"] == owner_id
|
||||
]
|
||||
if candidates:
|
||||
candidates.sort(key=lambda page: page.get("updatedAt", ""), reverse=True)
|
||||
page = candidates[0]
|
||||
return {**page, "components": deepcopy(_DEMO_COMPONENTS.get(page["pageId"], []))}
|
||||
return None
|
||||
|
||||
assert asyncpg is not None
|
||||
@@ -237,7 +268,7 @@ class CanvasService:
|
||||
SELECT *
|
||||
FROM oracle_canvas_pages
|
||||
WHERE tenant_id = $1 AND owner_id = $2
|
||||
ORDER BY created_at ASC
|
||||
ORDER BY updated_at DESC, created_at DESC
|
||||
LIMIT 1
|
||||
""",
|
||||
tenant_id,
|
||||
@@ -310,7 +341,7 @@ class CanvasService:
|
||||
"actorId": actor_id,
|
||||
"executionId": execution_id,
|
||||
"mergeRequestId": merge_request_id,
|
||||
"componentsSnapshot": json.dumps(components),
|
||||
"componentsSnapshot": json.dumps(_json_safe(components)),
|
||||
"idempotencyKey": idempotency_key,
|
||||
"createdAt": _now(),
|
||||
}
|
||||
@@ -346,7 +377,7 @@ class CanvasService:
|
||||
"actorId": existing["actor_id"],
|
||||
"executionId": _stringify(existing["execution_id"]) if existing["execution_id"] else None,
|
||||
"mergeRequestId": _stringify(existing["merge_request_id"]) if existing["merge_request_id"] else None,
|
||||
"componentsSnapshot": json.dumps(existing["components_snapshot"]),
|
||||
"componentsSnapshot": json.dumps(_json_safe(existing["components_snapshot"])),
|
||||
"idempotencyKey": existing["idempotency_key"],
|
||||
"createdAt": existing["created_at"].isoformat(),
|
||||
}
|
||||
@@ -385,7 +416,7 @@ class CanvasService:
|
||||
actor_id,
|
||||
execution_id or "",
|
||||
merge_request_id or "",
|
||||
json.dumps(normalized_components),
|
||||
json.dumps(_json_safe(normalized_components)),
|
||||
idempotency_key,
|
||||
)
|
||||
|
||||
@@ -411,7 +442,7 @@ class CanvasService:
|
||||
"actorId": revision["actor_id"],
|
||||
"executionId": _stringify(revision["execution_id"]) if revision["execution_id"] else None,
|
||||
"mergeRequestId": _stringify(revision["merge_request_id"]) if revision["merge_request_id"] else None,
|
||||
"componentsSnapshot": json.dumps(revision["components_snapshot"]),
|
||||
"componentsSnapshot": json.dumps(_json_safe(revision["components_snapshot"])),
|
||||
"idempotencyKey": revision["idempotency_key"],
|
||||
"createdAt": revision["created_at"].isoformat(),
|
||||
}
|
||||
@@ -462,13 +493,14 @@ class CanvasService:
|
||||
)
|
||||
if not revision:
|
||||
raise ValueError(f"Revision {target_revision} not found for page {page_id}")
|
||||
snapshot = _json_array(revision["components_snapshot"])
|
||||
return await self.commit_revision(
|
||||
page_id=page_id,
|
||||
tenant_id=tenant_id,
|
||||
actor_id=actor_id,
|
||||
commit_kind="rollback",
|
||||
commit_summary=f"Rollback to revision {target_revision}",
|
||||
components=list(revision["components_snapshot"]),
|
||||
components=snapshot,
|
||||
idempotency_key=idempotency_key,
|
||||
)
|
||||
finally:
|
||||
@@ -604,15 +636,15 @@ class CanvasService:
|
||||
component.get("description"),
|
||||
int(component.get("version", 1)),
|
||||
component.get("lifecycleState", "active"),
|
||||
json.dumps(component.get("dataSourceDescriptor", {})),
|
||||
json.dumps(component.get("visualizationParameters", {})),
|
||||
json.dumps(component.get("dataBindings", {})),
|
||||
json.dumps(component.get("provenance", {})),
|
||||
json.dumps(component.get("renderingHints", {})),
|
||||
json.dumps(component.get("layout", {})),
|
||||
json.dumps(component.get("accessControls", {})),
|
||||
json.dumps(component.get("styleSignature", {})),
|
||||
json.dumps(component.get("validationState", {})),
|
||||
json.dumps(_json_safe(component.get("dataSourceDescriptor", {}))),
|
||||
json.dumps(_json_safe(component.get("visualizationParameters", {}))),
|
||||
json.dumps(_json_safe(component.get("dataBindings", {}))),
|
||||
json.dumps(_json_safe(component.get("provenance", {}))),
|
||||
json.dumps(_json_safe(component.get("renderingHints", {}))),
|
||||
json.dumps(_json_safe(component.get("layout", {}))),
|
||||
json.dumps(_json_safe(component.get("accessControls", {}))),
|
||||
json.dumps(_json_safe(component.get("styleSignature", {}))),
|
||||
json.dumps(_json_safe(component.get("validationState", {}))),
|
||||
list(component.get("auditLog", [])),
|
||||
)
|
||||
|
||||
|
||||
@@ -261,13 +261,17 @@ class OracleCodebookService:
|
||||
if not prompt_terms:
|
||||
prompt_terms = set(_tokenize(prompt.replace("_", " ")))
|
||||
|
||||
lowered_prompt = prompt.lower()
|
||||
crm_prompt = any(term in lowered_prompt for term in ("client", "clients", "contact", "contacts", "crm", "lead", "account"))
|
||||
interaction_prompt = any(term in lowered_prompt for term in ("interaction", "timeline", "call", "message", "email", "whatsapp", "follow-up"))
|
||||
property_prompt = any(term in lowered_prompt for term in ("property", "properties", "project", "projects", "interest", "interested"))
|
||||
|
||||
scored: list[tuple[int, CodebookExample]] = []
|
||||
for example in self.load()["examples"]:
|
||||
score = 0
|
||||
term_set = set(example.score_terms)
|
||||
overlap = prompt_terms.intersection(term_set)
|
||||
score += len(overlap) * 6
|
||||
lowered_prompt = prompt.lower()
|
||||
if example.template_name.lower() in lowered_prompt:
|
||||
score += 24
|
||||
if example.subchapter_name.lower() in lowered_prompt:
|
||||
@@ -280,6 +284,15 @@ class OracleCodebookService:
|
||||
score += 8
|
||||
if "live_data_first" in example.policy_tags:
|
||||
score += 4
|
||||
chapter = example.chapter_name.lower()
|
||||
subchapter = example.subchapter_name.lower()
|
||||
title = example.title.lower()
|
||||
if crm_prompt and any(term in " ".join((chapter, subchapter, title, example.template_name.lower())) for term in ("lead", "client", "contact", "crm", "account", "pipeline")):
|
||||
score += 18
|
||||
if interaction_prompt and any(term in " ".join((chapter, subchapter, title, example.template_name.lower())) for term in ("interaction", "timeline", "call", "message", "email", "whatsapp", "follow-up")):
|
||||
score += 16
|
||||
if property_prompt and any(term in " ".join((chapter, subchapter, title, example.template_name.lower())) for term in ("property", "inventory", "interest", "project")):
|
||||
score += 16
|
||||
if score > 0:
|
||||
scored.append((score, example))
|
||||
|
||||
|
||||
@@ -11,6 +11,8 @@ import uuid
|
||||
from datetime import datetime, timezone
|
||||
from typing import Any
|
||||
|
||||
from .canvas_service import canvas_service
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# ── In-memory store (demo mode) ───────────────────────────────────────────────
|
||||
@@ -23,6 +25,32 @@ def _now() -> str:
|
||||
return datetime.now(timezone.utc).isoformat()
|
||||
|
||||
|
||||
def _clone_components_for_fork(
|
||||
components: list[dict[str, Any]],
|
||||
*,
|
||||
actor_id: str,
|
||||
source_page_id: str,
|
||||
source_branch_id: str,
|
||||
source_revision: int,
|
||||
) -> list[dict[str, Any]]:
|
||||
cloned: list[dict[str, Any]] = []
|
||||
for component in components:
|
||||
forked = copy.deepcopy(component)
|
||||
original_component_id = str(forked.get("componentId") or "")
|
||||
forked["componentId"] = str(uuid.uuid4())
|
||||
provenance = dict(forked.get("provenance") or {})
|
||||
provenance["forkedAt"] = _now()
|
||||
provenance["forkedBy"] = actor_id
|
||||
provenance["sourcePageId"] = source_page_id
|
||||
provenance["sourceBranchId"] = source_branch_id
|
||||
provenance["sourceRevision"] = source_revision
|
||||
if original_component_id:
|
||||
provenance["sourceComponentId"] = original_component_id
|
||||
forked["provenance"] = provenance
|
||||
cloned.append(forked)
|
||||
return cloned
|
||||
|
||||
|
||||
# ── Three-way diff engine ─────────────────────────────────────────────────────
|
||||
|
||||
def _three_way_diff(
|
||||
@@ -228,17 +256,50 @@ class CollaborationService:
|
||||
Creates a fork from the source_page snapshot at its current headRevision.
|
||||
Returns ForkRecord.
|
||||
"""
|
||||
if recipient_user_id == created_by:
|
||||
raise ValueError("You cannot share a canvas with your own account.")
|
||||
|
||||
fork_id = str(uuid.uuid4())
|
||||
fork_page_id = str(uuid.uuid4())
|
||||
fork_branch_id = str(uuid.uuid4())
|
||||
fork_page = await canvas_service.create_page(
|
||||
tenant_id=source_page["tenantId"],
|
||||
owner_id=recipient_user_id,
|
||||
title=f"{source_page['title']} Fork",
|
||||
page_type="fork",
|
||||
branch_name=f"fork-{str(fork_id)[:8]}",
|
||||
sharing_policy={
|
||||
"shareMode": "direct_fork_only",
|
||||
"allowReshare": visibility == "team",
|
||||
"defaultForkVisibility": visibility,
|
||||
},
|
||||
)
|
||||
|
||||
fork_components = _clone_components_for_fork(
|
||||
source_page.get("components", []),
|
||||
actor_id=created_by,
|
||||
source_page_id=source_page["pageId"],
|
||||
source_branch_id=source_page["branchId"],
|
||||
source_revision=source_page["headRevision"],
|
||||
)
|
||||
|
||||
await canvas_service.commit_revision(
|
||||
page_id=fork_page["pageId"],
|
||||
tenant_id=source_page["tenantId"],
|
||||
actor_id=created_by,
|
||||
commit_kind="merge",
|
||||
commit_summary=f"Forked from {source_page['title']} at rev.{source_page['headRevision']}",
|
||||
components=fork_components,
|
||||
execution_id=None,
|
||||
merge_request_id=None,
|
||||
idempotency_key=f"fork_{fork_id}",
|
||||
)
|
||||
|
||||
fork = {
|
||||
"forkId": fork_id,
|
||||
"sourcePageId": source_page["pageId"],
|
||||
"sourceBranchId": source_page["branchId"],
|
||||
"sourceRevision": source_page["headRevision"],
|
||||
"forkPageId": fork_page_id,
|
||||
"forkBranchId": fork_branch_id,
|
||||
"forkPageId": fork_page["pageId"],
|
||||
"forkBranchId": fork_page["branchId"],
|
||||
"recipientUserId": recipient_user_id,
|
||||
"createdBy": created_by,
|
||||
"visibility": visibility,
|
||||
|
||||
@@ -159,14 +159,20 @@ class DataAccessGateway:
|
||||
if dataset == "broker_performance":
|
||||
sql = """
|
||||
SELECT
|
||||
ROW_NUMBER() OVER (ORDER BY COALESCE(revenue_generated, 0) DESC, broker_name ASC)::int AS rank,
|
||||
broker_name AS name,
|
||||
deals_closed::int AS deals_closed,
|
||||
COALESCE(revenue_generated, 0)::float AS revenue_generated,
|
||||
avatar_url AS avatar
|
||||
FROM broker_performance
|
||||
WHERE tenant_id = $1
|
||||
ORDER BY revenue_generated DESC, broker_name ASC
|
||||
ROW_NUMBER() OVER (
|
||||
ORDER BY COUNT(DISTINCT l.person_id) DESC, COALESCE(u.full_name, u.email, u.id::text) ASC
|
||||
)::int AS rank,
|
||||
COALESCE(u.full_name, u.email, u.id::text) AS name,
|
||||
COUNT(DISTINCT l.person_id)::int AS deals_closed,
|
||||
COALESCE(SUM(o.value), 0)::float AS revenue_generated,
|
||||
u.avatar_url AS avatar
|
||||
FROM users_and_roles u
|
||||
LEFT JOIN crm_leads l ON l.assigned_user_id = u.id
|
||||
LEFT JOIN crm_opportunities o ON o.lead_id = l.lead_id
|
||||
WHERE u.is_active = TRUE
|
||||
GROUP BY u.id, u.full_name, u.email, u.avatar_url
|
||||
HAVING COUNT(DISTINCT l.person_id) > 0 OR COALESCE(SUM(o.value), 0) > 0
|
||||
ORDER BY revenue_generated DESC, name ASC
|
||||
LIMIT $2
|
||||
"""
|
||||
return sql, [ctx.tenant_id, row_limit]
|
||||
@@ -245,13 +251,20 @@ class DataAccessGateway:
|
||||
COALESCE(p.primary_phone, '') AS phone,
|
||||
COALESCE(p.city, '') AS city,
|
||||
COALESCE(p.buyer_type, 'unclassified') AS buyer_type,
|
||||
COALESCE(q.qd_score, 0)::float AS qd_score
|
||||
COALESCE(q.current_value, 0)::float AS qd_score
|
||||
FROM crm_people p
|
||||
LEFT JOIN LATERAL (
|
||||
SELECT qd_score
|
||||
SELECT current_value
|
||||
FROM intel_qd_scores q
|
||||
WHERE q.person_id = p.person_id
|
||||
ORDER BY q.scored_at DESC
|
||||
ORDER BY
|
||||
CASE
|
||||
WHEN q.score_type = 'engagement_score' THEN 0
|
||||
WHEN q.score_type = 'intent_score' THEN 1
|
||||
WHEN q.score_type = 'urgency_score' THEN 2
|
||||
ELSE 3
|
||||
END,
|
||||
q.computed_at DESC
|
||||
LIMIT 1
|
||||
) q ON TRUE
|
||||
ORDER BY qd_score DESC, p.full_name ASC
|
||||
@@ -301,6 +314,71 @@ class DataAccessGateway:
|
||||
"""
|
||||
return sql, [row_limit]
|
||||
|
||||
if dataset == "crm_last_interacted_clients":
|
||||
sql = """
|
||||
SELECT
|
||||
p.person_id::text AS id,
|
||||
p.full_name AS name,
|
||||
COALESCE(p.primary_email, '') AS email,
|
||||
COALESCE(p.primary_phone, '') AS phone,
|
||||
COALESCE(MAX(i.happened_at), p.updated_at, p.created_at) AS last_interaction_at,
|
||||
COUNT(i.interaction_id)::int AS interaction_count,
|
||||
COALESCE(q.current_value, 0)::float AS qd_score
|
||||
FROM crm_people p
|
||||
LEFT JOIN intel_interactions i ON i.person_id = p.person_id
|
||||
LEFT JOIN LATERAL (
|
||||
SELECT current_value
|
||||
FROM intel_qd_scores q
|
||||
WHERE q.person_id = p.person_id
|
||||
ORDER BY
|
||||
CASE
|
||||
WHEN q.score_type = 'engagement_score' THEN 0
|
||||
WHEN q.score_type = 'intent_score' THEN 1
|
||||
WHEN q.score_type = 'urgency_score' THEN 2
|
||||
ELSE 3
|
||||
END,
|
||||
q.computed_at DESC
|
||||
LIMIT 1
|
||||
) q ON TRUE
|
||||
GROUP BY p.person_id, p.full_name, p.primary_email, p.primary_phone, p.updated_at, p.created_at, q.current_value
|
||||
ORDER BY last_interaction_at DESC NULLS LAST, interaction_count DESC, p.full_name ASC
|
||||
LIMIT $1
|
||||
"""
|
||||
return sql, [row_limit]
|
||||
|
||||
if dataset == "crm_top_interested_clients":
|
||||
sql = """
|
||||
SELECT
|
||||
p.person_id::text AS id,
|
||||
p.full_name AS name,
|
||||
COALESCE(p.primary_email, '') AS email,
|
||||
COALESCE(p.primary_phone, '') AS phone,
|
||||
COUNT(pi.interest_id)::int AS interest_count,
|
||||
STRING_AGG(DISTINCT pi.project_name, ', ' ORDER BY pi.project_name) AS projects,
|
||||
COALESCE(MAX(pi.created_at), p.updated_at, p.created_at) AS last_interest_at,
|
||||
COALESCE(q.current_value, 0)::float AS qd_score
|
||||
FROM crm_people p
|
||||
INNER JOIN crm_property_interests pi ON pi.person_id = p.person_id
|
||||
LEFT JOIN LATERAL (
|
||||
SELECT current_value
|
||||
FROM intel_qd_scores q
|
||||
WHERE q.person_id = p.person_id
|
||||
ORDER BY
|
||||
CASE
|
||||
WHEN q.score_type = 'engagement_score' THEN 0
|
||||
WHEN q.score_type = 'intent_score' THEN 1
|
||||
WHEN q.score_type = 'urgency_score' THEN 2
|
||||
ELSE 3
|
||||
END,
|
||||
q.computed_at DESC
|
||||
LIMIT 1
|
||||
) q ON TRUE
|
||||
GROUP BY p.person_id, p.full_name, p.primary_email, p.primary_phone, p.updated_at, p.created_at, q.current_value
|
||||
ORDER BY interest_count DESC, qd_score DESC, last_interest_at DESC NULLS LAST, p.full_name ASC
|
||||
LIMIT $1
|
||||
"""
|
||||
return sql, [row_limit]
|
||||
|
||||
if dataset == "crm_interaction_timeline":
|
||||
sql = """
|
||||
SELECT
|
||||
|
||||
@@ -56,6 +56,18 @@ def _coerce_datetime(value: datetime | str | None) -> datetime | None:
|
||||
|
||||
# ── Execution store ───────────────────────────────────────────────────────────
|
||||
|
||||
def _json_safe(value: Any) -> Any:
|
||||
if isinstance(value, datetime):
|
||||
return value.isoformat()
|
||||
if isinstance(value, dict):
|
||||
return {str(key): _json_safe(val) for key, val in value.items()}
|
||||
if isinstance(value, list):
|
||||
return [_json_safe(item) for item in value]
|
||||
if isinstance(value, tuple):
|
||||
return [_json_safe(item) for item in value]
|
||||
return value
|
||||
|
||||
|
||||
_DEMO_EXECUTIONS: dict[str, dict[str, Any]] = {}
|
||||
|
||||
|
||||
@@ -117,13 +129,13 @@ def _build_demo_retrieval_plan(
|
||||
|
||||
|
||||
_DATASET_MAP: dict[str, str] = {
|
||||
"pipeline_board": "deals",
|
||||
"bar_chart": "lead_daily_snapshot",
|
||||
"pipeline_board": "crm_opportunity_pipeline",
|
||||
"bar_chart": "crm_property_interest_rollup",
|
||||
"geo_map": "lead_geo_interest_rollup",
|
||||
"table": "broker_performance",
|
||||
"line_chart": "inventory_absorption",
|
||||
"table": "crm_contacts_overview",
|
||||
"line_chart": "crm_property_interest_rollup",
|
||||
"kpi_tile": "oracle_aggregated_metric",
|
||||
"activity_stream": "lead_activity_log",
|
||||
"activity_stream": "crm_interaction_timeline",
|
||||
}
|
||||
|
||||
_CODEBOOK_COMPONENT_MAP: dict[str, str] = {
|
||||
@@ -162,6 +174,10 @@ def _dataset_for_codebook(example: CodebookExample, prompt: str, component_plan_
|
||||
return "crm_interaction_timeline"
|
||||
if component_plan_type == "pipeline_board":
|
||||
return "crm_opportunity_pipeline"
|
||||
if component_plan_type == "table" and any(term in lowered_prompt for term in ("last interacted", "last interaction", "recently contacted", "recent interaction")):
|
||||
return "crm_last_interacted_clients"
|
||||
if component_plan_type == "table" and any(term in lowered_prompt for term in ("interest", "interested", "project", "property", "properties")) and any(term in lowered_prompt for term in ("client", "clients", "contact", "contacts")):
|
||||
return "crm_top_interested_clients"
|
||||
if component_plan_type == "line_chart" and any(term in lowered_prompt for term in ("trend", "time", "history", "growth")):
|
||||
return "crm_property_interest_rollup"
|
||||
|
||||
@@ -170,8 +186,12 @@ def _dataset_for_codebook(example: CodebookExample, prompt: str, component_plan_
|
||||
return "crm_interaction_timeline"
|
||||
if "pipeline" in lowered_prompt or "opportunit" in lowered_prompt:
|
||||
return "crm_opportunity_pipeline"
|
||||
if ("interest" in lowered_prompt or "project" in lowered_prompt or "property" in lowered_prompt) and ("client" in lowered_prompt or "contact" in lowered_prompt):
|
||||
return "crm_top_interested_clients"
|
||||
if "interest" in lowered_prompt or "project" in lowered_prompt or "property" in lowered_prompt:
|
||||
return "crm_property_interest_rollup"
|
||||
if "last interacted" in lowered_prompt or "recently contacted" in lowered_prompt or "recent interaction" in lowered_prompt:
|
||||
return "crm_last_interacted_clients"
|
||||
return "crm_contacts_overview"
|
||||
|
||||
if "client" in chapter or "client" in subchapter or "contact" in subchapter:
|
||||
@@ -205,6 +225,7 @@ def _build_codebook_retrieval_plan(
|
||||
exemplar = matches[0]
|
||||
for component_plan_type in desired_types[:4]:
|
||||
dataset = _dataset_for_codebook(exemplar, prompt, component_plan_type)
|
||||
title_hint = _title_for_dataset(dataset, component_plan_type, prompt) or title_hints.get(component_plan_type, exemplar.title)
|
||||
components.append(
|
||||
{
|
||||
"suggestedType": component_plan_type,
|
||||
@@ -222,7 +243,7 @@ def _build_codebook_retrieval_plan(
|
||||
"subchapterName": exemplar.subchapter_name,
|
||||
"sourcePack": exemplar.source_pack,
|
||||
},
|
||||
"titleHint": title_hints.get(component_plan_type, exemplar.title),
|
||||
"titleHint": title_hint,
|
||||
}
|
||||
)
|
||||
|
||||
@@ -235,6 +256,24 @@ def _build_codebook_retrieval_plan(
|
||||
}
|
||||
|
||||
|
||||
def _title_for_dataset(dataset: str, component_plan_type: str, prompt: str) -> str | None:
|
||||
lowered_prompt = prompt.lower()
|
||||
dataset_titles = {
|
||||
"crm_contacts_overview": "CRM Contacts Overview",
|
||||
"crm_opportunity_pipeline": "Opportunity Pipeline",
|
||||
"crm_property_interest_rollup": "Property Interest Rollup",
|
||||
"crm_interaction_timeline": "Client Interaction Timeline",
|
||||
"crm_last_interacted_clients": "Last Interacted Clients",
|
||||
"crm_top_interested_clients": "Top Interested Clients",
|
||||
"broker_performance": "Broker Performance",
|
||||
}
|
||||
if dataset == "crm_top_interested_clients" and "top" in lowered_prompt:
|
||||
return "Top Interested Clients"
|
||||
if dataset == "crm_last_interacted_clients" and ("top" in lowered_prompt or "last" in lowered_prompt):
|
||||
return "Last Interacted Clients"
|
||||
return dataset_titles.get(dataset)
|
||||
|
||||
|
||||
_RUNTIME_ALLOWED_DATASETS = {
|
||||
"deals",
|
||||
"lead_daily_snapshot",
|
||||
@@ -247,6 +286,8 @@ _RUNTIME_ALLOWED_DATASETS = {
|
||||
"crm_opportunity_pipeline",
|
||||
"crm_property_interest_rollup",
|
||||
"crm_interaction_timeline",
|
||||
"crm_last_interacted_clients",
|
||||
"crm_top_interested_clients",
|
||||
}
|
||||
|
||||
|
||||
@@ -371,6 +412,11 @@ class PromptOrchestrator:
|
||||
execution["status"] = "executing"
|
||||
await self._persist_execution(execution)
|
||||
|
||||
page = await canvas_service.get_page(page_id, tenant_id)
|
||||
existing_comps = page.get("components", []) if page else []
|
||||
next_order_base = self._next_order_base(existing_comps)
|
||||
section_id = f"sec_prompt_generated_{execution_id.replace('-', '')[:12]}"
|
||||
|
||||
# ── Step 3: Build visualization plan (component descriptors) ──────────
|
||||
viz_plan = await self._build_visualization_plan(
|
||||
retrieval_plan=retrieval_plan,
|
||||
@@ -382,6 +428,8 @@ class PromptOrchestrator:
|
||||
placement_mode=placement_mode,
|
||||
ctx=ctx,
|
||||
persona_plan=persona_plan,
|
||||
base_order=next_order_base,
|
||||
section_id=section_id,
|
||||
)
|
||||
execution["visualizationPlan"] = viz_plan
|
||||
|
||||
@@ -391,9 +439,7 @@ class PromptOrchestrator:
|
||||
|
||||
# Commit a revision bump with the new components
|
||||
try:
|
||||
page = await canvas_service.get_page(page_id, tenant_id)
|
||||
if page:
|
||||
existing_comps = page.get("components", [])
|
||||
new_comps = existing_comps + viz_plan.get("components", [])
|
||||
revision = await canvas_service.commit_revision(
|
||||
page_id=page_id,
|
||||
@@ -429,6 +475,8 @@ class PromptOrchestrator:
|
||||
placement_mode: str,
|
||||
ctx: PolicyContext,
|
||||
persona_plan: dict[str, Any],
|
||||
base_order: int,
|
||||
section_id: str,
|
||||
) -> dict[str, Any]:
|
||||
"""Converts a retrieval plan into a list of CanvasComponent descriptors."""
|
||||
components = [
|
||||
@@ -438,9 +486,10 @@ class PromptOrchestrator:
|
||||
branch_id=branch_id,
|
||||
prompt=prompt,
|
||||
persona_plan=persona_plan,
|
||||
order_index=base_order + 100,
|
||||
section_id=section_id,
|
||||
)
|
||||
]
|
||||
base_order = 900 # Append after existing components
|
||||
|
||||
component_plans = retrieval_plan.get("components", [])
|
||||
for i, plan in enumerate(component_plans):
|
||||
@@ -469,7 +518,7 @@ class PromptOrchestrator:
|
||||
"privacyTier": plan.get("privacyTier", "standard"),
|
||||
"cachePolicy": {"mode": "ttl", "ttlSeconds": 120},
|
||||
},
|
||||
"visualizationParameters": self._default_viz_params(ctype, data_rows),
|
||||
"visualizationParameters": self._default_viz_params(ctype, dataset, data_rows),
|
||||
"dataBindings": self._default_bindings(ctype),
|
||||
"version": 1,
|
||||
"lifecycleState": "active",
|
||||
@@ -483,7 +532,7 @@ class PromptOrchestrator:
|
||||
"renderingHints": self._rendering_hints(ctype),
|
||||
"layout": {
|
||||
"orderIndex": base_order + (i + 1) * 100,
|
||||
"sectionId": "sec_prompt_generated",
|
||||
"sectionId": section_id,
|
||||
"widthMode": "full" if ctype in ("pipeline_board", "table", "geo_map") else "half",
|
||||
"minHeightPx": 300,
|
||||
"stickyHeader": False,
|
||||
@@ -520,11 +569,29 @@ class PromptOrchestrator:
|
||||
dataset=dataset,
|
||||
warnings=component_warnings,
|
||||
order_index=base_order + (i + 1) * 100,
|
||||
section_id=section_id,
|
||||
)
|
||||
components.append(comp)
|
||||
|
||||
if len(components) > 1:
|
||||
planning_component = components.pop(0)
|
||||
planning_component["layout"]["orderIndex"] = base_order + (len(component_plans) + 1) * 100
|
||||
components.append(planning_component)
|
||||
|
||||
return {"components": components}
|
||||
|
||||
@staticmethod
|
||||
def _next_order_base(existing_components: list[dict[str, Any]]) -> int:
|
||||
max_existing = 0
|
||||
for component in existing_components:
|
||||
try:
|
||||
order_index = int((component.get("layout") or {}).get("orderIndex", 0))
|
||||
except (TypeError, ValueError):
|
||||
order_index = 0
|
||||
if order_index > max_existing:
|
||||
max_existing = order_index
|
||||
return ((max_existing // 100) + 1) * 100
|
||||
|
||||
@staticmethod
|
||||
def _persona_text_canvas(
|
||||
*,
|
||||
@@ -533,13 +600,13 @@ class PromptOrchestrator:
|
||||
branch_id: str,
|
||||
prompt: str,
|
||||
persona_plan: dict[str, Any],
|
||||
order_index: int,
|
||||
section_id: str,
|
||||
) -> dict[str, Any]:
|
||||
recommended = ", ".join(persona_plan.get("recommendedTemplates", [])) or "no direct template matches"
|
||||
content = (
|
||||
f"Oracle received: {prompt}\n\n"
|
||||
f"Reusable templates: {recommended}\n\n"
|
||||
"Execution policy: query live CRM data first, reuse matching templates, "
|
||||
"synthesize missing UI blocks, then dispatch the required ComfyUI-backed workflow."
|
||||
"Execution policy: query live CRM data first, pick the strongest-fitting canvas components, "
|
||||
"and synthesize any missing UI blocks before rendering the result."
|
||||
)
|
||||
return {
|
||||
"componentId": str(uuid.uuid4()),
|
||||
@@ -574,8 +641,8 @@ class PromptOrchestrator:
|
||||
},
|
||||
"renderingHints": {"estimatedHeightPx": 180, "skeletonVariant": "text", "virtualizationPriority": 4},
|
||||
"layout": {
|
||||
"orderIndex": 910,
|
||||
"sectionId": "sec_prompt_generated",
|
||||
"orderIndex": order_index,
|
||||
"sectionId": section_id,
|
||||
"widthMode": "full",
|
||||
"minHeightPx": 180,
|
||||
"stickyHeader": False,
|
||||
@@ -631,17 +698,34 @@ class PromptOrchestrator:
|
||||
return labels.get(comp_type, "Oracle Canvas Component")
|
||||
|
||||
@staticmethod
|
||||
def _default_viz_params(comp_type: str, rows: list[dict[str, Any]]) -> dict[str, Any]:
|
||||
def _default_viz_params(comp_type: str, dataset: str, rows: list[dict[str, Any]]) -> dict[str, Any]:
|
||||
first_row = rows[0] if rows else {}
|
||||
inferred_columns = [key for key in first_row.keys() if key not in {"avatar"}] or ["name", "status"]
|
||||
table_columns_by_dataset: dict[str, list[str]] = {
|
||||
"broker_performance": ["name", "deals_closed", "revenue_generated"],
|
||||
"crm_contacts_overview": ["name", "email", "phone", "city", "buyer_type", "qd_score"],
|
||||
"crm_last_interacted_clients": ["name", "email", "phone", "last_interaction_at", "interaction_count", "qd_score"],
|
||||
"crm_top_interested_clients": ["name", "email", "phone", "interest_count", "projects", "qd_score"],
|
||||
}
|
||||
defaults: dict[str, dict[str, Any]] = {
|
||||
"bar_chart": {"xAxis": "category", "yAxis": "value", "sort": "desc", "showLabels": True, "legend": False},
|
||||
"line_chart": {"showPoints": True, "smooth": True},
|
||||
"kpi_tile": {
|
||||
"label": rows[0].get("metric_label", "Result") if rows else "Result",
|
||||
"trend": str(rows[0].get("trend_value", "")) if rows else "",
|
||||
"comparisonLabel": rows[0].get("comparison_label", "") if rows else "",
|
||||
"label": first_row.get("metric_label", "Result"),
|
||||
"trend": str(first_row.get("trend_value", "")),
|
||||
"comparisonLabel": first_row.get("comparison_label", ""),
|
||||
},
|
||||
"geo_map": {"mapStyle": "dubai_district_heat", "intensityField": "lead_count", "interactive": True, "tooltipFields": ["district", "lead_count", "avg_qd_score"]},
|
||||
"table": {"rankBy": "revenue_generated", "showTopBadge": True, "columns": ["name", "deals_closed", "revenue_generated"]},
|
||||
"table": {
|
||||
"rankBy": "revenue_generated",
|
||||
"showTopBadge": True,
|
||||
"columns": table_columns_by_dataset.get(
|
||||
dataset,
|
||||
inferred_columns,
|
||||
),
|
||||
"emptyStateTitle": "No matching records found",
|
||||
"emptyStateDescription": "The query ran successfully but returned no rows for this prompt.",
|
||||
},
|
||||
"pipeline_board": {"showValue": True, "colorByStage": True},
|
||||
"activity_stream": {"showUrgencyIndicator": True},
|
||||
}
|
||||
@@ -674,7 +758,8 @@ class PromptOrchestrator:
|
||||
def _generate_summary(prompt: str, viz_plan: dict[str, Any]) -> str:
|
||||
count = len(viz_plan.get("components", []))
|
||||
short_prompt = prompt[:60] + ("…" if len(prompt) > 60 else "")
|
||||
return f'Generated {count} component{"s" if count != 1 else ""} for: "{short_prompt}"'
|
||||
data_component_count = max(count - 1, 0)
|
||||
return f'Generated {data_component_count} component{"s" if data_component_count != 1 else ""} for: "{short_prompt}"'
|
||||
|
||||
@staticmethod
|
||||
def _error_component(
|
||||
@@ -686,6 +771,7 @@ class PromptOrchestrator:
|
||||
dataset: str,
|
||||
warnings: list[str],
|
||||
order_index: int,
|
||||
section_id: str,
|
||||
) -> dict[str, Any]:
|
||||
return {
|
||||
"componentId": component_id,
|
||||
@@ -722,7 +808,7 @@ class PromptOrchestrator:
|
||||
"renderingHints": {"estimatedHeightPx": 140, "skeletonVariant": "generic", "virtualizationPriority": 5},
|
||||
"layout": {
|
||||
"orderIndex": order_index,
|
||||
"sectionId": "sec_prompt_generated",
|
||||
"sectionId": section_id,
|
||||
"widthMode": "full",
|
||||
"minHeightPx": 140,
|
||||
"stickyHeader": False,
|
||||
@@ -875,8 +961,8 @@ class PromptOrchestrator:
|
||||
execution["status"],
|
||||
execution["modelRuntime"],
|
||||
execution["semanticModelVersion"],
|
||||
json.dumps(execution.get("retrievalPlan") or {}),
|
||||
json.dumps(execution.get("visualizationPlan") or {}),
|
||||
json.dumps(_json_safe(execution.get("retrievalPlan") or {})),
|
||||
json.dumps(_json_safe(execution.get("visualizationPlan") or {})),
|
||||
execution.get("warnings", []),
|
||||
execution.get("summary"),
|
||||
execution.get("componentsCreated", []),
|
||||
|
||||
@@ -257,13 +257,16 @@ async def create_fork(
|
||||
page = await canvas_service.get_page(page_id, ctx.tenant_id)
|
||||
if not page:
|
||||
raise HTTPException(status_code=404, detail="Source page not found.")
|
||||
fork = await collaboration_service.create_fork(
|
||||
source_page=page,
|
||||
recipient_user_id=payload.recipientUserId,
|
||||
created_by=ctx.actor_id,
|
||||
visibility=payload.visibility,
|
||||
message=payload.message,
|
||||
)
|
||||
try:
|
||||
fork = await collaboration_service.create_fork(
|
||||
source_page=page,
|
||||
recipient_user_id=payload.recipientUserId,
|
||||
created_by=ctx.actor_id,
|
||||
visibility=payload.visibility,
|
||||
message=payload.message,
|
||||
)
|
||||
except ValueError as exc:
|
||||
raise HTTPException(status_code=400, detail=str(exc)) from exc
|
||||
return _ok(fork)
|
||||
|
||||
|
||||
|
||||
@@ -1,394 +1,95 @@
|
||||
#!/usr/bin/env bash
|
||||
# =============================================================================
|
||||
# nemoclaw_deploy.sh
|
||||
# Deploys NemoClaw on the AWS G6.12xlarge instance.
|
||||
# - All data/install paths on NVMe (/opt/dlami/nvme/)
|
||||
# - Configures OpenShell to use existing Ollama (qwen3.5:27b, port 11434)
|
||||
# - GPUs 0+1 are Ollama's. Do NOT reassign them.
|
||||
# - ComfyUI owns GPUs 2+3. Do NOT touch.
|
||||
# - Creates a systemd service for the NemoClaw gateway.
|
||||
# =============================================================================
|
||||
|
||||
set -euo pipefail
|
||||
NVME="/opt/dlami/nvme"
|
||||
AGENT_NAME="velocity-sentinel"
|
||||
OLLAMA_URL="http://127.0.0.1:11434"
|
||||
OLLAMA_MODEL="qwen3.5:27b"
|
||||
OPENCLAW_PORT=8080 # Port our FastAPI backend targets
|
||||
|
||||
echo "================================================================"
|
||||
echo " Project Velocity — NemoClaw + OpenShell Deploy Script"
|
||||
echo " Instance: G6.12xlarge | NVMe: $NVME"
|
||||
echo "================================================================"
|
||||
# NemoClaw deployment helper for the Desineuron SGLang runtime.
|
||||
# This script intentionally avoids Ollama-era assumptions and configures
|
||||
# NemoClaw/OpenShell to talk to the shared OpenAI-compatible SGLang endpoint.
|
||||
|
||||
# ──────────────────────────────────────────────────────────────────
|
||||
# 0. Safety checks
|
||||
# ──────────────────────────────────────────────────────────────────
|
||||
if [ "$(id -u)" -ne 0 ]; then
|
||||
echo "[ERROR] Run as root or with sudo"; exit 1
|
||||
NVME_ROOT="${NVME_ROOT:-/opt/dlami/nvme/nemoclaw}"
|
||||
SGLANG_BASE_URL="${SGLANG_BASE_URL:-https://llm.desineuron.in}"
|
||||
SGLANG_MODEL="${SGLANG_MODEL:-qwen3.6:35b-a3b}"
|
||||
SGLANG_API_TOKEN="${SGLANG_API_TOKEN:-}"
|
||||
OPENSHELL_PORT="${OPENSHELL_PORT:-8080}"
|
||||
AGENT_NAME="${AGENT_NAME:-velocity-sentinel}"
|
||||
|
||||
if [[ "${EUID}" -ne 0 ]]; then
|
||||
echo "Run this script with sudo or as root."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if ! mountpoint -q "$NVME" 2>/dev/null && [ ! -d "$NVME" ]; then
|
||||
echo "[WARN] NVMe not mounted at $NVME — using /home/ubuntu/nvme as fallback"
|
||||
NVME="/home/ubuntu/nvme"
|
||||
mkdir -p "$NVME"
|
||||
fi
|
||||
echo "==> Desineuron NemoClaw deploy"
|
||||
echo "NVME root : ${NVME_ROOT}"
|
||||
echo "SGLang base URL: ${SGLANG_BASE_URL}"
|
||||
echo "Model : ${SGLANG_MODEL}"
|
||||
echo "Agent : ${AGENT_NAME}"
|
||||
|
||||
echo "[✓] NVMe target: $NVME"
|
||||
mkdir -p "${NVME_ROOT}"/{logs,state,home}
|
||||
|
||||
# Confirm Ollama is alive before proceeding
|
||||
if ! curl -sf "$OLLAMA_URL/api/tags" | grep -q "qwen"; then
|
||||
echo "[WARN] Ollama at $OLLAMA_URL doesn't show qwen3.5:27b yet — proceeding anyway"
|
||||
else
|
||||
echo "[✓] Ollama confirmed running with qwen3.5:27b"
|
||||
fi
|
||||
|
||||
# ──────────────────────────────────────────────────────────────────
|
||||
# 1. Node.js 22 (NemoClaw requirement: >=22.16)
|
||||
# ──────────────────────────────────────────────────────────────────
|
||||
echo ""
|
||||
echo "[1/7] Installing Node.js 22..."
|
||||
|
||||
NODE_VERSION=$(node --version 2>/dev/null | sed 's/v//' | cut -d. -f1 || echo "0")
|
||||
if [ "$NODE_VERSION" -ge 22 ]; then
|
||||
echo "[✓] Node.js $(node --version) already installed"
|
||||
else
|
||||
if ! command -v node >/dev/null 2>&1; then
|
||||
curl -fsSL https://deb.nodesource.com/setup_22.x | bash -
|
||||
apt-get update -y
|
||||
apt-get install -y nodejs
|
||||
echo "[✓] Node.js $(node --version) installed"
|
||||
fi
|
||||
|
||||
npm --version
|
||||
echo "[✓] npm $(npm --version)"
|
||||
|
||||
# ──────────────────────────────────────────────────────────────────
|
||||
# 2. Docker (required for OpenShell container runtime)
|
||||
# ──────────────────────────────────────────────────────────────────
|
||||
echo ""
|
||||
echo "[2/7] Ensuring Docker is installed..."
|
||||
|
||||
if command -v docker &>/dev/null && docker info &>/dev/null; then
|
||||
echo "[✓] Docker $(docker --version | awk '{print $3}') already running"
|
||||
else
|
||||
echo " Installing Docker..."
|
||||
apt-get install -y ca-certificates curl gnupg lsb-release
|
||||
install -m 0755 -d /etc/apt/keyrings
|
||||
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /etc/apt/keyrings/docker.gpg
|
||||
chmod a+r /etc/apt/keyrings/docker.gpg
|
||||
echo \
|
||||
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
|
||||
https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" \
|
||||
| tee /etc/apt/sources.list.d/docker.list > /dev/null
|
||||
apt-get update -q
|
||||
apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
|
||||
systemctl enable docker
|
||||
systemctl start docker
|
||||
echo "[✓] Docker installed"
|
||||
if ! command -v docker >/dev/null 2>&1; then
|
||||
apt-get update -y
|
||||
apt-get install -y docker.io
|
||||
systemctl enable --now docker
|
||||
fi
|
||||
|
||||
# Move Docker data root to NVMe so images don't fill root disk
|
||||
DOCKER_DAEMON_JSON="/etc/docker/daemon.json"
|
||||
if ! grep -q "nvme" "$DOCKER_DAEMON_JSON" 2>/dev/null; then
|
||||
echo " Moving Docker data-root → $NVME/docker"
|
||||
mkdir -p "$NVME/docker"
|
||||
# Preserve existing config if any
|
||||
EXISTING=$(cat "$DOCKER_DAEMON_JSON" 2>/dev/null || echo "{}")
|
||||
python3 -c "
|
||||
import json, sys
|
||||
cfg = json.loads('''$EXISTING''')
|
||||
cfg['data-root'] = '$NVME/docker'
|
||||
print(json.dumps(cfg, indent=2))
|
||||
" > "$DOCKER_DAEMON_JSON"
|
||||
systemctl restart docker
|
||||
echo "[✓] Docker data-root → $NVME/docker"
|
||||
if ! command -v openshell >/dev/null 2>&1; then
|
||||
npm install -g @nvidia/openshell || true
|
||||
fi
|
||||
|
||||
# ──────────────────────────────────────────────────────────────────
|
||||
# 3. Install NemoClaw (headless via env vars)
|
||||
# ──────────────────────────────────────────────────────────────────
|
||||
echo ""
|
||||
echo "[3/7] Installing NemoClaw..."
|
||||
|
||||
# Set HOME so NemoClaw installs to NVMe-backed location
|
||||
export NEMOCLAW_HOME="$NVME/nemoclaw"
|
||||
export OPENSHELL_HOME="$NVME/openshell"
|
||||
export HOME_OVERRIDE="$NVME/home"
|
||||
mkdir -p "$NEMOCLAW_HOME" "$OPENSHELL_HOME" "$HOME_OVERRIDE"
|
||||
|
||||
# Link ~/.nemoclaw and ~/.openshell to NVMe
|
||||
ln -sfn "$NEMOCLAW_HOME" /root/.nemoclaw 2>/dev/null || true
|
||||
ln -sfn "$NEMOCLAW_HOME" /home/ubuntu/.nemoclaw 2>/dev/null || true
|
||||
ln -sfn "$OPENSHELL_HOME" /root/.openshell 2>/dev/null || true
|
||||
ln -sfn "$OPENSHELL_HOME" /home/ubuntu/.openshell 2>/dev/null || true
|
||||
|
||||
if command -v nemoclaw &>/dev/null; then
|
||||
echo "[✓] nemoclaw already installed: $(nemoclaw --version 2>/dev/null || echo 'version unknown')"
|
||||
else
|
||||
echo " Downloading NemoClaw installer..."
|
||||
INSTALLER_SCRIPT="$NVME/nemoclaw_install.sh"
|
||||
curl -fsSL https://www.nvidia.com/nemoclaw.sh -o "$INSTALLER_SCRIPT"
|
||||
chmod +x "$INSTALLER_SCRIPT"
|
||||
|
||||
# Run the installer non-interactively
|
||||
# NEMOCLAW_SKIP_ONBOARD=1 bypasses the interactive wizard (undocumented but standard pattern)
|
||||
# We'll do manual onboarding after install using CLI flags
|
||||
NEMOCLAW_SKIP_ONBOARD=1 \
|
||||
NEMOCLAW_HOME="$NEMOCLAW_HOME" \
|
||||
bash "$INSTALLER_SCRIPT" || true
|
||||
|
||||
# Reload PATH
|
||||
export PATH="$PATH:/usr/local/bin:/root/.local/bin"
|
||||
source ~/.bashrc 2>/dev/null || true
|
||||
|
||||
if ! command -v nemoclaw &>/dev/null; then
|
||||
echo "[WARN] nemoclaw not in PATH yet — checking common locations..."
|
||||
for p in /usr/local/bin/nemoclaw /root/.local/bin/nemoclaw "$NVME/bin/nemoclaw"; do
|
||||
if [ -f "$p" ]; then
|
||||
ln -sfn "$p" /usr/local/bin/nemoclaw
|
||||
echo "[✓] Linked nemoclaw from $p"
|
||||
break
|
||||
fi
|
||||
done
|
||||
fi
|
||||
|
||||
echo "[✓] nemoclaw installed"
|
||||
if ! command -v nemoclaw >/dev/null 2>&1; then
|
||||
npm install -g @nvidia/nemoclaw || true
|
||||
fi
|
||||
|
||||
# ──────────────────────────────────────────────────────────────────
|
||||
# 4. Onboard the Velocity Sentinel agent sandbox
|
||||
# ──────────────────────────────────────────────────────────────────
|
||||
echo ""
|
||||
echo "[4/7] Onboarding '$AGENT_NAME' NemoClaw sandbox..."
|
||||
cat >/etc/default/desineuron-nemoclaw <<EOF
|
||||
SGLANG_BASE_URL=${SGLANG_BASE_URL}
|
||||
SGLANG_MODEL=${SGLANG_MODEL}
|
||||
SGLANG_API_TOKEN=${SGLANG_API_TOKEN}
|
||||
NEMOCLAW_BASE_URL=${SGLANG_BASE_URL}
|
||||
NEMOCLAW_MODEL=${SGLANG_MODEL}
|
||||
NEMOCLAW_API_TOKEN=${SGLANG_API_TOKEN}
|
||||
EOF
|
||||
chmod 600 /etc/default/desineuron-nemoclaw
|
||||
|
||||
# Check if sandbox already exists
|
||||
if nemoclaw "$AGENT_NAME" status &>/dev/null; then
|
||||
echo "[✓] Sandbox '$AGENT_NAME' already exists — skipping creation"
|
||||
else
|
||||
echo " Running nemoclaw onboard (this may take a few minutes)..."
|
||||
# --provider compatible-endpoint: use our local Ollama instead of NVIDIA cloud
|
||||
# --yes: skip confirmation prompts
|
||||
nemoclaw onboard \
|
||||
--name "$AGENT_NAME" \
|
||||
if command -v openshell >/dev/null 2>&1; then
|
||||
openshell inference set \
|
||||
--provider compatible-endpoint \
|
||||
--endpoint "$OLLAMA_URL/v1" \
|
||||
--model "$OLLAMA_MODEL" \
|
||||
--yes \
|
||||
--no-messaging-bridge \
|
||||
--no-skills || {
|
||||
echo "[WARN] Structured onboard failed — trying minimal onboard..."
|
||||
# Fallback: let it run with defaults if flags are not supported in this alpha version
|
||||
yes "" | nemoclaw onboard --name "$AGENT_NAME" 2>&1 | head -60 || true
|
||||
}
|
||||
echo "[✓] Sandbox onboarded"
|
||||
--base-url "${SGLANG_BASE_URL}/v1" \
|
||||
--api-key "${SGLANG_API_TOKEN:-desineuron}" \
|
||||
--model "${SGLANG_MODEL}" \
|
||||
--context-window 8192 \
|
||||
--max-tokens 4096 || true
|
||||
fi
|
||||
|
||||
# ──────────────────────────────────────────────────────────────────
|
||||
# 5. Configure OpenShell to use Ollama (compatible endpoint)
|
||||
# ──────────────────────────────────────────────────────────────────
|
||||
echo ""
|
||||
echo "[5/7] Configuring OpenShell inference → Ollama (qwen3.5:27b)..."
|
||||
|
||||
# Set inference route to our local Ollama
|
||||
openshell inference set \
|
||||
--provider compatible-endpoint \
|
||||
--base-url "$OLLAMA_URL/v1" \
|
||||
--api-key "ollama" \
|
||||
--model "$OLLAMA_MODEL" \
|
||||
--context-window 32768 \
|
||||
--max-tokens 4096 || {
|
||||
echo "[WARN] openshell inference set failed — trying alternate syntax..."
|
||||
openshell inference set \
|
||||
--provider compatible-endpoint \
|
||||
--model "$OLLAMA_MODEL" || true
|
||||
}
|
||||
|
||||
# Also set the context window on the Ollama model side
|
||||
echo " Setting Ollama num_ctx=32768..."
|
||||
curl -s -X POST "$OLLAMA_URL/api/generate" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "{\"model\":\"$OLLAMA_MODEL\",\"prompt\":\"\",\"options\":{\"num_ctx\":32768},\"stream\":false}" \
|
||||
> /dev/null 2>&1 || true
|
||||
|
||||
echo "[✓] OpenShell inference configured → $OLLAMA_URL ($OLLAMA_MODEL)"
|
||||
|
||||
# ──────────────────────────────────────────────────────────────────
|
||||
# 6. Write OpenShell network policy (allow Velocity backend egress)
|
||||
# ──────────────────────────────────────────────────────────────────
|
||||
echo ""
|
||||
echo "[6/7] Writing OpenShell network policy..."
|
||||
|
||||
POLICY_DIR="$OPENSHELL_HOME/policy"
|
||||
mkdir -p "$POLICY_DIR"
|
||||
|
||||
cat > "$POLICY_DIR/velocity_egress.yaml" << 'POLICY'
|
||||
# OpenShell Network Egress Policy — Project Velocity Sentinel
|
||||
# Applied to the velocity-sentinel sandbox.
|
||||
# All non-listed hosts are blocked by default.
|
||||
|
||||
version: "1"
|
||||
sandbox: velocity-sentinel
|
||||
|
||||
egress:
|
||||
# Local Ollama inference (Qwen 3.5 27B)
|
||||
- host: "127.0.0.1"
|
||||
ports: [11434]
|
||||
description: "Ollama LLM inference"
|
||||
action: allow
|
||||
|
||||
# OpenShell gateway itself (loopback)
|
||||
- host: "127.0.0.1"
|
||||
ports: [8080, 8081, 8082, 8083, 8084, 8085]
|
||||
description: "OpenShell gateway ports"
|
||||
action: allow
|
||||
|
||||
# Velocity FastAPI backend (same host)
|
||||
- host: "127.0.0.1"
|
||||
ports: [8000, 8001, 8288]
|
||||
description: "Velocity FastAPI backend"
|
||||
action: allow
|
||||
|
||||
# PostgreSQL (same host)
|
||||
- host: "127.0.0.1"
|
||||
ports: [5432]
|
||||
description: "PostgreSQL DB"
|
||||
action: allow
|
||||
|
||||
# Block everything else
|
||||
- host: "*"
|
||||
action: deny
|
||||
description: "Default deny — data sovereignty (India/Abu Dhabi)"
|
||||
POLICY
|
||||
|
||||
# Apply the policy if openshell supports it
|
||||
openshell policy apply "$POLICY_DIR/velocity_egress.yaml" 2>/dev/null || \
|
||||
echo "[WARN] Policy apply not supported yet in this alpha — YAML written for future use"
|
||||
|
||||
echo "[✓] Network policy written → $POLICY_DIR/velocity_egress.yaml"
|
||||
|
||||
# ──────────────────────────────────────────────────────────────────
|
||||
# 7. Write NemoClaw systemd service
|
||||
# ──────────────────────────────────────────────────────────────────
|
||||
echo ""
|
||||
echo "[7/7] Installing systemd service: nemoclaw-velocity.service..."
|
||||
|
||||
NEMOCLAW_BIN=$(command -v nemoclaw || echo "/usr/local/bin/nemoclaw")
|
||||
OPENSHELL_BIN=$(command -v openshell || echo "/usr/local/bin/openshell")
|
||||
|
||||
cat > /etc/systemd/system/nemoclaw-velocity.service << SERVICE
|
||||
cat >/etc/systemd/system/desineuron-nemoclaw-gateway.service <<EOF
|
||||
[Unit]
|
||||
Description=NemoClaw Velocity Sentinel Gateway
|
||||
Documentation=https://github.com/NVIDIA/NemoClaw
|
||||
After=network.target ollama.service docker.service
|
||||
Wants=ollama.service docker.service
|
||||
Description=Desineuron NemoClaw Gateway
|
||||
After=network-online.target
|
||||
Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=ubuntu
|
||||
Group=ubuntu
|
||||
WorkingDirectory=$NVME/nemoclaw
|
||||
|
||||
# GPU constraint: NemoClaw itself is CPU-bound (inference goes to Ollama)
|
||||
# Ollama already owns GPUs 0,1. ComfyUI owns GPUs 2,3.
|
||||
Environment=CUDA_VISIBLE_DEVICES=""
|
||||
Environment=NEMOCLAW_HOME=$NVME/nemoclaw
|
||||
Environment=OPENSHELL_HOME=$NVME/openshell
|
||||
Environment=OLLAMA_BASE_URL=http://127.0.0.1:11434
|
||||
Environment=VELOCITY_NEMO_MODEL=qwen3.5:27b
|
||||
Environment=GATEWAY_PORT=$OPENCLAW_PORT
|
||||
|
||||
ExecStart=$NEMOCLAW_BIN $AGENT_NAME connect --gateway-port $OPENCLAW_PORT
|
||||
ExecReload=/bin/kill -HUP \$MAINPID
|
||||
EnvironmentFile=/etc/default/desineuron-nemoclaw
|
||||
WorkingDirectory=${NVME_ROOT}
|
||||
Environment=HOME=${NVME_ROOT}/home
|
||||
ExecStart=/usr/bin/env bash -lc 'nemoclaw serve --name ${AGENT_NAME} --port ${OPENSHELL_PORT}'
|
||||
Restart=always
|
||||
RestartSec=10
|
||||
StandardOutput=append:$NVME/logs/nemoclaw-velocity.log
|
||||
StandardError=append:$NVME/logs/nemoclaw-velocity.log
|
||||
|
||||
# Limits
|
||||
LimitNOFILE=65536
|
||||
TimeoutStopSec=30
|
||||
RestartSec=5
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
SERVICE
|
||||
EOF
|
||||
|
||||
mkdir -p "$NVME/logs"
|
||||
systemctl daemon-reload
|
||||
systemctl enable nemoclaw-velocity.service
|
||||
systemctl start nemoclaw-velocity.service || true # May fail on first boot if onboard not done
|
||||
systemctl enable --now desineuron-nemoclaw-gateway.service
|
||||
systemctl --no-pager --full status desineuron-nemoclaw-gateway.service
|
||||
|
||||
echo "[✓] nemoclaw-velocity.service enabled and started"
|
||||
|
||||
# ──────────────────────────────────────────────────────────────────
|
||||
# Finalize: Detect gateway port & write env file
|
||||
# ──────────────────────────────────────────────────────────────────
|
||||
echo ""
|
||||
echo "================================================================"
|
||||
echo " Writing Velocity backend environment file..."
|
||||
echo "================================================================"
|
||||
|
||||
VELOCITY_ENV="$NVME/velocity/env"
|
||||
mkdir -p "$(dirname "$VELOCITY_ENV")"
|
||||
|
||||
# Detect actual OpenShell gateway URL
|
||||
GATEWAY_URL="http://127.0.0.1:$OPENCLAW_PORT"
|
||||
GATEWAY_CHAT_URL="$GATEWAY_URL/v1/chat/completions"
|
||||
|
||||
# Quick connectivity test (will succeed once nemoclaw starts)
|
||||
echo " Testing gateway at $GATEWAY_CHAT_URL ..."
|
||||
sleep 5
|
||||
HTTP_CODE=$(curl -sf -o /dev/null -w "%{http_code}" \
|
||||
-X POST "$GATEWAY_CHAT_URL" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model":"qwen3.5:27b","messages":[{"role":"user","content":"ping"}],"max_tokens":5}' \
|
||||
2>/dev/null || echo "000")
|
||||
|
||||
if [ "$HTTP_CODE" = "200" ] || [ "$HTTP_CODE" = "201" ]; then
|
||||
echo "[✓] Gateway responding at $GATEWAY_CHAT_URL (HTTP $HTTP_CODE)"
|
||||
else
|
||||
echo "[WARN] Gateway not yet responding (HTTP $HTTP_CODE) — it may still be starting up"
|
||||
fi
|
||||
|
||||
cat > "$VELOCITY_ENV" << ENV
|
||||
# Project Velocity — Backend Environment
|
||||
# Generated by nemoclaw_deploy.sh
|
||||
# Loaded by: source $VELOCITY_ENV
|
||||
|
||||
# ── NemoClaw / OpenShell Gateway ──────────────────────────────────
|
||||
NEMOCLAW_BASE_URL=$GATEWAY_URL
|
||||
NEMOCLAW_CHAT_URL=$GATEWAY_CHAT_URL
|
||||
NEMOCLAW_MODEL=qwen3.5:27b
|
||||
NEMOCLAW_TIMEOUT_S=30.0
|
||||
NEMOCLAW_TEMPERATURE=0.2
|
||||
|
||||
# ── Ollama (direct fallback if OpenShell gateway not up) ──────────
|
||||
OLLAMA_BASE_URL=http://127.0.0.1:11434
|
||||
|
||||
# ── NemoClaw Prompts ──────────────────────────────────────────────
|
||||
NEMOCLAW_PROMPT_DIR=$NVME/nemoclaw/prompts
|
||||
|
||||
# ── JWT / Auth ────────────────────────────────────────────────────
|
||||
# VELOCITY_JWT_SECRET=<SET_THIS>
|
||||
|
||||
# ── PostgreSQL ────────────────────────────────────────────────────
|
||||
# VELOCITY_DB_DSN=postgresql://velocity_app:<PW>@127.0.0.1:5432/velocity
|
||||
ENV
|
||||
|
||||
echo "[✓] Environment file written → $VELOCITY_ENV"
|
||||
echo ""
|
||||
echo "================================================================"
|
||||
echo " DONE. Summary:"
|
||||
echo ""
|
||||
echo " Agent name : $AGENT_NAME"
|
||||
echo " Gateway URL : $GATEWAY_URL"
|
||||
echo " Chat endpoint: $GATEWAY_CHAT_URL"
|
||||
echo " Model : $OLLAMA_MODEL (via Ollama on port 11434)"
|
||||
echo " GPUs 0,1 : Ollama (unchanged)"
|
||||
echo " GPUs 2,3 : ComfyUI (unchanged)"
|
||||
echo " Env file : $VELOCITY_ENV"
|
||||
echo " Service log : $NVME/logs/nemoclaw-velocity.log"
|
||||
echo ""
|
||||
echo " Next commands to verify:"
|
||||
echo " nemoclaw $AGENT_NAME status"
|
||||
echo " nemoclaw $AGENT_NAME logs --follow"
|
||||
echo " curl $GATEWAY_CHAT_URL (POST with messages[])"
|
||||
echo "================================================================"
|
||||
echo
|
||||
echo "NemoClaw deployment complete."
|
||||
echo "Gateway port : ${OPENSHELL_PORT}"
|
||||
echo "Model : ${SGLANG_MODEL}"
|
||||
echo "Runtime : ${SGLANG_BASE_URL}/v1"
|
||||
|
||||
@@ -1,10 +1,13 @@
|
||||
"""
|
||||
backend/services/nemoclaw_client.py - NemoClaw inference client.
|
||||
|
||||
Primary path:
|
||||
1. NVIDIA-hosted OpenAI-compatible chat completions.
|
||||
2. Optional compatible endpoint via NEMOCLAW_BASE_URL.
|
||||
3. Optional local Ollama fallback only when ALLOW_LOCAL_FALLBACK=true.
|
||||
Production path:
|
||||
1. Shared SGLang / OpenAI-compatible coding runtime.
|
||||
|
||||
Compatibility:
|
||||
- Legacy NEMOCLAW_* env names are still honored.
|
||||
- Legacy OLLAMA_BASE_URL can still seed the base URL, but Ollama is no longer
|
||||
a production fallback path.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
@@ -24,28 +27,23 @@ logger = logging.getLogger("velocity.nemoclaw")
|
||||
NEMOCLAW_TIMEOUT = float(os.getenv("NEMOCLAW_TIMEOUT_S", "45.0"))
|
||||
NEMOCLAW_TEMPERATURE = float(os.getenv("NEMOCLAW_TEMPERATURE", "0.2"))
|
||||
|
||||
NVIDIA_API_KEY = os.getenv("NVIDIA_API_KEY", "")
|
||||
NVIDIA_BASE_URL = os.getenv("NVIDIA_BASE_URL", "https://integrate.api.nvidia.com/v1")
|
||||
NVIDIA_CHAT_URL = os.getenv("NVIDIA_CHAT_URL", f"{NVIDIA_BASE_URL}/chat/completions")
|
||||
NVIDIA_MODEL = os.getenv("NVIDIA_MODEL", "nvidia/nemotron-3-super-120b-a12b")
|
||||
NVIDIA_FALLBACK_MODEL = os.getenv(
|
||||
"NVIDIA_FALLBACK_MODEL",
|
||||
"nvidia/llama-3.3-nemotron-super-49b-v1",
|
||||
SGLANG_BASE_URL = os.getenv(
|
||||
"SGLANG_BASE_URL",
|
||||
os.getenv(
|
||||
"NEMOCLAW_BASE_URL",
|
||||
os.getenv("LLM_BASE_URL", os.getenv("OLLAMA_BASE_URL", "https://llm.desineuron.in")),
|
||||
),
|
||||
).rstrip("/")
|
||||
SGLANG_CHAT_URL = os.getenv(
|
||||
"SGLANG_CHAT_URL",
|
||||
os.getenv("NEMOCLAW_CHAT_URL", f"{SGLANG_BASE_URL}/v1/chat/completions"),
|
||||
)
|
||||
|
||||
NEMOCLAW_BASE_URL = os.getenv("NEMOCLAW_BASE_URL", "")
|
||||
NEMOCLAW_CHAT_URL = (
|
||||
os.getenv("NEMOCLAW_CHAT_URL") or f"{NEMOCLAW_BASE_URL}/v1/chat/completions"
|
||||
if NEMOCLAW_BASE_URL
|
||||
else ""
|
||||
SGLANG_MODELS_URL = os.getenv("SGLANG_MODELS_URL", f"{SGLANG_BASE_URL}/v1/models")
|
||||
SGLANG_MODEL = os.getenv(
|
||||
"SGLANG_MODEL",
|
||||
os.getenv("NEMOCLAW_MODEL", os.getenv("OLLAMA_MODEL", "qwen3.6:35b-a3b")),
|
||||
)
|
||||
NEMOCLAW_MODEL = os.getenv("NEMOCLAW_MODEL", NVIDIA_MODEL)
|
||||
NEMOCLAW_API_TOKEN = os.getenv("NEMOCLAW_API_TOKEN", "")
|
||||
|
||||
ALLOW_LOCAL_FALLBACK = os.getenv("ALLOW_LOCAL_FALLBACK", "false").lower() == "true"
|
||||
OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL", "http://127.0.0.1:11434")
|
||||
OLLAMA_CHAT_URL = f"{OLLAMA_BASE_URL}/v1/chat/completions"
|
||||
OLLAMA_MODEL = os.getenv("OLLAMA_MODEL", "qwen3.5:27b")
|
||||
SGLANG_API_TOKEN = os.getenv("SGLANG_API_TOKEN", os.getenv("NEMOCLAW_API_TOKEN", ""))
|
||||
|
||||
_PROMPT_DIR = os.getenv("NEMOCLAW_PROMPT_DIR", "/opt/dlami/nvme/nemoclaw/prompts")
|
||||
|
||||
@@ -201,83 +199,40 @@ async def _nemoclaw_chat(
|
||||
user_content: str,
|
||||
timeout: float = NEMOCLAW_TIMEOUT,
|
||||
) -> dict:
|
||||
endpoints: list[tuple[str, str, str, dict[str, str]]] = []
|
||||
if NVIDIA_API_KEY:
|
||||
endpoints.append(
|
||||
(
|
||||
"nvidia_primary",
|
||||
NVIDIA_CHAT_URL,
|
||||
NVIDIA_MODEL,
|
||||
{
|
||||
"Authorization": f"Bearer {NVIDIA_API_KEY}",
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
)
|
||||
)
|
||||
if NVIDIA_FALLBACK_MODEL and NVIDIA_FALLBACK_MODEL != NVIDIA_MODEL:
|
||||
endpoints.append(
|
||||
(
|
||||
"nvidia_fallback",
|
||||
NVIDIA_CHAT_URL,
|
||||
NVIDIA_FALLBACK_MODEL,
|
||||
{
|
||||
"Authorization": f"Bearer {NVIDIA_API_KEY}",
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
)
|
||||
)
|
||||
if NEMOCLAW_CHAT_URL:
|
||||
headers = {"Content-Type": "application/json"}
|
||||
if NEMOCLAW_API_TOKEN:
|
||||
headers["Authorization"] = f"Bearer {NEMOCLAW_API_TOKEN}"
|
||||
endpoints.append(("compatible_endpoint", NEMOCLAW_CHAT_URL, NEMOCLAW_MODEL, headers))
|
||||
if ALLOW_LOCAL_FALLBACK:
|
||||
endpoints.append(
|
||||
("ollama_fallback", OLLAMA_CHAT_URL, OLLAMA_MODEL, {"Content-Type": "application/json"})
|
||||
if not SGLANG_CHAT_URL:
|
||||
raise RuntimeError(
|
||||
"No NemoClaw inference endpoint is configured. Set SGLANG_BASE_URL or NEMOCLAW_BASE_URL."
|
||||
)
|
||||
|
||||
if not endpoints:
|
||||
raise RuntimeError(
|
||||
"No NemoClaw inference endpoint is configured. "
|
||||
"Set NVIDIA_API_KEY or NEMOCLAW_BASE_URL."
|
||||
)
|
||||
headers = {"Content-Type": "application/json"}
|
||||
if SGLANG_API_TOKEN:
|
||||
headers["Authorization"] = f"Bearer {SGLANG_API_TOKEN}"
|
||||
|
||||
t_start = time.monotonic()
|
||||
last_error: Exception | None = None
|
||||
for label, url, model, headers in endpoints:
|
||||
try:
|
||||
result = await _attempt_chat(
|
||||
label=label,
|
||||
url=url,
|
||||
model=model,
|
||||
system_content=system_content,
|
||||
user_content=user_content,
|
||||
timeout=timeout,
|
||||
headers=headers,
|
||||
)
|
||||
logger.info(
|
||||
"NemoClaw inference via %s model=%s elapsed=%.2fs",
|
||||
label,
|
||||
model,
|
||||
time.monotonic() - t_start,
|
||||
)
|
||||
return result
|
||||
except (httpx.ConnectError, httpx.TimeoutException) as exc:
|
||||
logger.warning("NemoClaw %s unreachable (%s), trying next endpoint", label, exc)
|
||||
last_error = exc
|
||||
except httpx.HTTPStatusError as exc:
|
||||
logger.error(
|
||||
"NemoClaw %s HTTP %s: %s",
|
||||
label,
|
||||
exc.response.status_code,
|
||||
exc.response.text[:300],
|
||||
)
|
||||
last_error = exc
|
||||
except (KeyError, IndexError, TypeError, json.JSONDecodeError) as exc:
|
||||
logger.error("NemoClaw %s returned invalid JSON: %s", label, exc)
|
||||
last_error = exc
|
||||
|
||||
raise RuntimeError(f"All NemoClaw endpoints failed. Last error: {last_error}")
|
||||
try:
|
||||
result = await _attempt_chat(
|
||||
label="sglang",
|
||||
url=SGLANG_CHAT_URL,
|
||||
model=SGLANG_MODEL,
|
||||
system_content=system_content,
|
||||
user_content=user_content,
|
||||
timeout=timeout,
|
||||
headers=headers,
|
||||
)
|
||||
logger.info(
|
||||
"NemoClaw inference via sglang model=%s elapsed=%.2fs",
|
||||
SGLANG_MODEL,
|
||||
time.monotonic() - t_start,
|
||||
)
|
||||
return result
|
||||
except (httpx.ConnectError, httpx.TimeoutException) as exc:
|
||||
raise RuntimeError(f"NemoClaw SGLang endpoint unreachable: {exc}") from exc
|
||||
except httpx.HTTPStatusError as exc:
|
||||
raise RuntimeError(
|
||||
f"NemoClaw SGLang HTTP {exc.response.status_code}: {exc.response.text[:300]}"
|
||||
) from exc
|
||||
except (KeyError, IndexError, TypeError, json.JSONDecodeError) as exc:
|
||||
raise RuntimeError(f"NemoClaw SGLang returned invalid JSON: {exc}") from exc
|
||||
|
||||
|
||||
async def score_qd(
|
||||
@@ -368,46 +323,32 @@ async def profile_cctv_visitor(
|
||||
|
||||
|
||||
async def health_check() -> dict:
|
||||
results: dict[str, str] = {}
|
||||
endpoints: list[tuple[str, str, str, dict[str, str]]] = []
|
||||
if NVIDIA_API_KEY:
|
||||
endpoints.append(
|
||||
(
|
||||
"nvidia_primary",
|
||||
NVIDIA_CHAT_URL,
|
||||
NVIDIA_MODEL,
|
||||
{
|
||||
"Authorization": f"Bearer {NVIDIA_API_KEY}",
|
||||
"Content-Type": "application/json",
|
||||
headers = {"Content-Type": "application/json"}
|
||||
if SGLANG_API_TOKEN:
|
||||
headers["Authorization"] = f"Bearer {SGLANG_API_TOKEN}"
|
||||
|
||||
results: dict[str, str] = {
|
||||
"model": SGLANG_MODEL,
|
||||
"primary_url": SGLANG_CHAT_URL,
|
||||
"models_url": SGLANG_MODELS_URL,
|
||||
}
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=5.0) as client:
|
||||
models_response = await client.get(SGLANG_MODELS_URL, headers=headers)
|
||||
models_response.raise_for_status()
|
||||
chat_response = await client.post(
|
||||
SGLANG_CHAT_URL,
|
||||
json={
|
||||
"model": SGLANG_MODEL,
|
||||
"messages": [{"role": "user", "content": "ping"}],
|
||||
"max_tokens": 5,
|
||||
},
|
||||
headers=headers,
|
||||
)
|
||||
)
|
||||
if NEMOCLAW_CHAT_URL:
|
||||
headers = {"Content-Type": "application/json"}
|
||||
if NEMOCLAW_API_TOKEN:
|
||||
headers["Authorization"] = f"Bearer {NEMOCLAW_API_TOKEN}"
|
||||
endpoints.append(("compatible_endpoint", NEMOCLAW_CHAT_URL, NEMOCLAW_MODEL, headers))
|
||||
if ALLOW_LOCAL_FALLBACK:
|
||||
endpoints.append(
|
||||
("ollama_fallback", OLLAMA_CHAT_URL, OLLAMA_MODEL, {"Content-Type": "application/json"})
|
||||
)
|
||||
chat_response.raise_for_status()
|
||||
results["sglang"] = "ok"
|
||||
except Exception as exc:
|
||||
results["sglang"] = f"error: {exc}"
|
||||
|
||||
for name, url, model, headers in endpoints:
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=5.0) as client:
|
||||
response = await client.post(
|
||||
url,
|
||||
json={
|
||||
"model": model,
|
||||
"messages": [{"role": "user", "content": "ping"}],
|
||||
"max_tokens": 5,
|
||||
},
|
||||
headers=headers,
|
||||
)
|
||||
results[name] = "ok" if response.status_code < 500 else f"http_{response.status_code}"
|
||||
except Exception as exc:
|
||||
results[name] = f"error: {exc}"
|
||||
|
||||
results["model"] = NVIDIA_MODEL if NVIDIA_API_KEY else NEMOCLAW_MODEL
|
||||
results["primary_url"] = NVIDIA_CHAT_URL if NVIDIA_API_KEY else (NEMOCLAW_CHAT_URL or OLLAMA_CHAT_URL)
|
||||
return results
|
||||
|
||||
@@ -13,15 +13,17 @@ import httpx
|
||||
|
||||
logger = logging.getLogger("velocity.runtime_llm")
|
||||
|
||||
OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL", "http://127.0.0.1:11434").rstrip("/")
|
||||
OLLAMA_CHAT_URL = os.getenv("OLLAMA_CHAT_URL", f"{OLLAMA_BASE_URL}/v1/chat/completions")
|
||||
OLLAMA_TAGS_URL = os.getenv("OLLAMA_TAGS_URL", f"{OLLAMA_BASE_URL}/api/tags")
|
||||
OLLAMA_DEFAULT_MODEL = os.getenv("OLLAMA_MODEL", "qwen3.5:27b")
|
||||
|
||||
NEMOCLAW_BASE_URL = os.getenv("NEMOCLAW_BASE_URL", "").rstrip("/")
|
||||
NEMOCLAW_CHAT_URL = (os.getenv("NEMOCLAW_CHAT_URL") or f"{NEMOCLAW_BASE_URL}/v1/chat/completions").rstrip("/") if NEMOCLAW_BASE_URL else ""
|
||||
NEMOCLAW_DEFAULT_MODEL = os.getenv("NEMOCLAW_MODEL", "nvidia/nemotron-3-super-120b-a12b")
|
||||
NEMOCLAW_API_TOKEN = os.getenv("NEMOCLAW_API_TOKEN", "")
|
||||
SGLANG_BASE_URL = os.getenv(
|
||||
"SGLANG_BASE_URL",
|
||||
os.getenv("LLM_BASE_URL", os.getenv("OLLAMA_BASE_URL", "https://llm.desineuron.in")),
|
||||
).rstrip("/")
|
||||
SGLANG_CHAT_URL = os.getenv("SGLANG_CHAT_URL", f"{SGLANG_BASE_URL}/v1/chat/completions")
|
||||
SGLANG_MODELS_URL = os.getenv("SGLANG_MODELS_URL", f"{SGLANG_BASE_URL}/v1/models")
|
||||
SGLANG_DEFAULT_MODEL = os.getenv(
|
||||
"SGLANG_MODEL",
|
||||
os.getenv("OLLAMA_MODEL", "qwen3.6:35b-a3b"),
|
||||
)
|
||||
SGLANG_API_TOKEN = os.getenv("SGLANG_API_TOKEN", "")
|
||||
|
||||
RUNTIME_LLM_TIMEOUT_S = float(os.getenv("RUNTIME_LLM_TIMEOUT_S", "90.0"))
|
||||
RUNTIME_LLM_CONCURRENCY = int(os.getenv("RUNTIME_LLM_BATCH_CONCURRENCY", "2"))
|
||||
@@ -57,40 +59,30 @@ class RuntimeLLMService:
|
||||
self._jobs: dict[str, dict[str, Any]] = {}
|
||||
|
||||
def _provider_catalog(self) -> list[RuntimeProvider]:
|
||||
providers: list[RuntimeProvider] = []
|
||||
if OLLAMA_CHAT_URL:
|
||||
providers.append(
|
||||
RuntimeProvider(
|
||||
provider_id="ollama",
|
||||
base_url=OLLAMA_BASE_URL,
|
||||
chat_url=OLLAMA_CHAT_URL,
|
||||
default_model=OLLAMA_DEFAULT_MODEL,
|
||||
)
|
||||
if not SGLANG_CHAT_URL:
|
||||
return []
|
||||
return [
|
||||
RuntimeProvider(
|
||||
provider_id="sglang",
|
||||
base_url=SGLANG_BASE_URL,
|
||||
chat_url=SGLANG_CHAT_URL,
|
||||
default_model=SGLANG_DEFAULT_MODEL,
|
||||
auth_token=SGLANG_API_TOKEN or None,
|
||||
)
|
||||
if NEMOCLAW_CHAT_URL:
|
||||
providers.append(
|
||||
RuntimeProvider(
|
||||
provider_id="nemoclaw",
|
||||
base_url=NEMOCLAW_BASE_URL,
|
||||
chat_url=NEMOCLAW_CHAT_URL,
|
||||
default_model=NEMOCLAW_DEFAULT_MODEL,
|
||||
auth_token=NEMOCLAW_API_TOKEN or None,
|
||||
)
|
||||
)
|
||||
return providers
|
||||
]
|
||||
|
||||
def get_provider(self, provider_id: str | None) -> RuntimeProvider:
|
||||
providers = {provider.provider_id: provider for provider in self._provider_catalog()}
|
||||
if provider_id in {"ollama", "nemoclaw"}:
|
||||
provider_id = "sglang"
|
||||
if provider_id:
|
||||
provider = providers.get(provider_id)
|
||||
if provider is None:
|
||||
raise ValueError(f"Unknown provider '{provider_id}'.")
|
||||
return provider
|
||||
|
||||
if "nemoclaw" in providers:
|
||||
return providers["nemoclaw"]
|
||||
if "ollama" in providers:
|
||||
return providers["ollama"]
|
||||
if "sglang" in providers:
|
||||
return providers["sglang"]
|
||||
raise ValueError("No runtime LLM providers are configured.")
|
||||
|
||||
async def list_providers(self) -> list[dict[str, Any]]:
|
||||
@@ -101,28 +93,18 @@ class RuntimeLLMService:
|
||||
error: str | None = None
|
||||
|
||||
try:
|
||||
if provider.provider_id == "ollama":
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
response = await client.get(OLLAMA_TAGS_URL)
|
||||
response.raise_for_status()
|
||||
payload = response.json()
|
||||
models = [str(item.get("name", "")).strip() for item in payload.get("models", []) if item.get("name")]
|
||||
if provider.default_model not in models:
|
||||
models.insert(0, provider.default_model)
|
||||
status = "online"
|
||||
else:
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
response = await client.post(
|
||||
provider.chat_url,
|
||||
json={
|
||||
"model": provider.default_model,
|
||||
"messages": [{"role": "user", "content": "ping"}],
|
||||
"max_tokens": 4,
|
||||
},
|
||||
headers=provider.headers,
|
||||
)
|
||||
response.raise_for_status()
|
||||
status = "online"
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
response = await client.get(SGLANG_MODELS_URL, headers=provider.headers)
|
||||
response.raise_for_status()
|
||||
payload = response.json()
|
||||
models = [
|
||||
str(item.get("id", "")).strip()
|
||||
for item in payload.get("data", [])
|
||||
if item.get("id")
|
||||
]
|
||||
if provider.default_model not in models:
|
||||
models.insert(0, provider.default_model)
|
||||
status = "online"
|
||||
except Exception as exc: # pragma: no cover - network/runtime dependent
|
||||
error = str(exc)
|
||||
|
||||
|
||||
@@ -1,11 +1,12 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Dream Weaver — Local LLM Prompt Expander
|
||||
========================================
|
||||
Dream Weaver — Shared Runtime Prompt Expander
|
||||
============================================
|
||||
Converts user keywords + room type into a photorealistic interior design prompt
|
||||
using a local Ollama model (default: qwen3.5:27b).
|
||||
Cloud API calls (Gemini, OpenAI) have been completely removed for data privacy
|
||||
and local inference requirements.
|
||||
using the shared OpenAI-compatible Desineuron runtime (default: SGLang-hosted
|
||||
Qwen 3.6 35B A3B).
|
||||
Cloud API calls (Gemini, OpenAI SaaS) have been removed in favor of the routed
|
||||
Desineuron inference path.
|
||||
|
||||
Usage:
|
||||
from prompt_expander import expand_prompt
|
||||
@@ -126,26 +127,44 @@ class ExpandedPrompt:
|
||||
self.source = source
|
||||
|
||||
|
||||
def _call_ollama(user_message: str) -> str:
|
||||
ollama_url = os.environ.get("OLLAMA_URL", "http://localhost:11434")
|
||||
# Using Qwen 3.5 27B as requested
|
||||
model = os.environ.get("OLLAMA_MODEL", "qwen3.5:27b")
|
||||
full_prompt = f"{SYSTEM_PROMPT}\n\nUSER REQUEST:\n{user_message}\n\nReturn JSON ONLY. No markdown wrapping."
|
||||
def _call_runtime(user_message: str) -> str:
|
||||
runtime_base = os.environ.get(
|
||||
"SGLANG_BASE_URL",
|
||||
os.environ.get(
|
||||
"LLM_BASE_URL",
|
||||
os.environ.get("OLLAMA_URL", "https://llm.desineuron.in"),
|
||||
),
|
||||
).rstrip("/")
|
||||
chat_url = os.environ.get("SGLANG_CHAT_URL", f"{runtime_base}/v1/chat/completions")
|
||||
model = os.environ.get(
|
||||
"SGLANG_MODEL",
|
||||
os.environ.get("OLLAMA_MODEL", "qwen3.6:35b-a3b"),
|
||||
)
|
||||
api_token = os.environ.get("SGLANG_API_TOKEN", "")
|
||||
full_prompt = (
|
||||
f"{SYSTEM_PROMPT}\n\nUSER REQUEST:\n{user_message}\n\nReturn JSON ONLY. No markdown wrapping."
|
||||
)
|
||||
|
||||
headers = {"Content-Type": "application/json"}
|
||||
if api_token:
|
||||
headers["Authorization"] = f"Bearer {api_token}"
|
||||
|
||||
r = requests.post(
|
||||
f"{ollama_url}/api/generate",
|
||||
chat_url,
|
||||
json={
|
||||
"model": model,
|
||||
"prompt": full_prompt,
|
||||
"stream": False,
|
||||
"format": "json",
|
||||
"options": {"temperature": 0.5}
|
||||
"messages": [{"role": "user", "content": full_prompt}],
|
||||
"temperature": 0.5,
|
||||
"response_format": {"type": "json_object"},
|
||||
"max_tokens": 1200,
|
||||
},
|
||||
timeout=180 # Large models take time
|
||||
headers=headers,
|
||||
timeout=180,
|
||||
)
|
||||
r.raise_for_status()
|
||||
resp_json = r.json()
|
||||
return resp_json["response"]
|
||||
message = ((resp_json.get("choices") or [{}])[0].get("message") or {}).get("content", "")
|
||||
return message if isinstance(message, str) else json.dumps(message)
|
||||
|
||||
|
||||
def expand_prompt(keywords: list[str], room_type: str = "living_room", additional_notes: str = "") -> ExpandedPrompt:
|
||||
@@ -164,16 +183,16 @@ AVOID: {ctx['avoid']}
|
||||
{f'NOTES: {additional_notes}' if additional_notes else ''}"""
|
||||
|
||||
try:
|
||||
logger.info("Calling local Ollama LLM...")
|
||||
raw = _call_ollama(user_message).strip()
|
||||
logger.info("Calling shared Desineuron runtime LLM...")
|
||||
raw = _call_runtime(user_message).strip()
|
||||
|
||||
# Log the raw response for debugging
|
||||
logger.info(f"Raw Ollama response length: {len(raw)}")
|
||||
|
||||
# Handle empty response
|
||||
if not raw:
|
||||
logger.error("Empty response from Ollama")
|
||||
raise ValueError("Ollama returned an empty response")
|
||||
logger.error("Empty response from shared runtime")
|
||||
raise ValueError("Shared runtime returned an empty response")
|
||||
|
||||
# Clean string of common junk (control characters, leading/trailing non-bracket junk)
|
||||
raw_cleaned = re.sub(r'[\x00-\x08\x0b\x0c\x0e-\x1f\x7f]', '', raw)
|
||||
@@ -215,7 +234,7 @@ AVOID: {ctx['avoid']}
|
||||
source="ollama_local"
|
||||
)
|
||||
except Exception as e:
|
||||
logger.error(f"Ollama LLM expansion failed: {e}")
|
||||
logger.error(f"Shared runtime LLM expansion failed: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
# Full fallback if anything goes wrong
|
||||
|
||||
@@ -25,6 +25,25 @@ office.desineuron.in, git.desineuron.in, cloud.desineuron.in, projects.desineuro
|
||||
}
|
||||
}
|
||||
|
||||
velocity.desineuron.in {
|
||||
log {
|
||||
output file /var/log/caddy/access.log
|
||||
format json
|
||||
}
|
||||
|
||||
import /etc/caddy/managed/llm_upstream.caddy_inc
|
||||
|
||||
reverse_proxy https://127.0.0.1:8443 {
|
||||
header_up Host {host}
|
||||
header_up X-Forwarded-Host {host}
|
||||
header_up X-Forwarded-Proto {scheme}
|
||||
header_up X-Forwarded-For {remote_host}
|
||||
transport http {
|
||||
tls_insecure_skip_verify
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
ops.desineuron.in {
|
||||
log {
|
||||
output file /var/log/caddy/access.log
|
||||
|
||||
@@ -0,0 +1,20 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
TARGET_PATH="${TARGET_PATH:-/opt/dlami/nvme/models/cyankiwi-Qwen3.5-122B-A10B-AWQ-4bit}"
|
||||
MODEL_REPO="${MODEL_REPO:-cyankiwi/Qwen3.5-122B-A10B-AWQ-4bit}"
|
||||
|
||||
mkdir -p "${TARGET_PATH}"
|
||||
|
||||
if command -v hf >/dev/null 2>&1; then
|
||||
hf download "${MODEL_REPO}" --local-dir "${TARGET_PATH}" --max-workers 8
|
||||
else
|
||||
python3 - <<PY
|
||||
from huggingface_hub import snapshot_download
|
||||
snapshot_download(repo_id="${MODEL_REPO}", local_dir="${TARGET_PATH}", max_workers=8)
|
||||
PY
|
||||
fi
|
||||
|
||||
echo "Staged ${MODEL_REPO} under ${TARGET_PATH}"
|
||||
echo "This is an acquisition/staging path only. The live L4 runtime remains qwen3.6:35b-a3b unless explicitly cut over."
|
||||
echo "Use MODEL_REPO=txn545/Qwen3.5-122B-A10B-NVFP4 only on hardware validated for NVFP4."
|
||||
17
infrastructure/desineuron_ingress/deploy_caddy_llm.sh
Normal file
17
infrastructure/desineuron_ingress/deploy_caddy_llm.sh
Normal file
@@ -0,0 +1,17 @@
|
||||
#!/bin/bash
|
||||
set -ex
|
||||
|
||||
# Copy latest config files
|
||||
sudo scp -o StrictHostKeyChecking=no -i /opt/desineuron-ops-control-plane/state/desineuron-l4-node.pem /tmp/manage_desineuron_routes.py ec2-user@98.87.120.120:/tmp/manage_desineuron_routes.py
|
||||
sudo scp -o StrictHostKeyChecking=no -i /opt/desineuron-ops-control-plane/state/desineuron-l4-node.pem /tmp/Caddyfile ec2-user@98.87.120.120:/tmp/Caddyfile
|
||||
|
||||
# Bootstrap on the proxy target
|
||||
sudo ssh -o StrictHostKeyChecking=no -i /opt/desineuron-ops-control-plane/state/desineuron-l4-node.pem ec2-user@98.87.120.120 "sudo cp /tmp/manage_desineuron_routes.py /usr/local/bin/manage_desineuron_routes.py && sudo chmod +x /usr/local/bin/manage_desineuron_routes.py && sudo touch /etc/caddy/managed/llm_upstream.caddy_inc && sudo cp /tmp/Caddyfile /etc/caddy/Caddyfile"
|
||||
|
||||
# Invoke immediate synchronization pulse to populate llm_upstream.caddy_inc
|
||||
sudo systemctl start desineuron-llm-route-sync.service
|
||||
|
||||
sleep 5
|
||||
|
||||
# Safely initiate proxy reload
|
||||
sudo ssh -o StrictHostKeyChecking=no -i /opt/desineuron-ops-control-plane/state/desineuron-l4-node.pem ec2-user@98.87.120.120 "sudo systemctl reload caddy"
|
||||
@@ -0,0 +1,9 @@
|
||||
[Unit]
|
||||
Description=Sync llm.desineuron.in managed route to current GPU private IP
|
||||
After=network-online.target
|
||||
Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
EnvironmentFile=/etc/desineuron-llm-route-sync.env
|
||||
ExecStart=/usr/local/bin/run_llm_route_sync.sh
|
||||
@@ -0,0 +1,10 @@
|
||||
[Unit]
|
||||
Description=Run LLM route sync on boot and every 2 minutes
|
||||
|
||||
[Timer]
|
||||
OnBootSec=1min
|
||||
OnUnitActiveSec=2min
|
||||
Unit=desineuron-llm-route-sync.service
|
||||
|
||||
[Install]
|
||||
WantedBy=timers.target
|
||||
108
infrastructure/desineuron_ingress/install_gpu_ollama_watchdog.sh
Normal file
108
infrastructure/desineuron_ingress/install_gpu_ollama_watchdog.sh
Normal file
@@ -0,0 +1,108 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
MODEL_NAME="qwen3.6:35b-a3b"
|
||||
NVME_ROOT="/opt/dlami/nvme/ollama"
|
||||
OLLAMA_OVERRIDE_DIR="/etc/systemd/system/ollama.service.d"
|
||||
|
||||
# 1. Configure Ollama to use NVME
|
||||
sudo mkdir -p "${NVME_ROOT}/models" "${NVME_ROOT}/state" "${NVME_ROOT}/logs"
|
||||
sudo chown -R root:root "${NVME_ROOT}"
|
||||
|
||||
echo "Configuring Ollama to use NVME storage at ${NVME_ROOT}/models..."
|
||||
sudo mkdir -p "${OLLAMA_OVERRIDE_DIR}"
|
||||
sudo tee "${OLLAMA_OVERRIDE_DIR}/override.conf" >/dev/null <<EOF
|
||||
[Service]
|
||||
Environment="OLLAMA_MODELS=${NVME_ROOT}/models"
|
||||
Environment="OLLAMA_HOST=0.0.0.0"
|
||||
EOF
|
||||
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable --now ollama.service
|
||||
|
||||
# 2. Write the Hydrate Helper
|
||||
HYDRATE_HELPER="/usr/local/bin/desineuron-hydrate-qwen36.sh"
|
||||
echo "Creating Hydrate Helper map at $HYDRATE_HELPER"
|
||||
sudo tee "$HYDRATE_HELPER" >/dev/null <<EOF
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
echo "(\$(date)) Hydrating \$1 model using ollama pull..." | sudo tee -a "${NVME_ROOT}/logs/qwen36_hydrate.log"
|
||||
# This requires outward access or an Ollama compatible registry proxy
|
||||
# Note: For S3-based private GGUFs, this would use s5cmd
|
||||
ollama pull "\$1"
|
||||
echo "(\$(date)) Hydration complete" | sudo tee -a "${NVME_ROOT}/logs/qwen36_hydrate.log"
|
||||
EOF
|
||||
sudo chmod 0755 "$HYDRATE_HELPER"
|
||||
|
||||
# 3. Write Watchdog Script
|
||||
WATCHDOG_SCRIPT="/usr/local/bin/desineuron-ollama-watchdog.sh"
|
||||
echo "Creating Watchdog Script map at $WATCHDOG_SCRIPT"
|
||||
sudo tee "$WATCHDOG_SCRIPT" >/dev/null <<EOF
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
MODEL_NAME="${MODEL_NAME}"
|
||||
OLLAMA_URL="http://127.0.0.1:11434"
|
||||
|
||||
if ! systemctl is-active --quiet ollama; then
|
||||
systemctl restart ollama
|
||||
sleep 5
|
||||
fi
|
||||
|
||||
# Try asking Ollama if the tag exists
|
||||
if ! curl -fsS "\$OLLAMA_URL/api/tags" | grep -q "\$MODEL_NAME"; then
|
||||
echo "Expected model \$MODEL_NAME missing. Initiating hydration..."
|
||||
|
||||
# Ensure wiped ephemeral NVMe disks are scaffolded pre-hydration
|
||||
sudo mkdir -p "${NVME_ROOT}/logs" "${NVME_ROOT}/models" "${NVME_ROOT}/state"
|
||||
sudo chown -R ollama:ollama "${NVME_ROOT}"
|
||||
|
||||
/usr/local/bin/desineuron-hydrate-qwen36.sh "\$MODEL_NAME"
|
||||
sleep 5
|
||||
fi
|
||||
|
||||
# Verify final state
|
||||
if curl -fsS "\$OLLAMA_URL/api/tags" | grep -q "\$MODEL_NAME"; then
|
||||
echo "healthy"
|
||||
exit 0
|
||||
else
|
||||
echo "unhealthy: Model \$MODEL_NAME failed to register" >&2
|
||||
exit 1
|
||||
fi
|
||||
EOF
|
||||
sudo chmod 0755 "$WATCHDOG_SCRIPT"
|
||||
|
||||
|
||||
# 4. Write Watchdog Systemd Service & Timer
|
||||
sudo tee "/etc/systemd/system/desineuron-ollama-watchdog.service" >/dev/null <<EOF
|
||||
[Unit]
|
||||
Description=Desineuron GPU Ollama Watchdog for Model $MODEL_NAME
|
||||
After=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
Environment="HOME=/root"
|
||||
ExecStart=$WATCHDOG_SCRIPT
|
||||
EOF
|
||||
|
||||
sudo tee "/etc/systemd/system/desineuron-ollama-watchdog.timer" >/dev/null <<EOF
|
||||
[Unit]
|
||||
Description=Watchdog run for Ollama Model $MODEL_NAME every 5 mins
|
||||
|
||||
[Timer]
|
||||
OnBootSec=2min
|
||||
OnUnitActiveSec=5min
|
||||
Unit=desineuron-ollama-watchdog.service
|
||||
|
||||
[Install]
|
||||
WantedBy=timers.target
|
||||
EOF
|
||||
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable --now desineuron-ollama-watchdog.timer
|
||||
sudo systemctl start desineuron-ollama-watchdog.service
|
||||
|
||||
echo "Ollama Watchdog installed and model $MODEL_NAME setup initiated."
|
||||
sudo systemctl --no-pager status desineuron-ollama-watchdog.timer
|
||||
|
||||
104
infrastructure/desineuron_ingress/install_gpu_sglang_runtime.sh
Normal file
104
infrastructure/desineuron_ingress/install_gpu_sglang_runtime.sh
Normal file
@@ -0,0 +1,104 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
NVME_ROOT="${NVME_ROOT:-/opt/dlami/nvme/sglang}"
|
||||
RUNTIME_ROOT="${RUNTIME_ROOT:-/opt/desineuron-sglang}"
|
||||
VENV_PATH="${RUNTIME_ROOT}/.venv"
|
||||
PORT="${SGLANG_PORT:-30100}"
|
||||
HOST="${SGLANG_HOST:-}"
|
||||
MODEL_ID="${SGLANG_MODEL_ID:-qwen3.6-35b-a3b}"
|
||||
MODEL_PATH="${SGLANG_MODEL_PATH:-/opt/dlami/nvme/models/Qwen-Qwen3.6-35B-A3B-FP8}"
|
||||
TP_SIZE="${SGLANG_TP_SIZE:-4}"
|
||||
CONTEXT_LENGTH="${SGLANG_CONTEXT_LENGTH:-131072}"
|
||||
MEM_FRACTION_STATIC="${SGLANG_MEM_FRACTION_STATIC:-0.88}"
|
||||
ATTENTION_BACKEND="${SGLANG_ATTENTION_BACKEND:-flashinfer}"
|
||||
DIST_INIT_ADDR="${SGLANG_DIST_INIT_ADDR:-127.0.0.1:50000}"
|
||||
|
||||
if [[ -z "${HOST}" ]]; then
|
||||
IMDS_TOKEN="$(curl -fsS -X PUT http://169.254.169.254/latest/api/token -H 'X-aws-ec2-metadata-token-ttl-seconds: 21600' || true)"
|
||||
if [[ -n "${IMDS_TOKEN}" ]]; then
|
||||
HOST="$(curl -fsS -H "X-aws-ec2-metadata-token: ${IMDS_TOKEN}" http://169.254.169.254/latest/meta-data/local-ipv4 || true)"
|
||||
fi
|
||||
fi
|
||||
|
||||
if [[ -z "${HOST}" ]]; then
|
||||
HOST="$(hostname -I | awk '{print $1}')"
|
||||
fi
|
||||
|
||||
if [[ -z "${HOST}" ]]; then
|
||||
echo "Unable to resolve GPU private IP for SGLang host binding" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
sudo mkdir -p "${NVME_ROOT}"/{cache,logs,state} "${RUNTIME_ROOT}"
|
||||
python3 -m venv "${VENV_PATH}"
|
||||
"${VENV_PATH}/bin/pip" install --upgrade pip wheel setuptools
|
||||
"${VENV_PATH}/bin/pip" install "sglang[all]>=0.5.3" flashinfer-python huggingface_hub
|
||||
|
||||
sudo tee /etc/default/desineuron-sglang >/dev/null <<EOF
|
||||
SGLANG_HOST=${HOST}
|
||||
SGLANG_PORT=${PORT}
|
||||
SGLANG_MODEL_ID=${MODEL_ID}
|
||||
SGLANG_MODEL_PATH=${MODEL_PATH}
|
||||
SGLANG_TP_SIZE=${TP_SIZE}
|
||||
SGLANG_CONTEXT_LENGTH=${CONTEXT_LENGTH}
|
||||
SGLANG_MEM_FRACTION_STATIC=${MEM_FRACTION_STATIC}
|
||||
SGLANG_ATTENTION_BACKEND=${ATTENTION_BACKEND}
|
||||
SGLANG_DIST_INIT_ADDR=${DIST_INIT_ADDR}
|
||||
SGLANG_CACHE_DIR=${NVME_ROOT}/cache
|
||||
SGLANG_LOG_DIR=${NVME_ROOT}/logs
|
||||
SGLANG_STATE_DIR=${NVME_ROOT}/state
|
||||
SGLANG_USE_FLASHINFER=1
|
||||
SGLANG_ENABLE_PREFIX_CACHE=1
|
||||
SGLANG_SERVED_MODEL_NAME=${MODEL_ID}
|
||||
SGLANG_EXTRA_ARGS=
|
||||
EOF
|
||||
sudo chmod 600 /etc/default/desineuron-sglang
|
||||
|
||||
sudo tee /usr/local/bin/desineuron-sglang-launch.sh >/dev/null <<'EOF'
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
source /etc/default/desineuron-sglang
|
||||
export HF_HOME="${SGLANG_CACHE_DIR}/hf"
|
||||
export HUGGINGFACE_HUB_CACHE="${SGLANG_CACHE_DIR}/hf"
|
||||
export CUDA_DEVICE_MAX_CONNECTIONS=1
|
||||
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
|
||||
export SGLANG_USE_FLASHINFER="${SGLANG_USE_FLASHINFER}"
|
||||
exec /opt/desineuron-sglang/.venv/bin/sglang serve \
|
||||
--host "${SGLANG_HOST}" \
|
||||
--port "${SGLANG_PORT}" \
|
||||
--model-path "${SGLANG_MODEL_PATH}" \
|
||||
--served-model-name "${SGLANG_SERVED_MODEL_NAME}" \
|
||||
--tp-size "${SGLANG_TP_SIZE}" \
|
||||
--context-length "${SGLANG_CONTEXT_LENGTH}" \
|
||||
--mem-fraction-static "${SGLANG_MEM_FRACTION_STATIC}" \
|
||||
--attention-backend "${SGLANG_ATTENTION_BACKEND}" \
|
||||
--dist-init-addr "${SGLANG_DIST_INIT_ADDR}" \
|
||||
--enable-metrics \
|
||||
--skip-server-warmup \
|
||||
${SGLANG_EXTRA_ARGS}
|
||||
EOF
|
||||
sudo chmod 0755 /usr/local/bin/desineuron-sglang-launch.sh
|
||||
|
||||
sudo tee /etc/systemd/system/desineuron-sglang.service >/dev/null <<EOF
|
||||
[Unit]
|
||||
Description=Desineuron SGLang Runtime
|
||||
After=network-online.target
|
||||
Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
EnvironmentFile=/etc/default/desineuron-sglang
|
||||
WorkingDirectory=${RUNTIME_ROOT}
|
||||
ExecStart=/usr/local/bin/desineuron-sglang-launch.sh
|
||||
Restart=always
|
||||
RestartSec=5
|
||||
LimitNOFILE=1048576
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
EOF
|
||||
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable --now desineuron-sglang.service
|
||||
sudo systemctl --no-pager --full status desineuron-sglang.service
|
||||
@@ -0,0 +1,85 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
sudo tee /usr/local/bin/desineuron-sglang-watchdog.sh >/dev/null <<'EOF'
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
source /etc/default/desineuron-sglang
|
||||
|
||||
HEALTH_URL="http://127.0.0.1:${SGLANG_PORT}/v1/models"
|
||||
HYDRATE_HELPER="/usr/local/bin/desineuron-sglang-hydrate.sh"
|
||||
STARTUP_GRACE_SECONDS="${SGLANG_STARTUP_GRACE_SECONDS:-900}"
|
||||
HEALTH_TIMEOUT_SECONDS="${SGLANG_HEALTH_TIMEOUT_SECONDS:-60}"
|
||||
|
||||
if [[ ! -d "${SGLANG_MODEL_PATH}" ]]; then
|
||||
"${HYDRATE_HELPER}" "${SGLANG_MODEL_ID}" "${SGLANG_MODEL_PATH}"
|
||||
fi
|
||||
|
||||
if ! systemctl is-active --quiet desineuron-sglang.service; then
|
||||
systemctl restart desineuron-sglang.service
|
||||
sleep 10
|
||||
fi
|
||||
|
||||
main_pid="$(systemctl show -p MainPID --value desineuron-sglang.service || true)"
|
||||
if [[ -n "${main_pid}" && "${main_pid}" != "0" ]]; then
|
||||
runtime_age="$(( $(date +%s) - $(stat -c %Y "/proc/${main_pid}" 2>/dev/null || date +%s) ))"
|
||||
if (( runtime_age < STARTUP_GRACE_SECONDS )); then
|
||||
echo "startup_grace"
|
||||
exit 0
|
||||
fi
|
||||
fi
|
||||
|
||||
if ! curl --max-time "${HEALTH_TIMEOUT_SECONDS}" -fsS "${HEALTH_URL}" >/dev/null; then
|
||||
systemctl restart desineuron-sglang.service
|
||||
sleep 20
|
||||
fi
|
||||
|
||||
curl --max-time "${HEALTH_TIMEOUT_SECONDS}" -fsS "${HEALTH_URL}" >/dev/null
|
||||
echo "healthy"
|
||||
EOF
|
||||
sudo chmod 0755 /usr/local/bin/desineuron-sglang-watchdog.sh
|
||||
|
||||
sudo tee /usr/local/bin/desineuron-sglang-hydrate.sh >/dev/null <<'EOF'
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
MODEL_ID="${1:?model id required}"
|
||||
TARGET_PATH="${2:?target path required}"
|
||||
mkdir -p "$(dirname "${TARGET_PATH}")"
|
||||
if command -v hf >/dev/null 2>&1; then
|
||||
hf download "${MODEL_ID}" --local-dir "${TARGET_PATH}" --max-workers 8
|
||||
else
|
||||
python3 - <<PY
|
||||
from huggingface_hub import snapshot_download
|
||||
snapshot_download(repo_id="${MODEL_ID}", local_dir="${TARGET_PATH}", max_workers=8)
|
||||
PY
|
||||
fi
|
||||
EOF
|
||||
sudo chmod 0755 /usr/local/bin/desineuron-sglang-hydrate.sh
|
||||
|
||||
sudo tee /etc/systemd/system/desineuron-sglang-watchdog.service >/dev/null <<EOF
|
||||
[Unit]
|
||||
Description=Desineuron SGLang Runtime Watchdog
|
||||
After=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
ExecStart=/usr/local/bin/desineuron-sglang-watchdog.sh
|
||||
EOF
|
||||
|
||||
sudo tee /etc/systemd/system/desineuron-sglang-watchdog.timer >/dev/null <<EOF
|
||||
[Unit]
|
||||
Description=Run the Desineuron SGLang watchdog every 5 minutes
|
||||
|
||||
[Timer]
|
||||
OnBootSec=2min
|
||||
OnUnitActiveSec=5min
|
||||
Unit=desineuron-sglang-watchdog.service
|
||||
|
||||
[Install]
|
||||
WantedBy=timers.target
|
||||
EOF
|
||||
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable --now desineuron-sglang-watchdog.timer
|
||||
sudo systemctl start desineuron-sglang-watchdog.service
|
||||
sudo systemctl --no-pager --full status desineuron-sglang-watchdog.timer
|
||||
@@ -0,0 +1,35 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
APP_ROOT=/opt/desineuron-llm-route-sync
|
||||
VENV_PATH="$APP_ROOT/.venv"
|
||||
ENV_FILE=/etc/desineuron-llm-route-sync.env
|
||||
SCRIPT_PATH=/usr/local/bin/sync_llm_route.py
|
||||
WRAPPER_PATH=/usr/local/bin/run_llm_route_sync.sh
|
||||
SERVICE_FILE=/etc/systemd/system/desineuron-llm-route-sync.service
|
||||
TIMER_FILE=/etc/systemd/system/desineuron-llm-route-sync.timer
|
||||
|
||||
sudo mkdir -p "$APP_ROOT" /var/lib/desineuron-llm-route-sync
|
||||
python3 -m venv "$VENV_PATH"
|
||||
"$VENV_PATH/bin/pip" install --upgrade pip boto3
|
||||
|
||||
sudo install -m 0755 /tmp/desineuron_ingress/sync_llm_route.py "$SCRIPT_PATH"
|
||||
sudo install -m 0755 /tmp/desineuron_ingress/run_llm_route_sync.sh "$WRAPPER_PATH"
|
||||
sudo install -m 0644 /tmp/desineuron_ingress/desineuron-llm-route-sync.service "$SERVICE_FILE"
|
||||
sudo install -m 0644 /tmp/desineuron_ingress/desineuron-llm-route-sync.timer "$TIMER_FILE"
|
||||
|
||||
sudo tee "$ENV_FILE" >/dev/null <<EOF
|
||||
OPS_ENV_FILE=/opt/desineuron-ops-control-plane/.env
|
||||
LLM_ROUTE_HOSTNAME=llm.desineuron.in
|
||||
LLM_ROUTE_PORT=30100
|
||||
LLM_INSTANCE_TAG_KEY=DesineuronRole
|
||||
LLM_INSTANCE_TAG_VALUE=comfyui
|
||||
LLM_ROUTE_STATE_FILE=/var/lib/desineuron-llm-route-sync/current_target.txt
|
||||
INGRESS_SSH_KEY_PATH=/opt/desineuron-ops-control-plane/state/desineuron-l4-node.pem
|
||||
EOF
|
||||
|
||||
sudo chmod 600 "$ENV_FILE"
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable --now desineuron-llm-route-sync.timer
|
||||
sudo systemctl start desineuron-llm-route-sync.service
|
||||
sudo systemctl --no-pager --full status desineuron-llm-route-sync.service desineuron-llm-route-sync.timer
|
||||
@@ -0,0 +1,94 @@
|
||||
#!/usr/bin/env python3
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
STATE_FILE = Path("/etc/caddy/managed/desineuron-routes.json")
|
||||
SNIPPET_FILE = Path("/etc/caddy/managed/desineuron-routes.caddy")
|
||||
|
||||
|
||||
def load_routes() -> dict[str, dict]:
|
||||
if STATE_FILE.exists():
|
||||
return json.loads(STATE_FILE.read_text(encoding="utf-8"))
|
||||
return {}
|
||||
|
||||
|
||||
def save_routes(routes: dict[str, dict]) -> None:
|
||||
STATE_FILE.parent.mkdir(parents=True, exist_ok=True)
|
||||
STATE_FILE.write_text(json.dumps(routes, indent=2), encoding="utf-8")
|
||||
|
||||
|
||||
def render_routes(routes: dict[str, dict]) -> None:
|
||||
lines: list[str] = []
|
||||
for hostname, route in sorted(routes.items()):
|
||||
lines.extend(
|
||||
[
|
||||
f"{hostname} {{",
|
||||
"\ttls /etc/caddy/tls/fullchain.pem /etc/caddy/tls/privkey.pem",
|
||||
"\tlog {",
|
||||
"\t\toutput file /var/log/caddy/access.log",
|
||||
"\t\tformat json",
|
||||
"\t}",
|
||||
f"\treverse_proxy {route['scheme']}://{route['target_host']}:{route['target_port']} {{",
|
||||
"\t\theader_up Host {host}",
|
||||
"\t\theader_up X-Forwarded-Host {host}",
|
||||
"\t\theader_up X-Forwarded-Proto {scheme}",
|
||||
"\t\theader_up X-Forwarded-For {remote_host}",
|
||||
"\t}",
|
||||
"}",
|
||||
"",
|
||||
]
|
||||
)
|
||||
SNIPPET_FILE.write_text("\n".join(lines).rstrip() + "\n", encoding="utf-8")
|
||||
|
||||
# Generate a dedicated upstream include exclusively for velocity.desineuron.in/llm
|
||||
llm_inc = Path("/etc/caddy/managed/llm_upstream.caddy_inc")
|
||||
if "llm.desineuron.in" in routes:
|
||||
route = routes["llm.desineuron.in"]
|
||||
llm_inc.write_text(
|
||||
f"handle_path /llm/* {{\n"
|
||||
f"\treverse_proxy {route['scheme']}://{route['target_host']}:{route['target_port']} {{\n"
|
||||
f"\t\theader_up Host {{host}}\n"
|
||||
f"\t\theader_up X-Forwarded-For {{remote_host}}\n"
|
||||
f"\t\tflush_interval -1\n"
|
||||
f"\t\theader_down X-Accel-Buffering no\n"
|
||||
f"\t}}\n"
|
||||
f"}}\n",
|
||||
encoding="utf-8",
|
||||
)
|
||||
else:
|
||||
llm_inc.write_text("", encoding="utf-8")
|
||||
|
||||
|
||||
def main() -> int:
|
||||
if len(sys.argv) < 2:
|
||||
print("usage: manage_desineuron_routes.py <upsert|delete|list> [payload|hostname]")
|
||||
return 1
|
||||
command = sys.argv[1]
|
||||
routes = load_routes()
|
||||
if command == "upsert":
|
||||
payload = json.loads(sys.argv[2])
|
||||
routes[payload["hostname"]] = payload
|
||||
save_routes(routes)
|
||||
render_routes(routes)
|
||||
print(json.dumps({"status": "ok", "action": "upsert", "hostname": payload["hostname"]}))
|
||||
return 0
|
||||
if command == "delete":
|
||||
hostname = sys.argv[2]
|
||||
routes.pop(hostname, None)
|
||||
save_routes(routes)
|
||||
render_routes(routes)
|
||||
print(json.dumps({"status": "ok", "action": "delete", "hostname": hostname}))
|
||||
return 0
|
||||
if command == "list":
|
||||
print(json.dumps(routes, indent=2))
|
||||
return 0
|
||||
print(f"unknown command: {command}")
|
||||
return 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
@@ -0,0 +1,34 @@
|
||||
$ErrorActionPreference = "Stop"
|
||||
|
||||
$gpuGroups = @(
|
||||
"sg-0b144c17b1b89f4c6",
|
||||
"sg-05e4de3fe94ad6558"
|
||||
)
|
||||
|
||||
$ingressGroup = "sg-0721b8b48e12c531d"
|
||||
|
||||
try {
|
||||
aws ec2 authorize-security-group-ingress `
|
||||
--group-id "sg-0b144c17b1b89f4c6" `
|
||||
--protocol tcp --port 11434 `
|
||||
--source-group $ingressGroup | Out-Null
|
||||
} catch {
|
||||
}
|
||||
|
||||
foreach ($group in $gpuGroups) {
|
||||
foreach ($port in 11434) {
|
||||
try {
|
||||
aws ec2 revoke-security-group-ingress `
|
||||
--group-id $group `
|
||||
--protocol tcp `
|
||||
--port $port `
|
||||
--cidr 0.0.0.0/0 | Out-Null
|
||||
} catch {
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
aws ec2 describe-security-groups `
|
||||
--group-ids $gpuGroups `
|
||||
--query "SecurityGroups[].{GroupId:GroupId,GroupName:GroupName,Ingress:IpPermissions}" `
|
||||
--output json
|
||||
13
infrastructure/desineuron_ingress/run_llm_route_sync.sh
Normal file
13
infrastructure/desineuron_ingress/run_llm_route_sync.sh
Normal file
@@ -0,0 +1,13 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
APP_ROOT=/opt/desineuron-llm-route-sync
|
||||
SCRIPT_PATH=/usr/local/bin/sync_llm_route.py
|
||||
VENV_PYTHON="$APP_ROOT/.venv/bin/python"
|
||||
|
||||
if [[ ! -x "$VENV_PYTHON" ]]; then
|
||||
echo "Missing route-sync venv python at $VENV_PYTHON" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
exec "$VENV_PYTHON" "$SCRIPT_PATH"
|
||||
42
infrastructure/desineuron_ingress/start_gpu.py
Normal file
42
infrastructure/desineuron_ingress/start_gpu.py
Normal file
@@ -0,0 +1,42 @@
|
||||
import boto3, os, time
|
||||
from pathlib import Path
|
||||
d={}
|
||||
for l in Path('/opt/desineuron-ops-control-plane/.env').read_text().splitlines():
|
||||
if '=' in l and not l.startswith('#'):
|
||||
k,v=l.split('=',1)
|
||||
d[k.strip()]=v.strip()
|
||||
os.environ['AWS_ACCESS_KEY_ID']=d.get('AWS_ACCESS_KEY_ID','')
|
||||
os.environ['AWS_SECRET_ACCESS_KEY']=d.get('AWS_SECRET_ACCESS_KEY','')
|
||||
ec2=boto3.client('ec2', region_name='us-east-1')
|
||||
|
||||
def get_gpu():
|
||||
for r in ec2.describe_instances()['Reservations']:
|
||||
for i in r['Instances']:
|
||||
if any(t['Key'] == 'Name' and t['Value'] == 'desineuron-comfy-gpu' for t in i.get('Tags', [])):
|
||||
return i
|
||||
return None
|
||||
|
||||
def main():
|
||||
while True:
|
||||
i = get_gpu()
|
||||
if not i:
|
||||
print('Not found')
|
||||
break
|
||||
state = i['State']['Name']
|
||||
print(f"Instance {i['InstanceId']} is {state}")
|
||||
if state == 'stopped':
|
||||
print('Starting instance...')
|
||||
ec2.start_instances(InstanceIds=[i['InstanceId']])
|
||||
time.sleep(5)
|
||||
elif state == 'stopping':
|
||||
print('Waiting for extremely aggressive stop sequence gracefully...')
|
||||
time.sleep(10)
|
||||
elif state == 'running':
|
||||
print('Instance successfully running payload on IP:', i.get('PrivateIpAddress'))
|
||||
break
|
||||
else:
|
||||
print('Waiting eagerly...')
|
||||
time.sleep(10)
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
152
infrastructure/desineuron_ingress/sync_llm_route.py
Normal file
152
infrastructure/desineuron_ingress/sync_llm_route.py
Normal file
@@ -0,0 +1,152 @@
|
||||
#!/usr/bin/env python3
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
import boto3
|
||||
|
||||
|
||||
def load_env_file(path: Path) -> dict[str, str]:
|
||||
data: dict[str, str] = {}
|
||||
if not path.exists():
|
||||
return data
|
||||
for line in path.read_text(encoding="utf-8").splitlines():
|
||||
line = line.strip()
|
||||
if not line or line.startswith("#") or "=" not in line:
|
||||
continue
|
||||
key, value = line.split("=", 1)
|
||||
data[key.strip()] = value.strip()
|
||||
return data
|
||||
|
||||
|
||||
def env(name: str, default: str = "") -> str:
|
||||
return os.environ.get(name, default)
|
||||
|
||||
|
||||
def resolve_target_instance(ec2) -> dict | None:
|
||||
explicit_instance_id = env("LLM_INSTANCE_ID")
|
||||
if explicit_instance_id:
|
||||
reservations = ec2.describe_instances(InstanceIds=[explicit_instance_id])["Reservations"]
|
||||
for reservation in reservations:
|
||||
for instance in reservation["Instances"]:
|
||||
if instance["State"]["Name"] == "running":
|
||||
return instance
|
||||
return None
|
||||
|
||||
# We assume the LLM runtime runs on the same GPU instance as comfyui initially
|
||||
tag_key = env("LLM_INSTANCE_TAG_KEY", "DesineuronRole")
|
||||
tag_value = env("LLM_INSTANCE_TAG_VALUE", "comfyui")
|
||||
filters = [
|
||||
{"Name": "instance-state-name", "Values": ["running"]},
|
||||
{"Name": f"tag:{tag_key}", "Values": [tag_value]},
|
||||
]
|
||||
reservations = ec2.describe_instances(Filters=filters)["Reservations"]
|
||||
instances = [instance for reservation in reservations for instance in reservation["Instances"]]
|
||||
if not instances:
|
||||
return None
|
||||
instances.sort(key=lambda row: row["LaunchTime"], reverse=True)
|
||||
return instances[0]
|
||||
|
||||
|
||||
def upsert_route(hostname: str, private_ip: str, port: int) -> subprocess.CompletedProcess[str]:
|
||||
ingress_host = env("INGRESS_SSH_HOST")
|
||||
ingress_user = env("INGRESS_SSH_USER", "ec2-user")
|
||||
ingress_port = env("INGRESS_SSH_PORT", "22")
|
||||
ingress_key = env("INGRESS_SSH_KEY_PATH")
|
||||
helper = env("INGRESS_ROUTE_HELPER", "/usr/local/bin/manage_desineuron_routes.py")
|
||||
payload = json.dumps(
|
||||
{
|
||||
"hostname": hostname,
|
||||
"scheme": "http",
|
||||
"target_host": private_ip,
|
||||
"target_port": port,
|
||||
}
|
||||
)
|
||||
command = (
|
||||
f"sudo {helper} upsert '{payload}'"
|
||||
" && sudo caddy validate --config /etc/caddy/Caddyfile"
|
||||
" && sudo systemctl reload caddy"
|
||||
)
|
||||
return subprocess.run(
|
||||
[
|
||||
"ssh",
|
||||
"-o",
|
||||
"StrictHostKeyChecking=no",
|
||||
"-o",
|
||||
"UserKnownHostsFile=/dev/null",
|
||||
"-i",
|
||||
ingress_key,
|
||||
"-p",
|
||||
ingress_port,
|
||||
f"{ingress_user}@{ingress_host}",
|
||||
command,
|
||||
],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
check=False,
|
||||
)
|
||||
|
||||
|
||||
def main() -> int:
|
||||
ops_env = load_env_file(Path(env("OPS_ENV_FILE", "/opt/desineuron-ops-control-plane/.env")))
|
||||
for key in ["AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY", "AWS_DEFAULT_REGION"]:
|
||||
if key not in os.environ and key in ops_env:
|
||||
os.environ[key] = ops_env[key]
|
||||
os.environ.setdefault("AWS_DEFAULT_REGION", ops_env.get("OPS_DEFAULT_REGION", "us-east-1"))
|
||||
os.environ.setdefault("INGRESS_SSH_HOST", ops_env.get("OPS_INGRESS_SSH_HOST", ""))
|
||||
os.environ.setdefault("INGRESS_SSH_USER", ops_env.get("OPS_INGRESS_SSH_USER", "ec2-user"))
|
||||
os.environ.setdefault("INGRESS_SSH_PORT", ops_env.get("OPS_INGRESS_SSH_PORT", "22"))
|
||||
normalized_key_path = ops_env.get("OPS_SSH_KEY_PATH", "/opt/desineuron-ops-control-plane/state/desineuron-l4-node.pem")
|
||||
if normalized_key_path.startswith("/app/state/"):
|
||||
normalized_key_path = normalized_key_path.replace("/app/state/", "/opt/desineuron-ops-control-plane/state/")
|
||||
os.environ.setdefault("INGRESS_SSH_KEY_PATH", normalized_key_path)
|
||||
os.environ.setdefault("INGRESS_ROUTE_HELPER", ops_env.get("OPS_INGRESS_ROUTE_HELPER", "/usr/local/bin/manage_desineuron_routes.py"))
|
||||
|
||||
region = os.environ["AWS_DEFAULT_REGION"]
|
||||
hostname = env("LLM_ROUTE_HOSTNAME", "llm.desineuron.in")
|
||||
port = int(env("LLM_ROUTE_PORT", "11434"))
|
||||
state_file = Path(env("LLM_ROUTE_STATE_FILE", "/var/lib/desineuron-llm-route-sync/current_target.txt"))
|
||||
|
||||
ec2 = boto3.client("ec2", region_name=region)
|
||||
instance = resolve_target_instance(ec2)
|
||||
if not instance:
|
||||
print("No running LLM target instance found", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
private_ip = instance.get("PrivateIpAddress")
|
||||
if not private_ip:
|
||||
print("Target instance has no private IP", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
desired_state = f"{private_ip}:{port}"
|
||||
current = state_file.read_text(encoding="utf-8").strip() if state_file.exists() else ""
|
||||
if current == desired_state:
|
||||
print(
|
||||
json.dumps(
|
||||
{"status": "noop", "hostname": hostname, "target_host": private_ip, "target_port": port}
|
||||
)
|
||||
)
|
||||
return 0
|
||||
|
||||
result = upsert_route(hostname, private_ip, port)
|
||||
if result.returncode != 0:
|
||||
print(result.stdout)
|
||||
print(result.stderr, file=sys.stderr)
|
||||
return result.returncode
|
||||
|
||||
state_file.parent.mkdir(parents=True, exist_ok=True)
|
||||
state_file.write_text(desired_state, encoding="utf-8")
|
||||
print(
|
||||
json.dumps(
|
||||
{"status": "updated", "hostname": hostname, "target_host": private_ip, "target_port": port}
|
||||
)
|
||||
)
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
21
infrastructure/desineuron_ingress/update_ingress_tls.sh
Normal file
21
infrastructure/desineuron_ingress/update_ingress_tls.sh
Normal file
@@ -0,0 +1,21 @@
|
||||
#!/bin/bash
|
||||
set -ex
|
||||
|
||||
# Push the Caddyfile configuration
|
||||
sudo scp -o StrictHostKeyChecking=no -i /opt/desineuron-ops-control-plane/state/desineuron-l4-node.pem /tmp/Caddyfile ec2-user@98.87.120.120:/tmp/Caddyfile
|
||||
sudo ssh -o StrictHostKeyChecking=no -i /opt/desineuron-ops-control-plane/state/desineuron-l4-node.pem ec2-user@98.87.120.120 'sudo cp /tmp/Caddyfile /etc/caddy/Caddyfile'
|
||||
|
||||
# Fix cloudflare token
|
||||
sudo mkdir -p /etc/letsencrypt/.secrets/
|
||||
echo "dns_cloudflare_api_token = O1CyZ45txLgTXu04KAGTJmZ6CENZZtQIlIxUMXVL" | sudo tee /etc/letsencrypt/.secrets/cloudflare.ini > /dev/null
|
||||
sudo chmod 600 /etc/letsencrypt/.secrets/cloudflare.ini
|
||||
|
||||
# Renew and expand Let's Encrypt certificates locally on velocity-linux utilizing cloudflare dns
|
||||
sudo certbot certonly --cert-name desineuron-infra --dns-cloudflare --dns-cloudflare-credentials /etc/letsencrypt/.secrets/cloudflare.ini -d '*.desineuron.in' -d desineuron.in --expand --non-interactive --agree-tos
|
||||
|
||||
# Copy the fresh certs directly to the proxy substrate
|
||||
sudo scp -o StrictHostKeyChecking=no -i /opt/desineuron-ops-control-plane/state/desineuron-l4-node.pem /etc/letsencrypt/live/desineuron-infra/fullchain.pem ec2-user@98.87.120.120:/tmp/fullchain.pem
|
||||
sudo scp -o StrictHostKeyChecking=no -i /opt/desineuron-ops-control-plane/state/desineuron-l4-node.pem /etc/letsencrypt/live/desineuron-infra/privkey.pem ec2-user@98.87.120.120:/tmp/privkey.pem
|
||||
|
||||
# Apply to Caddy
|
||||
sudo ssh -o StrictHostKeyChecking=no -i /opt/desineuron-ops-control-plane/state/desineuron-l4-node.pem ec2-user@98.87.120.120 'sudo cp /tmp/fullchain.pem /etc/caddy/tls/fullchain.pem && sudo cp /tmp/privkey.pem /etc/caddy/tls/privkey.pem && sudo systemctl reload caddy'
|
||||
@@ -11,6 +11,17 @@ server {
|
||||
access_log /var/log/nginx/velocity.desineuron.in.access.log;
|
||||
error_log /var/log/nginx/velocity.desineuron.in.error.log;
|
||||
|
||||
location /api/ {
|
||||
proxy_pass http://127.0.0.1:8001;
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Upgrade $http_upgrade;
|
||||
proxy_set_header Connection "upgrade";
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
}
|
||||
|
||||
location / {
|
||||
try_files $uri $uri/ /index.html;
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user