# Project Velocity — Truthbook > **What this is:** The single source of truth for Project Velocity. If it's written down here, it's how the system works — not how someone hoped it would work. --- ## Table of Contents 1. [What Is Project Velocity](#what-is-project-velocity) 2. [Quick Start](#quick-start) 3. [Architecture Overview](#architecture-overview) 4. [Runtime Truth](#runtime-truth) 5. [Team Setup](#team-setup) 6. [GPU & Model Runtime](#gpu--model-runtime) 7. [Infrastructure](#infrastructure) 8. [Runbooks](#runbooks) 9. [API Reference](#api-reference) 10. [Contributing](#contributing) --- ## What Is Project Velocity Project Velocity is a multi-agent AI development platform. It orchestrates intelligent agents (powered by Qwen 3.6 35B A3B and other models) to collaborate on software engineering tasks — code generation, review, testing, deployment — as a coordinated team rather than isolated tools. **Why it exists:** Single-agent coding tools hit a ceiling. They lack context persistence, cross-task coordination, and operational reliability. Velocity solves this by: - **Multi-agent collaboration** — Agents communicate via WebSocket channels and shared memory - **Persistent state** — PostgreSQL backs user data, CRM records, and agent memory - **GPU-accelerated inference** — Local Ollama runtime on NVIDIA GPU hardware - **Role-based access control** — Admin and standard user tiers with avatar support - **Live event broadcasting** — Real-time campaign and catalyst events via WebSocket **Core stack:** | Layer | Technology | |-------|-----------| | Backend API | Python / FastAPI | | Database | PostgreSQL (via `databases` library with connection pooling) | | Frontend | React 19 + TypeScript + Vite + Tailwind CSS + Framer Motion | | Inference | Ollama (Qwen 3.6 35B A3B primary model) | | Real-time | WebSocket (Catalyst channel, CRM channel) | | Deployment | systemd services on Linux with NVIDIA GPU | --- ## Quick Start ### Prerequisites - **GPU Machine:** NVIDIA GPU with sufficient VRAM (≥16GB recommended for Qwen 3.6 35B A3B) - **NVMe Storage:** For model weights and cache - **Linux OS:** Ubuntu 22.04+ or equivalent - **Python 3.11+:** Backend runtime - **Node.js 18+:** Frontend build - **Ollama:** Latest stable with Qwen 3.6 35B A3B model pulled - **PostgreSQL 15+:** Database backend ### One-Line Bootstrap ```bash bash bootstrap/setup.sh ``` This script handles: 1. GPU driver verification 2. Ollama installation and model pull 3. PostgreSQL setup 4. Backend dependency installation 5. Frontend dependency installation 6. systemd service creation ### Manual Setup #### 1. GPU & Ollama ```bash # Verify GPU nvidia-smi # Install Ollama curl -fsSL https://ollama.ai/install.sh | sh # Pull the primary model ollama pull qwen3.6:35b-a3b # Verify model is loaded curl http://localhost:11434/api/tags | jq '.models[] | select(.name == "qwen3.6:35b-a3b")' ``` #### 2. Database ```bash # Start PostgreSQL sudo systemctl start postgresql # Create database and user psql -U postgres -c "CREATE DATABASE velocity;" psql -U postgres -c "CREATE USER velocity WITH PASSWORD 'secure_password';" psql -U postgres -c "GRANT ALL PRIVILEGES ON DATABASE velocity TO velocity;" ``` #### 3. Backend ```bash cd Project_Velocity/backend # Install dependencies pip install -r requirements.txt # Configure environment cp .env.example .env # Edit .env with your database credentials and secrets # Run migrations python migrate.py # Start server uvicorn main:app --host 0.0.0.0 --port 8000 ``` #### 4. Frontend ```bash cd Project_Velocity/app # Install dependencies npm install # Start dev server npm run dev ``` Frontend is now available at `http://localhost:5173`. #### 5. Verify Everything ```bash # Backend health curl http://localhost:8000/health # Model availability curl http://localhost:11434/api/tags # Frontend open http://localhost:5173 ``` --- ## Architecture Overview ### System Diagram ``` ┌─────────────┐ ┌──────────────┐ ┌─────────────┐ │ React UI │────▶│ FastAPI │────▶│ PostgreSQL │ │ (Port 5173)│◀────│ (Port 8000) │◀────│ (Port 5432)│ └─────────────┘ └──────┬───────┘ └─────────────┘ │ ▼ ┌──────────────┐ │ Ollama │ │ (Port 11434) │ │ Qwen 3.6 35B │ └──────────────┘ │ ▼ ┌──────────────┐ │ NVIDIA GPU │ └──────────────┘ ``` ### Component Breakdown #### Backend (`backend/`) [`main.py`](Project_Velocity/backend/main.py) — FastAPI application with: - **Auth system** — Login, profile lookup, user listing, avatar upload - **WebSocket managers** — [`_CatalystManager()`](Project_Velocity/backend/main.py:296) and [`_CRMManager()`](Project_Velocity/backend/main.py:320) for real-time event broadcasting - **Connection pooling** — PostgreSQL via `databases` library with async context management - **Lifespan hooks** — [`lifespan()`](Project_Velocity/backend/main.py:83) initializes and cleans up resources Key endpoints: | Endpoint | Method | Purpose | |----------|--------|---------| | `/api/auth/login` | POST | Authenticate user | | `/api/auth/me` | GET | Get current user profile | | `/api/auth/users` | GET | List all users (admin) | | `/api/auth/profile/avatar` | POST | Upload profile avatar | | `/ws/catalyst` | WS | Catalyst event channel | | `/ws/crm` | WS | CRM event channel | | `/health` | GET | Health check | #### Frontend (`app/`) [`App.tsx`](Project_Velocity/app/src/App.tsx) — React application with: - **Protected routes** — [`ProtectedRoute()`](Project_Velocity/app/src/App.tsx:66) wraps authenticated paths - **Route module sync** — [`RouteModuleSync()`](Project_Velocity/app/src/App.tsx:90) handles dynamic route loading - **Main layout** — [`MainLayout()`](Project_Velocity/app/src/App.tsx:90) provides chrome (header, sidebar, content area) - **Role rendering** — [`formatRoleLabel()`](Project_Velocity/app/src/App.tsx:379) converts role codes to display labels - **Auth state management** — Dual `useEffect` hooks handle token persistence and user fetch #### Agent Context (`.Agent Context/`) Documents that define how agents operate within Velocity: - [`Qwen 3.6 35B A3B Ollama Access, Recovery, and Team Setup.md`](Project_Velocity/.Agent%20Context/Qwen%203.6%2035B%20A3B%20Ollama%20Access,%20Recovery,%20and%20Team%20Setup.md) — Model runtime, recovery policies, team onboarding - `README.md` — This file #### Infrastructure (`.Infrastructure/`) Deployment and operational documentation: - systemd unit files for backend, frontend, Ollama services - Network configuration and ingress rules - Monitoring and alerting setup --- ## Runtime Truth ### What "Works" Means in Velocity Velocity has three runtime layers, each with different failure modes: #### Layer A: Fast Runtime Recovery If the API crashes or restarts: - PostgreSQL connection pool rebuilds automatically via [`lifespan()`](Project_Velocity/backend/main.py:83) - WebSocket managers reinitialize and accept new connections - No data loss — all state is in PostgreSQL #### Layer B: Model Rehydration Recovery If Ollama loses the Qwen model: - Watchdog systemd unit detects absence via `/api/tags` - Auto-registers model from NVMe cache or S3 artifact storage - **Production requirement:** Same-run auto-hydration logic must complete before any agent request #### Layer C: Full System Recovery If everything goes down: 1. PostgreSQL recovers WAL logs 2. Ollama watchdog restores model 3. Backend systemd unit restarts API 4. Frontend rebuilds if artifacts are corrupted ### Critical Contracts **Auth contract:** ``` Client → POST /api/auth/login {email, password} → 200 OK {token, user} Client → GET /api/auth/me (Authorization: Bearer ) → 200 OK {id, email, role, avatar_url} → 401 Unauthorized ``` **WebSocket contract:** ``` Client → WS /ws/catalyst → Accepts live events: {event_type, campaign_name, value, timestamp} Client → WS /ws/crm → Accepts CRM events: {type, payload, timestamp} ``` **Model contract:** ``` Ollama → GET /api/tags returns qwen3.6:35b-a3b → Context window: 131072 tokens → Provider: OpenAI-compatible interface at http://localhost:11434/v1 ``` --- ## Team Setup ### Developer Onboarding #### 1. Clone & Bootstrap ```bash git clone cd Project_Velocity bash bootstrap/setup.sh ``` #### 2. VS Code / Roo Code Configuration Edit `.vscode/settings.json`: ```json { "roo-cline.provider": "openai-compatible", "roo-cline.baseUrl": "http://localhost:11434/v1", "roo-cline.modelId": "qwen3.6:35b-a3b", "roo-cline.contextWindow": 131072, "roo-cline.temperature": 0.7 } ``` #### 3. Verify Team Access ```bash # Backend health curl http://localhost:8000/health # Expected: {"status": "ok"} # Model loaded curl http://localhost:11434/api/tags | jq -r '.models[].name' # Expected: qwen3.6:35b-a3b # Frontend open http://localhost:5173 # Expected: Login screen ``` ### Role Definitions | Role | Access Level | Can Do | |------|-------------|--------| | `admin` | Full | User management, system config, agent orchestration | | `developer` | Standard | Code generation, review, testing | | `viewer` | Read-only | Dashboard, campaign monitoring | ### Performance Expectations | Scenario | Tokens/sec | Latency | |----------|-----------|---------| | Single-stream (local GPU) | ~80-120 tok/s | ~200ms first token | | Two concurrent requests | ~60-90 tok/s each | ~300ms first token | | Four-way batch | ~40-60 tok/s each | ~500ms first token | *Numbers vary by GPU hardware. Measure your setup.* --- ## GPU & Model Runtime ### Hardware Requirements | Component | Minimum | Recommended | |-----------|---------|-------------| | GPU VRAM | 16GB | 24GB+ | | GPU Compute | Turing architecture | Ada Lovelace / Hopper | | NVMe Storage | 50GB free | 100GB+ NVMe Gen4 | | RAM | 32GB | 64GB+ | ### Ollama Watchdog The watchdog is a systemd-managed service that ensures the Qwen model stays loaded: **Location:** `.Infrastructure/systemd/ollama-watchdog.service` **Behavior:** 1. Every 60 seconds, queries `http://localhost:11434/api/tags` 2. If `qwen3.6:35b-a3b` is absent, triggers rehydration 3. Rehydration priority: NVMe cache → S3 artifact → remote pull 4. Logs all actions to journalctl **Manual watchdog check:** ```bash sudo systemctl status ollama-watchdog journalctl -u ollama-watchdog --since "1 hour ago" ``` ### Model Hydration Strategies | Strategy | Speed | Use Case | |----------|-------|----------| | NVMe local registration | ~2 seconds | Primary recovery path | | Local manifest `ollama create` | ~5 seconds | Fresh hydration from extracted weights | | S3 cold hydrate | ~60-300 seconds | No local cache available | ### Critical: What Watchdog Must NOT Do - ❌ Delete model layers during recovery - ❌ Modify GPU memory directly - ❌ Block agent requests during hydration (graceful degradation only) - ❌ Restart Ollama process unless absolutely necessary --- ## Infrastructure ### Deployment Topology ``` ┌─────────────────────────────────────────────────┐ │ Production Host │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │ │ │ Backend │ │ Frontend │ │ Ollama │ │ │ │ :8000 │ │ :5173 │ │ :11434 │ │ │ │ systemd │ │ nginx │ │ systemd │ │ │ └────┬─────┘ └────┬─────┘ └──────┬───────┘ │ │ │ │ │ │ │ └─────────────┴───────────────┘ │ │ │ │ │ ┌──────▼───────┐ │ │ │ PostgreSQL │ │ │ │ :5432 │ │ │ │ systemd │ │ │ └──────────────┘ │ │ │ │ ┌──────────────────────────────────────────┐ │ │ │ NVIDIA GPU (CUDA + TensorRT) │ │ │ └──────────────────────────────────────────┘ │ └─────────────────────────────────────────────────┘ ``` ### systemd Services | Service | File | Restart Policy | |---------|------|---------------| | Backend API | `velocity-backend.service` | always | | Frontend (nginx) | `velocity-frontend.service` | always | | Ollama | `ollama.service` | on-failure | | Watchdog | `ollama-watchdog.service` | always | | PostgreSQL | `postgresql.service` | on-failure | ### Network Rules | Port | Protocol | Service | External Access | |------|----------|---------|-----------------| | 80 | HTTP | nginx → frontend | Yes (public) | | 443 | HTTPS | nginx → frontend | Yes (public) | | 8000 | TCP | FastAPI backend | No (internal only) | | 5173 | TCP | Vite dev server | No (dev only) | | 5432 | TCP | PostgreSQL | No (internal only) | | 11434 | TCP | Ollama API | No (internal only) | ### Monitoring ```bash # All service health systemctl status velocity-backend ollama postgresql # GPU utilization nvidia-smi -l 1 # Model inference logs journalctl -u ollama -f # API error rate curl -s http://localhost:8000/health | jq . ``` --- ## Runbooks ### Runbook: Backend Crashes at 2 AM **Symptom:** Frontend shows 500 errors on API calls. **Steps:** ```bash # 1. Check backend status sudo systemctl status velocity-backend # Expected: active (running) # 2. If stopped, restart sudo systemctl restart velocity-backend # 3. Check logs for root cause sudo journalctl -u velocity-backend --since "30 minutes ago" --no-pager # 4. Verify recovery curl http://localhost:8000/health # Expected: {"status": "ok"} # 5. If crash repeats, check database connectivity psql -U velocity -d velocity -c "SELECT 1;" # Expected: 1 ``` **If still broken:** 1. Check disk space: `df -h /` 2. Check memory: `free -h` 3. Check PostgreSQL: `sudo systemctl status postgresql` 4. Escalate with logs from step 3 --- ### Runbook: Ollama Model Disappeared **Symptom:** Agents return empty responses or errors. **Steps:** ```bash # 1. Check if Ollama is running sudo systemctl status ollama # Expected: active (running) # 2. Check loaded models curl http://localhost:11434/api/tags | jq '.models[].name' # Expected: qwen3.6:35b-a3b # 3. If model is missing, check watchdog sudo systemctl status ollama-watchdog journalctl -u ollama-watchdog --since "1 hour ago" --no-pager # 4. Manual recovery if watchdog failed ollama pull qwen3.6:35b-a3b # 5. Verify model is usable curl http://localhost:11434/api/generate -d '{ "model": "qwen3.6:35b-a3b", "prompt": "Hello", "stream": false }' | jq .done # Expected: true ``` --- ### Runbook: Database Connection Failures **Symptom:** Backend logs show `connection refused` or `pool exhausted`. **Steps:** ```bash # 1. Check PostgreSQL status sudo systemctl status postgresql # Expected: active (running) # 2. Check connection count psql -U postgres -c "SELECT count(*) FROM pg_stat_activity;" # Should be < max_connections (default 100) # 3. Check disk space for WAL files df -h /var/lib/postgresql # 4. Restart if hung sudo systemctl restart postgresql # 5. Verify backend reconnects sudo journalctl -u velocity-backend --since "1 minute ago" | grep -i "connected\|error" ``` --- ### Runbook: GPU Memory Exhaustion **Symptom:** Ollama returns `out of memory` errors. **Steps:** ```bash # 1. Check current GPU usage nvidia-smi # Note: PID, memory usage, temperature # 2. Kill non-essential GPU processes if needed nvidia-smi --id=0 --query-compute-apps=pid,name,used_memory --format=csv kill # 3. Check Ollama memory allocation ollama show qwen3.6:35b-a3b | grep -i "layer\|memory" # 4. If still exhausted, reduce model quantization ollama pull qwen3.6:35b-a3b-q4_0 # 5. Monitor recovery watch -n 1 nvidia-smi ``` --- ## API Reference ### Auth Endpoints #### `POST /api/auth/login` Authenticate a user and receive a JWT token. **Request:** ```json { "email": "user@example.com", "password": "secure_password" } ``` **Response (200 OK):** ```json { "token": "eyJhbGciOiJIUzI1NiIs...", "user": { "id": "uuid-here", "email": "user@example.com", "role": "developer", "avatar_url": null } } ``` **Errors:** | Status | Meaning | |--------|---------| | 401 | Invalid credentials | | 422 | Malformed request body | --- #### `GET /api/auth/me` Get the current authenticated user's profile. **Headers:** ``` Authorization: Bearer ``` **Response (200 OK):** ```json { "id": "uuid-here", "email": "user@example.com", "role": "developer", "avatar_url": "https://cdn.example.com/avatars/user.png" } ``` **Errors:** | Status | Meaning | |--------|---------| | 401 | Token missing or invalid | | 403 | Token expired | --- #### `GET /api/auth/users` List all users in the system. Admin only. **Headers:** ``` Authorization: Bearer ``` **Response (200 OK):** ```json [ { "id": "uuid-1", "email": "admin@example.com", "role": "admin", "avatar_url": null }, { "id": "uuid-2", "email": "dev@example.com", "role": "developer", "avatar_url": "https://cdn.example.com/avatars/dev.png" } ] ``` **Errors:** | Status | Meaning | |--------|---------| | 403 | User is not admin | --- #### `POST /api/auth/profile/avatar` Upload a profile avatar image. **Headers:** ``` Authorization: Bearer Content-Type: multipart/form-data ``` **Form Data:** | Field | Type | Required | |-------|------|----------| | avatar | file (image/jpeg, image/png) | Yes | **Response (200 OK):** ```json { "avatar_url": "https://cdn.example.com/avatars/new-avatar.png" } ``` **Errors:** | Status | Meaning | |--------|---------| | 401 | Not authenticated | | 422 | Invalid file type or size > 5MB | --- ### WebSocket Endpoints #### `WS /ws/catalyst` Real-time channel for Catalyst events (agent coordination, task updates). **Connection:** ```javascript const ws = new WebSocket('ws://localhost:8000/ws/catalyst'); ws.onmessage = (event) => { const data = JSON.parse(event.data); console.log(data.event_type, data.campaign_name, data.value); }; ``` **Event Format:** ```json { "event_type": "task_complete", "campaign_name": "codegen-sprint-42", "value": 0.97, "timestamp": "2026-04-21T16:00:00Z" } ``` --- #### `WS /ws/crm` Real-time channel for CRM events (customer interactions, lead updates). **Connection:** ```javascript const ws = new WebSocket('ws://localhost:8000/ws/crm'); ws.onmessage = (event) => { const data = JSON.parse(event.data); console.log(data.type, data.payload); }; ``` **Event Format:** ```json { "type": "lead_created", "payload": { "id": "crm-uuid", "name": "Acme Corp", "status": "new" }, "timestamp": "2026-04-21T16:00:00Z" } ``` --- ### Health Check #### `GET /health` Verify system health. **Response (200 OK):** ```json { "status": "ok", "database": "connected", "ollama": "available", "gpu": "present" } ``` --- ## Contributing ### Code Structure ``` Project_Velocity/ ├── .Agent Context/ # Agent documentation, model specs ├── .Infrastructure/ # Deployment configs, systemd units ├── backend/ # FastAPI backend │ ├── main.py # Application entry point │ ├── requirements.txt # Python dependencies │ └── migrate.py # Database migrations ├── app/ # React frontend │ ├── src/ │ │ ├── App.tsx # Root component │ │ └── ... # Components, routes, utils │ ├── package.json # Node dependencies │ └── vite.config.ts # Build config ├── bootstrap/ # Setup scripts │ └── setup.sh # One-line bootstrap └── README.md # This file ``` ### Making a Contribution 1. **Fork and branch** ```bash git checkout -b feature/your-feature-name ``` 2. **Make changes** - Backend: Follow FastAPI conventions, add type hints - Frontend: Follow React + TypeScript patterns, use existing components - Docs: Update this README if behavior changes 3. **Test locally** ```bash # Backend tests cd backend && pytest # Frontend checks cd app && npm run build ``` 4. **Submit PR** - Title: Clear, action-oriented - Description: What + Why + How to test - Link any related issues ### Documentation Standards - **Every endpoint:** Document inputs, outputs, errors - **Every component:** JSDoc for public APIs - **Every runbook:** Write as if for on-call at 2am - **Every decision:** Record in `DECISIONS.md` with rationale --- ## Appendix ### A. Environment Variables | Variable | Required | Description | |----------|----------|-------------| | `DATABASE_URL` | Yes | PostgreSQL connection string | | `SECRET_KEY` | Yes | JWT signing key | | `OLLAMA_BASE_URL` | No | Ollama API URL (default: `http://localhost:11434`) | | `GPU_ENABLED` | No | Enable GPU path (default: `true`) | | `LOG_LEVEL` | No | Logging level (default: `INFO`) | ### B. Troubleshooting Matrix | Symptom | Likely Cause | Fix | |---------|-------------|-----| | Frontend blank screen | Backend down | `curl http://localhost:8000/health` | | 401 on all calls | Token expired | Re-login | | Agent returns empty | Model unloaded | `ollama pull qwen3.6:35b-a3b` | | Slow responses | GPU not used | Check `nvidia-smi`, verify CUDA | | Database errors | Pool exhausted | Check `max_connections`, restart backend | | WebSocket disconnects | Network issue | Check firewall, reverse proxy config | ### C. Useful Commands Cheat Sheet ```bash # Full system status systemctl status velocity-backend ollama postgresql ollama-watchdog # GPU实时监控 watch -n 1 nvidia-smi # Model check curl http://localhost:11434/api/tags | jq '.models[].name' # API health curl -s http://localhost:8000/health | jq . # Database connection test psql -U velocity -d velocity -c "SELECT version();" # Frontend rebuild cd app && npm run build && cp -r dist/* ../nginx/html/ # Restart everything (nuclear option) sudo systemctl restart velocity-backend ollama postgresql ``` --- > **Last verified:** 2026-04-21 > **Maintained by:** Velocity Team > **If this doc is wrong, the system is broken. Fix the doc first.**