Project_Velocity/.Agent Context/README.md

# Project Velocity — Truthbook

> **What this is:** The single source of truth for Project Velocity. If it's written down here, it's how the system works — not how someone hoped it would work.

---

## Table of Contents

1. [What Is Project Velocity](#what-is-project-velocity)
2. [Quick Start](#quick-start)
3. [Architecture Overview](#architecture-overview)
4. [Runtime Truth](#runtime-truth)
5. [Team Setup](#team-setup)
6. [GPU & Model Runtime](#gpu--model-runtime)
7. [Infrastructure](#infrastructure)
8. [Runbooks](#runbooks)
9. [API Reference](#api-reference)
10. [Contributing](#contributing)

---

## What Is Project Velocity

Project Velocity is a multi-agent AI development platform. It orchestrates intelligent agents (powered by Qwen 3.6 35B A3B and other models) to collaborate on software engineering tasks — code generation, review, testing, deployment — as a coordinated team rather than isolated tools.

**Why it exists:** Single-agent coding tools hit a ceiling. They lack context persistence, cross-task coordination, and operational reliability. Velocity solves this by:

- **Multi-agent collaboration** — Agents communicate via WebSocket channels and shared memory
- **Persistent state** — PostgreSQL backs user data, CRM records, and agent memory
- **GPU-accelerated inference** — Local Ollama runtime on NVIDIA GPU hardware
- **Role-based access control** — Admin and standard user tiers with avatar support
- **Live event broadcasting** — Real-time campaign and catalyst events via WebSocket

**Core stack:**

| Layer | Technology |
|-------|-----------|
| Backend API | Python / FastAPI |
| Database | PostgreSQL (via `databases` library with connection pooling) |
| Frontend | React 19 + TypeScript + Vite + Tailwind CSS + Framer Motion |
| Inference | Ollama (Qwen 3.6 35B A3B primary model) |
| Real-time | WebSocket (Catalyst channel, CRM channel) |
| Deployment | systemd services on Linux with NVIDIA GPU |

---

## Quick Start

### Prerequisites

- **GPU Machine:** NVIDIA GPU with sufficient VRAM (≥16GB recommended for Qwen 3.6 35B A3B)
- **NVMe Storage:** For model weights and cache
- **Linux OS:** Ubuntu 22.04+ or equivalent
- **Python 3.11+:** Backend runtime
- **Node.js 18+:** Frontend build
- **Ollama:** Latest stable with Qwen 3.6 35B A3B model pulled
- **PostgreSQL 15+:** Database backend

### One-Line Bootstrap

```bash
bash bootstrap/setup.sh
```

This script handles:
1. GPU driver verification
2. Ollama installation and model pull
3. PostgreSQL setup
4. Backend dependency installation
5. Frontend dependency installation
6. systemd service creation

### Manual Setup

#### 1. GPU & Ollama

```bash
# Verify GPU
nvidia-smi

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull the primary model
ollama pull qwen3.6:35b-a3b

# Verify model is loaded
curl http://localhost:11434/api/tags | jq '.models[] | select(.name == "qwen3.6:35b-a3b")'
```

#### 2. Database

```bash
# Start PostgreSQL
sudo systemctl start postgresql

# Create database and user
psql -U postgres -c "CREATE DATABASE velocity;"
psql -U postgres -c "CREATE USER velocity WITH PASSWORD 'secure_password';"
psql -U postgres -c "GRANT ALL PRIVILEGES ON DATABASE velocity TO velocity;"
```

#### 3. Backend

```bash
cd Project_Velocity/backend

# Install dependencies
pip install -r requirements.txt

# Configure environment
cp .env.example .env
# Edit .env with your database credentials and secrets

# Run migrations
python migrate.py

# Start server
uvicorn main:app --host 0.0.0.0 --port 8000
```

#### 4. Frontend

```bash
cd Project_Velocity/app

# Install dependencies
npm install

# Start dev server
npm run dev
```

Frontend is now available at `http://localhost:5173`.

#### 5. Verify Everything

```bash
# Backend health
curl http://localhost:8000/health

# Model availability
curl http://localhost:11434/api/tags

# Frontend
open http://localhost:5173
```

---

## Architecture Overview

### System Diagram

```
┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│   React UI  │────▶│  FastAPI     │────▶│  PostgreSQL │
│  (Port 5173)│◀────│  (Port 8000) │◀────│  (Port 5432)│
└─────────────┘     └──────┬───────┘     └─────────────┘
                           │
                           ▼
                    ┌──────────────┐
                    │   Ollama     │
                    │ (Port 11434) │
                    │ Qwen 3.6 35B │
                    └──────────────┘
                           │
                           ▼
                    ┌──────────────┐
                    │  NVIDIA GPU  │
                    └──────────────┘
```

### Component Breakdown

#### Backend (`backend/`)

[`main.py`](Project_Velocity/backend/main.py) — FastAPI application with:

- **Auth system** — Login, profile lookup, user listing, avatar upload
- **WebSocket managers** — [`_CatalystManager()`](Project_Velocity/backend/main.py:296) and [`_CRMManager()`](Project_Velocity/backend/main.py:320) for real-time event broadcasting
- **Connection pooling** — PostgreSQL via `databases` library with async context management
- **Lifespan hooks** — [`lifespan()`](Project_Velocity/backend/main.py:83) initializes and cleans up resources

Key endpoints:

| Endpoint | Method | Purpose |
|----------|--------|---------|
| `/api/auth/login` | POST | Authenticate user |
| `/api/auth/me` | GET | Get current user profile |
| `/api/auth/users` | GET | List all users (admin) |
| `/api/auth/profile/avatar` | POST | Upload profile avatar |
| `/ws/catalyst` | WS | Catalyst event channel |
| `/ws/crm` | WS | CRM event channel |
| `/health` | GET | Health check |

#### Frontend (`app/`)

[`App.tsx`](Project_Velocity/app/src/App.tsx) — React application with:

- **Protected routes** — [`ProtectedRoute()`](Project_Velocity/app/src/App.tsx:66) wraps authenticated paths
- **Route module sync** — [`RouteModuleSync()`](Project_Velocity/app/src/App.tsx:90) handles dynamic route loading
- **Main layout** — [`MainLayout()`](Project_Velocity/app/src/App.tsx:90) provides chrome (header, sidebar, content area)
- **Role rendering** — [`formatRoleLabel()`](Project_Velocity/app/src/App.tsx:379) converts role codes to display labels
- **Auth state management** — Dual `useEffect` hooks handle token persistence and user fetch

#### Agent Context (`.Agent Context/`)

Documents that define how agents operate within Velocity:

- [`Qwen 3.6 35B A3B Ollama Access, Recovery, and Team Setup.md`](Project_Velocity/.Agent%20Context/Qwen%203.6%2035B%20A3B%20Ollama%20Access,%20Recovery,%20and%20Team%20Setup.md) — Model runtime, recovery policies, team onboarding
- `README.md` — This file

#### Infrastructure (`.Infrastructure/`)

Deployment and operational documentation:

- systemd unit files for backend, frontend, Ollama services
- Network configuration and ingress rules
- Monitoring and alerting setup

---

## Runtime Truth

### What "Works" Means in Velocity

Velocity has three runtime layers, each with different failure modes:

#### Layer A: Fast Runtime Recovery

If the API crashes or restarts:
- PostgreSQL connection pool rebuilds automatically via [`lifespan()`](Project_Velocity/backend/main.py:83)
- WebSocket managers reinitialize and accept new connections
- No data loss — all state is in PostgreSQL

#### Layer B: Model Rehydration Recovery

If Ollama loses the Qwen model:
- Watchdog systemd unit detects absence via `/api/tags`
- Auto-registers model from NVMe cache or S3 artifact storage
- **Production requirement:** Same-run auto-hydration logic must complete before any agent request

#### Layer C: Full System Recovery

If everything goes down:
1. PostgreSQL recovers WAL logs
2. Ollama watchdog restores model
3. Backend systemd unit restarts API
4. Frontend rebuilds if artifacts are corrupted

### Critical Contracts

**Auth contract:**
```
Client → POST /api/auth/login {email, password}
       → 200 OK {token, user}

Client → GET /api/auth/me (Authorization: Bearer <token>)
       → 200 OK {id, email, role, avatar_url}
       → 401 Unauthorized
```

**WebSocket contract:**
```
Client → WS /ws/catalyst
       → Accepts live events: {event_type, campaign_name, value, timestamp}

Client → WS /ws/crm
       → Accepts CRM events: {type, payload, timestamp}
```

**Model contract:**
```
Ollama → GET /api/tags returns qwen3.6:35b-a3b
       → Context window: 131072 tokens
       → Provider: OpenAI-compatible interface at http://localhost:11434/v1
```

---

## Team Setup

### Developer Onboarding

#### 1. Clone & Bootstrap

```bash
git clone <repo-url>
cd Project_Velocity
bash bootstrap/setup.sh
```

#### 2. VS Code / Roo Code Configuration

Edit `.vscode/settings.json`:

```json
{
  "roo-cline.provider": "openai-compatible",
  "roo-cline.baseUrl": "http://localhost:11434/v1",
  "roo-cline.modelId": "qwen3.6:35b-a3b",
  "roo-cline.contextWindow": 131072,
  "roo-cline.temperature": 0.7
}
```

#### 3. Verify Team Access

```bash
# Backend health
curl http://localhost:8000/health
# Expected: {"status": "ok"}

# Model loaded
curl http://localhost:11434/api/tags | jq -r '.models[].name'
# Expected: qwen3.6:35b-a3b

# Frontend
open http://localhost:5173
# Expected: Login screen
```

### Role Definitions

| Role | Access Level | Can Do |
|------|-------------|--------|
| `admin` | Full | User management, system config, agent orchestration |
| `developer` | Standard | Code generation, review, testing |
| `viewer` | Read-only | Dashboard, campaign monitoring |

### Performance Expectations

| Scenario | Tokens/sec | Latency |
|----------|-----------|---------|
| Single-stream (local GPU) | ~80-120 tok/s | ~200ms first token |
| Two concurrent requests | ~60-90 tok/s each | ~300ms first token |
| Four-way batch | ~40-60 tok/s each | ~500ms first token |

*Numbers vary by GPU hardware. Measure your setup.*

---

## GPU & Model Runtime

### Hardware Requirements

| Component | Minimum | Recommended |
|-----------|---------|-------------|
| GPU VRAM | 16GB | 24GB+ |
| GPU Compute | Turing architecture | Ada Lovelace / Hopper |
| NVMe Storage | 50GB free | 100GB+ NVMe Gen4 |
| RAM | 32GB | 64GB+ |

### Ollama Watchdog

The watchdog is a systemd-managed service that ensures the Qwen model stays loaded:

**Location:** `.Infrastructure/systemd/ollama-watchdog.service`

**Behavior:**
1. Every 60 seconds, queries `http://localhost:11434/api/tags`
2. If `qwen3.6:35b-a3b` is absent, triggers rehydration
3. Rehydration priority: NVMe cache → S3 artifact → remote pull
4. Logs all actions to journalctl

**Manual watchdog check:**
```bash
sudo systemctl status ollama-watchdog
journalctl -u ollama-watchdog --since "1 hour ago"
```

### Model Hydration Strategies

| Strategy | Speed | Use Case |
|----------|-------|----------|
| NVMe local registration | ~2 seconds | Primary recovery path |
| Local manifest `ollama create` | ~5 seconds | Fresh hydration from extracted weights |
| S3 cold hydrate | ~60-300 seconds | No local cache available |

### Critical: What Watchdog Must NOT Do

- ❌ Delete model layers during recovery
- ❌ Modify GPU memory directly
- ❌ Block agent requests during hydration (graceful degradation only)
- ❌ Restart Ollama process unless absolutely necessary

---

## Infrastructure

### Deployment Topology

```
┌─────────────────────────────────────────────────┐
│                  Production Host                 │
│                                                  │
│  ┌──────────┐  ┌──────────┐  ┌──────────────┐  │
│  │ Backend  │  │ Frontend │  │   Ollama     │  │
│  │ :8000    │  │ :5173    │  │  :11434      │  │
│  │ systemd  │  │ nginx    │  │  systemd     │  │
│  └────┬─────┘  └────┬─────┘  └──────┬───────┘  │
│       │             │               │           │
│       └─────────────┴───────────────┘           │
│                         │                        │
│                  ┌──────▼───────┐               │
│                  │  PostgreSQL  │               │
│                  │   :5432      │               │
│                  │  systemd     │               │
│                  └──────────────┘               │
│                                                  │
│  ┌──────────────────────────────────────────┐    │
│  │        NVIDIA GPU (CUDA + TensorRT)      │    │
│  └──────────────────────────────────────────┘    │
└─────────────────────────────────────────────────┘
```

### systemd Services

| Service | File | Restart Policy |
|---------|------|---------------|
| Backend API | `velocity-backend.service` | always |
| Frontend (nginx) | `velocity-frontend.service` | always |
| Ollama | `ollama.service` | on-failure |
| Watchdog | `ollama-watchdog.service` | always |
| PostgreSQL | `postgresql.service` | on-failure |

### Network Rules

| Port | Protocol | Service | External Access |
|------|----------|---------|-----------------|
| 80 | HTTP | nginx → frontend | Yes (public) |
| 443 | HTTPS | nginx → frontend | Yes (public) |
| 8000 | TCP | FastAPI backend | No (internal only) |
| 5173 | TCP | Vite dev server | No (dev only) |
| 5432 | TCP | PostgreSQL | No (internal only) |
| 11434 | TCP | Ollama API | No (internal only) |

### Monitoring

```bash
# All service health
systemctl status velocity-backend ollama postgresql

# GPU utilization
nvidia-smi -l 1

# Model inference logs
journalctl -u ollama -f

# API error rate
curl -s http://localhost:8000/health | jq .
```

---

## Runbooks

### Runbook: Backend Crashes at 2 AM

**Symptom:** Frontend shows 500 errors on API calls.

**Steps:**

```bash
# 1. Check backend status
sudo systemctl status velocity-backend
# Expected: active (running)

# 2. If stopped, restart
sudo systemctl restart velocity-backend

# 3. Check logs for root cause
sudo journalctl -u velocity-backend --since "30 minutes ago" --no-pager

# 4. Verify recovery
curl http://localhost:8000/health
# Expected: {"status": "ok"}

# 5. If crash repeats, check database connectivity
psql -U velocity -d velocity -c "SELECT 1;"
# Expected: 1
```

**If still broken:**
1. Check disk space: `df -h /`
2. Check memory: `free -h`
3. Check PostgreSQL: `sudo systemctl status postgresql`
4. Escalate with logs from step 3

---

### Runbook: Ollama Model Disappeared

**Symptom:** Agents return empty responses or errors.

**Steps:**

```bash
# 1. Check if Ollama is running
sudo systemctl status ollama
# Expected: active (running)

# 2. Check loaded models
curl http://localhost:11434/api/tags | jq '.models[].name'
# Expected: qwen3.6:35b-a3b

# 3. If model is missing, check watchdog
sudo systemctl status ollama-watchdog
journalctl -u ollama-watchdog --since "1 hour ago" --no-pager

# 4. Manual recovery if watchdog failed
ollama pull qwen3.6:35b-a3b

# 5. Verify model is usable
curl http://localhost:11434/api/generate -d '{
  "model": "qwen3.6:35b-a3b",
  "prompt": "Hello",
  "stream": false
}' | jq .done
# Expected: true
```

---

### Runbook: Database Connection Failures

**Symptom:** Backend logs show `connection refused` or `pool exhausted`.

**Steps:**

```bash
# 1. Check PostgreSQL status
sudo systemctl status postgresql
# Expected: active (running)

# 2. Check connection count
psql -U postgres -c "SELECT count(*) FROM pg_stat_activity;"
# Should be < max_connections (default 100)

# 3. Check disk space for WAL files
df -h /var/lib/postgresql

# 4. Restart if hung
sudo systemctl restart postgresql

# 5. Verify backend reconnects
sudo journalctl -u velocity-backend --since "1 minute ago" | grep -i "connected\|error"
```

---

### Runbook: GPU Memory Exhaustion

**Symptom:** Ollama returns `out of memory` errors.

**Steps:**

```bash
# 1. Check current GPU usage
nvidia-smi
# Note: PID, memory usage, temperature

# 2. Kill non-essential GPU processes if needed
nvidia-smi --id=0 --query-compute-apps=pid,name,used_memory --format=csv
kill <PID>

# 3. Check Ollama memory allocation
ollama show qwen3.6:35b-a3b | grep -i "layer\|memory"

# 4. If still exhausted, reduce model quantization
ollama pull qwen3.6:35b-a3b-q4_0

# 5. Monitor recovery
watch -n 1 nvidia-smi
```

---

## API Reference

### Auth Endpoints

#### `POST /api/auth/login`

Authenticate a user and receive a JWT token.

**Request:**
```json
{
  "email": "user@example.com",
  "password": "secure_password"
}
```

**Response (200 OK):**
```json
{
  "token": "eyJhbGciOiJIUzI1NiIs...",
  "user": {
    "id": "uuid-here",
    "email": "user@example.com",
    "role": "developer",
    "avatar_url": null
  }
}
```

**Errors:**
| Status | Meaning |
|--------|---------|
| 401 | Invalid credentials |
| 422 | Malformed request body |

---

#### `GET /api/auth/me`

Get the current authenticated user's profile.

**Headers:**
```
Authorization: Bearer <token>
```

**Response (200 OK):**
```json
{
  "id": "uuid-here",
  "email": "user@example.com",
  "role": "developer",
  "avatar_url": "https://cdn.example.com/avatars/user.png"
}
```

**Errors:**
| Status | Meaning |
|--------|---------|
| 401 | Token missing or invalid |
| 403 | Token expired |

---

#### `GET /api/auth/users`

List all users in the system. Admin only.

**Headers:**
```
Authorization: Bearer <admin_token>
```

**Response (200 OK):**
```json
[
  {
    "id": "uuid-1",
    "email": "admin@example.com",
    "role": "admin",
    "avatar_url": null
  },
  {
    "id": "uuid-2",
    "email": "dev@example.com",
    "role": "developer",
    "avatar_url": "https://cdn.example.com/avatars/dev.png"
  }
]
```

**Errors:**
| Status | Meaning |
|--------|---------|
| 403 | User is not admin |

---

#### `POST /api/auth/profile/avatar`

Upload a profile avatar image.

**Headers:**
```
Authorization: Bearer <token>
Content-Type: multipart/form-data
```

**Form Data:**
| Field | Type | Required |
|-------|------|----------|
| avatar | file (image/jpeg, image/png) | Yes |

**Response (200 OK):**
```json
{
  "avatar_url": "https://cdn.example.com/avatars/new-avatar.png"
}
```

**Errors:**
| Status | Meaning |
|--------|---------|
| 401 | Not authenticated |
| 422 | Invalid file type or size > 5MB |

---

### WebSocket Endpoints

#### `WS /ws/catalyst`

Real-time channel for Catalyst events (agent coordination, task updates).

**Connection:**
```javascript
const ws = new WebSocket('ws://localhost:8000/ws/catalyst');
ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  console.log(data.event_type, data.campaign_name, data.value);
};
```

**Event Format:**
```json
{
  "event_type": "task_complete",
  "campaign_name": "codegen-sprint-42",
  "value": 0.97,
  "timestamp": "2026-04-21T16:00:00Z"
}
```

---

#### `WS /ws/crm`

Real-time channel for CRM events (customer interactions, lead updates).

**Connection:**
```javascript
const ws = new WebSocket('ws://localhost:8000/ws/crm');
ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  console.log(data.type, data.payload);
};
```

**Event Format:**
```json
{
  "type": "lead_created",
  "payload": {
    "id": "crm-uuid",
    "name": "Acme Corp",
    "status": "new"
  },
  "timestamp": "2026-04-21T16:00:00Z"
}
```

---

### Health Check

#### `GET /health`

Verify system health.

**Response (200 OK):**
```json
{
  "status": "ok",
  "database": "connected",
  "ollama": "available",
  "gpu": "present"
}
```

---

## Contributing

### Code Structure

```
Project_Velocity/
├── .Agent Context/          # Agent documentation, model specs
├── .Infrastructure/         # Deployment configs, systemd units
├── backend/                 # FastAPI backend
│   ├── main.py              # Application entry point
│   ├── requirements.txt     # Python dependencies
│   └── migrate.py           # Database migrations
├── app/                     # React frontend
│   ├── src/
│   │   ├── App.tsx          # Root component
│   │   └── ...              # Components, routes, utils
│   ├── package.json         # Node dependencies
│   └── vite.config.ts       # Build config
├── bootstrap/               # Setup scripts
│   └── setup.sh             # One-line bootstrap
└── README.md                # This file
```

### Making a Contribution

1. **Fork and branch**
   ```bash
   git checkout -b feature/your-feature-name
   ```

2. **Make changes**
   - Backend: Follow FastAPI conventions, add type hints
   - Frontend: Follow React + TypeScript patterns, use existing components
   - Docs: Update this README if behavior changes

3. **Test locally**
   ```bash
   # Backend tests
   cd backend && pytest

   # Frontend checks
   cd app && npm run build
   ```

4. **Submit PR**
   - Title: Clear, action-oriented
   - Description: What + Why + How to test
   - Link any related issues

### Documentation Standards

- **Every endpoint:** Document inputs, outputs, errors
- **Every component:** JSDoc for public APIs
- **Every runbook:** Write as if for on-call at 2am
- **Every decision:** Record in `DECISIONS.md` with rationale

---

## Appendix

### A. Environment Variables

| Variable | Required | Description |
|----------|----------|-------------|
| `DATABASE_URL` | Yes | PostgreSQL connection string |
| `SECRET_KEY` | Yes | JWT signing key |
| `OLLAMA_BASE_URL` | No | Ollama API URL (default: `http://localhost:11434`) |
| `GPU_ENABLED` | No | Enable GPU path (default: `true`) |
| `LOG_LEVEL` | No | Logging level (default: `INFO`) |

### B. Troubleshooting Matrix

| Symptom | Likely Cause | Fix |
|---------|-------------|-----|
| Frontend blank screen | Backend down | `curl http://localhost:8000/health` |
| 401 on all calls | Token expired | Re-login |
| Agent returns empty | Model unloaded | `ollama pull qwen3.6:35b-a3b` |
| Slow responses | GPU not used | Check `nvidia-smi`, verify CUDA |
| Database errors | Pool exhausted | Check `max_connections`, restart backend |
| WebSocket disconnects | Network issue | Check firewall, reverse proxy config |

### C. Useful Commands Cheat Sheet

```bash
# Full system status
systemctl status velocity-backend ollama postgresql ollama-watchdog

# GPU实时监控
watch -n 1 nvidia-smi

# Model check
curl http://localhost:11434/api/tags | jq '.models[].name'

# API health
curl -s http://localhost:8000/health | jq .

# Database connection test
psql -U velocity -d velocity -c "SELECT version();"

# Frontend rebuild
cd app && npm run build && cp -r dist/* ../nginx/html/

# Restart everything (nuclear option)
sudo systemctl restart velocity-backend ollama postgresql
```

---

> **Last verified:** 2026-04-21
> **Maintained by:** Velocity Team
> **If this doc is wrong, the system is broken. Fix the doc first.**