Files

23 KiB

Project Velocity — Truthbook

What this is: The single source of truth for Project Velocity. If it's written down here, it's how the system works — not how someone hoped it would work.


Table of Contents

  1. What Is Project Velocity
  2. Quick Start
  3. Architecture Overview
  4. Runtime Truth
  5. Team Setup
  6. GPU & Model Runtime
  7. Infrastructure
  8. Runbooks
  9. API Reference
  10. Contributing

What Is Project Velocity

Project Velocity is a multi-agent AI development platform. It orchestrates intelligent agents (powered by Qwen 3.6 35B A3B and other models) to collaborate on software engineering tasks — code generation, review, testing, deployment — as a coordinated team rather than isolated tools.

Why it exists: Single-agent coding tools hit a ceiling. They lack context persistence, cross-task coordination, and operational reliability. Velocity solves this by:

  • Multi-agent collaboration — Agents communicate via WebSocket channels and shared memory
  • Persistent state — PostgreSQL backs user data, CRM records, and agent memory
  • GPU-accelerated inference — Local Ollama runtime on NVIDIA GPU hardware
  • Role-based access control — Admin and standard user tiers with avatar support
  • Live event broadcasting — Real-time campaign and catalyst events via WebSocket

Core stack:

Layer Technology
Backend API Python / FastAPI
Database PostgreSQL (via databases library with connection pooling)
Frontend React 19 + TypeScript + Vite + Tailwind CSS + Framer Motion
Inference Ollama (Qwen 3.6 35B A3B primary model)
Real-time WebSocket (Catalyst channel, CRM channel)
Deployment systemd services on Linux with NVIDIA GPU

Quick Start

Prerequisites

  • GPU Machine: NVIDIA GPU with sufficient VRAM (≥16GB recommended for Qwen 3.6 35B A3B)
  • NVMe Storage: For model weights and cache
  • Linux OS: Ubuntu 22.04+ or equivalent
  • Python 3.11+: Backend runtime
  • Node.js 18+: Frontend build
  • Ollama: Latest stable with Qwen 3.6 35B A3B model pulled
  • PostgreSQL 15+: Database backend

One-Line Bootstrap

bash bootstrap/setup.sh

This script handles:

  1. GPU driver verification
  2. Ollama installation and model pull
  3. PostgreSQL setup
  4. Backend dependency installation
  5. Frontend dependency installation
  6. systemd service creation

Manual Setup

1. GPU & Ollama

# Verify GPU
nvidia-smi

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull the primary model
ollama pull qwen3.6:35b-a3b

# Verify model is loaded
curl http://localhost:11434/api/tags | jq '.models[] | select(.name == "qwen3.6:35b-a3b")'

2. Database

# Start PostgreSQL
sudo systemctl start postgresql

# Create database and user
psql -U postgres -c "CREATE DATABASE velocity;"
psql -U postgres -c "CREATE USER velocity WITH PASSWORD 'secure_password';"
psql -U postgres -c "GRANT ALL PRIVILEGES ON DATABASE velocity TO velocity;"

3. Backend

cd Project_Velocity/backend

# Install dependencies
pip install -r requirements.txt

# Configure environment
cp .env.example .env
# Edit .env with your database credentials and secrets

# Run migrations
python migrate.py

# Start server
uvicorn main:app --host 0.0.0.0 --port 8000

4. Frontend

cd Project_Velocity/app

# Install dependencies
npm install

# Start dev server
npm run dev

Frontend is now available at http://localhost:5173.

5. Verify Everything

# Backend health
curl http://localhost:8000/health

# Model availability
curl http://localhost:11434/api/tags

# Frontend
open http://localhost:5173

Architecture Overview

System Diagram

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│   React UI  │────▶│  FastAPI     │────▶│  PostgreSQL │
│  (Port 5173)│◀────│  (Port 8000) │◀────│  (Port 5432)│
└─────────────┘     └──────┬───────┘     └─────────────┘
                           │
                           ▼
                    ┌──────────────┐
                    │   Ollama     │
                    │ (Port 11434) │
                    │ Qwen 3.6 35B │
                    └──────────────┘
                           │
                           ▼
                    ┌──────────────┐
                    │  NVIDIA GPU  │
                    └──────────────┘

Component Breakdown

Backend (backend/)

main.py — FastAPI application with:

  • Auth system — Login, profile lookup, user listing, avatar upload
  • WebSocket managers_CatalystManager() and _CRMManager() for real-time event broadcasting
  • Connection pooling — PostgreSQL via databases library with async context management
  • Lifespan hookslifespan() initializes and cleans up resources

Key endpoints:

Endpoint Method Purpose
/api/auth/login POST Authenticate user
/api/auth/me GET Get current user profile
/api/auth/users GET List all users (admin)
/api/auth/profile/avatar POST Upload profile avatar
/ws/catalyst WS Catalyst event channel
/ws/crm WS CRM event channel
/health GET Health check

Frontend (app/)

App.tsx — React application with:

  • Protected routesProtectedRoute() wraps authenticated paths
  • Route module syncRouteModuleSync() handles dynamic route loading
  • Main layoutMainLayout() provides chrome (header, sidebar, content area)
  • Role renderingformatRoleLabel() converts role codes to display labels
  • Auth state management — Dual useEffect hooks handle token persistence and user fetch

Agent Context (.Agent Context/)

Documents that define how agents operate within Velocity:

Infrastructure (.Infrastructure/)

Deployment and operational documentation:

  • systemd unit files for backend, frontend, Ollama services
  • Network configuration and ingress rules
  • Monitoring and alerting setup

Runtime Truth

What "Works" Means in Velocity

Velocity has three runtime layers, each with different failure modes:

Layer A: Fast Runtime Recovery

If the API crashes or restarts:

  • PostgreSQL connection pool rebuilds automatically via lifespan()
  • WebSocket managers reinitialize and accept new connections
  • No data loss — all state is in PostgreSQL

Layer B: Model Rehydration Recovery

If Ollama loses the Qwen model:

  • Watchdog systemd unit detects absence via /api/tags
  • Auto-registers model from NVMe cache or S3 artifact storage
  • Production requirement: Same-run auto-hydration logic must complete before any agent request

Layer C: Full System Recovery

If everything goes down:

  1. PostgreSQL recovers WAL logs
  2. Ollama watchdog restores model
  3. Backend systemd unit restarts API
  4. Frontend rebuilds if artifacts are corrupted

Critical Contracts

Auth contract:

Client → POST /api/auth/login {email, password}
       → 200 OK {token, user}
       
Client → GET /api/auth/me (Authorization: Bearer <token>)
       → 200 OK {id, email, role, avatar_url}
       → 401 Unauthorized

WebSocket contract:

Client → WS /ws/catalyst
       → Accepts live events: {event_type, campaign_name, value, timestamp}

Client → WS /ws/crm
       → Accepts CRM events: {type, payload, timestamp}

Model contract:

Ollama → GET /api/tags returns qwen3.6:35b-a3b
       → Context window: 131072 tokens
       → Provider: OpenAI-compatible interface at http://localhost:11434/v1

Team Setup

Developer Onboarding

1. Clone & Bootstrap

git clone <repo-url>
cd Project_Velocity
bash bootstrap/setup.sh

2. VS Code / Roo Code Configuration

Edit .vscode/settings.json:

{
  "roo-cline.provider": "openai-compatible",
  "roo-cline.baseUrl": "http://localhost:11434/v1",
  "roo-cline.modelId": "qwen3.6:35b-a3b",
  "roo-cline.contextWindow": 131072,
  "roo-cline.temperature": 0.7
}

3. Verify Team Access

# Backend health
curl http://localhost:8000/health
# Expected: {"status": "ok"}

# Model loaded
curl http://localhost:11434/api/tags | jq -r '.models[].name'
# Expected: qwen3.6:35b-a3b

# Frontend
open http://localhost:5173
# Expected: Login screen

Role Definitions

Role Access Level Can Do
admin Full User management, system config, agent orchestration
developer Standard Code generation, review, testing
viewer Read-only Dashboard, campaign monitoring

Performance Expectations

Scenario Tokens/sec Latency
Single-stream (local GPU) ~80-120 tok/s ~200ms first token
Two concurrent requests ~60-90 tok/s each ~300ms first token
Four-way batch ~40-60 tok/s each ~500ms first token

Numbers vary by GPU hardware. Measure your setup.


GPU & Model Runtime

Hardware Requirements

Component Minimum Recommended
GPU VRAM 16GB 24GB+
GPU Compute Turing architecture Ada Lovelace / Hopper
NVMe Storage 50GB free 100GB+ NVMe Gen4
RAM 32GB 64GB+

Ollama Watchdog

The watchdog is a systemd-managed service that ensures the Qwen model stays loaded:

Location: .Infrastructure/systemd/ollama-watchdog.service

Behavior:

  1. Every 60 seconds, queries http://localhost:11434/api/tags
  2. If qwen3.6:35b-a3b is absent, triggers rehydration
  3. Rehydration priority: NVMe cache → S3 artifact → remote pull
  4. Logs all actions to journalctl

Manual watchdog check:

sudo systemctl status ollama-watchdog
journalctl -u ollama-watchdog --since "1 hour ago"

Model Hydration Strategies

Strategy Speed Use Case
NVMe local registration ~2 seconds Primary recovery path
Local manifest ollama create ~5 seconds Fresh hydration from extracted weights
S3 cold hydrate ~60-300 seconds No local cache available

Critical: What Watchdog Must NOT Do

  • Delete model layers during recovery
  • Modify GPU memory directly
  • Block agent requests during hydration (graceful degradation only)
  • Restart Ollama process unless absolutely necessary

Infrastructure

Deployment Topology

┌─────────────────────────────────────────────────┐
│                  Production Host                 │
│                                                  │
│  ┌──────────┐  ┌──────────┐  ┌──────────────┐  │
│  │ Backend  │  │ Frontend │  │   Ollama     │  │
│  │ :8000    │  │ :5173    │  │  :11434      │  │
│  │ systemd  │  │ nginx    │  │  systemd     │  │
│  └────┬─────┘  └────┬─────┘  └──────┬───────┘  │
│       │             │               │           │
│       └─────────────┴───────────────┘           │
│                         │                        │
│                  ┌──────▼───────┐               │
│                  │  PostgreSQL  │               │
│                  │   :5432      │               │
│                  │  systemd     │               │
│                  └──────────────┘               │
│                                                  │
│  ┌──────────────────────────────────────────┐    │
│  │        NVIDIA GPU (CUDA + TensorRT)      │    │
│  └──────────────────────────────────────────┘    │
└─────────────────────────────────────────────────┘

systemd Services

Service File Restart Policy
Backend API velocity-backend.service always
Frontend (nginx) velocity-frontend.service always
Ollama ollama.service on-failure
Watchdog ollama-watchdog.service always
PostgreSQL postgresql.service on-failure

Network Rules

Port Protocol Service External Access
80 HTTP nginx → frontend Yes (public)
443 HTTPS nginx → frontend Yes (public)
8000 TCP FastAPI backend No (internal only)
5173 TCP Vite dev server No (dev only)
5432 TCP PostgreSQL No (internal only)
11434 TCP Ollama API No (internal only)

Monitoring

# All service health
systemctl status velocity-backend ollama postgresql

# GPU utilization
nvidia-smi -l 1

# Model inference logs
journalctl -u ollama -f

# API error rate
curl -s http://localhost:8000/health | jq .

Runbooks

Runbook: Backend Crashes at 2 AM

Symptom: Frontend shows 500 errors on API calls.

Steps:

# 1. Check backend status
sudo systemctl status velocity-backend
# Expected: active (running)

# 2. If stopped, restart
sudo systemctl restart velocity-backend

# 3. Check logs for root cause
sudo journalctl -u velocity-backend --since "30 minutes ago" --no-pager

# 4. Verify recovery
curl http://localhost:8000/health
# Expected: {"status": "ok"}

# 5. If crash repeats, check database connectivity
psql -U velocity -d velocity -c "SELECT 1;"
# Expected: 1

If still broken:

  1. Check disk space: df -h /
  2. Check memory: free -h
  3. Check PostgreSQL: sudo systemctl status postgresql
  4. Escalate with logs from step 3

Runbook: Ollama Model Disappeared

Symptom: Agents return empty responses or errors.

Steps:

# 1. Check if Ollama is running
sudo systemctl status ollama
# Expected: active (running)

# 2. Check loaded models
curl http://localhost:11434/api/tags | jq '.models[].name'
# Expected: qwen3.6:35b-a3b

# 3. If model is missing, check watchdog
sudo systemctl status ollama-watchdog
journalctl -u ollama-watchdog --since "1 hour ago" --no-pager

# 4. Manual recovery if watchdog failed
ollama pull qwen3.6:35b-a3b

# 5. Verify model is usable
curl http://localhost:11434/api/generate -d '{
  "model": "qwen3.6:35b-a3b",
  "prompt": "Hello",
  "stream": false
}' | jq .done
# Expected: true

Runbook: Database Connection Failures

Symptom: Backend logs show connection refused or pool exhausted.

Steps:

# 1. Check PostgreSQL status
sudo systemctl status postgresql
# Expected: active (running)

# 2. Check connection count
psql -U postgres -c "SELECT count(*) FROM pg_stat_activity;"
# Should be < max_connections (default 100)

# 3. Check disk space for WAL files
df -h /var/lib/postgresql

# 4. Restart if hung
sudo systemctl restart postgresql

# 5. Verify backend reconnects
sudo journalctl -u velocity-backend --since "1 minute ago" | grep -i "connected\|error"

Runbook: GPU Memory Exhaustion

Symptom: Ollama returns out of memory errors.

Steps:

# 1. Check current GPU usage
nvidia-smi
# Note: PID, memory usage, temperature

# 2. Kill non-essential GPU processes if needed
nvidia-smi --id=0 --query-compute-apps=pid,name,used_memory --format=csv
kill <PID>

# 3. Check Ollama memory allocation
ollama show qwen3.6:35b-a3b | grep -i "layer\|memory"

# 4. If still exhausted, reduce model quantization
ollama pull qwen3.6:35b-a3b-q4_0

# 5. Monitor recovery
watch -n 1 nvidia-smi

API Reference

Auth Endpoints

POST /api/auth/login

Authenticate a user and receive a JWT token.

Request:

{
  "email": "user@example.com",
  "password": "secure_password"
}

Response (200 OK):

{
  "token": "eyJhbGciOiJIUzI1NiIs...",
  "user": {
    "id": "uuid-here",
    "email": "user@example.com",
    "role": "developer",
    "avatar_url": null
  }
}

Errors:

Status Meaning
401 Invalid credentials
422 Malformed request body

GET /api/auth/me

Get the current authenticated user's profile.

Headers:

Authorization: Bearer <token>

Response (200 OK):

{
  "id": "uuid-here",
  "email": "user@example.com",
  "role": "developer",
  "avatar_url": "https://cdn.example.com/avatars/user.png"
}

Errors:

Status Meaning
401 Token missing or invalid
403 Token expired

GET /api/auth/users

List all users in the system. Admin only.

Headers:

Authorization: Bearer <admin_token>

Response (200 OK):

[
  {
    "id": "uuid-1",
    "email": "admin@example.com",
    "role": "admin",
    "avatar_url": null
  },
  {
    "id": "uuid-2",
    "email": "dev@example.com",
    "role": "developer",
    "avatar_url": "https://cdn.example.com/avatars/dev.png"
  }
]

Errors:

Status Meaning
403 User is not admin

POST /api/auth/profile/avatar

Upload a profile avatar image.

Headers:

Authorization: Bearer <token>
Content-Type: multipart/form-data

Form Data:

Field Type Required
avatar file (image/jpeg, image/png) Yes

Response (200 OK):

{
  "avatar_url": "https://cdn.example.com/avatars/new-avatar.png"
}

Errors:

Status Meaning
401 Not authenticated
422 Invalid file type or size > 5MB

WebSocket Endpoints

WS /ws/catalyst

Real-time channel for Catalyst events (agent coordination, task updates).

Connection:

const ws = new WebSocket('ws://localhost:8000/ws/catalyst');
ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  console.log(data.event_type, data.campaign_name, data.value);
};

Event Format:

{
  "event_type": "task_complete",
  "campaign_name": "codegen-sprint-42",
  "value": 0.97,
  "timestamp": "2026-04-21T16:00:00Z"
}

WS /ws/crm

Real-time channel for CRM events (customer interactions, lead updates).

Connection:

const ws = new WebSocket('ws://localhost:8000/ws/crm');
ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  console.log(data.type, data.payload);
};

Event Format:

{
  "type": "lead_created",
  "payload": {
    "id": "crm-uuid",
    "name": "Acme Corp",
    "status": "new"
  },
  "timestamp": "2026-04-21T16:00:00Z"
}

Health Check

GET /health

Verify system health.

Response (200 OK):

{
  "status": "ok",
  "database": "connected",
  "ollama": "available",
  "gpu": "present"
}

Contributing

Code Structure

Project_Velocity/
├── .Agent Context/          # Agent documentation, model specs
├── .Infrastructure/         # Deployment configs, systemd units
├── backend/                 # FastAPI backend
│   ├── main.py              # Application entry point
│   ├── requirements.txt     # Python dependencies
│   └── migrate.py           # Database migrations
├── app/                     # React frontend
│   ├── src/
│   │   ├── App.tsx          # Root component
│   │   └── ...              # Components, routes, utils
│   ├── package.json         # Node dependencies
│   └── vite.config.ts       # Build config
├── bootstrap/               # Setup scripts
│   └── setup.sh             # One-line bootstrap
└── README.md                # This file

Making a Contribution

  1. Fork and branch

    git checkout -b feature/your-feature-name
    
  2. Make changes

    • Backend: Follow FastAPI conventions, add type hints
    • Frontend: Follow React + TypeScript patterns, use existing components
    • Docs: Update this README if behavior changes
  3. Test locally

    # Backend tests
    cd backend && pytest
    
    # Frontend checks
    cd app && npm run build
    
  4. Submit PR

    • Title: Clear, action-oriented
    • Description: What + Why + How to test
    • Link any related issues

Documentation Standards

  • Every endpoint: Document inputs, outputs, errors
  • Every component: JSDoc for public APIs
  • Every runbook: Write as if for on-call at 2am
  • Every decision: Record in DECISIONS.md with rationale

Appendix

A. Environment Variables

Variable Required Description
DATABASE_URL Yes PostgreSQL connection string
SECRET_KEY Yes JWT signing key
OLLAMA_BASE_URL No Ollama API URL (default: http://localhost:11434)
GPU_ENABLED No Enable GPU path (default: true)
LOG_LEVEL No Logging level (default: INFO)

B. Troubleshooting Matrix

Symptom Likely Cause Fix
Frontend blank screen Backend down curl http://localhost:8000/health
401 on all calls Token expired Re-login
Agent returns empty Model unloaded ollama pull qwen3.6:35b-a3b
Slow responses GPU not used Check nvidia-smi, verify CUDA
Database errors Pool exhausted Check max_connections, restart backend
WebSocket disconnects Network issue Check firewall, reverse proxy config

C. Useful Commands Cheat Sheet

# Full system status
systemctl status velocity-backend ollama postgresql ollama-watchdog

# GPU实时监控
watch -n 1 nvidia-smi

# Model check
curl http://localhost:11434/api/tags | jq '.models[].name'

# API health
curl -s http://localhost:8000/health | jq .

# Database connection test
psql -U velocity -d velocity -c "SELECT version();"

# Frontend rebuild
cd app && npm run build && cp -r dist/* ../nginx/html/

# Restart everything (nuclear option)
sudo systemctl restart velocity-backend ollama postgresql

Last verified: 2026-04-21 Maintained by: Velocity Team If this doc is wrong, the system is broken. Fix the doc first.