23 KiB
Project Velocity — Truthbook
What this is: The single source of truth for Project Velocity. If it's written down here, it's how the system works — not how someone hoped it would work.
Table of Contents
- What Is Project Velocity
- Quick Start
- Architecture Overview
- Runtime Truth
- Team Setup
- GPU & Model Runtime
- Infrastructure
- Runbooks
- API Reference
- Contributing
What Is Project Velocity
Project Velocity is a multi-agent AI development platform. It orchestrates intelligent agents (powered by Qwen 3.6 35B A3B and other models) to collaborate on software engineering tasks — code generation, review, testing, deployment — as a coordinated team rather than isolated tools.
Why it exists: Single-agent coding tools hit a ceiling. They lack context persistence, cross-task coordination, and operational reliability. Velocity solves this by:
- Multi-agent collaboration — Agents communicate via WebSocket channels and shared memory
- Persistent state — PostgreSQL backs user data, CRM records, and agent memory
- GPU-accelerated inference — Local Ollama runtime on NVIDIA GPU hardware
- Role-based access control — Admin and standard user tiers with avatar support
- Live event broadcasting — Real-time campaign and catalyst events via WebSocket
Core stack:
| Layer | Technology |
|---|---|
| Backend API | Python / FastAPI |
| Database | PostgreSQL (via databases library with connection pooling) |
| Frontend | React 19 + TypeScript + Vite + Tailwind CSS + Framer Motion |
| Inference | Ollama (Qwen 3.6 35B A3B primary model) |
| Real-time | WebSocket (Catalyst channel, CRM channel) |
| Deployment | systemd services on Linux with NVIDIA GPU |
Quick Start
Prerequisites
- GPU Machine: NVIDIA GPU with sufficient VRAM (≥16GB recommended for Qwen 3.6 35B A3B)
- NVMe Storage: For model weights and cache
- Linux OS: Ubuntu 22.04+ or equivalent
- Python 3.11+: Backend runtime
- Node.js 18+: Frontend build
- Ollama: Latest stable with Qwen 3.6 35B A3B model pulled
- PostgreSQL 15+: Database backend
One-Line Bootstrap
bash bootstrap/setup.sh
This script handles:
- GPU driver verification
- Ollama installation and model pull
- PostgreSQL setup
- Backend dependency installation
- Frontend dependency installation
- systemd service creation
Manual Setup
1. GPU & Ollama
# Verify GPU
nvidia-smi
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull the primary model
ollama pull qwen3.6:35b-a3b
# Verify model is loaded
curl http://localhost:11434/api/tags | jq '.models[] | select(.name == "qwen3.6:35b-a3b")'
2. Database
# Start PostgreSQL
sudo systemctl start postgresql
# Create database and user
psql -U postgres -c "CREATE DATABASE velocity;"
psql -U postgres -c "CREATE USER velocity WITH PASSWORD 'secure_password';"
psql -U postgres -c "GRANT ALL PRIVILEGES ON DATABASE velocity TO velocity;"
3. Backend
cd Project_Velocity/backend
# Install dependencies
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Edit .env with your database credentials and secrets
# Run migrations
python migrate.py
# Start server
uvicorn main:app --host 0.0.0.0 --port 8000
4. Frontend
cd Project_Velocity/app
# Install dependencies
npm install
# Start dev server
npm run dev
Frontend is now available at http://localhost:5173.
5. Verify Everything
# Backend health
curl http://localhost:8000/health
# Model availability
curl http://localhost:11434/api/tags
# Frontend
open http://localhost:5173
Architecture Overview
System Diagram
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ React UI │────▶│ FastAPI │────▶│ PostgreSQL │
│ (Port 5173)│◀────│ (Port 8000) │◀────│ (Port 5432)│
└─────────────┘ └──────┬───────┘ └─────────────┘
│
▼
┌──────────────┐
│ Ollama │
│ (Port 11434) │
│ Qwen 3.6 35B │
└──────────────┘
│
▼
┌──────────────┐
│ NVIDIA GPU │
└──────────────┘
Component Breakdown
Backend (backend/)
main.py — FastAPI application with:
- Auth system — Login, profile lookup, user listing, avatar upload
- WebSocket managers —
_CatalystManager()and_CRMManager()for real-time event broadcasting - Connection pooling — PostgreSQL via
databaseslibrary with async context management - Lifespan hooks —
lifespan()initializes and cleans up resources
Key endpoints:
| Endpoint | Method | Purpose |
|---|---|---|
/api/auth/login |
POST | Authenticate user |
/api/auth/me |
GET | Get current user profile |
/api/auth/users |
GET | List all users (admin) |
/api/auth/profile/avatar |
POST | Upload profile avatar |
/ws/catalyst |
WS | Catalyst event channel |
/ws/crm |
WS | CRM event channel |
/health |
GET | Health check |
Frontend (app/)
App.tsx — React application with:
- Protected routes —
ProtectedRoute()wraps authenticated paths - Route module sync —
RouteModuleSync()handles dynamic route loading - Main layout —
MainLayout()provides chrome (header, sidebar, content area) - Role rendering —
formatRoleLabel()converts role codes to display labels - Auth state management — Dual
useEffecthooks handle token persistence and user fetch
Agent Context (.Agent Context/)
Documents that define how agents operate within Velocity:
Qwen 3.6 35B A3B Ollama Access, Recovery, and Team Setup.md— Model runtime, recovery policies, team onboardingREADME.md— This file
Infrastructure (.Infrastructure/)
Deployment and operational documentation:
- systemd unit files for backend, frontend, Ollama services
- Network configuration and ingress rules
- Monitoring and alerting setup
Runtime Truth
What "Works" Means in Velocity
Velocity has three runtime layers, each with different failure modes:
Layer A: Fast Runtime Recovery
If the API crashes or restarts:
- PostgreSQL connection pool rebuilds automatically via
lifespan() - WebSocket managers reinitialize and accept new connections
- No data loss — all state is in PostgreSQL
Layer B: Model Rehydration Recovery
If Ollama loses the Qwen model:
- Watchdog systemd unit detects absence via
/api/tags - Auto-registers model from NVMe cache or S3 artifact storage
- Production requirement: Same-run auto-hydration logic must complete before any agent request
Layer C: Full System Recovery
If everything goes down:
- PostgreSQL recovers WAL logs
- Ollama watchdog restores model
- Backend systemd unit restarts API
- Frontend rebuilds if artifacts are corrupted
Critical Contracts
Auth contract:
Client → POST /api/auth/login {email, password}
→ 200 OK {token, user}
Client → GET /api/auth/me (Authorization: Bearer <token>)
→ 200 OK {id, email, role, avatar_url}
→ 401 Unauthorized
WebSocket contract:
Client → WS /ws/catalyst
→ Accepts live events: {event_type, campaign_name, value, timestamp}
Client → WS /ws/crm
→ Accepts CRM events: {type, payload, timestamp}
Model contract:
Ollama → GET /api/tags returns qwen3.6:35b-a3b
→ Context window: 131072 tokens
→ Provider: OpenAI-compatible interface at http://localhost:11434/v1
Team Setup
Developer Onboarding
1. Clone & Bootstrap
git clone <repo-url>
cd Project_Velocity
bash bootstrap/setup.sh
2. VS Code / Roo Code Configuration
Edit .vscode/settings.json:
{
"roo-cline.provider": "openai-compatible",
"roo-cline.baseUrl": "http://localhost:11434/v1",
"roo-cline.modelId": "qwen3.6:35b-a3b",
"roo-cline.contextWindow": 131072,
"roo-cline.temperature": 0.7
}
3. Verify Team Access
# Backend health
curl http://localhost:8000/health
# Expected: {"status": "ok"}
# Model loaded
curl http://localhost:11434/api/tags | jq -r '.models[].name'
# Expected: qwen3.6:35b-a3b
# Frontend
open http://localhost:5173
# Expected: Login screen
Role Definitions
| Role | Access Level | Can Do |
|---|---|---|
admin |
Full | User management, system config, agent orchestration |
developer |
Standard | Code generation, review, testing |
viewer |
Read-only | Dashboard, campaign monitoring |
Performance Expectations
| Scenario | Tokens/sec | Latency |
|---|---|---|
| Single-stream (local GPU) | ~80-120 tok/s | ~200ms first token |
| Two concurrent requests | ~60-90 tok/s each | ~300ms first token |
| Four-way batch | ~40-60 tok/s each | ~500ms first token |
Numbers vary by GPU hardware. Measure your setup.
GPU & Model Runtime
Hardware Requirements
| Component | Minimum | Recommended |
|---|---|---|
| GPU VRAM | 16GB | 24GB+ |
| GPU Compute | Turing architecture | Ada Lovelace / Hopper |
| NVMe Storage | 50GB free | 100GB+ NVMe Gen4 |
| RAM | 32GB | 64GB+ |
Ollama Watchdog
The watchdog is a systemd-managed service that ensures the Qwen model stays loaded:
Location: .Infrastructure/systemd/ollama-watchdog.service
Behavior:
- Every 60 seconds, queries
http://localhost:11434/api/tags - If
qwen3.6:35b-a3bis absent, triggers rehydration - Rehydration priority: NVMe cache → S3 artifact → remote pull
- Logs all actions to journalctl
Manual watchdog check:
sudo systemctl status ollama-watchdog
journalctl -u ollama-watchdog --since "1 hour ago"
Model Hydration Strategies
| Strategy | Speed | Use Case |
|---|---|---|
| NVMe local registration | ~2 seconds | Primary recovery path |
Local manifest ollama create |
~5 seconds | Fresh hydration from extracted weights |
| S3 cold hydrate | ~60-300 seconds | No local cache available |
Critical: What Watchdog Must NOT Do
- ❌ Delete model layers during recovery
- ❌ Modify GPU memory directly
- ❌ Block agent requests during hydration (graceful degradation only)
- ❌ Restart Ollama process unless absolutely necessary
Infrastructure
Deployment Topology
┌─────────────────────────────────────────────────┐
│ Production Host │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ Backend │ │ Frontend │ │ Ollama │ │
│ │ :8000 │ │ :5173 │ │ :11434 │ │
│ │ systemd │ │ nginx │ │ systemd │ │
│ └────┬─────┘ └────┬─────┘ └──────┬───────┘ │
│ │ │ │ │
│ └─────────────┴───────────────┘ │
│ │ │
│ ┌──────▼───────┐ │
│ │ PostgreSQL │ │
│ │ :5432 │ │
│ │ systemd │ │
│ └──────────────┘ │
│ │
│ ┌──────────────────────────────────────────┐ │
│ │ NVIDIA GPU (CUDA + TensorRT) │ │
│ └──────────────────────────────────────────┘ │
└─────────────────────────────────────────────────┘
systemd Services
| Service | File | Restart Policy |
|---|---|---|
| Backend API | velocity-backend.service |
always |
| Frontend (nginx) | velocity-frontend.service |
always |
| Ollama | ollama.service |
on-failure |
| Watchdog | ollama-watchdog.service |
always |
| PostgreSQL | postgresql.service |
on-failure |
Network Rules
| Port | Protocol | Service | External Access |
|---|---|---|---|
| 80 | HTTP | nginx → frontend | Yes (public) |
| 443 | HTTPS | nginx → frontend | Yes (public) |
| 8000 | TCP | FastAPI backend | No (internal only) |
| 5173 | TCP | Vite dev server | No (dev only) |
| 5432 | TCP | PostgreSQL | No (internal only) |
| 11434 | TCP | Ollama API | No (internal only) |
Monitoring
# All service health
systemctl status velocity-backend ollama postgresql
# GPU utilization
nvidia-smi -l 1
# Model inference logs
journalctl -u ollama -f
# API error rate
curl -s http://localhost:8000/health | jq .
Runbooks
Runbook: Backend Crashes at 2 AM
Symptom: Frontend shows 500 errors on API calls.
Steps:
# 1. Check backend status
sudo systemctl status velocity-backend
# Expected: active (running)
# 2. If stopped, restart
sudo systemctl restart velocity-backend
# 3. Check logs for root cause
sudo journalctl -u velocity-backend --since "30 minutes ago" --no-pager
# 4. Verify recovery
curl http://localhost:8000/health
# Expected: {"status": "ok"}
# 5. If crash repeats, check database connectivity
psql -U velocity -d velocity -c "SELECT 1;"
# Expected: 1
If still broken:
- Check disk space:
df -h / - Check memory:
free -h - Check PostgreSQL:
sudo systemctl status postgresql - Escalate with logs from step 3
Runbook: Ollama Model Disappeared
Symptom: Agents return empty responses or errors.
Steps:
# 1. Check if Ollama is running
sudo systemctl status ollama
# Expected: active (running)
# 2. Check loaded models
curl http://localhost:11434/api/tags | jq '.models[].name'
# Expected: qwen3.6:35b-a3b
# 3. If model is missing, check watchdog
sudo systemctl status ollama-watchdog
journalctl -u ollama-watchdog --since "1 hour ago" --no-pager
# 4. Manual recovery if watchdog failed
ollama pull qwen3.6:35b-a3b
# 5. Verify model is usable
curl http://localhost:11434/api/generate -d '{
"model": "qwen3.6:35b-a3b",
"prompt": "Hello",
"stream": false
}' | jq .done
# Expected: true
Runbook: Database Connection Failures
Symptom: Backend logs show connection refused or pool exhausted.
Steps:
# 1. Check PostgreSQL status
sudo systemctl status postgresql
# Expected: active (running)
# 2. Check connection count
psql -U postgres -c "SELECT count(*) FROM pg_stat_activity;"
# Should be < max_connections (default 100)
# 3. Check disk space for WAL files
df -h /var/lib/postgresql
# 4. Restart if hung
sudo systemctl restart postgresql
# 5. Verify backend reconnects
sudo journalctl -u velocity-backend --since "1 minute ago" | grep -i "connected\|error"
Runbook: GPU Memory Exhaustion
Symptom: Ollama returns out of memory errors.
Steps:
# 1. Check current GPU usage
nvidia-smi
# Note: PID, memory usage, temperature
# 2. Kill non-essential GPU processes if needed
nvidia-smi --id=0 --query-compute-apps=pid,name,used_memory --format=csv
kill <PID>
# 3. Check Ollama memory allocation
ollama show qwen3.6:35b-a3b | grep -i "layer\|memory"
# 4. If still exhausted, reduce model quantization
ollama pull qwen3.6:35b-a3b-q4_0
# 5. Monitor recovery
watch -n 1 nvidia-smi
API Reference
Auth Endpoints
POST /api/auth/login
Authenticate a user and receive a JWT token.
Request:
{
"email": "user@example.com",
"password": "secure_password"
}
Response (200 OK):
{
"token": "eyJhbGciOiJIUzI1NiIs...",
"user": {
"id": "uuid-here",
"email": "user@example.com",
"role": "developer",
"avatar_url": null
}
}
Errors:
| Status | Meaning |
|---|---|
| 401 | Invalid credentials |
| 422 | Malformed request body |
GET /api/auth/me
Get the current authenticated user's profile.
Headers:
Authorization: Bearer <token>
Response (200 OK):
{
"id": "uuid-here",
"email": "user@example.com",
"role": "developer",
"avatar_url": "https://cdn.example.com/avatars/user.png"
}
Errors:
| Status | Meaning |
|---|---|
| 401 | Token missing or invalid |
| 403 | Token expired |
GET /api/auth/users
List all users in the system. Admin only.
Headers:
Authorization: Bearer <admin_token>
Response (200 OK):
[
{
"id": "uuid-1",
"email": "admin@example.com",
"role": "admin",
"avatar_url": null
},
{
"id": "uuid-2",
"email": "dev@example.com",
"role": "developer",
"avatar_url": "https://cdn.example.com/avatars/dev.png"
}
]
Errors:
| Status | Meaning |
|---|---|
| 403 | User is not admin |
POST /api/auth/profile/avatar
Upload a profile avatar image.
Headers:
Authorization: Bearer <token>
Content-Type: multipart/form-data
Form Data:
| Field | Type | Required |
|---|---|---|
| avatar | file (image/jpeg, image/png) | Yes |
Response (200 OK):
{
"avatar_url": "https://cdn.example.com/avatars/new-avatar.png"
}
Errors:
| Status | Meaning |
|---|---|
| 401 | Not authenticated |
| 422 | Invalid file type or size > 5MB |
WebSocket Endpoints
WS /ws/catalyst
Real-time channel for Catalyst events (agent coordination, task updates).
Connection:
const ws = new WebSocket('ws://localhost:8000/ws/catalyst');
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
console.log(data.event_type, data.campaign_name, data.value);
};
Event Format:
{
"event_type": "task_complete",
"campaign_name": "codegen-sprint-42",
"value": 0.97,
"timestamp": "2026-04-21T16:00:00Z"
}
WS /ws/crm
Real-time channel for CRM events (customer interactions, lead updates).
Connection:
const ws = new WebSocket('ws://localhost:8000/ws/crm');
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
console.log(data.type, data.payload);
};
Event Format:
{
"type": "lead_created",
"payload": {
"id": "crm-uuid",
"name": "Acme Corp",
"status": "new"
},
"timestamp": "2026-04-21T16:00:00Z"
}
Health Check
GET /health
Verify system health.
Response (200 OK):
{
"status": "ok",
"database": "connected",
"ollama": "available",
"gpu": "present"
}
Contributing
Code Structure
Project_Velocity/
├── .Agent Context/ # Agent documentation, model specs
├── .Infrastructure/ # Deployment configs, systemd units
├── backend/ # FastAPI backend
│ ├── main.py # Application entry point
│ ├── requirements.txt # Python dependencies
│ └── migrate.py # Database migrations
├── app/ # React frontend
│ ├── src/
│ │ ├── App.tsx # Root component
│ │ └── ... # Components, routes, utils
│ ├── package.json # Node dependencies
│ └── vite.config.ts # Build config
├── bootstrap/ # Setup scripts
│ └── setup.sh # One-line bootstrap
└── README.md # This file
Making a Contribution
-
Fork and branch
git checkout -b feature/your-feature-name -
Make changes
- Backend: Follow FastAPI conventions, add type hints
- Frontend: Follow React + TypeScript patterns, use existing components
- Docs: Update this README if behavior changes
-
Test locally
# Backend tests cd backend && pytest # Frontend checks cd app && npm run build -
Submit PR
- Title: Clear, action-oriented
- Description: What + Why + How to test
- Link any related issues
Documentation Standards
- Every endpoint: Document inputs, outputs, errors
- Every component: JSDoc for public APIs
- Every runbook: Write as if for on-call at 2am
- Every decision: Record in
DECISIONS.mdwith rationale
Appendix
A. Environment Variables
| Variable | Required | Description |
|---|---|---|
DATABASE_URL |
Yes | PostgreSQL connection string |
SECRET_KEY |
Yes | JWT signing key |
OLLAMA_BASE_URL |
No | Ollama API URL (default: http://localhost:11434) |
GPU_ENABLED |
No | Enable GPU path (default: true) |
LOG_LEVEL |
No | Logging level (default: INFO) |
B. Troubleshooting Matrix
| Symptom | Likely Cause | Fix |
|---|---|---|
| Frontend blank screen | Backend down | curl http://localhost:8000/health |
| 401 on all calls | Token expired | Re-login |
| Agent returns empty | Model unloaded | ollama pull qwen3.6:35b-a3b |
| Slow responses | GPU not used | Check nvidia-smi, verify CUDA |
| Database errors | Pool exhausted | Check max_connections, restart backend |
| WebSocket disconnects | Network issue | Check firewall, reverse proxy config |
C. Useful Commands Cheat Sheet
# Full system status
systemctl status velocity-backend ollama postgresql ollama-watchdog
# GPU实时监控
watch -n 1 nvidia-smi
# Model check
curl http://localhost:11434/api/tags | jq '.models[].name'
# API health
curl -s http://localhost:8000/health | jq .
# Database connection test
psql -U velocity -d velocity -c "SELECT version();"
# Frontend rebuild
cd app && npm run build && cp -r dist/* ../nginx/html/
# Restart everything (nuclear option)
sudo systemctl restart velocity-backend ollama postgresql
Last verified: 2026-04-21 Maintained by: Velocity Team If this doc is wrong, the system is broken. Fix the doc first.