Built the Sentinel Tab

2026-04-12 02:02:58 +05:30
parent fb656d1443
commit 075ab280ad
526 changed files with 17646 additions and 70931 deletions
--- a/infrastructure/desineuron_ingress/Caddyfile
+++ b/infrastructure/desineuron_ingress/Caddyfile
@@ -0,0 +1,59 @@
+{
+	email admin@desineuron.in
+	log {
+		output file /var/log/caddy/admin.log
+		format json
+	}
+}
+
+office.desineuron.in, git.desineuron.in, cloud.desineuron.in, projects.desineuron.in, talk.desineuron.in, vpn.desineuron.in {
+	tls /etc/caddy/tls/fullchain.pem /etc/caddy/tls/privkey.pem
+
+	log {
+		output file /var/log/caddy/access.log
+		format json
+	}
+
+	reverse_proxy https://127.0.0.1:8443 {
+		header_up Host {host}
+		header_up X-Forwarded-Host {host}
+		header_up X-Forwarded-Proto {scheme}
+		header_up X-Forwarded-For {remote_host}
+		transport http {
+			tls_insecure_skip_verify
+		}
+	}
+}
+
+ops.desineuron.in {
+	log {
+		output file /var/log/caddy/access.log
+		format json
+	}
+
+	reverse_proxy https://127.0.0.1:8443 {
+		header_up Host {host}
+		header_up X-Forwarded-Host {host}
+		header_up X-Forwarded-Proto {scheme}
+		header_up X-Forwarded-For {remote_host}
+		transport http {
+			tls_insecure_skip_verify
+		}
+	}
+}
+
+comfy.desineuron.in {
+	log {
+		output file /var/log/caddy/access.log
+		format json
+	}
+
+	reverse_proxy http://172.31.46.190:8188 {
+		header_up Host {host}
+		header_up X-Forwarded-Host {host}
+		header_up X-Forwarded-Proto {scheme}
+		header_up X-Forwarded-For {remote_host}
+	}
+}
+
+import /etc/caddy/managed/*.caddy
--- a/infrastructure/desineuron_ingress/README.md
+++ b/infrastructure/desineuron_ingress/README.md
@@ -0,0 +1,38 @@
+# Desineuron Ingress
+
+This directory contains the reproducible bootstrap artifacts for the
+`desineuron-ingress-01` EC2 node.
+
+Architecture:
+- EC2 `t4g.micro` on-demand in `us-east-1`
+- Amazon Linux 2023 ARM64
+- `20 GB` gp3 root volume
+- `Caddy` as the public HTTPS edge
+- `rathole` as the reverse TCP relay from the Linux origin box
+
+Traffic model:
+- Public DNS stays in Cloudflare
+- Public HTTPS terminates on EC2
+- All six public hostnames proxy through EC2 to one local relay socket
+- Linux origin continues to serve the actual apps on `https://localhost:443`
+
+Key files:
+- `user_data.sh`: first-boot provisioning for the EC2 ingress node
+- `Caddyfile`: public edge routing
+- `rathole-server.toml`: EC2-side relay config
+- `rathole-client.toml`: Linux-side relay config template
+- `install_linux_rathole_client.sh`: Linux-side installer/service script
+- `sync_ingress_home_ip.py`: detects current home public IP and updates the ingress SSH allowlist rule
+- `desineuron-ingress-home-ip-sync.service`: systemd oneshot service for the IP sync
+- `desineuron-ingress-home-ip-sync.timer`: persistent timer that reruns the sync every 5 minutes and on boot
+- `install_linux_ingress_ip_sync.sh`: Linux-side installer for the IP sync service
+
+Manual Cloudflare work still required unless API credentials are provided:
+- set the six hostnames to DNS-only
+- point them to the ingress Elastic IP
+- retire the Cloudflare Tunnel routes once public validation passes
+
+Dynamic home IP handling:
+- `rathole` control port `2333/tcp` is intentionally open on the ingress so public services do not break when the ISP IP changes
+- SSH fallback on the ingress remains restricted to the current home public IP on `22/tcp`
+- the Linux-side IP sync service keeps that SSH fallback rule current after ISP churn or reboot
--- a/infrastructure/desineuron_ingress/TEAM_HANDOFF_2026-04-08.md
+++ b/infrastructure/desineuron_ingress/TEAM_HANDOFF_2026-04-08.md
@@ -0,0 +1,540 @@
+## Desineuron Stable Ingress Handoff
+
+Date: 2026-04-08
+
+### Chapters
+
+1. Outcome
+2. Final Architecture
+3. AWS Resources
+4. Linux Origin State
+5. Migration Changes Applied
+6. Validation Results
+7. ComfyUI Recovery and GPU Route
+8. Files and Config Artifacts
+9. Dynamic Home IP Sync
+10. Operational Commands
+11. Future Service Mapping Runbook
+12. Security Notes
+13. Remaining Improvement Ideas
+14. Rollback
+15. Team Summary
+16. Current Status Snapshot - 2026-04-11
+17. Linux Ops Control Plane
+
+### Outcome
+
+The Cloudflare Tunnel dependency for the six public `desineuron.in` services has been replaced with a self-hosted AWS ingress layer:
+
+- Public edge: AWS EC2 `t4g.micro`
+- Stable public IP: `98.87.120.120`
+- TLS termination: `Caddy` on the ingress node
+- Private backend relay: `rathole`
+- Origin: Linux box at `192.168.1.4`
+- DNS: Cloudflare, `DNS only`
+
+Public hostnames now route through AWS instead of Cloudflare Tunnel:
+
+- `office.desineuron.in`
+- `git.desineuron.in`
+- `cloud.desineuron.in`
+- `projects.desineuron.in`
+- `talk.desineuron.in`
+- `vpn.desineuron.in`
+- `comfy.desineuron.in` (ingress route created for AWS GPU ComfyUI)
+- `ops.desineuron.in` (private operator control surface on the Linux box)
+
+### Final Architecture
+
+```text
+Internet
+  -> Cloudflare DNS
+  -> 98.87.120.120
+  -> EC2 ingress: desineuron-ingress-01
+     -> Caddy :443
+     -> rathole server (control on 2333, local relay on 127.0.0.1:8443)
+     -> Linux origin tunnel client
+        -> Linux nginx :443
+        -> per-host upstream routing
+           -> Gitea
+           -> Nextcloud
+           -> Taiga
+           -> OnlyOffice
+           -> NetBird
+  -> comfy.desineuron.in
+     -> EC2 ingress Caddy
+     -> private proxy to AWS GPU box `172.31.46.190:8188`
+     -> ComfyUI endpoints on systemd-managed GPU service
+```
+
+### AWS Resources
+
+- Instance name: `desineuron-ingress-01`
+- Instance ID: `i-094df09acafb72494`
+- Type: `t4g.micro`
+- Region: `us-east-1`
+- Subnet: `subnet-03d684ed15f327151`
+- VPC: `vpc-081d2397920aad268`
+- Root disk: `20 GB gp3`
+- Elastic IP: `98.87.120.120`
+- IAM role: `desineuron-ingress-role`
+- Instance profile: `desineuron-ingress-profile`
+- Security group: `sg-0721b8b48e12c531d`
+
+Current GPU worker:
+
+- Instance ID: `i-0e4eab5fe67cf9abe`
+- Type: `g6.12xlarge`
+- Region: `us-east-1`
+- Private IP: `172.31.46.190`
+- Current public IP: `18.208.176.121`
+- Launch time: `2026-04-11T06:14:04Z`
+
+Open ingress ports:
+
+- `80/tcp` from internet
+- `443/tcp` from internet
+- `22/tcp` restricted to the current home public IP and auto-synced from the Linux origin
+- `2333/tcp` from internet for `rathole` control and data relay
+
+GPU node security posture for ComfyUI:
+
+- public `8118/tcp` removed
+- public `8188/tcp` removed
+- `8188/tcp` now allowed only from ingress security group `sg-0721b8b48e12c531d`
+
+### Linux Origin State
+
+Services exposed to local nginx:
+
+- `git.desineuron.in` -> `127.0.0.1:3000` (`gitea`)
+- `cloud.desineuron.in` -> `127.0.0.1:11000` (`nextcloud_app`)
+- `talk.desineuron.in` -> `127.0.0.1:11000` (`nextcloud_app`, Talk-focused hostname)
+- `projects.desineuron.in` -> `127.0.0.1:9100` (`taiga-gateway`)
+- `office.desineuron.in` -> `127.0.0.1:9980` (`nextcloud_onlyoffice`)
+- `vpn.desineuron.in` -> `127.0.0.1:8080` / `127.0.0.1:8081` (`netbird`)
+
+Tunnel state:
+
+- `rathole-client.service` active on Linux
+- `rathole-server.service` active on AWS
+- `cloudflared` inactive on Linux
+
+### Migration Changes Applied
+
+#### Cloudflare
+
+Old CNAME tunnel records were removed for the six public hostnames.
+
+New records were created:
+
+- Type: `A`
+- Value: `98.87.120.120`
+- Proxy status: `DNS only`
+- TTL: `300`
+
+#### AWS Ingress
+
+Installed and configured:
+
+- `Caddy`
+- `rathole`
+- `amazon-ssm-agent`
+- Linux-driven SSH allowlist sync for the ingress node
+
+TLS:
+
+- Existing valid certificate/key pair from the Linux origin was copied to the ingress node.
+- Caddy now terminates HTTPS at the edge.
+
+#### Linux Origin
+
+nginx was already routing by hostname and remains the origin router.
+
+Nextcloud was adjusted so `talk.desineuron.in` no longer canonicalizes to `cloud.desineuron.in`:
+
+- removed `overwritehost` pin
+- added `talk.desineuron.in` to trusted domains
+- restarted `nextcloud_app`
+
+### Validation Results
+
+Public hostname checks through the new ingress:
+
+- `office.desineuron.in` -> `200 /welcome/`
+- `git.desineuron.in` -> `200`
+- `cloud.desineuron.in` -> `200 /login`
+- `projects.desineuron.in` -> `200`
+- `talk.desineuron.in` -> `200 /login` on `talk.desineuron.in`
+- `vpn.desineuron.in` -> `200`
+- `ops.desineuron.in/login` -> `200`
+- `comfy.desineuron.in` -> `502`
+
+Important note:
+
+- `talk.desineuron.in` now stays on the `talk` hostname.
+- It is still backed by the same Nextcloud origin and presents the Nextcloud login flow, which is expected given the current Linux-side app layout.
+
+### ComfyUI Recovery and GPU Route
+
+Root cause of the earlier `502`:
+
+- ingress route and TLS were correct
+- the GPU spot node had lost the actual `/opt/dlami/nvme/ComfyUI` app tree
+- nothing was listening on `172.31.46.190:8188`
+
+Permanent fix applied:
+
+- restored `/opt/dlami/nvme/ComfyUI` from upstream source control
+- installed ComfyUI Python requirements on the GPU node
+- created `systemd` unit `comfyui.service`
+- enabled `comfyui.service` at boot with automatic restart
+- kept `comfy.desineuron.in` mapped through ingress Caddy
+- removed direct public access to `8118` and `8188`
+- allowed `8188` only from ingress security group
+
+Current live path:
+
+- `https://comfy.desineuron.in`
+  -> ingress `98.87.120.120`
+  -> Caddy reverse proxy
+  -> GPU private IP `172.31.46.190:8188`
+  -> `comfyui.service`
+
+Current public result:
+
+- `comfy.desineuron.in` currently returns `502 Bad Gateway`
+- ingress route is present and Caddy is healthy
+- the current GPU backend is not yet listening on `172.31.46.190:8188`, so this is a backend readiness issue, not a DNS or edge-TLS issue
+
+Current GPU service:
+
+- `comfyui.service`
+- app path: `/opt/dlami/nvme/ComfyUI`
+- log path: `/var/log/comfyui/service.log`
+- port: `8188/tcp`
+
+Current backend state on `2026-04-11`:
+
+- `comfyui.service` is `activating`
+- latest log shows ComfyUI startup and `Starting server`
+- the process is still not binding `8188`, so ingress sees the backend as unavailable
+
+Expected endpoints:
+
+- `https://comfy.desineuron.in/`
+- `https://comfy.desineuron.in/prompt`
+- `https://comfy.desineuron.in/history/{prompt_id}`
+- `https://comfy.desineuron.in/queue`
+- `https://comfy.desineuron.in/upload/image`
+
+### Files and Config Artifacts
+
+Infrastructure artifacts in repo:
+
+- [README.md](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/README.md)
+- [Caddyfile](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/Caddyfile)
+- [rathole-server.toml](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/rathole-server.toml)
+- [rathole-client.toml](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/rathole-client.toml)
+- [install_linux_rathole_client.sh](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/install_linux_rathole_client.sh)
+- [user_data.sh](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/user_data.sh)
+- [install_gpu_comfyui_service.sh](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/install_gpu_comfyui_service.sh)
+- [map_gpu_comfy_security.ps1](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/map_gpu_comfy_security.ps1)
+- [sync_ingress_home_ip.py](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/sync_ingress_home_ip.py)
+- [desineuron-ingress-home-ip-sync.service](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/desineuron-ingress-home-ip-sync.service)
+- [desineuron-ingress-home-ip-sync.timer](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/desineuron-ingress-home-ip-sync.timer)
+- [install_linux_ingress_ip_sync.sh](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/install_linux_ingress_ip_sync.sh)
+- [README.md](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/ops_control_plane/README.md)
+- [Desineuron Ops Control Plane Bibel.md](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/.Agent%20Context/Bibels/Desineuron%20Ops%20Control%20Plane%20Bibel.md)
+
+Linux origin files touched:
+
+- `/etc/nginx/sites-enabled/desineuron.conf`
+- `/mnt/ServerStorage/docker_apps/nextcloud/.env`
+- `/mnt/ServerStorage/docker_apps/nextcloud/data/config/config.php`
+- `/mnt/ServerStorage/docker_apps/nextcloud/data/config/reverse-proxy.config.php`
+
+Backups created on Linux:
+
+- `/mnt/ServerStorage/docker_apps/nextcloud/.env.pre_ingress_backup_2026-04-08`
+- `/mnt/ServerStorage/docker_apps/nextcloud/data/config/reverse-proxy.config.php.pre_ingress_backup_2026-04-08`
+
+### Dynamic Home IP Sync
+
+Purpose:
+
+- Keep ingress `22/tcp` restricted to the current Airtel public IP even when the ISP changes it
+- Prevent future manual outages for SSH fallback caused by stale home-IP security-group rules
+
+Design:
+
+- Linux origin runs `desineuron-ingress-home-ip-sync.timer`
+- Timer fires on boot and every 5 minutes
+- Service resolves the current home public IP via `https://api.ipify.org`
+- Service updates only the ingress security group `sg-0721b8b48e12c531d`
+- Only the SSH fallback rule is mutated
+- `rathole` is no longer dependent on the Airtel IP because `2333/tcp` remains open on the ingress
+
+Installed Linux paths:
+
+- `/usr/local/bin/sync_ingress_home_ip.py`
+- `/etc/systemd/system/desineuron-ingress-home-ip-sync.service`
+- `/etc/systemd/system/desineuron-ingress-home-ip-sync.timer`
+- `/etc/desineuron-ingress-home-ip-sync.env`
+- `/opt/desineuron-ingress-ip-sync/.venv`
+- `/var/lib/desineuron-ingress-ip-sync/current_ip.txt`
+
+Current state:
+
+- Timer: enabled and active
+- Last recorded home public IP: `223.185.28.89`
+- Ingress SSH rule CIDR: `223.185.28.89/32`
+
+### Operational Commands
+
+Check AWS ingress status:
+
+```powershell
+aws ec2 describe-instances --instance-ids i-094df09acafb72494 --region us-east-1
+aws ec2 describe-addresses --allocation-ids eipalloc-0d54fc0f827450e7b --region us-east-1
+```
+
+Check ingress services:
+
+```powershell
+aws ssm send-command --region us-east-1 --instance-ids i-094df09acafb72494 --document-name AWS-RunShellScript --parameters commands="sudo systemctl status caddy rathole-server --no-pager"
+```
+
+Check GPU ComfyUI service:
+
+```powershell
+aws ssm send-command --region us-east-1 --instance-ids i-0e4eab5fe67cf9abe --document-name AWS-RunShellScript --parameters commands="sudo systemctl status comfyui --no-pager","ss -ltnp | grep 8188 || true","tail -n 40 /var/log/comfyui/service.log || true"
+```
+
+Check Linux origin services:
+
+```powershell
+ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S systemctl status rathole-client nginx"
+ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S systemctl status desineuron-ingress-home-ip-sync.service desineuron-ingress-home-ip-sync.timer"
+ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S journalctl -u desineuron-ingress-home-ip-sync -n 50 --no-pager"
+ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S systemctl status desineuron-ops-control-plane.service --no-pager"
+ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S docker compose -f /opt/desineuron-ops-control-plane/docker-compose.yml ps"
+```
+
+Public endpoint validation:
+
+```powershell
+curl.exe -I https://office.desineuron.in
+curl.exe -I https://git.desineuron.in
+curl.exe -I https://cloud.desineuron.in
+curl.exe -I https://projects.desineuron.in
+curl.exe -I https://talk.desineuron.in
+curl.exe -I https://vpn.desineuron.in
+curl.exe -I https://comfy.desineuron.in
+curl.exe -I https://ops.desineuron.in/login
+```
+
+### Future Service Mapping Runbook
+
+Use this pattern for any future public service behind the stable ingress layer.
+
+1. Decide the backend location.
+
+- Linux origin behind `rathole`
+- AWS GPU/private EC2 node
+- another private backend later
+
+2. Decide whether the service should terminate TLS at ingress.
+
+- default: yes
+- Caddy on ingress should own the public hostname and certificate
+
+3. Create the DNS record in Cloudflare.
+
+- type: `A`
+- value: `98.87.120.120`
+- proxy mode: `DNS only`
+- low TTL during rollout
+
+4. Add the ingress route in [`Caddyfile`](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/Caddyfile).
+
+Patterns:
+
+- Linux-origin service:
+  - proxy to `https://127.0.0.1:8443`
+  - preserve `Host`
+- private AWS backend service:
+  - proxy to `http://<private-ip>:<port>` or `https://<private-ip>:<port>`
+
+5. Restrict backend network access.
+
+- never leave backend app ports open to `0.0.0.0/0` unless absolutely necessary
+- prefer security-group rule allowing traffic only from ingress security group
+- for home-origin services, keep them private behind `rathole`
+
+6. Reload ingress.
+
+```powershell
+ssh -i "F:\Workin In Progress\DESINEURON\GITLAB\Project_Velocity\desineuron-l4-node.pem" ec2-user@98.87.120.120 "sudo caddy validate --config /etc/caddy/Caddyfile && sudo systemctl reload caddy"
+```
+
+7. Validate TLS and app response.
+
+- check certificate subject matches hostname
+- check `curl -I https://<host>`
+- check login page or health endpoint
+- check browser behavior
+
+8. If the backend is stateful, create a persistent service.
+
+- prefer `systemd`
+- enable restart on failure
+- log to a stable path
+- record service name, working directory, ports, and restart policy in this handoff doc
+
+9. Update team docs immediately.
+
+- hostname
+- DNS record type
+- ingress route target
+- backend service owner
+- service name
+- health check command
+- rollback step
+
+### Security Notes
+
+- Public traffic terminates only at the AWS edge.
+- The Linux box no longer needs Cloudflare Tunnel for these six routes.
+- The Linux origin is reached through an outbound tunnel, not by directly exposing the home machine to the public for app traffic.
+- SSH on the Linux box remains key-only.
+- The AWS ingress IAM role is limited to SSM core.
+- ComfyUI is no longer directly exposed on the GPU public IP; only the ingress layer can reach `8188`.
+- Ingress `22/tcp` stays restricted and is now auto-synced from the Linux origin.
+- Ingress `2333/tcp` is intentionally open so `rathole` survives Airtel IP changes without operator action.
+
+### Remaining Improvement Ideas
+
+- Move the Linux nginx certificate issuance/renewal model to the AWS edge permanently instead of copying an existing certificate.
+- Clean up nginx warnings about duplicated protocol options.
+- Separate `talk.desineuron.in` more fully from general Nextcloud if a distinct Talk-only UX is desired.
+- Add authentication in front of `comfy.desineuron.in`; internet scanners started hitting the route immediately after it went live.
+- Consider putting Basic Auth or an allowlist in front of `comfy.desineuron.in` before broader team rollout.
+- Add monitoring and alerting on:
+  - `caddy`
+  - `rathole-server`
+  - `rathole-client`
+  - public HTTPS checks
+- Add infrastructure-as-code for the EC2 ingress node if this should be reproducible by the team without manual AWS CLI steps.
+
+### Rollback
+
+If rollback is needed:
+
+1. Recreate Cloudflare CNAME/tunnel routes or repoint the DNS records away from `98.87.120.120`.
+2. Stop `caddy` and `rathole-server` on AWS.
+3. Stop `rathole-client` on Linux.
+4. Restore Nextcloud files from:
+   - `.env.pre_ingress_backup_2026-04-08`
+   - `reverse-proxy.config.php.pre_ingress_backup_2026-04-08`
+5. Restart `nextcloud_app` and nginx.
+
+### Team Summary
+
+This migration is complete.
+
+Cloudflare Tunnel is no longer the production path for the six public service hostnames. The stable production ingress is now the AWS `t4g.micro` node with Elastic IP `98.87.120.120`, and the Linux machine remains the private origin behind `rathole`.
+
+Additional mapped route:
+
+- `comfy.desineuron.in` now terminates on the same stable ingress and forwards to the GPU node's private address `172.31.46.190:8188`.
+- No further DNS change is needed for ComfyUI.
+- The backend is supervised by `systemd`, but the current worker is not yet binding `8188`, so public access is currently degraded with `502`.
+- The team can use:
+  - `https://comfy.desineuron.in/prompt`
+  - `https://comfy.desineuron.in/history/{prompt_id}`
+  - `https://comfy.desineuron.in/queue`
+  - `https://comfy.desineuron.in/upload/image`
+
+### Current Status Snapshot - 2026-04-11
+
+Live public service state:
+
+- `office.desineuron.in` -> `200`
+- `git.desineuron.in` -> `200`
+- `cloud.desineuron.in` -> `200`
+- `projects.desineuron.in` -> `200`
+- `talk.desineuron.in` -> `200`
+- `vpn.desineuron.in` -> `200`
+- `ops.desineuron.in/login` -> `200`
+- `comfy.desineuron.in` -> `502`
+
+Linux-origin health:
+
+- `nginx.service` -> `active`
+- `rathole-client.service` -> `active`
+- `desineuron-ingress-home-ip-sync.timer` -> `active`
+- `desineuron-ops-control-plane.service` -> `active`
+
+Linux ops stack containers:
+
+- `desineuron-ops-api` -> `Up`
+- `desineuron-ops-db` -> `Up (healthy)`
+- `desineuron-ops-worker` -> `Up`
+
+Ingress health:
+
+- `caddy` -> `active`
+- `rathole-server` -> `active`
+- `comfy.desineuron.in` Caddy route is present in `/etc/caddy/Caddyfile`
+
+GPU ComfyUI state:
+
+- `comfyui.service` -> `activating`
+- latest logs show ComfyUI startup sequence completing toward `Starting server`
+- no active listener on `8188` yet
+- ingress cannot connect to `172.31.46.190:8188`, which is why the public result is `502`
+
+### Linux Ops Control Plane
+
+The Linux box now also hosts the private AWS control surface for the team.
+
+Public operator URL:
+
+- `https://ops.desineuron.in/login`
+
+Purpose:
+
+- launch/stop/terminate AWS machines
+- view spot/on-demand market data
+- track runtime and estimated cost
+- ingest model directories from the Linux box into S3
+- hydrate models from S3 to AWS GPU nodes
+- manage ingress routes through the `t4g.micro`
+- export session/cost CSVs
+
+Linux runtime paths:
+
+- stack root: `/opt/desineuron-ops-control-plane`
+- env file: `/opt/desineuron-ops-control-plane/.env`
+- exports: `/opt/desineuron-ops-control-plane/exports`
+- state: `/opt/desineuron-ops-control-plane/state`
+
+Canonical S3 bucket:
+
+- `desineuron-ops-control-plane-819079556187-us-east-1`
+
+Model library source on Linux:
+
+- `/mnt/ServerStorage/ai-models/models`
+
+Current operator accounts:
+
+- `sagnik@desineuron.in`
+- `sayan@desineuron.in`
+- `sourik@desineuron.in`
+
+Reference docs:
+
+- [README.md](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/ops_control_plane/README.md)
+- [Desineuron Ops Control Plane Bibel.md](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/.Agent%20Context/Bibels/Desineuron%20Ops%20Control%20Plane%20Bibel.md)
--- a/infrastructure/desineuron_ingress/desineuron-ingress-home-ip-sync.service
+++ b/infrastructure/desineuron_ingress/desineuron-ingress-home-ip-sync.service
@@ -0,0 +1,12 @@
+[Unit]
+Description=Update ingress SSH allowlist to current home public IP
+After=network-online.target
+Wants=network-online.target
+
+[Service]
+Type=oneshot
+EnvironmentFile=/etc/desineuron-ingress-home-ip-sync.env
+ExecStart=/opt/desineuron-ingress-ip-sync/.venv/bin/python /usr/local/bin/sync_ingress_home_ip.py
+WorkingDirectory=/var/lib/desineuron-ingress-ip-sync
+User=root
+Group=root
--- a/infrastructure/desineuron_ingress/desineuron-ingress-home-ip-sync.timer
+++ b/infrastructure/desineuron_ingress/desineuron-ingress-home-ip-sync.timer
@@ -0,0 +1,11 @@
+[Unit]
+Description=Run ingress home IP sync on boot and every 5 minutes
+
+[Timer]
+OnBootSec=45s
+OnUnitActiveSec=5min
+Unit=desineuron-ingress-home-ip-sync.service
+Persistent=true
+
+[Install]
+WantedBy=timers.target
--- a/infrastructure/desineuron_ingress/install_gpu_comfyui_service.sh
+++ b/infrastructure/desineuron_ingress/install_gpu_comfyui_service.sh
@@ -0,0 +1,52 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+COMFY_DIR="/opt/dlami/nvme/ComfyUI"
+SERVICE_NAME="comfyui"
+LOG_DIR="/var/log/comfyui"
+
+if ! command -v git >/dev/null 2>&1; then
+  sudo apt-get update
+  sudo apt-get install -y git
+fi
+
+if [ ! -d "${COMFY_DIR}/.git" ]; then
+  sudo mkdir -p /opt/dlami/nvme
+  sudo chown -R ubuntu:ubuntu /opt/dlami/nvme
+  git clone https://github.com/comfyanonymous/ComfyUI.git "${COMFY_DIR}"
+else
+  git -C "${COMFY_DIR}" pull --ff-only
+fi
+
+python3 -m pip install -r "${COMFY_DIR}/requirements.txt"
+
+sudo mkdir -p "${LOG_DIR}"
+sudo chown -R ubuntu:ubuntu "${LOG_DIR}"
+
+sudo tee /etc/systemd/system/${SERVICE_NAME}.service >/dev/null <<'EOF'
+[Unit]
+Description=ComfyUI GPU Service
+After=network-online.target
+Wants=network-online.target
+
+[Service]
+Type=simple
+User=ubuntu
+Group=ubuntu
+WorkingDirectory=/opt/dlami/nvme/ComfyUI
+Environment=HOME=/home/ubuntu
+Environment=PYTHONUNBUFFERED=1
+ExecStart=/usr/bin/python3 /opt/dlami/nvme/ComfyUI/main.py --listen 0.0.0.0 --port 8188 --disable-auto-launch
+Restart=always
+RestartSec=5
+StandardOutput=append:/var/log/comfyui/service.log
+StandardError=append:/var/log/comfyui/service.log
+
+[Install]
+WantedBy=multi-user.target
+EOF
+
+sudo systemctl daemon-reload
+sudo systemctl enable --now "${SERVICE_NAME}.service"
+sleep 5
+sudo systemctl --no-pager --full status "${SERVICE_NAME}.service"
--- a/infrastructure/desineuron_ingress/install_linux_ingress_ip_sync.sh
+++ b/infrastructure/desineuron_ingress/install_linux_ingress_ip_sync.sh
@@ -0,0 +1,40 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+if [[ $# -ne 2 ]]; then
+  echo "Usage: $0 <aws_access_key_id> <aws_secret_access_key>" >&2
+  exit 1
+fi
+
+AWS_ACCESS_KEY_ID="$1"
+AWS_SECRET_ACCESS_KEY="$2"
+INSTALL_ROOT="/opt/desineuron-ingress-ip-sync"
+VENV_PATH="${INSTALL_ROOT}/.venv"
+
+sudo apt-get update
+sudo apt-get install -y python3-venv
+sudo mkdir -p "${INSTALL_ROOT}"
+sudo python3 -m venv "${VENV_PATH}"
+sudo "${VENV_PATH}/bin/pip" install --upgrade pip boto3
+
+sudo install -m 0755 /tmp/sync_ingress_home_ip.py /usr/local/bin/sync_ingress_home_ip.py
+sudo install -m 0644 /tmp/desineuron-ingress-home-ip-sync.service /etc/systemd/system/desineuron-ingress-home-ip-sync.service
+sudo install -m 0644 /tmp/desineuron-ingress-home-ip-sync.timer /etc/systemd/system/desineuron-ingress-home-ip-sync.timer
+
+sudo mkdir -p /var/lib/desineuron-ingress-ip-sync
+sudo tee /etc/desineuron-ingress-home-ip-sync.env >/dev/null <<EOF
+AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
+AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
+AWS_REGION=us-east-1
+INGRESS_SECURITY_GROUP_ID=sg-0721b8b48e12c531d
+INGRESS_SSH_PORT=22
+INGRESS_SSH_RULE_DESCRIPTION=SSH fallback from origin network
+INGRESS_IP_STATE_FILE=/var/lib/desineuron-ingress-ip-sync/current_ip.txt
+EOF
+sudo chmod 600 /etc/desineuron-ingress-home-ip-sync.env
+
+sudo systemctl daemon-reload
+sudo systemctl enable --now desineuron-ingress-home-ip-sync.timer
+sudo systemctl start desineuron-ingress-home-ip-sync.service
+sudo systemctl --no-pager --full status desineuron-ingress-home-ip-sync.service
+sudo systemctl --no-pager --full status desineuron-ingress-home-ip-sync.timer
--- a/infrastructure/desineuron_ingress/install_linux_rathole_client.sh
+++ b/infrastructure/desineuron_ingress/install_linux_rathole_client.sh
@@ -0,0 +1,44 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+RATHOLE_VERSION="${RATHOLE_VERSION:-v0.4.3}"
+RATHOLE_URL="${RATHOLE_URL:-https://github.com/rapiz1/rathole/releases/download/${RATHOLE_VERSION}/rathole-x86_64-unknown-linux-gnu.zip}"
+CONFIG_SOURCE="${CONFIG_SOURCE:-/tmp/rathole-client.toml}"
+
+sudo install -d -m 0755 /etc/rathole
+sudo install -d -m 0755 /opt/rathole
+
+tmp_dir="$(mktemp -d)"
+trap 'rm -rf "$tmp_dir"' EXIT
+
+cd "$tmp_dir"
+curl -fL "$RATHOLE_URL" -o rathole.zip
+python3 - <<'PY'
+import zipfile
+z = zipfile.ZipFile("rathole.zip")
+z.extractall(".")
+PY
+
+sudo install -m 0755 rathole /usr/local/bin/rathole
+sudo install -m 0600 "$CONFIG_SOURCE" /etc/rathole/client.toml
+
+cat <<'EOF' | sudo tee /etc/systemd/system/rathole-client.service >/dev/null
+[Unit]
+Description=Desineuron Rathole Client
+After=network-online.target
+Wants=network-online.target
+
+[Service]
+Type=simple
+ExecStart=/usr/local/bin/rathole /etc/rathole/client.toml
+Restart=always
+RestartSec=5
+User=root
+
+[Install]
+WantedBy=multi-user.target
+EOF
+
+sudo systemctl daemon-reload
+sudo systemctl enable --now rathole-client.service
+sudo systemctl status --no-pager rathole-client.service || true
--- a/infrastructure/desineuron_ingress/map_gpu_comfy_security.ps1
+++ b/infrastructure/desineuron_ingress/map_gpu_comfy_security.ps1
@@ -0,0 +1,33 @@
+$ErrorActionPreference = "Stop"
+
+$gpuGroups = @(
+  "sg-0b144c17b1b89f4c6",
+  "sg-05e4de3fe94ad6558"
+)
+
+$ingressGroup = "sg-0721b8b48e12c531d"
+
+try {
+  aws ec2 authorize-security-group-ingress `
+    --group-id "sg-0b144c17b1b89f4c6" `
+    --ip-permissions "[{\"IpProtocol\":\"tcp\",\"FromPort\":8188,\"ToPort\":8188,\"UserIdGroupPairs\":[{\"GroupId\":\"$ingressGroup\",\"Description\":\"Allow ComfyUI from ingress\"}]}]" | Out-Null
+} catch {
+}
+
+foreach ($group in $gpuGroups) {
+  foreach ($port in 8118, 8188) {
+    try {
+      aws ec2 revoke-security-group-ingress `
+        --group-id $group `
+        --protocol tcp `
+        --port $port `
+        --cidr 0.0.0.0/0 | Out-Null
+    } catch {
+    }
+  }
+}
+
+aws ec2 describe-security-groups `
+  --group-ids $gpuGroups `
+  --query "SecurityGroups[].{GroupId:GroupId,GroupName:GroupName,Ingress:IpPermissions}" `
+  --output json
--- a/infrastructure/desineuron_ingress/rathole-client.toml
+++ b/infrastructure/desineuron_ingress/rathole-client.toml
@@ -0,0 +1,12 @@
+[client]
+remote_addr = "__INGRESS_HOST__:2333"
+default_token = "__RATHOLE_TOKEN__"
+
+[client.transport]
+type = "noise"
+
+[client.transport.noise]
+remote_public_key = "__RATHOLE_SERVER_PUBLIC_KEY__"
+
+[client.services.https_origin]
+local_addr = "127.0.0.1:443"
--- a/infrastructure/desineuron_ingress/rathole-server.toml
+++ b/infrastructure/desineuron_ingress/rathole-server.toml
@@ -0,0 +1,12 @@
+[server]
+bind_addr = "0.0.0.0:2333"
+default_token = "__RATHOLE_TOKEN__"
+
+[server.transport]
+type = "noise"
+
+[server.transport.noise]
+local_private_key = "__RATHOLE_SERVER_PRIVATE_KEY__"
+
+[server.services.https_origin]
+bind_addr = "127.0.0.1:8443"
--- a/infrastructure/desineuron_ingress/sync_ingress_home_ip.py
+++ b/infrastructure/desineuron_ingress/sync_ingress_home_ip.py
@@ -0,0 +1,110 @@
+#!/usr/bin/env python3
+import json
+import os
+import sys
+import urllib.request
+from pathlib import Path
+
+import boto3
+
+
+SECURITY_GROUP_ID = os.environ["INGRESS_SECURITY_GROUP_ID"]
+RULE_DESCRIPTION = os.environ.get("INGRESS_SSH_RULE_DESCRIPTION", "SSH fallback from origin network")
+PORT = int(os.environ.get("INGRESS_SSH_PORT", "22"))
+STATE_FILE = Path(os.environ.get("INGRESS_IP_STATE_FILE", "/var/lib/desineuron-ingress-ip-sync/current_ip.txt"))
+
+
+def get_public_ip() -> str:
+    with urllib.request.urlopen("https://api.ipify.org", timeout=15) as response:
+        return response.read().decode("utf-8").strip()
+
+
+def get_security_group():
+    ec2 = boto3.client("ec2", region_name=os.environ.get("AWS_REGION", "us-east-1"))
+    response = ec2.describe_security_groups(GroupIds=[SECURITY_GROUP_ID])
+    return ec2, response["SecurityGroups"][0]
+
+
+def find_existing_ssh_rules(ip_permissions):
+    matches = []
+    for permission in ip_permissions:
+        if permission.get("IpProtocol") != "tcp":
+            continue
+        if permission.get("FromPort") != PORT or permission.get("ToPort") != PORT:
+            continue
+        for ip_range in permission.get("IpRanges", []):
+            if ip_range.get("Description") == RULE_DESCRIPTION:
+                matches.append(ip_range["CidrIp"])
+    return matches
+
+
+def revoke_old_rules(ec2, cidrs):
+    for cidr in cidrs:
+        ec2.revoke_security_group_ingress(
+            GroupId=SECURITY_GROUP_ID,
+            IpPermissions=[
+                {
+                    "IpProtocol": "tcp",
+                    "FromPort": PORT,
+                    "ToPort": PORT,
+                    "IpRanges": [{"CidrIp": cidr}],
+                }
+            ],
+        )
+
+
+def authorize_new_rule(ec2, cidr):
+    ec2.authorize_security_group_ingress(
+        GroupId=SECURITY_GROUP_ID,
+        IpPermissions=[
+            {
+                "IpProtocol": "tcp",
+                "FromPort": PORT,
+                "ToPort": PORT,
+                "IpRanges": [{"CidrIp": cidr, "Description": RULE_DESCRIPTION}],
+            }
+        ],
+    )
+
+
+def write_state(ip: str):
+    STATE_FILE.parent.mkdir(parents=True, exist_ok=True)
+    STATE_FILE.write_text(ip + "\n", encoding="utf-8")
+
+
+def main() -> int:
+    public_ip = get_public_ip()
+    desired_cidr = f"{public_ip}/32"
+
+    ec2, group = get_security_group()
+    existing_rules = find_existing_ssh_rules(group["IpPermissions"])
+
+    if existing_rules == [desired_cidr]:
+        write_state(public_ip)
+        print(json.dumps({"status": "noop", "public_ip": public_ip, "cidr": desired_cidr}))
+        return 0
+
+    if existing_rules:
+        revoke_old_rules(ec2, existing_rules)
+
+    authorize_new_rule(ec2, desired_cidr)
+    write_state(public_ip)
+    print(
+        json.dumps(
+            {
+                "status": "updated",
+                "public_ip": public_ip,
+                "cidr": desired_cidr,
+                "replaced": existing_rules,
+            }
+        )
+    )
+    return 0
+
+
+if __name__ == "__main__":
+    try:
+        raise SystemExit(main())
+    except Exception as exc:
+        print(json.dumps({"status": "error", "error": str(exc)}), file=sys.stderr)
+        raise
--- a/infrastructure/desineuron_ingress/user_data.sh
+++ b/infrastructure/desineuron_ingress/user_data.sh
@@ -0,0 +1,102 @@
+#!/bin/bash
+set -euxo pipefail
+
+exec > >(tee /var/log/desineuron-ingress-bootstrap.log | logger -t user-data -s 2>/dev/console) 2>&1
+
+dnf update -y
+dnf install -y curl tar gzip unzip jq policycoreutils-python-utils
+
+systemctl enable amazon-ssm-agent
+systemctl restart amazon-ssm-agent
+
+useradd --system --home /var/lib/caddy --shell /sbin/nologin caddy || true
+install -d -o caddy -g caddy -m 0755 /etc/caddy /var/lib/caddy /var/log/caddy
+install -d -m 0755 /etc/rathole /opt/rathole
+
+cat >/etc/ssh/sshd_config.d/10-desineuron-hardening.conf <<'EOF'
+PasswordAuthentication no
+KbdInteractiveAuthentication no
+PermitRootLogin no
+PubkeyAuthentication yes
+EOF
+systemctl restart sshd
+
+CADDY_VERSION="v2.10.2"
+CADDY_URL="https://github.com/caddyserver/caddy/releases/download/${CADDY_VERSION}/caddy_2.10.2_linux_arm64.tar.gz"
+RATHOLE_VERSION="v0.4.3"
+RATHOLE_URL="https://github.com/rapiz1/rathole/releases/download/${RATHOLE_VERSION}/rathole-aarch64-unknown-linux-musl.zip"
+
+tmp_dir="$(mktemp -d)"
+cd "$tmp_dir"
+curl -fL "$CADDY_URL" -o caddy.tar.gz
+tar -xzf caddy.tar.gz
+install -m 0755 caddy /usr/local/bin/caddy
+setcap cap_net_bind_service=+ep /usr/local/bin/caddy || true
+
+curl -fL "$RATHOLE_URL" -o rathole.zip
+python3 - <<'PY'
+import zipfile
+z = zipfile.ZipFile("rathole.zip")
+z.extractall(".")
+PY
+install -m 0755 rathole /usr/local/bin/rathole
+rm -rf "$tmp_dir"
+
+cat >/etc/systemd/system/caddy.service <<'EOF'
+[Unit]
+Description=Caddy
+After=network-online.target
+Wants=network-online.target
+
+[Service]
+User=caddy
+Group=caddy
+ExecStart=/usr/local/bin/caddy run --environ --config /etc/caddy/Caddyfile
+ExecReload=/usr/local/bin/caddy reload --config /etc/caddy/Caddyfile
+TimeoutStopSec=5s
+LimitNOFILE=1048576
+PrivateTmp=true
+ProtectSystem=full
+AmbientCapabilities=CAP_NET_BIND_SERVICE
+CapabilityBoundingSet=CAP_NET_BIND_SERVICE
+NoNewPrivileges=true
+Restart=on-failure
+
+[Install]
+WantedBy=multi-user.target
+EOF
+
+cat >/etc/systemd/system/rathole-server.service <<'EOF'
+[Unit]
+Description=Desineuron Rathole Server
+After=network-online.target
+Wants=network-online.target
+
+[Service]
+Type=simple
+ExecStart=/usr/local/bin/rathole /etc/rathole/server.toml
+Restart=always
+RestartSec=5
+User=root
+
+[Install]
+WantedBy=multi-user.target
+EOF
+
+cat >/etc/logrotate.d/caddy <<'EOF'
+/var/log/caddy/*.log {
+  daily
+  rotate 14
+  compress
+  missingok
+  notifempty
+  copytruncate
+}
+EOF
+
+touch /etc/caddy/Caddyfile
+touch /etc/rathole/server.toml
+
+systemctl daemon-reload
+systemctl enable caddy.service
+systemctl enable rathole-server.service
--- a/infrastructure/ops_control_plane/.env.example
+++ b/infrastructure/ops_control_plane/.env.example
@@ -0,0 +1,37 @@
+OPS_DB_NAME=desineuron_ops
+OPS_DB_USER=desineuron_ops
+OPS_DB_PASSWORD=change-me
+OPS_DATABASE_URL=postgresql+psycopg://desineuron_ops:change-me@ops-db:5432/desineuron_ops
+OPS_SESSION_SECRET=change-me
+OPS_ADMIN_USERNAME=sagnik
+OPS_ADMIN_PASSWORD=change-me
+OPS_TEAM_USERS_JSON=[]
+OPS_DEFAULT_REGION=us-east-1
+OPS_VISIBLE_REGIONS=us-east-1,ap-south-1,eu-west-1
+OPS_BUCKET_NAME=
+OPS_BUCKET_REGION=us-east-1
+OPS_SSH_KEY_PATH=/app/state/desineuron-l4-node.pem
+OPS_GPU_SSH_USER=ubuntu
+OPS_INGRESS_SSH_HOST=98.87.120.120
+OPS_INGRESS_SSH_USER=ec2-user
+OPS_INGRESS_PRIVATE_IP=172.31.41.26
+OPS_INGRESS_SSH_PORT=22
+OPS_LINUX_PUBLIC_BASE_URL=https://ops.desineuron.in
+OPS_PRICE_EBS_GP3_PER_GB_MONTH=0.08
+OPS_PRICE_PUBLIC_IPV4_PER_HOUR=0.005
+OPS_ALLOWED_MACHINE_IDS=i-094df09acafb72494,i-0e4eab5fe67cf9abe
+OPS_GPU_SUBNET_ID=subnet-03d684ed15f327151
+OPS_GPU_SECURITY_GROUP_IDS=sg-05e4de3fe94ad6558,sg-0b144c17b1b89f4c6
+OPS_GPU_KEY_NAME=desineuron-l4-node
+OPS_GPU_AMI_ID=ami-0016081b488c7376d
+OPS_GPU_INSTANCE_PROFILE=Synapse-Training-Profile
+OPS_GPU_ROOT_VOLUME_GB=300
+OPS_GPU_WORKER_SCRIPT_PATH=/app/ops_control_plane/worker.py
+OPS_CSV_EXPORT_DIR=/app/exports
+OPS_LOG_DIR=/app/logs
+OPS_STATE_DIR=/app/state
+OPS_MODEL_LIBRARY_HOST_PATH=/mnt/ServerStorage/ai-models/models
+OPS_MODEL_LIBRARY_ROOT=/model-library
+OPS_INGRESS_ROUTE_HELPER=/usr/local/bin/manage_desineuron_routes.py
+OPS_CLOUDFLARE_ZONE_NAME=desineuron.in
+OPS_CLOUDFLARE_API_TOKEN=
--- a/infrastructure/ops_control_plane/README.md
+++ b/infrastructure/ops_control_plane/README.md
@@ -0,0 +1,78 @@
+# Desineuron Ops Control Plane
+
+Internal Linux-hosted control surface for:
+
+- AWS machine lifecycle
+- S3-backed model ingest with generated manifests and checksums
+- model hydration from S3
+- runtime and estimated cost tracking
+- ingress route management
+- session logging and CSV export
+
+Main deployment target:
+
+- Linux box at `192.168.1.4`
+
+Primary public route:
+
+- `ops.desineuron.in`
+
+Canonical S3 bucket:
+
+- `desineuron-ops-control-plane-819079556187-us-east-1`
+
+Related AWS nodes:
+
+- ingress: `i-094df09acafb72494`
+- current GPU worker: `i-0e4eab5fe67cf9abe`
+
+Core runtime:
+
+- FastAPI web + API surface
+- background worker
+- PostgreSQL
+- Docker Compose
+- systemd wrapper on Linux
+
+Key files:
+
+- `docker-compose.yml`
+- `.env.example`
+- `app/ops_control_plane/main.py`
+- `app/ops_control_plane/worker.py`
+- `app/ops_control_plane/cli.py`
+- `manage_desineuron_routes.py`
+- `install_linux_ops_control_plane.sh`
+
+Runtime paths on Linux:
+
+- stack root: `/opt/desineuron-ops-control-plane`
+- env file: `/opt/desineuron-ops-control-plane/.env`
+- exports: `/opt/desineuron-ops-control-plane/exports`
+- state: `/opt/desineuron-ops-control-plane/state`
+
+Access:
+
+- login route: `https://ops.desineuron.in/login`
+- operator logins are provisioned as email-style usernames
+- admin password is stored in the protected `.env` file on Linux and should be retrieved locally rather than copied into repo notes
+
+Validated live behaviors:
+
+- market pricing API returns live on-demand and spot views
+- session and cost tracking persist in PostgreSQL and export to CSV
+- spot launch failures are recorded cleanly instead of crashing the UI
+- on-demand GPU launch was validated with a `g6.xlarge` lifecycle test
+- managed ingress route upsert/delete was validated through the helper on the `t4g.micro` ingress
+- model ingest from Linux model library to S3 was validated with `ops-smoke-model`, including manifest generation and catalog registration
+
+Operator retrieval commands:
+
+- admin password:
+  - `sudo sed -n 's/^OPS_ADMIN_PASSWORD=//p' /opt/desineuron-ops-control-plane/.env`
+- latest CSV export:
+  - `ls -lah /opt/desineuron-ops-control-plane/exports`
+
+Installer safety note:
+
+- `install_linux_ops_control_plane.sh` intentionally excludes runtime directories (`data/`, `exports/`, `logs/`, `state/`, `.env`) from code sync so redeploys do not corrupt Postgres state or overwrite secrets
--- a/infrastructure/ops_control_plane/app/Dockerfile
+++ b/infrastructure/ops_control_plane/app/Dockerfile
@@ -0,0 +1,16 @@
+FROM python:3.12-slim
+
+ENV PYTHONDONTWRITEBYTECODE=1
+ENV PYTHONUNBUFFERED=1
+
+WORKDIR /app
+
+COPY requirements.txt /app/requirements.txt
+RUN apt-get update \
+    && apt-get install -y --no-install-recommends openssh-client curl ca-certificates \
+    && rm -rf /var/lib/apt/lists/*
+RUN pip install --no-cache-dir -r /app/requirements.txt
+
+COPY ops_control_plane /app/ops_control_plane
+
+CMD ["python", "-m", "ops_control_plane.main"]
--- a/infrastructure/ops_control_plane/app/ops_control_plane/init.py
+++ b/infrastructure/ops_control_plane/app/ops_control_plane/init.py
@@ -0,0 +1 @@
+__all__ = ["main"]
--- a/infrastructure/ops_control_plane/app/ops_control_plane/pycache/init.cpython-314.pyc
+++ b/infrastructure/ops_control_plane/app/ops_control_plane/pycache/init.cpython-314.pyc
--- a/infrastructure/ops_control_plane/app/ops_control_plane/pycache/aws_control.cpython-314.pyc
+++ b/infrastructure/ops_control_plane/app/ops_control_plane/pycache/aws_control.cpython-314.pyc
--- a/infrastructure/ops_control_plane/app/ops_control_plane/pycache/cli.cpython-314.pyc
+++ b/infrastructure/ops_control_plane/app/ops_control_plane/pycache/cli.cpython-314.pyc
--- a/infrastructure/ops_control_plane/app/ops_control_plane/pycache/config.cpython-314.pyc
+++ b/infrastructure/ops_control_plane/app/ops_control_plane/pycache/config.cpython-314.pyc
--- a/infrastructure/ops_control_plane/app/ops_control_plane/pycache/database.cpython-314.pyc
+++ b/infrastructure/ops_control_plane/app/ops_control_plane/pycache/database.cpython-314.pyc
--- a/infrastructure/ops_control_plane/app/ops_control_plane/pycache/main.cpython-314.pyc
+++ b/infrastructure/ops_control_plane/app/ops_control_plane/pycache/main.cpython-314.pyc
--- a/infrastructure/ops_control_plane/app/ops_control_plane/pycache/models.cpython-314.pyc
+++ b/infrastructure/ops_control_plane/app/ops_control_plane/pycache/models.cpython-314.pyc
--- a/infrastructure/ops_control_plane/app/ops_control_plane/pycache/route_control.cpython-314.pyc
+++ b/infrastructure/ops_control_plane/app/ops_control_plane/pycache/route_control.cpython-314.pyc
--- a/infrastructure/ops_control_plane/app/ops_control_plane/pycache/security.cpython-314.pyc
+++ b/infrastructure/ops_control_plane/app/ops_control_plane/pycache/security.cpython-314.pyc
--- a/infrastructure/ops_control_plane/app/ops_control_plane/pycache/seed.cpython-314.pyc
+++ b/infrastructure/ops_control_plane/app/ops_control_plane/pycache/seed.cpython-314.pyc
--- a/infrastructure/ops_control_plane/app/ops_control_plane/pycache/worker.cpython-314.pyc
+++ b/infrastructure/ops_control_plane/app/ops_control_plane/pycache/worker.cpython-314.pyc
--- a/infrastructure/ops_control_plane/app/ops_control_plane/aws_control.py
+++ b/infrastructure/ops_control_plane/app/ops_control_plane/aws_control.py
@@ -0,0 +1,549 @@
+from __future__ import annotations
+
+import csv
+import hashlib
+import io
+import json
+import shlex
+import subprocess
+from pathlib import Path
+from collections.abc import Iterable
+from datetime import datetime, timezone
+
+import boto3
+from botocore.exceptions import ClientError
+from sqlalchemy import select
+from sqlalchemy.orm import Session
+
+from .config import settings
+from .models import AuditEvent, Machine, MachineModelCache, MachineProfile, MarketSnapshot, ModelCatalog, RouteBinding, Session as RuntimeSession, SessionCost
+
+
+REGION_LOCATION_MAP = {
+    "us-east-1": "US East (N. Virginia)",
+    "ap-south-1": "Asia Pacific (Mumbai)",
+    "eu-west-1": "EU (Ireland)",
+}
+
+ON_DEMAND_PRICE_FALLBACKS = {
+    ("us-east-1", "t4g.micro"): 0.0084,
+}
+
+
+def utcnow() -> datetime:
+    return datetime.now(timezone.utc)
+
+
+def ec2_client(region: str):
+    return boto3.client("ec2", region_name=region)
+
+
+def pricing_client():
+    return boto3.client("pricing", region_name="us-east-1")
+
+
+def s3_client(region: str | None = None):
+    return boto3.client("s3", region_name=region or settings.bucket_region)
+
+
+def ensure_bucket(bucket_name: str, region: str) -> None:
+    client = s3_client(region)
+    try:
+        client.head_bucket(Bucket=bucket_name)
+    except ClientError as exc:
+        code = exc.response.get("Error", {}).get("Code", "")
+        if code in {"404", "NoSuchBucket", "NotFound"}:
+            if region == "us-east-1":
+                client.create_bucket(Bucket=bucket_name)
+            else:
+                client.create_bucket(
+                    Bucket=bucket_name,
+                    CreateBucketConfiguration={"LocationConstraint": region},
+                )
+        elif code not in {"301", "403"}:
+            raise
+    client.put_bucket_versioning(Bucket=bucket_name, VersioningConfiguration={"Status": "Enabled"})
+    client.put_bucket_encryption(
+        Bucket=bucket_name,
+        ServerSideEncryptionConfiguration={
+            "Rules": [{"ApplyServerSideEncryptionByDefault": {"SSEAlgorithm": "AES256"}}]
+        },
+    )
+
+
+def seed_bucket_prefixes(bucket_name: str) -> None:
+    client = s3_client()
+    for prefix in [
+        "models/",
+        "workflows/",
+        "references/",
+        "outputs/",
+        "manifests/",
+        "bootstrap/",
+    ]:
+        client.put_object(Bucket=bucket_name, Key=prefix)
+
+
+def resolve_model_source_dir(source_relative_path: str) -> Path:
+    source = (settings.model_library_root / source_relative_path).resolve()
+    root = settings.model_library_root.resolve()
+    if root not in source.parents and source != root:
+        raise ValueError("Model source path escapes configured model library root")
+    if not source.exists() or not source.is_dir():
+        raise FileNotFoundError(f"Model source directory not found: {source}")
+    return source
+
+
+def build_model_manifest(source_dir: Path) -> dict:
+    files: list[dict] = []
+    total_size = 0
+    for path in sorted(p for p in source_dir.rglob("*") if p.is_file()):
+        rel = path.relative_to(source_dir).as_posix()
+        sha256 = hashlib.sha256()
+        with path.open("rb") as handle:
+            for chunk in iter(lambda: handle.read(1024 * 1024), b""):
+                sha256.update(chunk)
+        size_bytes = path.stat().st_size
+        total_size += size_bytes
+        files.append({"path": rel, "sha256": sha256.hexdigest(), "size_bytes": size_bytes})
+    return {
+        "generated_at": utcnow().isoformat(),
+        "file_count": len(files),
+        "total_size_bytes": total_size,
+        "files": files,
+    }
+
+
+def upload_model_directory(bucket_name: str, model_key: str, source_relative_path: str, label: str, workload_tags: list[str] | None = None, compatibility_tags: list[str] | None = None) -> dict:
+    source_dir = resolve_model_source_dir(source_relative_path)
+    manifest = build_model_manifest(source_dir)
+    client = s3_client()
+    s3_prefix = f"models/{model_key}/"
+    for file_entry in manifest["files"]:
+        local_path = source_dir / Path(file_entry["path"])
+        client.upload_file(str(local_path), bucket_name, s3_prefix + file_entry["path"])
+    manifest_key = f"manifests/models/{model_key}.json"
+    client.put_object(
+        Bucket=bucket_name,
+        Key=manifest_key,
+        Body=json.dumps(manifest, indent=2).encode("utf-8"),
+        ContentType="application/json",
+    )
+    return {
+        "model_key": model_key,
+        "label": label,
+        "source_dir": str(source_dir),
+        "s3_prefix": s3_prefix,
+        "manifest_key": manifest_key,
+        "manifest": manifest,
+        "workload_tags": workload_tags or [],
+        "compatibility_tags": compatibility_tags or [],
+    }
+
+
+def fetch_on_demand_price(region: str, instance_type: str) -> float | None:
+    location = REGION_LOCATION_MAP.get(region)
+    if not location:
+        return None
+    response = pricing_client().get_products(
+        ServiceCode="AmazonEC2",
+        Filters=[
+            {"Type": "TERM_MATCH", "Field": "instanceType", "Value": instance_type},
+            {"Type": "TERM_MATCH", "Field": "location", "Value": location},
+            {"Type": "TERM_MATCH", "Field": "operatingSystem", "Value": "Linux"},
+            {"Type": "TERM_MATCH", "Field": "tenancy", "Value": "Shared"},
+            {"Type": "TERM_MATCH", "Field": "preInstalledSw", "Value": "NA"},
+            {"Type": "TERM_MATCH", "Field": "capacitystatus", "Value": "Used"},
+        ],
+        MaxResults=1,
+    )
+    for price_item in response.get("PriceList", []):
+        item = json.loads(price_item)
+        terms = item.get("terms", {}).get("OnDemand", {})
+        for term in terms.values():
+            for dimension in term.get("priceDimensions", {}).values():
+                price = dimension.get("pricePerUnit", {}).get("USD")
+                if price:
+                    return float(price)
+    return ON_DEMAND_PRICE_FALLBACKS.get((region, instance_type))
+
+
+def refresh_market_snapshots(db: Session, regions: Iterable[str], profile_rows: Iterable[MachineProfile]) -> None:
+    seen: set[tuple[str, str]] = set()
+    for profile in profile_rows:
+        seen.add((profile.region, profile.instance_type))
+
+    for region in regions:
+        region_profiles = [p for p in profile_rows if p.region == region]
+        instance_types = {p.instance_type for p in region_profiles}
+        if not instance_types:
+            continue
+        ec2 = ec2_client(region)
+        offerings = ec2.describe_instance_type_offerings(
+            LocationType="region",
+            Filters=[{"Name": "instance-type", "Values": sorted(instance_types)}],
+        )["InstanceTypeOfferings"]
+        available = {item["InstanceType"] for item in offerings}
+        for instance_type in instance_types:
+            on_demand_price = fetch_on_demand_price(region, instance_type)
+            db.add(
+                MarketSnapshot(
+                    region=region,
+                    instance_type=instance_type,
+                    lifecycle="on-demand",
+                    offering_available=instance_type in available,
+                    hourly_price_usd=on_demand_price,
+                    raw_payload={"instance_type": instance_type, "region": region},
+                )
+            )
+            try:
+                spot_history = ec2.describe_spot_price_history(
+                    InstanceTypes=[instance_type],
+                    ProductDescriptions=["Linux/UNIX"],
+                    StartTime=utcnow(),
+                    MaxResults=1,
+                )["SpotPriceHistory"]
+                spot_price = float(spot_history[0]["SpotPrice"]) if spot_history else None
+            except ClientError:
+                spot_price = None
+            db.add(
+                MarketSnapshot(
+                    region=region,
+                    instance_type=instance_type,
+                    lifecycle="spot",
+                    offering_available=instance_type in available and spot_price is not None,
+                    hourly_price_usd=spot_price,
+                    raw_payload={"instance_type": instance_type, "region": region},
+                )
+            )
+
+
+def latest_market_price(db: Session, region: str, instance_type: str, lifecycle: str) -> float:
+    row = db.scalar(
+        select(MarketSnapshot)
+        .where(
+            MarketSnapshot.region == region,
+            MarketSnapshot.instance_type == instance_type,
+            MarketSnapshot.lifecycle == lifecycle,
+        )
+        .order_by(MarketSnapshot.observed_at.desc())
+    )
+    return row.hourly_price_usd if row and row.hourly_price_usd is not None else 0.0
+
+
+def sync_instances(db: Session, regions: Iterable[str]) -> None:
+    for region in regions:
+        ec2 = ec2_client(region)
+        reservations = ec2.describe_instances()["Reservations"]
+        for reservation in reservations:
+            for instance in reservation["Instances"]:
+                instance_id = instance["InstanceId"]
+                launch_time = instance.get("LaunchTime")
+                if launch_time and launch_time.tzinfo is None:
+                    launch_time = launch_time.replace(tzinfo=timezone.utc)
+                public_ip = instance.get("PublicIpAddress")
+                private_ip = instance.get("PrivateIpAddress")
+                state_name = instance["State"]["Name"]
+                volume_size = 0
+                if instance.get("BlockDeviceMappings"):
+                    try:
+                        volume_ids = [b["Ebs"]["VolumeId"] for b in instance["BlockDeviceMappings"] if "Ebs" in b]
+                        if volume_ids:
+                            volumes = ec2.describe_volumes(VolumeIds=volume_ids)["Volumes"]
+                            volume_size = sum(v.get("Size", 0) for v in volumes)
+                    except ClientError:
+                        volume_size = 0
+                existing = db.scalar(select(Machine).where(Machine.aws_instance_id == instance_id))
+                tags = {tag["Key"]: tag["Value"] for tag in instance.get("Tags", [])}
+                payload = {
+                    "key_name": instance.get("KeyName"),
+                    "subnet_id": instance.get("SubnetId"),
+                    "security_groups": instance.get("SecurityGroups", []),
+                    "image_id": instance.get("ImageId"),
+                    "iam_instance_profile": instance.get("IamInstanceProfile", {}).get("Arn"),
+                    "availability_zone": instance.get("Placement", {}).get("AvailabilityZone"),
+                    "public_dns": instance.get("PublicDnsName"),
+                }
+                if existing:
+                    existing.name = tags.get("Name", instance_id)
+                    existing.region = region
+                    existing.instance_type = instance["InstanceType"]
+                    existing.lifecycle = instance.get("InstanceLifecycle", "on-demand")
+                    existing.state = state_name
+                    existing.public_ip = public_ip
+                    existing.private_ip = private_ip
+                    existing.launch_time = launch_time
+                    existing.volume_gb = volume_size
+                    existing.public_ipv4_attached = bool(public_ip)
+                    existing.details = payload
+                else:
+                    db.add(
+                        Machine(
+                            aws_instance_id=instance_id,
+                            name=tags.get("Name", instance_id),
+                            region=region,
+                            profile_name=tags.get("DesineuronProfile"),
+                            instance_type=instance["InstanceType"],
+                            lifecycle=instance.get("InstanceLifecycle", "on-demand"),
+                            state=state_name,
+                            public_ip=public_ip,
+                            private_ip=private_ip,
+                            launch_time=launch_time,
+                            volume_gb=volume_size,
+                            public_ipv4_attached=bool(public_ip),
+                            details=payload,
+                        )
+                        )
+
+
+def candidate_subnet_ids(region: str, preferred_subnet_id: str) -> list[str]:
+    if not preferred_subnet_id:
+        return []
+    ec2 = ec2_client(region)
+    subnet_response = ec2.describe_subnets(SubnetIds=[preferred_subnet_id])["Subnets"]
+    if not subnet_response:
+        return [preferred_subnet_id]
+    preferred = subnet_response[0]
+    vpc_id = preferred["VpcId"]
+    subnets = ec2.describe_subnets(
+        Filters=[
+            {"Name": "vpc-id", "Values": [vpc_id]},
+            {"Name": "state", "Values": ["available"]},
+        ]
+    )["Subnets"]
+    ranked: list[tuple[int, str, str]] = []
+    for subnet in subnets:
+        subnet_id = subnet["SubnetId"]
+        az = subnet.get("AvailabilityZone", "")
+        score = 2
+        if subnet_id == preferred_subnet_id:
+            score = 0
+        elif subnet.get("MapPublicIpOnLaunch"):
+            score = 1
+        ranked.append((score, az, subnet_id))
+    return [subnet_id for _, _, subnet_id in sorted(ranked)]
+
+
+def calculate_machine_cost(machine: Machine, hourly_rate: float) -> dict:
+    if not machine.launch_time:
+        runtime_hours = 0.0
+    else:
+        runtime_hours = max((utcnow() - machine.launch_time).total_seconds() / 3600.0, 0.0)
+    compute_cost = runtime_hours * hourly_rate
+    storage_hourly = (machine.volume_gb * settings.ebs_gp3_per_gb_month) / 730.0
+    storage_cost = runtime_hours * storage_hourly
+    public_ip_cost = runtime_hours * settings.public_ipv4_per_hour if machine.public_ipv4_attached else 0.0
+    return {
+        "runtime_hours": round(runtime_hours, 3),
+        "compute_cost_usd": round(compute_cost, 4),
+        "storage_cost_usd": round(storage_cost, 4),
+        "public_ip_cost_usd": round(public_ip_cost, 4),
+        "total_cost_usd": round(compute_cost + storage_cost + public_ip_cost, 4),
+        "hourly_price_usd": round(hourly_rate + storage_hourly + (settings.public_ipv4_per_hour if machine.public_ipv4_attached else 0.0), 4),
+    }
+
+
+def upsert_session_cost(db: Session, session_row: RuntimeSession, machine: Machine) -> None:
+    hourly_rate = latest_market_price(db, machine.region, machine.instance_type, machine.lifecycle or "on-demand")
+    cost_payload = calculate_machine_cost(machine, hourly_rate)
+    record = db.scalar(
+        select(SessionCost).where(SessionCost.session_id == session_row.id).order_by(SessionCost.calculated_at.desc())
+    )
+    if record:
+        record.runtime_hours = cost_payload["runtime_hours"]
+        record.compute_cost_usd = cost_payload["compute_cost_usd"]
+        record.storage_cost_usd = cost_payload["storage_cost_usd"]
+        record.public_ip_cost_usd = cost_payload["public_ip_cost_usd"]
+        record.total_cost_usd = cost_payload["total_cost_usd"]
+        record.calculated_at = utcnow()
+    else:
+        db.add(SessionCost(session_id=session_row.id, **cost_payload))
+
+
+def create_managed_instance(db: Session, profile: MachineProfile, actor: str, lifecycle: str) -> RuntimeSession:
+    ec2 = ec2_client(profile.region)
+    launch_config = profile.launch_config
+    base_run_args = {
+        "ImageId": launch_config["ami_id"],
+        "InstanceType": profile.instance_type,
+        "SecurityGroupIds": launch_config["security_group_ids"],
+        "KeyName": launch_config["key_name"],
+        "IamInstanceProfile": {"Name": launch_config["instance_profile"]},
+        "MinCount": 1,
+        "MaxCount": 1,
+        "BlockDeviceMappings": [
+            {
+                "DeviceName": "/dev/sda1",
+                "Ebs": {
+                    "VolumeSize": int(launch_config.get("root_volume_gb", settings.gpu_root_volume_gb)),
+                    "VolumeType": "gp3",
+                    "DeleteOnTermination": True,
+                },
+            }
+        ],
+        "TagSpecifications": [
+            {
+                "ResourceType": "instance",
+                "Tags": [
+                    {"Key": "Name", "Value": f"desineuron-{profile.name}-{int(utcnow().timestamp())}"},
+                    {"Key": "ManagedBy", "Value": "DesineuronOps"},
+                    {"Key": "DesineuronProfile", "Value": profile.name},
+                ],
+            }
+        ],
+    }
+    if lifecycle == "spot":
+        base_run_args["InstanceMarketOptions"] = {
+            "MarketType": "spot",
+            "SpotOptions": {"SpotInstanceType": "one-time", "InstanceInterruptionBehavior": "terminate"},
+        }
+    subnet_ids = candidate_subnet_ids(profile.region, launch_config["subnet_id"]) or [launch_config["subnet_id"]]
+    last_exc: Exception | None = None
+    response = None
+    chosen_subnet = launch_config["subnet_id"]
+    for subnet_id in subnet_ids:
+        run_args = dict(base_run_args)
+        run_args["SubnetId"] = subnet_id
+        try:
+            response = ec2.run_instances(**run_args)
+            chosen_subnet = subnet_id
+            break
+        except ClientError as exc:
+            last_exc = exc
+            error_code = exc.response.get("Error", {}).get("Code")
+            if error_code not in {"InsufficientInstanceCapacity", "MaxSpotInstanceCountExceeded", "Unsupported"}:
+                raise
+            continue
+    if response is None:
+        assert last_exc is not None
+        raise last_exc
+    instance = response["Instances"][0]
+    machine = Machine(
+        aws_instance_id=instance["InstanceId"],
+        name=f"desineuron-{profile.name}",
+        region=profile.region,
+        profile_name=profile.name,
+        instance_type=profile.instance_type,
+        lifecycle=lifecycle,
+        state=instance["State"]["Name"],
+        public_ip=instance.get("PublicIpAddress"),
+        private_ip=instance.get("PrivateIpAddress"),
+        launch_time=instance.get("LaunchTime"),
+        volume_gb=int(launch_config.get("root_volume_gb", settings.gpu_root_volume_gb)),
+        public_ipv4_attached=True,
+        details={"launched_by": actor, "chosen_subnet_id": chosen_subnet},
+    )
+    db.add(machine)
+    db.flush()
+    session_row = RuntimeSession(machine_id=machine.id, actor=actor, workload_name=profile.name, status="active")
+    db.add(session_row)
+    db.add(AuditEvent(actor=actor, action="launch_machine", entity_type="machine", entity_id=machine.aws_instance_id, payload={"profile": profile.name, "lifecycle": lifecycle}))
+    return session_row
+
+
+def stop_machine(db: Session, machine: Machine, actor: str) -> None:
+    ec2 = ec2_client(machine.region)
+    ec2.stop_instances(InstanceIds=[machine.aws_instance_id])
+    machine.state = "stopping"
+    db.add(AuditEvent(actor=actor, action="stop_machine", entity_type="machine", entity_id=machine.aws_instance_id, payload={}))
+
+
+def terminate_machine(db: Session, machine: Machine, actor: str) -> None:
+    ec2 = ec2_client(machine.region)
+    ec2.terminate_instances(InstanceIds=[machine.aws_instance_id])
+    machine.state = "shutting-down"
+    db.add(AuditEvent(actor=actor, action="terminate_machine", entity_type="machine", entity_id=machine.aws_instance_id, payload={}))
+
+
+def ssh_run(host: str, user: str, command: str) -> subprocess.CompletedProcess[str]:
+    return subprocess.run(
+        [
+            "ssh",
+            "-o",
+            "StrictHostKeyChecking=no",
+            "-o",
+            "UserKnownHostsFile=NUL",
+            "-i",
+            str(settings.ssh_key_path),
+            f"{user}@{host}",
+            command,
+        ],
+        capture_output=True,
+        text=True,
+        check=False,
+    )
+
+
+def hydrate_model(machine: Machine, model_prefix: str, actor: str, bucket_name: str) -> dict:
+    if not machine.public_ip:
+        raise RuntimeError("Machine has no public IP for hydration")
+    install_cmd = (
+        "command -v s5cmd >/dev/null 2>&1 || "
+        "curl -L https://github.com/peak/s5cmd/releases/download/v2.3.0/s5cmd_2.3.0_Linux-64bit.tar.gz "
+        "| tar -xz -C /tmp && sudo mv /tmp/s5cmd /usr/local/bin/s5cmd"
+    )
+    ssh_run(machine.public_ip, settings.gpu_ssh_user, install_cmd)
+    remote_dir = f"/opt/dlami/nvme/models/{model_prefix.split('/')[-2]}"
+    copy_cmd = (
+        f"mkdir -p {remote_dir} && "
+        f"s5cmd cp 's3://{bucket_name}/{model_prefix}*' '{remote_dir}/'"
+    )
+    result = ssh_run(machine.public_ip, settings.gpu_ssh_user, copy_cmd)
+    verify_result = None
+    manifest_key = f"manifests/models/{model_prefix.rstrip('/').split('/')[-1]}.json"
+    try:
+        manifest_obj = s3_client().get_object(Bucket=bucket_name, Key=manifest_key)
+        manifest = json.loads(manifest_obj["Body"].read().decode("utf-8"))
+        checks = " && ".join(
+            f"test -f {shlex.quote(remote_dir + '/' + entry['path'])}"
+            for entry in manifest.get("files", [])
+        ) or "true"
+        verify = ssh_run(machine.public_ip, settings.gpu_ssh_user, checks)
+        verify_result = {"stdout": verify.stdout, "stderr": verify.stderr, "returncode": verify.returncode}
+    except ClientError:
+        verify_result = {"stdout": "", "stderr": "manifest_missing", "returncode": 1}
+    return {
+        "stdout": result.stdout,
+        "stderr": result.stderr,
+        "returncode": result.returncode,
+        "remote_dir": remote_dir,
+        "verify": verify_result,
+    }
+
+
+def start_service(machine: Machine, service_name: str) -> dict:
+    if not machine.public_ip:
+        raise RuntimeError("Machine has no public IP")
+    result = ssh_run(machine.public_ip, settings.gpu_ssh_user, f"sudo systemctl start {service_name} && sudo systemctl is-active {service_name}")
+    return {"stdout": result.stdout, "stderr": result.stderr, "returncode": result.returncode}
+
+
+def stop_service(machine: Machine, service_name: str) -> dict:
+    if not machine.public_ip:
+        raise RuntimeError("Machine has no public IP")
+    result = ssh_run(machine.public_ip, settings.gpu_ssh_user, f"sudo systemctl stop {service_name}")
+    return {"stdout": result.stdout, "stderr": result.stderr, "returncode": result.returncode}
+
+
+def export_sessions_csv(db: Session, target_path: str) -> str:
+    rows = db.execute(
+        select(
+            RuntimeSession.id,
+            RuntimeSession.actor,
+            RuntimeSession.workload_name,
+            RuntimeSession.status,
+            RuntimeSession.started_at,
+            RuntimeSession.ended_at,
+            SessionCost.runtime_hours,
+            SessionCost.compute_cost_usd,
+            SessionCost.storage_cost_usd,
+            SessionCost.public_ip_cost_usd,
+            SessionCost.total_cost_usd,
+        ).join(SessionCost, SessionCost.session_id == RuntimeSession.id, isouter=True)
+    )
+    with open(target_path, "w", newline="", encoding="utf-8") as handle:
+        writer = csv.writer(handle)
+        writer.writerow(["session_id", "actor", "workload", "status", "started_at", "ended_at", "runtime_hours", "compute_cost_usd", "storage_cost_usd", "public_ip_cost_usd", "total_cost_usd"])
+        for row in rows:
+            writer.writerow(row)
+    return target_path
--- a/infrastructure/ops_control_plane/app/ops_control_plane/cli.py
+++ b/infrastructure/ops_control_plane/app/ops_control_plane/cli.py
@@ -0,0 +1,79 @@
+from __future__ import annotations
+
+import json
+from pathlib import Path
+
+import typer
+from sqlalchemy import select
+
+from .aws_control import calculate_machine_cost, create_managed_instance, export_sessions_csv, latest_market_price, stop_machine, terminate_machine
+from .database import Base, engine, session_scope
+from .models import AuditEvent, Machine, MachineProfile, Session as RuntimeSession
+
+
+app = typer.Typer(help="Desineuron Ops CLI")
+
+
+@app.command("machine-list")
+def machine_list():
+    with session_scope() as db:
+        machines = db.scalars(select(Machine).order_by(Machine.updated_at.desc())).all()
+        for machine in machines:
+            hourly_rate = latest_market_price(db, machine.region, machine.instance_type, machine.lifecycle)
+            cost = calculate_machine_cost(machine, hourly_rate)
+            typer.echo(f"{machine.aws_instance_id} {machine.instance_type} {machine.state} ${cost['total_cost_usd']:.4f}")
+
+
+@app.command("machine-launch")
+def machine_launch(profile_name: str, lifecycle: str = "spot", actor: str = "cli"):
+    with session_scope() as db:
+        profile = db.scalar(select(MachineProfile).where(MachineProfile.name == profile_name))
+        if not profile:
+            raise typer.BadParameter(f"Unknown profile: {profile_name}")
+        session_row = create_managed_instance(db, profile, actor, lifecycle)
+        typer.echo(json.dumps({"session_id": session_row.id, "profile": profile_name, "lifecycle": lifecycle}))
+
+
+@app.command("machine-stop")
+def machine_stop(machine_id: str, actor: str = "cli"):
+    with session_scope() as db:
+        machine = db.scalar(select(Machine).where(Machine.aws_instance_id == machine_id))
+        if not machine:
+            raise typer.BadParameter(f"Unknown machine: {machine_id}")
+        stop_machine(db, machine, actor)
+        active_session = db.scalar(select(RuntimeSession).where(RuntimeSession.machine_id == machine.id, RuntimeSession.status == "active"))
+        if active_session:
+            active_session.status = "stopped"
+        typer.echo(json.dumps({"machine": machine_id, "status": "stopping"}))
+
+
+@app.command("machine-terminate")
+def machine_terminate(machine_id: str, actor: str = "cli"):
+    with session_scope() as db:
+        machine = db.scalar(select(Machine).where(Machine.aws_instance_id == machine_id))
+        if not machine:
+            raise typer.BadParameter(f"Unknown machine: {machine_id}")
+        terminate_machine(db, machine, actor)
+        active_session = db.scalar(select(RuntimeSession).where(RuntimeSession.machine_id == machine.id, RuntimeSession.status == "active"))
+        if active_session:
+            active_session.status = "terminated"
+        typer.echo(json.dumps({"machine": machine_id, "status": "terminating"}))
+
+
+@app.command("audit-tail")
+def audit_tail(limit: int = 20):
+    with session_scope() as db:
+        events = db.scalars(select(AuditEvent).order_by(AuditEvent.created_at.desc()).limit(limit)).all()
+        for event in events:
+            typer.echo(json.dumps({"actor": event.actor, "action": event.action, "entity": event.entity_id, "created_at": event.created_at.isoformat()}))
+
+
+@app.command("export-sessions")
+def export_sessions(output: Path = Path("/app/exports/sessions_cli.csv")):
+    with session_scope() as db:
+        export_sessions_csv(db, str(output))
+    typer.echo(str(output))
+
+
+if __name__ == "__main__":
+    app()
--- a/infrastructure/ops_control_plane/app/ops_control_plane/config.py
+++ b/infrastructure/ops_control_plane/app/ops_control_plane/config.py
@@ -0,0 +1,51 @@
+from __future__ import annotations
+
+import os
+import json
+from dataclasses import field
+from dataclasses import dataclass
+from pathlib import Path
+
+
+@dataclass(frozen=True)
+class Settings:
+    database_url: str = os.environ["OPS_DATABASE_URL"]
+    session_secret: str = os.environ["OPS_SESSION_SECRET"]
+    admin_username: str = os.environ.get("OPS_ADMIN_USERNAME", "sagnik")
+    admin_password: str = os.environ["OPS_ADMIN_PASSWORD"]
+    team_users_json: str = os.environ.get("OPS_TEAM_USERS_JSON", "[]")
+    default_region: str = os.environ.get("OPS_DEFAULT_REGION", "us-east-1")
+    visible_regions: tuple[str, ...] = tuple(
+        region.strip() for region in os.environ.get("OPS_VISIBLE_REGIONS", "us-east-1").split(",") if region.strip()
+    )
+    bucket_name: str = os.environ.get("OPS_BUCKET_NAME", "")
+    bucket_region: str = os.environ.get("OPS_BUCKET_REGION", "us-east-1")
+    ssh_key_path: Path = Path(os.environ.get("OPS_SSH_KEY_PATH", "/app/state/desineuron-l4-node.pem"))
+    gpu_ssh_user: str = os.environ.get("OPS_GPU_SSH_USER", "ubuntu")
+    ingress_ssh_host: str = os.environ.get("OPS_INGRESS_SSH_HOST", "")
+    ingress_ssh_user: str = os.environ.get("OPS_INGRESS_SSH_USER", "ec2-user")
+    ingress_ssh_port: int = int(os.environ.get("OPS_INGRESS_SSH_PORT", "22"))
+    ingress_route_helper: str = os.environ.get("OPS_INGRESS_ROUTE_HELPER", "/usr/local/bin/manage_desineuron_routes.py")
+    public_base_url: str = os.environ.get("OPS_LINUX_PUBLIC_BASE_URL", "https://ops.desineuron.in")
+    ebs_gp3_per_gb_month: float = float(os.environ.get("OPS_PRICE_EBS_GP3_PER_GB_MONTH", "0.08"))
+    public_ipv4_per_hour: float = float(os.environ.get("OPS_PRICE_PUBLIC_IPV4_PER_HOUR", "0.005"))
+    allowed_machine_ids: tuple[str, ...] = tuple(
+        machine.strip() for machine in os.environ.get("OPS_ALLOWED_MACHINE_IDS", "").split(",") if machine.strip()
+    )
+    gpu_subnet_id: str = os.environ.get("OPS_GPU_SUBNET_ID", "")
+    gpu_security_group_ids: tuple[str, ...] = tuple(
+        group.strip() for group in os.environ.get("OPS_GPU_SECURITY_GROUP_IDS", "").split(",") if group.strip()
+    )
+    gpu_key_name: str = os.environ.get("OPS_GPU_KEY_NAME", "")
+    gpu_ami_id: str = os.environ.get("OPS_GPU_AMI_ID", "")
+    gpu_instance_profile: str = os.environ.get("OPS_GPU_INSTANCE_PROFILE", "")
+    gpu_root_volume_gb: int = int(os.environ.get("OPS_GPU_ROOT_VOLUME_GB", "300"))
+    export_dir: Path = Path(os.environ.get("OPS_CSV_EXPORT_DIR", "/app/exports"))
+    log_dir: Path = Path(os.environ.get("OPS_LOG_DIR", "/app/logs"))
+    state_dir: Path = Path(os.environ.get("OPS_STATE_DIR", "/app/state"))
+    model_library_root: Path = Path(os.environ.get("OPS_MODEL_LIBRARY_ROOT", "/model-library"))
+    cloudflare_zone_name: str = os.environ.get("OPS_CLOUDFLARE_ZONE_NAME", "desineuron.in")
+    cloudflare_api_token: str = os.environ.get("OPS_CLOUDFLARE_API_TOKEN", "")
+
+
+settings = Settings()
--- a/infrastructure/ops_control_plane/app/ops_control_plane/database.py
+++ b/infrastructure/ops_control_plane/app/ops_control_plane/database.py
@@ -0,0 +1,41 @@
+from __future__ import annotations
+
+from contextlib import contextmanager
+
+from sqlalchemy import create_engine
+from sqlalchemy.orm import DeclarativeBase, Session, sessionmaker
+
+from .config import settings
+
+
+engine = create_engine(settings.database_url, pool_pre_ping=True)
+SessionLocal = sessionmaker(bind=engine, autoflush=False, autocommit=False, expire_on_commit=False)
+
+
+class Base(DeclarativeBase):
+    pass
+
+
+def get_db():
+    db = SessionLocal()
+    try:
+        yield db
+        db.commit()
+    except Exception:
+        db.rollback()
+        raise
+    finally:
+        db.close()
+
+
+@contextmanager
+def session_scope():
+    session = SessionLocal()
+    try:
+        yield session
+        session.commit()
+    except Exception:
+        session.rollback()
+        raise
+    finally:
+        session.close()
--- a/infrastructure/ops_control_plane/app/ops_control_plane/main.py
+++ b/infrastructure/ops_control_plane/app/ops_control_plane/main.py
@@ -0,0 +1,598 @@
+from __future__ import annotations
+
+import os
+from datetime import datetime, timedelta, timezone
+from pathlib import Path
+
+from botocore.exceptions import ClientError
+from fastapi import Depends, FastAPI, Form, HTTPException, Request
+from fastapi.responses import HTMLResponse, JSONResponse, RedirectResponse
+from fastapi.staticfiles import StaticFiles
+from fastapi.templating import Jinja2Templates
+from sqlalchemy import func, select
+from sqlalchemy.orm import Session
+from starlette.middleware.sessions import SessionMiddleware
+
+from .aws_control import calculate_machine_cost, create_managed_instance, ensure_bucket, export_sessions_csv, hydrate_model, latest_market_price, seed_bucket_prefixes, start_service, stop_machine, stop_service, sync_instances, terminate_machine, upload_model_directory
+from .config import settings
+from .database import Base, engine, get_db, session_scope
+from .models import AuditEvent, CsvExport, Job, Machine, MachineProfile, MarketSnapshot, ModelCatalog, RouteBinding, Session as RuntimeSession, SessionCost, User, WorkloadProfile
+from .route_control import apply_route, remove_route
+from .seed import seed_defaults
+from .security import get_current_user, verify_password
+
+
+app = FastAPI(title="Desineuron Ops Control Plane")
+app.add_middleware(SessionMiddleware, secret_key=settings.session_secret)
+template_dir = Path(__file__).parent / "templates"
+static_dir = Path(__file__).parent / "static"
+templates = Jinja2Templates(directory=str(template_dir))
+app.mount("/static", StaticFiles(directory=str(static_dir)), name="static")
+
+
+def utcnow() -> datetime:
+    return datetime.now(timezone.utc)
+
+
+def recent_totals(db: Session) -> dict:
+    now = utcnow()
+    day_start = now - timedelta(days=1)
+    month_start = now - timedelta(days=30)
+    day_total = db.scalar(
+        select(func.coalesce(func.sum(SessionCost.total_cost_usd), 0.0))
+        .join(RuntimeSession, RuntimeSession.id == SessionCost.session_id)
+        .where(SessionCost.calculated_at >= day_start)
+    )
+    month_total = db.scalar(
+        select(func.coalesce(func.sum(SessionCost.total_cost_usd), 0.0))
+        .join(RuntimeSession, RuntimeSession.id == SessionCost.session_id)
+        .where(SessionCost.calculated_at >= month_start)
+    )
+    return {
+        "last_24h_usd": round(float(day_total or 0.0), 4),
+        "last_30d_usd": round(float(month_total or 0.0), 4),
+    }
+
+
+def pop_flash(request: Request) -> dict | None:
+    return request.session.pop("flash", None)
+
+
+def set_flash(request: Request, level: str, message: str) -> None:
+    request.session["flash"] = {"level": level, "message": message}
+
+
+def parse_tag_list(raw: str) -> list[str]:
+    return [item.strip() for item in raw.split(",") if item.strip()]
+
+
+@app.on_event("startup")
+def startup() -> None:
+    Base.metadata.create_all(bind=engine)
+    settings.export_dir.mkdir(parents=True, exist_ok=True)
+    settings.log_dir.mkdir(parents=True, exist_ok=True)
+    settings.state_dir.mkdir(parents=True, exist_ok=True)
+    with session_scope() as db:
+        seed_defaults(db)
+    if settings.bucket_name:
+        ensure_bucket(settings.bucket_name, settings.bucket_region)
+        seed_bucket_prefixes(settings.bucket_name)
+
+
+@app.get("/", response_class=HTMLResponse)
+def root(request: Request):
+    if request.session.get("username"):
+        return RedirectResponse("/dashboard", status_code=302)
+    return RedirectResponse("/login", status_code=302)
+
+
+@app.get("/login", response_class=HTMLResponse)
+def login_page(request: Request):
+    return templates.TemplateResponse("login.html", {"request": request, "error": None})
+
+
+@app.post("/login", response_class=HTMLResponse)
+def login(request: Request, username: str = Form(...), password: str = Form(...), db: Session = Depends(get_db)):
+    user = db.scalar(select(User).where(User.username == username, User.is_active.is_(True)))
+    if not user or not verify_password(password, user.password_hash):
+        return templates.TemplateResponse("login.html", {"request": request, "error": "Invalid credentials"}, status_code=401)
+    request.session["username"] = user.username
+    return RedirectResponse("/dashboard", status_code=302)
+
+
+@app.get("/logout")
+def logout(request: Request):
+    request.session.clear()
+    return RedirectResponse("/login", status_code=302)
+
+
+@app.get("/dashboard", response_class=HTMLResponse)
+def dashboard(request: Request, current_user: User = Depends(get_current_user), db: Session = Depends(get_db)):
+    machines = db.scalars(select(Machine).order_by(Machine.updated_at.desc())).all()
+    profiles = db.scalars(select(MachineProfile).order_by(MachineProfile.name)).all()
+    workloads = db.scalars(select(WorkloadProfile).order_by(WorkloadProfile.name)).all()
+    models = db.scalars(select(ModelCatalog).order_by(ModelCatalog.model_key)).all()
+    routes = db.scalars(select(RouteBinding).order_by(RouteBinding.hostname)).all()
+    jobs = db.scalars(select(Job).order_by(Job.created_at.desc()).limit(20)).all()
+    sessions = db.scalars(select(RuntimeSession).order_by(RuntimeSession.started_at.desc()).limit(20)).all()
+    market_rows = db.scalars(select(MarketSnapshot).order_by(MarketSnapshot.observed_at.desc()).limit(100)).all()
+    audits = db.scalars(select(AuditEvent).order_by(AuditEvent.created_at.desc()).limit(20)).all()
+    costs = []
+    total_hourly = 0.0
+    total_estimated = 0.0
+    for machine in machines:
+        hourly_rate = latest_market_price(db, machine.region, machine.instance_type, machine.lifecycle)
+        machine_cost = calculate_machine_cost(machine, hourly_rate)
+        total_hourly += machine_cost["hourly_price_usd"]
+        total_estimated += machine_cost["total_cost_usd"]
+        costs.append((machine.aws_instance_id, machine_cost))
+    summary = {
+        "machine_count": len(machines),
+        "active_sessions": sum(1 for session in sessions if session.status == "active"),
+        "active_jobs": sum(1 for job in jobs if job.status in {"queued", "running"}),
+        "routes_active": sum(1 for route in routes if route.status == "active"),
+        "hourly_burn_usd": round(total_hourly, 4),
+        "fleet_estimated_cost_usd": round(total_estimated, 4),
+        **recent_totals(db),
+    }
+    return templates.TemplateResponse(
+        "index.html",
+        {
+            "request": request,
+            "user": current_user,
+            "machines": machines,
+            "profiles": profiles,
+            "workloads": workloads,
+            "models": models,
+            "routes": routes,
+            "jobs": jobs,
+            "sessions": sessions,
+            "market_rows": market_rows,
+            "audits": audits,
+            "costs": dict(costs),
+            "summary": summary,
+            "flash": pop_flash(request),
+            "bucket_name": settings.bucket_name,
+            "regions": settings.visible_regions,
+        },
+    )
+
+
+@app.get("/api/markets/instances")
+def get_markets(current_user: User = Depends(get_current_user), db: Session = Depends(get_db)):
+    profiles = db.scalars(select(MachineProfile).order_by(MachineProfile.name)).all()
+    payload = []
+    for profile in profiles:
+        per_region = {}
+        for region in settings.visible_regions:
+            on_demand = db.scalar(
+                select(MarketSnapshot)
+                .where(MarketSnapshot.region == region, MarketSnapshot.instance_type == profile.instance_type, MarketSnapshot.lifecycle == "on-demand")
+                .order_by(MarketSnapshot.observed_at.desc())
+            )
+            spot = db.scalar(
+                select(MarketSnapshot)
+                .where(MarketSnapshot.region == region, MarketSnapshot.instance_type == profile.instance_type, MarketSnapshot.lifecycle == "spot")
+                .order_by(MarketSnapshot.observed_at.desc())
+            )
+            per_region[region] = {
+                "on_demand": on_demand.hourly_price_usd if on_demand else None,
+                "on_demand_available": bool(on_demand and on_demand.offering_available),
+                "spot": spot.hourly_price_usd if spot else None,
+                "spot_available": bool(spot and spot.offering_available),
+                "last_seen": max(
+                    [stamp for stamp in [on_demand.observed_at if on_demand else None, spot.observed_at if spot else None] if stamp],
+                    default=None,
+                ),
+            }
+        payload.append(
+            {
+                "profile": profile.name,
+                "instance_type": profile.instance_type,
+                "gpu_label": profile.gpu_label,
+                "vcpu": profile.vcpu,
+                "memory_gib": profile.memory_gib,
+                "regions": per_region,
+            }
+        )
+    return payload
+
+
+@app.get("/api/machines")
+def get_machines(current_user: User = Depends(get_current_user), db: Session = Depends(get_db)):
+    machines = db.scalars(select(Machine).order_by(Machine.updated_at.desc())).all()
+    payload = []
+    for machine in machines:
+        hourly_rate = latest_market_price(db, machine.region, machine.instance_type, machine.lifecycle)
+        payload.append(
+            {
+                "id": machine.id,
+                "aws_instance_id": machine.aws_instance_id,
+                "name": machine.name,
+                "region": machine.region,
+                "state": machine.state,
+                "instance_type": machine.instance_type,
+                "lifecycle": machine.lifecycle,
+                "public_ip": machine.public_ip,
+                "private_ip": machine.private_ip,
+                "cost": calculate_machine_cost(machine, hourly_rate),
+            }
+        )
+    return payload
+
+
+@app.post("/api/machines/launch")
+def launch_machine(request: Request, profile_name: str = Form(...), lifecycle: str = Form(...), db: Session = Depends(get_db), current_user: User = Depends(get_current_user)):
+    profile = db.scalar(select(MachineProfile).where(MachineProfile.name == profile_name))
+    if not profile:
+        raise HTTPException(status_code=404, detail="Profile not found")
+    job = Job(job_type="launch_machine", status="running", actor=current_user.username, payload={"profile_name": profile_name, "lifecycle": lifecycle}, started_at=utcnow())
+    db.add(job)
+    db.flush()
+    try:
+        session_row = create_managed_instance(db, profile, current_user.username, lifecycle)
+    except Exception as exc:
+        error_code = exc.response.get("Error", {}).get("Code") if isinstance(exc, ClientError) else exc.__class__.__name__
+        job.status = "failed"
+        job.finished_at = utcnow()
+        job.result = {"error": str(exc), "code": error_code}
+        db.add(AuditEvent(actor=current_user.username, action="launch_machine_failed", entity_type="profile", entity_id=profile.name, payload=job.result))
+        set_flash(request, "error", f"Launch failed for {profile.name}: {error_code}")
+        return RedirectResponse("/dashboard", status_code=302)
+    job.status = "completed"
+    job.session_id = session_row.id
+    job.finished_at = utcnow()
+    job.result = {"session_id": session_row.id}
+    set_flash(request, "success", f"Launched {profile.name} as {lifecycle}.")
+    return RedirectResponse("/dashboard", status_code=302)
+
+
+@app.post("/api/machines/{machine_id}/stop")
+def api_stop_machine(machine_id: int, request: Request, db: Session = Depends(get_db), current_user: User = Depends(get_current_user)):
+    machine = db.get(Machine, machine_id)
+    if not machine:
+        raise HTTPException(status_code=404, detail="Machine not found")
+    job = Job(job_type="stop_machine", status="running", actor=current_user.username, machine_id=machine_id, payload={"aws_instance_id": machine.aws_instance_id}, started_at=utcnow())
+    db.add(job)
+    stop_machine(db, machine, current_user.username)
+    active_session = db.scalar(select(RuntimeSession).where(RuntimeSession.machine_id == machine.id, RuntimeSession.status == "active"))
+    if active_session:
+        active_session.status = "stopped"
+        active_session.ended_at = utcnow()
+    job.status = "completed"
+    job.finished_at = utcnow()
+    job.result = {"status": "stopping"}
+    if "text/html" in request.headers.get("accept", ""):
+        set_flash(request, "success", f"Stop requested for {machine.aws_instance_id}.")
+        return RedirectResponse("/dashboard", status_code=302)
+    return {"status": "stopping"}
+
+
+@app.post("/api/machines/{machine_id}/terminate")
+def api_terminate_machine(machine_id: int, request: Request, db: Session = Depends(get_db), current_user: User = Depends(get_current_user)):
+    machine = db.get(Machine, machine_id)
+    if not machine:
+        raise HTTPException(status_code=404, detail="Machine not found")
+    job = Job(job_type="terminate_machine", status="running", actor=current_user.username, machine_id=machine_id, payload={"aws_instance_id": machine.aws_instance_id}, started_at=utcnow())
+    db.add(job)
+    terminate_machine(db, machine, current_user.username)
+    active_session = db.scalar(select(RuntimeSession).where(RuntimeSession.machine_id == machine.id, RuntimeSession.status == "active"))
+    if active_session:
+        active_session.status = "terminated"
+        active_session.ended_at = utcnow()
+    job.status = "completed"
+    job.finished_at = utcnow()
+    job.result = {"status": "terminating"}
+    if "text/html" in request.headers.get("accept", ""):
+        set_flash(request, "success", f"Terminate requested for {machine.aws_instance_id}.")
+        return RedirectResponse("/dashboard", status_code=302)
+    return {"status": "terminating"}
+
+
+@app.post("/api/models/hydrate")
+def api_hydrate_model(request: Request, machine_id: int = Form(...), model_key: str = Form(...), db: Session = Depends(get_db), current_user: User = Depends(get_current_user)):
+    machine = db.get(Machine, machine_id)
+    model = db.scalar(select(ModelCatalog).where(ModelCatalog.model_key == model_key))
+    if not machine or not model:
+        raise HTTPException(status_code=404, detail="Machine or model not found")
+    if not settings.bucket_name:
+        raise HTTPException(status_code=400, detail="Bucket is not configured")
+    job = Job(job_type="hydrate_model", status="running", actor=current_user.username, machine_id=machine_id, payload={"model_key": model_key}, started_at=utcnow())
+    db.add(job)
+    result = hydrate_model(machine, model.s3_prefix, current_user.username, settings.bucket_name)
+    db.add(AuditEvent(actor=current_user.username, action="hydrate_model", entity_type="machine", entity_id=machine.aws_instance_id, payload={"model_key": model.model_key, "result": result}))
+    job.status = "completed" if result.get("returncode") == 0 else "failed"
+    job.finished_at = utcnow()
+    job.result = result
+    if "text/html" in request.headers.get("accept", ""):
+        set_flash(request, "success" if result.get("returncode") == 0 else "error", f"Hydration {'completed' if result.get('returncode') == 0 else 'failed'} for {model.label} on {machine.aws_instance_id}.")
+        return RedirectResponse("/dashboard", status_code=302)
+    return JSONResponse(result)
+
+
+@app.post("/api/models/register")
+def api_register_model(
+    request: Request,
+    model_key: str = Form(...),
+    label: str = Form(...),
+    source_relative_path: str = Form(...),
+    workload_tags: str = Form(""),
+    compatibility_tags: str = Form(""),
+    db: Session = Depends(get_db),
+    current_user: User = Depends(get_current_user),
+):
+    if not settings.bucket_name:
+        raise HTTPException(status_code=400, detail="Bucket is not configured")
+    job = Job(
+        job_type="register_model",
+        status="running",
+        actor=current_user.username,
+        payload={
+            "model_key": model_key,
+            "label": label,
+            "source_relative_path": source_relative_path,
+            "workload_tags": workload_tags,
+            "compatibility_tags": compatibility_tags,
+        },
+        started_at=utcnow(),
+    )
+    db.add(job)
+    try:
+        result = upload_model_directory(
+            settings.bucket_name,
+            model_key=model_key,
+            source_relative_path=source_relative_path,
+            label=label,
+            workload_tags=parse_tag_list(workload_tags),
+            compatibility_tags=parse_tag_list(compatibility_tags),
+        )
+    except Exception as exc:
+        job.status = "failed"
+        job.finished_at = utcnow()
+        job.result = {"error": str(exc)}
+        db.add(AuditEvent(actor=current_user.username, action="register_model_failed", entity_type="model", entity_id=model_key, payload=job.result))
+        if "text/html" in request.headers.get("accept", ""):
+            set_flash(request, "error", f"Model ingest failed for {model_key}: {exc}")
+            return RedirectResponse("/dashboard", status_code=302)
+        raise HTTPException(status_code=500, detail=str(exc))
+
+    existing = db.scalar(select(ModelCatalog).where(ModelCatalog.model_key == model_key))
+    if existing:
+        existing.label = label
+        existing.s3_prefix = result["s3_prefix"]
+        existing.expected_manifest = result["manifest"]
+        existing.checksums = {entry["path"]: entry["sha256"] for entry in result["manifest"]["files"]}
+        existing.compatibility_tags = result["compatibility_tags"]
+        existing.workload_tags = result["workload_tags"]
+        existing.size_gb = round(result["manifest"]["total_size_bytes"] / (1024 ** 3), 3)
+    else:
+        db.add(
+            ModelCatalog(
+                model_key=model_key,
+                label=label,
+                s3_prefix=result["s3_prefix"],
+                expected_manifest=result["manifest"],
+                checksums={entry["path"]: entry["sha256"] for entry in result["manifest"]["files"]},
+                compatibility_tags=result["compatibility_tags"],
+                workload_tags=result["workload_tags"],
+                size_gb=round(result["manifest"]["total_size_bytes"] / (1024 ** 3), 3),
+            )
+        )
+    job.status = "completed"
+    job.finished_at = utcnow()
+    job.result = {"manifest_key": result["manifest_key"], "file_count": result["manifest"]["file_count"]}
+    db.add(AuditEvent(actor=current_user.username, action="register_model", entity_type="model", entity_id=model_key, payload=job.result))
+    if "text/html" in request.headers.get("accept", ""):
+        set_flash(request, "success", f"Model {model_key} uploaded to S3 and manifest stored.")
+        return RedirectResponse("/dashboard", status_code=302)
+    return JSONResponse(job.result)
+
+
+@app.post("/api/workloads/start")
+def api_start_workload(request: Request, machine_id: int = Form(...), workload_name: str = Form(...), auto_route: bool = Form(False), db: Session = Depends(get_db), current_user: User = Depends(get_current_user)):
+    machine = db.get(Machine, machine_id)
+    workload = db.scalar(select(WorkloadProfile).where(WorkloadProfile.name == workload_name))
+    if not machine or not workload:
+        raise HTTPException(status_code=404, detail="Machine or workload not found")
+    job = Job(job_type="start_workload", status="running", actor=current_user.username, machine_id=machine_id, payload={"workload_name": workload_name, "auto_route": auto_route}, started_at=utcnow())
+    db.add(job)
+    result = start_service(machine, workload.name)
+    route_result = None
+    if result.get("returncode") == 0 and auto_route and workload.route_hostname and workload.default_port and machine.private_ip:
+        route_result = apply_route(workload.route_hostname, "http", machine.private_ip, workload.default_port)
+        existing = db.scalar(select(RouteBinding).where(RouteBinding.hostname == workload.route_hostname))
+        if existing:
+            existing.scheme = "http"
+            existing.target_host = machine.private_ip
+            existing.target_port = workload.default_port
+            existing.status = "active"
+            existing.details = {"managed_by": "ops_control_plane", "machine_id": machine.aws_instance_id}
+        else:
+            db.add(RouteBinding(hostname=workload.route_hostname, target_type="managed", target_host=machine.private_ip, target_port=workload.default_port, scheme="http", status="active", details={"managed_by": "ops_control_plane", "machine_id": machine.aws_instance_id}))
+    db.add(AuditEvent(actor=current_user.username, action="start_workload", entity_type="machine", entity_id=machine.aws_instance_id, payload={"workload": workload.name, "result": result}))
+    job.status = "completed" if result.get("returncode") == 0 else "failed"
+    job.finished_at = utcnow()
+    job.result = {"service": result, "route": route_result}
+    if "text/html" in request.headers.get("accept", ""):
+        set_flash(request, "success" if result.get("returncode") == 0 else "error", f"Start workload {'completed' if result.get('returncode') == 0 else 'failed'} for {workload.name} on {machine.aws_instance_id}.")
+        return RedirectResponse("/dashboard", status_code=302)
+    return JSONResponse({"service": result, "route": route_result})
+
+
+@app.post("/api/workloads/{machine_id}/stop")
+def api_stop_workload(machine_id: int, request: Request, workload_name: str = Form(...), db: Session = Depends(get_db), current_user: User = Depends(get_current_user)):
+    machine = db.get(Machine, machine_id)
+    if not machine:
+        raise HTTPException(status_code=404, detail="Machine not found")
+    job = Job(job_type="stop_workload", status="running", actor=current_user.username, machine_id=machine_id, payload={"workload_name": workload_name}, started_at=utcnow())
+    db.add(job)
+    result = stop_service(machine, workload_name)
+    db.add(AuditEvent(actor=current_user.username, action="stop_workload", entity_type="machine", entity_id=machine.aws_instance_id, payload={"workload": workload_name, "result": result}))
+    job.status = "completed" if result.get("returncode") == 0 else "failed"
+    job.finished_at = utcnow()
+    job.result = result
+    if "text/html" in request.headers.get("accept", ""):
+        set_flash(request, "success" if result.get("returncode") == 0 else "error", f"Stop workload {'completed' if result.get('returncode') == 0 else 'failed'} for {workload_name} on {machine.aws_instance_id}.")
+        return RedirectResponse("/dashboard", status_code=302)
+    return JSONResponse(result)
+
+
+@app.post("/api/routes/map")
+def api_map_route(request: Request, hostname: str = Form(...), scheme: str = Form(...), target_host: str = Form(...), target_port: int = Form(...), db: Session = Depends(get_db), current_user: User = Depends(get_current_user)):
+    job = Job(job_type="map_route", status="running", actor=current_user.username, payload={"hostname": hostname, "scheme": scheme, "target_host": target_host, "target_port": target_port}, started_at=utcnow())
+    db.add(job)
+    result = apply_route(hostname, scheme, target_host, target_port)
+    existing = db.scalar(select(RouteBinding).where(RouteBinding.hostname == hostname))
+    if existing:
+        existing.scheme = scheme
+        existing.target_host = target_host
+        existing.target_port = target_port
+        existing.status = "active"
+    else:
+        db.add(RouteBinding(hostname=hostname, target_type="managed", target_host=target_host, target_port=target_port, scheme=scheme, status="active"))
+    db.add(AuditEvent(actor=current_user.username, action="map_route", entity_type="route", entity_id=hostname, payload=result))
+    job.status = "completed" if result.get("returncode") == 0 else "failed"
+    job.finished_at = utcnow()
+    job.result = result
+    if "text/html" in request.headers.get("accept", ""):
+        set_flash(request, "success" if result.get("returncode") == 0 else "error", f"Route {'mapped' if result.get('returncode') == 0 else 'map failed'} for {hostname}.")
+        return RedirectResponse("/dashboard", status_code=302)
+    return JSONResponse(result)
+
+
+@app.post("/api/routes/unmap")
+def api_unmap_route(request: Request, hostname: str = Form(...), db: Session = Depends(get_db), current_user: User = Depends(get_current_user)):
+    job = Job(job_type="unmap_route", status="running", actor=current_user.username, payload={"hostname": hostname}, started_at=utcnow())
+    db.add(job)
+    result = remove_route(hostname)
+    existing = db.scalar(select(RouteBinding).where(RouteBinding.hostname == hostname))
+    if existing:
+        existing.status = "removed"
+    db.add(AuditEvent(actor=current_user.username, action="unmap_route", entity_type="route", entity_id=hostname, payload=result))
+    job.status = "completed" if result.get("returncode") == 0 else "failed"
+    job.finished_at = utcnow()
+    job.result = result
+    if "text/html" in request.headers.get("accept", ""):
+        set_flash(request, "success" if result.get("returncode") == 0 else "error", f"Route {'removed' if result.get('returncode') == 0 else 'removal failed'} for {hostname}.")
+        return RedirectResponse("/dashboard", status_code=302)
+    return JSONResponse(result)
+
+
+@app.get("/api/markets/pricing")
+def get_market_pricing(current_user: User = Depends(get_current_user), db: Session = Depends(get_db)):
+    rows = db.scalars(select(MarketSnapshot).order_by(MarketSnapshot.observed_at.desc()).limit(200)).all()
+    return [
+        {
+            "region": row.region,
+            "instance_type": row.instance_type,
+            "lifecycle": row.lifecycle,
+            "offering_available": row.offering_available,
+            "hourly_price_usd": row.hourly_price_usd,
+            "observed_at": row.observed_at,
+        }
+        for row in rows
+    ]
+
+
+@app.get("/api/sessions")
+def get_sessions(current_user: User = Depends(get_current_user), db: Session = Depends(get_db)):
+    sessions = db.scalars(select(RuntimeSession).order_by(RuntimeSession.started_at.desc()).limit(200)).all()
+    payload = []
+    for session_row in sessions:
+        machine = db.get(Machine, session_row.machine_id) if session_row.machine_id else None
+        latest_cost = db.scalar(select(SessionCost).where(SessionCost.session_id == session_row.id).order_by(SessionCost.calculated_at.desc()))
+        payload.append(
+            {
+                "id": session_row.id,
+                "actor": session_row.actor,
+                "workload_name": session_row.workload_name,
+                "status": session_row.status,
+                "started_at": session_row.started_at,
+                "ended_at": session_row.ended_at,
+                "notes": session_row.notes,
+                "machine": machine.aws_instance_id if machine else None,
+                "cost": latest_cost.total_cost_usd if latest_cost else None,
+                "runtime_hours": latest_cost.runtime_hours if latest_cost else None,
+            }
+        )
+    return payload
+
+
+@app.get("/api/costs")
+def api_costs(current_user: User = Depends(get_current_user), db: Session = Depends(get_db)):
+    machines = db.scalars(select(Machine)).all()
+    total = 0.0
+    items = []
+    for machine in machines:
+        hourly_rate = latest_market_price(db, machine.region, machine.instance_type, machine.lifecycle)
+        cost = calculate_machine_cost(machine, hourly_rate)
+        total += cost["total_cost_usd"]
+        items.append({"machine": machine.aws_instance_id, **cost})
+    return {"machines": items, "total_estimated_cost_usd": round(total, 4), **recent_totals(db)}
+
+
+@app.get("/api/models")
+def api_models(current_user: User = Depends(get_current_user), db: Session = Depends(get_db)):
+    models = db.scalars(select(ModelCatalog).order_by(ModelCatalog.model_key)).all()
+    return [
+        {
+            "model_key": model.model_key,
+            "label": model.label,
+            "s3_prefix": model.s3_prefix,
+            "size_gb": model.size_gb,
+            "workload_tags": model.workload_tags,
+            "compatibility_tags": model.compatibility_tags,
+            "file_count": (model.expected_manifest or {}).get("file_count", 0),
+        }
+        for model in models
+    ]
+
+
+@app.get("/api/audit")
+def api_audit(current_user: User = Depends(get_current_user), db: Session = Depends(get_db)):
+    events = db.scalars(select(AuditEvent).order_by(AuditEvent.created_at.desc()).limit(100)).all()
+    return [
+        {
+            "actor": event.actor,
+            "action": event.action,
+            "entity_type": event.entity_type,
+            "entity_id": event.entity_id,
+            "payload": event.payload,
+            "created_at": event.created_at,
+        }
+        for event in events
+    ]
+
+
+@app.get("/api/jobs")
+def api_jobs(current_user: User = Depends(get_current_user), db: Session = Depends(get_db)):
+    jobs = db.scalars(select(Job).order_by(Job.created_at.desc()).limit(200)).all()
+    return [
+        {
+            "id": job.id,
+            "job_type": job.job_type,
+            "status": job.status,
+            "actor": job.actor,
+            "machine_id": job.machine_id,
+            "session_id": job.session_id,
+            "payload": job.payload,
+            "result": job.result,
+            "created_at": job.created_at,
+            "finished_at": job.finished_at,
+        }
+        for job in jobs
+    ]
+
+
+@app.get("/api/exports/csv")
+def api_export_csv(current_user: User = Depends(get_current_user), db: Session = Depends(get_db)):
+    target = settings.export_dir / "sessions_latest.csv"
+    export_sessions_csv(db, str(target))
+    db.add(CsvExport(actor=current_user.username, export_type="sessions", path=str(target), details={"format": "csv"}))
+    return {"path": str(target)}
+
+
+if __name__ == "__main__":
+    import uvicorn
+
+    uvicorn.run("ops_control_plane.main:app", host="0.0.0.0", port=8080, reload=False)
--- a/infrastructure/ops_control_plane/app/ops_control_plane/models.py
+++ b/infrastructure/ops_control_plane/app/ops_control_plane/models.py
@@ -0,0 +1,192 @@
+from __future__ import annotations
+
+from datetime import datetime, timezone
+
+from sqlalchemy import Boolean, DateTime, Float, ForeignKey, Integer, JSON, String, Text
+from sqlalchemy.orm import Mapped, mapped_column, relationship
+
+from .database import Base
+
+
+def utcnow() -> datetime:
+    return datetime.now(timezone.utc)
+
+
+class User(Base):
+    __tablename__ = "users"
+    id: Mapped[int] = mapped_column(Integer, primary_key=True)
+    username: Mapped[str] = mapped_column(String(64), unique=True, index=True)
+    password_hash: Mapped[str] = mapped_column(String(255))
+    role: Mapped[str] = mapped_column(String(32), default="admin")
+    is_active: Mapped[bool] = mapped_column(Boolean, default=True)
+    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow)
+
+
+class MachineProfile(Base):
+    __tablename__ = "machine_profiles"
+    id: Mapped[int] = mapped_column(Integer, primary_key=True)
+    name: Mapped[str] = mapped_column(String(64), unique=True)
+    region: Mapped[str] = mapped_column(String(32))
+    instance_type: Mapped[str] = mapped_column(String(32))
+    gpu_label: Mapped[str] = mapped_column(String(64))
+    vcpu: Mapped[int] = mapped_column(Integer)
+    memory_gib: Mapped[float] = mapped_column(Float)
+    preferred_lifecycle: Mapped[str] = mapped_column(String(16), default="spot")
+    launch_config: Mapped[dict] = mapped_column(JSON, default=dict)
+    intended_workloads: Mapped[list] = mapped_column(JSON, default=list)
+    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow)
+
+
+class MarketSnapshot(Base):
+    __tablename__ = "market_snapshots"
+    id: Mapped[int] = mapped_column(Integer, primary_key=True)
+    region: Mapped[str] = mapped_column(String(32), index=True)
+    instance_type: Mapped[str] = mapped_column(String(32), index=True)
+    lifecycle: Mapped[str] = mapped_column(String(16), index=True)
+    offering_available: Mapped[bool] = mapped_column(Boolean, default=False)
+    hourly_price_usd: Mapped[float | None] = mapped_column(Float, nullable=True)
+    source: Mapped[str] = mapped_column(String(32), default="aws")
+    raw_payload: Mapped[dict] = mapped_column(JSON, default=dict)
+    observed_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow, index=True)
+
+
+class Machine(Base):
+    __tablename__ = "machines"
+    id: Mapped[int] = mapped_column(Integer, primary_key=True)
+    aws_instance_id: Mapped[str] = mapped_column(String(32), unique=True, index=True)
+    name: Mapped[str] = mapped_column(String(128))
+    region: Mapped[str] = mapped_column(String(32))
+    profile_name: Mapped[str | None] = mapped_column(String(64), nullable=True)
+    instance_type: Mapped[str] = mapped_column(String(32))
+    lifecycle: Mapped[str] = mapped_column(String(16))
+    state: Mapped[str] = mapped_column(String(32))
+    public_ip: Mapped[str | None] = mapped_column(String(64), nullable=True)
+    private_ip: Mapped[str | None] = mapped_column(String(64), nullable=True)
+    launch_time: Mapped[datetime | None] = mapped_column(DateTime(timezone=True), nullable=True)
+    volume_gb: Mapped[int] = mapped_column(Integer, default=0)
+    public_ipv4_attached: Mapped[bool] = mapped_column(Boolean, default=False)
+    details: Mapped[dict] = mapped_column(JSON, default=dict)
+    updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow, onupdate=utcnow)
+
+
+class WorkloadProfile(Base):
+    __tablename__ = "workload_profiles"
+    id: Mapped[int] = mapped_column(Integer, primary_key=True)
+    name: Mapped[str] = mapped_column(String(64), unique=True)
+    service_type: Mapped[str] = mapped_column(String(32))
+    model_requirements: Mapped[list] = mapped_column(JSON, default=list)
+    default_port: Mapped[int | None] = mapped_column(Integer, nullable=True)
+    start_command: Mapped[str | None] = mapped_column(Text, nullable=True)
+    stop_command: Mapped[str | None] = mapped_column(Text, nullable=True)
+    healthcheck_path: Mapped[str | None] = mapped_column(String(255), nullable=True)
+    route_hostname: Mapped[str | None] = mapped_column(String(255), nullable=True)
+
+
+class Job(Base):
+    __tablename__ = "jobs"
+    id: Mapped[int] = mapped_column(Integer, primary_key=True)
+    job_type: Mapped[str] = mapped_column(String(32), index=True)
+    status: Mapped[str] = mapped_column(String(32), index=True, default="queued")
+    payload: Mapped[dict] = mapped_column(JSON, default=dict)
+    result: Mapped[dict] = mapped_column(JSON, default=dict)
+    actor: Mapped[str | None] = mapped_column(String(64), nullable=True)
+    machine_id: Mapped[int | None] = mapped_column(ForeignKey("machines.id"), nullable=True)
+    session_id: Mapped[int | None] = mapped_column(ForeignKey("sessions.id"), nullable=True)
+    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow)
+    started_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True), nullable=True)
+    finished_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True), nullable=True)
+
+
+class Session(Base):
+    __tablename__ = "sessions"
+    id: Mapped[int] = mapped_column(Integer, primary_key=True)
+    machine_id: Mapped[int | None] = mapped_column(ForeignKey("machines.id"), nullable=True)
+    actor: Mapped[str | None] = mapped_column(String(64), nullable=True)
+    workload_name: Mapped[str | None] = mapped_column(String(64), nullable=True)
+    status: Mapped[str] = mapped_column(String(32), default="active")
+    started_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow)
+    ended_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True), nullable=True)
+    notes: Mapped[str | None] = mapped_column(Text, nullable=True)
+    cost_records: Mapped[list["SessionCost"]] = relationship(back_populates="session")
+
+
+class SessionCost(Base):
+    __tablename__ = "session_costs"
+    id: Mapped[int] = mapped_column(Integer, primary_key=True)
+    session_id: Mapped[int] = mapped_column(ForeignKey("sessions.id"))
+    runtime_hours: Mapped[float] = mapped_column(Float, default=0.0)
+    compute_cost_usd: Mapped[float] = mapped_column(Float, default=0.0)
+    storage_cost_usd: Mapped[float] = mapped_column(Float, default=0.0)
+    public_ip_cost_usd: Mapped[float] = mapped_column(Float, default=0.0)
+    total_cost_usd: Mapped[float] = mapped_column(Float, default=0.0)
+    calculated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow)
+    session: Mapped[Session] = relationship(back_populates="cost_records")
+
+
+class ModelCatalog(Base):
+    __tablename__ = "model_catalog"
+    id: Mapped[int] = mapped_column(Integer, primary_key=True)
+    model_key: Mapped[str] = mapped_column(String(128), unique=True)
+    label: Mapped[str] = mapped_column(String(255))
+    s3_prefix: Mapped[str] = mapped_column(String(512))
+    expected_manifest: Mapped[dict] = mapped_column(JSON, default=dict)
+    checksums: Mapped[dict] = mapped_column(JSON, default=dict)
+    compatibility_tags: Mapped[list] = mapped_column(JSON, default=list)
+    workload_tags: Mapped[list] = mapped_column(JSON, default=list)
+    size_gb: Mapped[float | None] = mapped_column(Float, nullable=True)
+    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow)
+
+
+class MachineModelCache(Base):
+    __tablename__ = "machine_model_cache"
+    id: Mapped[int] = mapped_column(Integer, primary_key=True)
+    machine_id: Mapped[int] = mapped_column(ForeignKey("machines.id"))
+    model_key: Mapped[str] = mapped_column(String(128))
+    status: Mapped[str] = mapped_column(String(32), default="pending")
+    path_on_machine: Mapped[str | None] = mapped_column(String(512), nullable=True)
+    hydrated_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True), nullable=True)
+    details: Mapped[dict] = mapped_column(JSON, default=dict)
+
+
+class RouteBinding(Base):
+    __tablename__ = "route_bindings"
+    id: Mapped[int] = mapped_column(Integer, primary_key=True)
+    hostname: Mapped[str] = mapped_column(String(255), unique=True)
+    target_type: Mapped[str] = mapped_column(String(32))
+    target_host: Mapped[str] = mapped_column(String(255))
+    target_port: Mapped[int] = mapped_column(Integer)
+    scheme: Mapped[str] = mapped_column(String(16), default="http")
+    status: Mapped[str] = mapped_column(String(32), default="active")
+    details: Mapped[dict] = mapped_column(JSON, default=dict)
+    updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow, onupdate=utcnow)
+
+
+class ServiceState(Base):
+    __tablename__ = "service_states"
+    id: Mapped[int] = mapped_column(Integer, primary_key=True)
+    machine_id: Mapped[int | None] = mapped_column(ForeignKey("machines.id"), nullable=True)
+    service_name: Mapped[str] = mapped_column(String(64))
+    status: Mapped[str] = mapped_column(String(32))
+    details: Mapped[dict] = mapped_column(JSON, default=dict)
+    updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow, onupdate=utcnow)
+
+
+class AuditEvent(Base):
+    __tablename__ = "audit_events"
+    id: Mapped[int] = mapped_column(Integer, primary_key=True)
+    actor: Mapped[str | None] = mapped_column(String(64), nullable=True)
+    action: Mapped[str] = mapped_column(String(64))
+    entity_type: Mapped[str] = mapped_column(String(64))
+    entity_id: Mapped[str | None] = mapped_column(String(128), nullable=True)
+    payload: Mapped[dict] = mapped_column(JSON, default=dict)
+    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow)
+
+
+class CsvExport(Base):
+    __tablename__ = "csv_exports"
+    id: Mapped[int] = mapped_column(Integer, primary_key=True)
+    actor: Mapped[str | None] = mapped_column(String(64), nullable=True)
+    export_type: Mapped[str] = mapped_column(String(64))
+    path: Mapped[str] = mapped_column(String(512))
+    details: Mapped[dict] = mapped_column(JSON, default=dict)
+    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow)
--- a/infrastructure/ops_control_plane/app/ops_control_plane/route_control.py
+++ b/infrastructure/ops_control_plane/app/ops_control_plane/route_control.py
@@ -0,0 +1,45 @@
+from __future__ import annotations
+
+import json
+import subprocess
+
+from .config import settings
+
+
+def run_ingress_command(command: str) -> subprocess.CompletedProcess[str]:
+    return subprocess.run(
+        [
+            "ssh",
+            "-o",
+            "StrictHostKeyChecking=no",
+            "-o",
+            "UserKnownHostsFile=NUL",
+            "-i",
+            str(settings.ssh_key_path),
+            "-p",
+            str(settings.ingress_ssh_port),
+            f"{settings.ingress_ssh_user}@{settings.ingress_ssh_host}",
+            command,
+        ],
+        capture_output=True,
+        text=True,
+        check=False,
+    )
+
+
+def apply_route(hostname: str, scheme: str, target_host: str, target_port: int) -> dict:
+    payload = json.dumps(
+        {"hostname": hostname, "scheme": scheme, "target_host": target_host, "target_port": target_port}
+    )
+    result = run_ingress_command(
+        f"sudo {settings.ingress_route_helper} upsert '{payload}' && sudo systemctl reload caddy"
+    )
+    return {"stdout": result.stdout, "stderr": result.stderr, "returncode": result.returncode}
+
+
+def remove_route(hostname: str) -> dict:
+    result = run_ingress_command(
+        f"sudo {settings.ingress_route_helper} delete {hostname} && sudo systemctl reload caddy"
+    )
+    return {"stdout": result.stdout, "stderr": result.stderr, "returncode": result.returncode}
+
--- a/infrastructure/ops_control_plane/app/ops_control_plane/security.py
+++ b/infrastructure/ops_control_plane/app/ops_control_plane/security.py
@@ -0,0 +1,30 @@
+from __future__ import annotations
+
+from fastapi import Depends, HTTPException, Request, status
+from passlib.context import CryptContext
+from sqlalchemy import select
+from sqlalchemy.orm import Session
+
+from .database import get_db
+from .models import User
+
+
+pwd_context = CryptContext(schemes=["pbkdf2_sha256"], deprecated="auto")
+
+
+def hash_password(password: str) -> str:
+    return pwd_context.hash(password)
+
+
+def verify_password(password: str, password_hash: str) -> bool:
+    return pwd_context.verify(password, password_hash)
+
+
+def get_current_user(request: Request, db: Session = Depends(get_db)) -> User:
+    username = request.session.get("username")
+    if not username:
+        raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED)
+    user = db.scalar(select(User).where(User.username == username, User.is_active.is_(True)))
+    if not user:
+        raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED)
+    return user
--- a/infrastructure/ops_control_plane/app/ops_control_plane/seed.py
+++ b/infrastructure/ops_control_plane/app/ops_control_plane/seed.py
@@ -0,0 +1,160 @@
+from __future__ import annotations
+
+import json
+
+from sqlalchemy import select
+from sqlalchemy.orm import Session
+
+from .config import settings
+from .models import MachineProfile, ModelCatalog, User, WorkloadProfile
+from .security import hash_password
+
+
+DEFAULT_MACHINE_PROFILES = [
+    {
+        "name": "t4g-micro-ingress",
+        "region": "us-east-1",
+        "instance_type": "t4g.micro",
+        "gpu_label": "Ingress CPU",
+        "vcpu": 2,
+        "memory_gib": 1.0,
+        "preferred_lifecycle": "on-demand",
+        "intended_workloads": ["ingress"],
+    },
+    {
+        "name": "g6-xlarge",
+        "region": "us-east-1",
+        "instance_type": "g6.xlarge",
+        "gpu_label": "1x NVIDIA L4",
+        "vcpu": 4,
+        "memory_gib": 16.0,
+        "preferred_lifecycle": "spot",
+        "intended_workloads": ["light-comfy", "qwen-edit"],
+    },
+    {
+        "name": "g6-2xlarge",
+        "region": "us-east-1",
+        "instance_type": "g6.2xlarge",
+        "gpu_label": "1x NVIDIA L4",
+        "vcpu": 8,
+        "memory_gib": 32.0,
+        "preferred_lifecycle": "spot",
+        "intended_workloads": ["comfyui", "qwen-edit"],
+    },
+    {
+        "name": "g6-4xlarge",
+        "region": "us-east-1",
+        "instance_type": "g6.4xlarge",
+        "gpu_label": "1x NVIDIA L4",
+        "vcpu": 16,
+        "memory_gib": 64.0,
+        "preferred_lifecycle": "spot",
+        "intended_workloads": ["comfyui", "wan-video", "qwen-edit"],
+    },
+    {
+        "name": "g6-12xlarge",
+        "region": "us-east-1",
+        "instance_type": "g6.12xlarge",
+        "gpu_label": "4x NVIDIA L4",
+        "vcpu": 48,
+        "memory_gib": 192.0,
+        "preferred_lifecycle": "spot",
+        "intended_workloads": ["comfyui", "batch-storyboard", "qwen-edit", "multi-gpu"],
+    },
+]
+
+
+DEFAULT_WORKLOADS = [
+    {
+        "name": "comfyui",
+        "service_type": "systemd",
+        "model_requirements": [],
+        "default_port": 8188,
+        "start_command": "sudo systemctl start comfyui",
+        "stop_command": "sudo systemctl stop comfyui",
+        "healthcheck_path": "/",
+        "route_hostname": "comfy.desineuron.in",
+    },
+]
+
+
+DEFAULT_MODELS = [
+    {
+        "model_key": "qwen-image-edit-2511",
+        "label": "Qwen Image Edit 2511",
+        "s3_prefix": "models/qwen-image-edit-2511/",
+        "compatibility_tags": ["qwen", "image-edit"],
+        "workload_tags": ["comfyui", "qwen-edit"],
+    },
+    {
+        "model_key": "qwen-image-2512",
+        "label": "Qwen Image 2512",
+        "s3_prefix": "models/qwen-image-2512/",
+        "compatibility_tags": ["qwen", "image"],
+        "workload_tags": ["comfyui", "qwen-image"],
+    },
+]
+
+
+def seed_defaults(db: Session) -> None:
+    if not db.scalar(select(User).where(User.username == settings.admin_username)):
+        db.add(
+            User(
+                username=settings.admin_username,
+                password_hash=hash_password(settings.admin_password),
+                role="admin",
+            )
+        )
+
+    try:
+        team_users = json.loads(settings.team_users_json)
+    except json.JSONDecodeError:
+        team_users = []
+    for row in team_users:
+        username = row.get("username")
+        password = row.get("password")
+        role = row.get("role", "operator")
+        if not username or not password:
+            continue
+        existing_user = db.scalar(select(User).where(User.username == username))
+        if existing_user:
+            existing_user.role = role
+            existing_user.is_active = True
+            if row.get("reset_password"):
+                existing_user.password_hash = hash_password(password)
+            continue
+        db.add(User(username=username, password_hash=hash_password(password), role=role))
+
+    for profile in DEFAULT_MACHINE_PROFILES:
+        existing = db.scalar(select(MachineProfile).where(MachineProfile.name == profile["name"]))
+        if existing:
+            existing.launch_config = {
+                "ami_id": settings.gpu_ami_id,
+                "subnet_id": settings.gpu_subnet_id,
+                "security_group_ids": list(settings.gpu_security_group_ids),
+                "key_name": settings.gpu_key_name,
+                "instance_profile": settings.gpu_instance_profile,
+                "root_volume_gb": settings.gpu_root_volume_gb,
+            }
+            continue
+        db.add(
+            MachineProfile(
+                **profile,
+                launch_config={
+                    "ami_id": settings.gpu_ami_id,
+                    "subnet_id": settings.gpu_subnet_id,
+                    "security_group_ids": list(settings.gpu_security_group_ids),
+                    "key_name": settings.gpu_key_name,
+                    "instance_profile": settings.gpu_instance_profile,
+                    "root_volume_gb": settings.gpu_root_volume_gb,
+                },
+            )
+        )
+
+    for workload in DEFAULT_WORKLOADS:
+        if not db.scalar(select(WorkloadProfile).where(WorkloadProfile.name == workload["name"])):
+            db.add(WorkloadProfile(**workload))
+
+    for model in DEFAULT_MODELS:
+        if not db.scalar(select(ModelCatalog).where(ModelCatalog.model_key == model["model_key"])):
+            db.add(ModelCatalog(**model))
--- a/infrastructure/ops_control_plane/app/ops_control_plane/static/style.css
+++ b/infrastructure/ops_control_plane/app/ops_control_plane/static/style.css
@@ -0,0 +1,209 @@
+html{color-scheme:dark}
+body{
+  font-family:Segoe UI,system-ui,sans-serif;
+  background:
+    radial-gradient(circle at top right, rgba(220,38,38,.18), transparent 28%),
+    radial-gradient(circle at left 20%, rgba(239,68,68,.09), transparent 24%),
+    linear-gradient(180deg, #020202 0%, #070707 100%);
+  color:#f5f5f5;
+  margin:0;
+  min-height:100vh;
+}
+.hud-grid{
+  position:fixed;
+  inset:0;
+  pointer-events:none;
+  background-image:
+    linear-gradient(rgba(255,255,255,.02) 1px, transparent 1px),
+    linear-gradient(90deg, rgba(255,255,255,.02) 1px, transparent 1px);
+  background-size:32px 32px;
+  mask-image:linear-gradient(180deg, rgba(0,0,0,.35), rgba(0,0,0,.85));
+}
+.topbar{
+  position:sticky;
+  top:0;
+  z-index:10;
+  display:flex;
+  justify-content:space-between;
+  align-items:center;
+  padding:22px 30px;
+  background:rgba(10,10,10,.9);
+  backdrop-filter:blur(18px);
+  border-bottom:1px solid rgba(255,255,255,.07);
+  box-shadow:0 10px 40px rgba(0,0,0,.4);
+}
+.topbar h1{
+  margin:0;
+  font-size:24px;
+  letter-spacing:.04em;
+  text-transform:uppercase;
+}
+.topbar p{
+  margin:5px 0 0;
+  color:#b8b8b8;
+  max-width:760px;
+}
+.topbar-actions{
+  display:flex;
+  gap:12px;
+  align-items:center;
+}
+.user-chip{
+  display:inline-flex;
+  align-items:center;
+  padding:8px 12px;
+  border:1px solid rgba(248,113,113,.45);
+  border-radius:999px;
+  color:#fca5a5;
+  background:rgba(127,29,29,.22);
+  box-shadow:0 0 24px rgba(220,38,38,.15) inset;
+}
+.topbar-actions a,.button,button{
+  display:inline-flex;
+  align-items:center;
+  justify-content:center;
+  gap:8px;
+  background:linear-gradient(180deg, #ef4444 0%, #991b1b 100%);
+  color:#fff;
+  border:1px solid rgba(248,113,113,.5);
+  border-radius:12px;
+  padding:10px 14px;
+  text-decoration:none;
+  cursor:pointer;
+  box-shadow:0 0 24px rgba(220,38,38,.18);
+}
+.button.secondary,button.secondary{
+  background:rgba(255,255,255,.04);
+  border-color:rgba(255,255,255,.14);
+  color:#fff;
+  box-shadow:none;
+}
+.button.danger,button.danger{
+  background:linear-gradient(180deg, #dc2626 0%, #7f1d1d 100%);
+}
+.page{
+  position:relative;
+  padding:26px;
+}
+.grid{display:grid;gap:20px}
+.grid.two{grid-template-columns:repeat(2,minmax(0,1fr))}
+.grid.three{grid-template-columns:repeat(3,minmax(0,1fr))}
+.summary-grid{display:grid;grid-template-columns:repeat(4,minmax(0,1fr));gap:20px;margin-bottom:20px}
+.card{
+  position:relative;
+  overflow:hidden;
+  background:linear-gradient(180deg, rgba(16,16,16,.88) 0%, rgba(8,8,8,.92) 100%);
+  border:1px solid rgba(255,255,255,.08);
+  border-radius:20px;
+  padding:22px;
+  margin-bottom:20px;
+  box-shadow:
+    0 16px 40px rgba(0,0,0,.45),
+    0 0 0 1px rgba(255,255,255,.02) inset;
+}
+.card::after{
+  content:"";
+  position:absolute;
+  inset:auto -20% -60% auto;
+  width:180px;
+  height:180px;
+  background:radial-gradient(circle, rgba(220,38,38,.16), transparent 65%);
+  pointer-events:none;
+}
+.card h2{
+  margin:0 0 16px;
+  font-size:18px;
+  letter-spacing:.04em;
+  text-transform:uppercase;
+}
+.card.narrow{max-width:460px;margin:90px auto}
+.card.stat strong{
+  display:block;
+  font-size:30px;
+  margin:8px 0;
+  color:#fff;
+}
+.eyebrow{
+  color:#f87171;
+  font-size:11px;
+  letter-spacing:.18em;
+  text-transform:uppercase;
+}
+.flash{
+  display:flex;
+  gap:12px;
+  align-items:center;
+}
+.flash.success{
+  border-color:rgba(248,113,113,.35);
+  background:linear-gradient(180deg, rgba(127,29,29,.25) 0%, rgba(18,18,18,.95) 100%);
+}
+.flash.error{
+  border-color:rgba(248,113,113,.6);
+  background:linear-gradient(180deg, rgba(69,10,10,.55) 0%, rgba(18,18,18,.95) 100%);
+}
+.stack{display:grid;gap:12px}
+.action-stack{display:grid;gap:8px}
+.plain-list{padding-left:18px;margin:0;display:grid;gap:8px;color:#d6d6d6}
+.kv-list{display:grid;gap:10px}
+.kv-list div{display:flex;justify-content:space-between;gap:12px}
+.checkbox-row{
+  display:flex;
+  align-items:center;
+  gap:10px;
+  color:#f5f5f5;
+}
+label{display:grid;gap:6px;color:#d0d0d0}
+input,select{
+  padding:11px 12px;
+  border-radius:12px;
+  border:1px solid rgba(255,255,255,.12);
+  background:rgba(255,255,255,.03);
+  color:#fff;
+  outline:none;
+}
+input:focus,select:focus{
+  border-color:rgba(248,113,113,.75);
+  box-shadow:0 0 0 3px rgba(220,38,38,.16);
+}
+table{width:100%;border-collapse:collapse}
+th,td{
+  padding:12px 10px;
+  border-bottom:1px solid rgba(255,255,255,.08);
+  text-align:left;
+  vertical-align:top;
+}
+th{
+  color:#fca5a5;
+  font-weight:600;
+  font-size:12px;
+  letter-spacing:.08em;
+  text-transform:uppercase;
+}
+.pill{
+  display:inline-block;
+  padding:4px 10px;
+  border-radius:999px;
+  font-size:12px;
+  background:rgba(255,255,255,.06);
+  color:#f3f3f3;
+}
+.pill.available{
+  background:rgba(127,29,29,.45);
+  color:#fecaca;
+  border:1px solid rgba(248,113,113,.3);
+}
+.pill.unavailable{
+  background:rgba(31,31,31,.9);
+  color:#d4d4d4;
+}
+.pill.unknown{
+  background:rgba(55,65,81,.5);
+  color:#e5e7eb;
+}
+.muted{color:#a3a3a3;font-size:12px}
+.error{color:#fca5a5}
+
+@media (max-width: 1100px){
+  .grid.two,.grid.three,.summary-grid{grid-template-columns:1fr}
+}
--- a/infrastructure/ops_control_plane/app/ops_control_plane/templates/base.html
+++ b/infrastructure/ops_control_plane/app/ops_control_plane/templates/base.html
@@ -0,0 +1,27 @@
+<!doctype html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <title>{{ title or "Desineuron Ops" }}</title>
+    <link rel="stylesheet" href="/static/style.css">
+  </head>
+  <body>
+    <div class="hud-grid" aria-hidden="true"></div>
+    <header class="topbar">
+      <div>
+        <h1>Desineuron Ops Control Plane</h1>
+        <p>Linux-hosted AWS control surface for machines, models, routes, and cost</p>
+      </div>
+      {% if user %}
+      <div class="topbar-actions">
+        <span class="user-chip">{{ user.username }}</span>
+        <a href="/logout">Logout</a>
+      </div>
+      {% endif %}
+    </header>
+    <main class="page">
+      {% block content %}{% endblock %}
+    </main>
+  </body>
+  </html>
--- a/infrastructure/ops_control_plane/app/ops_control_plane/templates/index.html
+++ b/infrastructure/ops_control_plane/app/ops_control_plane/templates/index.html
@@ -0,0 +1,355 @@
+{% extends "base.html" %}
+{% block content %}
+{% if flash %}
+<section class="card flash {{ flash.level }}">
+  <strong>{{ flash.level|capitalize }}</strong>
+  <span>{{ flash.message }}</span>
+</section>
+{% endif %}
+
+<section class="summary-grid">
+  <article class="card stat">
+    <span class="eyebrow">Machines</span>
+    <strong>{{ summary.machine_count }}</strong>
+    <span class="muted">Known AWS nodes</span>
+  </article>
+  <article class="card stat">
+    <span class="eyebrow">Hourly Burn</span>
+    <strong>${{ summary.hourly_burn_usd }}</strong>
+    <span class="muted">Estimated live blended hourly cost</span>
+  </article>
+  <article class="card stat">
+    <span class="eyebrow">24h Cost</span>
+    <strong>${{ summary.last_24h_usd }}</strong>
+    <span class="muted">Rolling 24 hour estimate</span>
+  </article>
+  <article class="card stat">
+    <span class="eyebrow">30d Cost</span>
+    <strong>${{ summary.last_30d_usd }}</strong>
+    <span class="muted">Rolling 30 day estimate</span>
+  </article>
+</section>
+
+<div class="grid three">
+  <section class="card">
+    <h2>Control Surface</h2>
+    <div class="kv-list">
+      <div><span>Bucket</span><strong>{{ bucket_name or "not configured" }}</strong></div>
+      <div><span>Visible regions</span><strong>{{ regions|join(", ") }}</strong></div>
+      <div><span>Active sessions</span><strong>{{ summary.active_sessions }}</strong></div>
+      <div><span>Active jobs</span><strong>{{ summary.active_jobs }}</strong></div>
+      <div><span>Active routes</span><strong>{{ summary.routes_active }}</strong></div>
+      <div><span>Fleet est. cost</span><strong>${{ summary.fleet_estimated_cost_usd }}</strong></div>
+    </div>
+  </section>
+
+  <section class="card">
+    <h2>Launch Machine</h2>
+    <form method="post" action="/api/machines/launch" class="stack">
+      <label>Profile
+        <select name="profile_name">
+          {% for profile in profiles %}
+          <option value="{{ profile.name }}">{{ profile.name }} | {{ profile.instance_type }} | {{ profile.gpu_label }}</option>
+          {% endfor %}
+        </select>
+      </label>
+      <label>Lifecycle
+        <select name="lifecycle">
+          <option value="spot">spot</option>
+          <option value="on-demand">on-demand</option>
+        </select>
+      </label>
+      <button type="submit">Launch Selected Machine</button>
+    </form>
+  </section>
+
+  <section class="card">
+    <h2>Runbooks</h2>
+    <ul class="plain-list">
+      <li>1. Launch preferred GPU profile.</li>
+      <li>2. Hydrate required model from S3.</li>
+      <li>3. Start workload and optionally map route.</li>
+      <li>4. Monitor runtime and estimated cost.</li>
+      <li>5. Stop or terminate the node when done.</li>
+    </ul>
+    <a class="button secondary" href="/api/exports/csv">Export Sessions CSV</a>
+  </section>
+</div>
+
+<section class="card">
+  <h2>Markets</h2>
+  <table>
+    <thead>
+      <tr>
+        <th>Profile</th>
+        <th>Instance</th>
+        <th>GPU</th>
+        <th>vCPU / RAM</th>
+        <th>Region</th>
+        <th>On-Demand</th>
+        <th>Spot</th>
+        <th>Preferred Use</th>
+      </tr>
+    </thead>
+    <tbody>
+      {% for profile in profiles %}
+        {% for region in regions %}
+          {% set ns = namespace(on_demand='-', on_demand_status='unknown', spot='-', spot_status='unknown') %}
+          {% for market in market_rows %}
+            {% if market.region == region and market.instance_type == profile.instance_type and market.lifecycle == 'on-demand' %}
+              {% set ns.on_demand = '$' ~ market.hourly_price_usd if market.hourly_price_usd is not none else '-' %}
+              {% set ns.on_demand_status = 'available' if market.offering_available else 'unavailable' %}
+            {% endif %}
+            {% if market.region == region and market.instance_type == profile.instance_type and market.lifecycle == 'spot' %}
+              {% set ns.spot = '$' ~ market.hourly_price_usd if market.hourly_price_usd is not none else '-' %}
+              {% set ns.spot_status = 'available' if market.offering_available else 'unavailable' %}
+            {% endif %}
+          {% endfor %}
+          <tr>
+            <td>{{ profile.name }}</td>
+            <td>{{ profile.instance_type }}</td>
+            <td>{{ profile.gpu_label }}</td>
+            <td>{{ profile.vcpu }} / {{ profile.memory_gib }} GiB</td>
+            <td>{{ region }}</td>
+            <td><span class="pill {{ ns.on_demand_status }}">{{ ns.on_demand }}</span></td>
+            <td><span class="pill {{ ns.spot_status }}">{{ ns.spot }}</span></td>
+            <td>{{ profile.intended_workloads|join(", ") }}</td>
+          </tr>
+        {% endfor %}
+      {% endfor %}
+    </tbody>
+  </table>
+</section>
+
+<section class="card">
+  <h2>Machines</h2>
+  <table>
+    <thead>
+      <tr>
+        <th>Name</th>
+        <th>Type</th>
+        <th>State</th>
+        <th>IPs</th>
+        <th>Runtime</th>
+        <th>Cost</th>
+        <th>Actions</th>
+      </tr>
+    </thead>
+    <tbody>
+      {% for machine in machines %}
+      <tr>
+        <td>
+          <strong>{{ machine.name }}</strong>
+          <div class="muted">{{ machine.aws_instance_id }}</div>
+        </td>
+        <td>
+          <div>{{ machine.instance_type }}</div>
+          <div class="muted">{{ machine.lifecycle }} / {{ machine.region }}</div>
+        </td>
+        <td>{{ machine.state }}</td>
+        <td>
+          <div>{{ machine.public_ip or "-" }}</div>
+          <div class="muted">{{ machine.private_ip or "-" }}</div>
+        </td>
+        <td>{{ costs[machine.aws_instance_id].runtime_hours if machine.aws_instance_id in costs else "-" }} h</td>
+        <td>
+          <div>${{ costs[machine.aws_instance_id].total_cost_usd if machine.aws_instance_id in costs else "-" }}</div>
+          <div class="muted">${{ costs[machine.aws_instance_id].hourly_price_usd if machine.aws_instance_id in costs else "-" }}/hr</div>
+        </td>
+        <td>
+          <div class="action-stack">
+            <form method="post" action="/api/machines/{{ machine.id }}/stop">
+              <button type="submit" class="button secondary">Stop</button>
+            </form>
+            <form method="post" action="/api/machines/{{ machine.id }}/terminate">
+              <button type="submit" class="button danger">Terminate</button>
+            </form>
+          </div>
+        </td>
+      </tr>
+      {% endfor %}
+    </tbody>
+  </table>
+</section>
+
+<div class="grid two">
+  <section class="card">
+    <h2>Model Library Ingest</h2>
+    <form method="post" action="/api/models/register" class="stack">
+      <label>Model Key <input type="text" name="model_key" placeholder="qwen-image-edit-2511" required></label>
+      <label>Label <input type="text" name="label" placeholder="Qwen Image Edit 2511" required></label>
+      <label>Source Path Under Linux Model Library <input type="text" name="source_relative_path" placeholder="Qwen-Image-Edit-2511" required></label>
+      <label>Workload Tags <input type="text" name="workload_tags" placeholder="comfyui, qwen-edit"></label>
+      <label>Compatibility Tags <input type="text" name="compatibility_tags" placeholder="qwen, image-edit"></label>
+      <button type="submit">Upload to S3 + Generate Manifest</button>
+    </form>
+  </section>
+
+  <section class="card">
+    <h2>Hydrate Model</h2>
+    <form method="post" action="/api/models/hydrate" class="stack">
+      <label>Machine
+        <select name="machine_id">
+          {% for machine in machines %}
+          <option value="{{ machine.id }}">{{ machine.name }} ({{ machine.aws_instance_id }})</option>
+          {% endfor %}
+        </select>
+      </label>
+      <label>Model
+        <select name="model_key">
+          {% for model in models %}
+          <option value="{{ model.model_key }}">{{ model.label }}</option>
+          {% endfor %}
+        </select>
+      </label>
+      <button type="submit">Hydrate from S3</button>
+    </form>
+  </section>
+
+  <section class="card">
+    <h2>Start Workload</h2>
+    <form method="post" action="/api/workloads/start" class="stack">
+      <label>Machine
+        <select name="machine_id">
+          {% for machine in machines %}
+          <option value="{{ machine.id }}">{{ machine.name }}</option>
+          {% endfor %}
+        </select>
+      </label>
+      <label>Workload
+        <select name="workload_name">
+          {% for workload in workloads %}
+          <option value="{{ workload.name }}">{{ workload.name }}</option>
+          {% endfor %}
+        </select>
+      </label>
+      <label class="checkbox-row"><input type="checkbox" name="auto_route" value="true"> Auto-map workload hostname via ingress</label>
+      <button type="submit">Start Workload</button>
+    </form>
+  </section>
+</div>
+
+<section class="card">
+  <h2>Registered Models</h2>
+  <table>
+    <thead>
+      <tr><th>Model</th><th>S3 Prefix</th><th>Size</th><th>Files</th><th>Tags</th></tr>
+    </thead>
+    <tbody>
+      {% for model in models %}
+      <tr>
+        <td>
+          <strong>{{ model.label }}</strong>
+          <div class="muted">{{ model.model_key }}</div>
+        </td>
+        <td>{{ model.s3_prefix }}</td>
+        <td>{{ model.size_gb or "-" }} GiB</td>
+        <td>{{ model.expected_manifest.file_count if model.expected_manifest else "-" }}</td>
+        <td>
+          <div>{{ model.workload_tags|join(", ") }}</div>
+          <div class="muted">{{ model.compatibility_tags|join(", ") }}</div>
+        </td>
+      </tr>
+      {% endfor %}
+    </tbody>
+  </table>
+</section>
+
+<div class="grid two">
+  <section class="card">
+    <h2>Route Management</h2>
+    <form method="post" action="/api/routes/map" class="stack">
+      <label>Hostname <input type="text" name="hostname" placeholder="gpu-ui.desineuron.in" required></label>
+      <label>Scheme
+        <select name="scheme">
+          <option value="http">http</option>
+          <option value="https">https</option>
+        </select>
+      </label>
+      <label>Target Host <input type="text" name="target_host" placeholder="172.31.x.x" required></label>
+      <label>Target Port <input type="number" name="target_port" value="8188" required></label>
+      <button type="submit">Map Route</button>
+    </form>
+    <table>
+      <thead>
+        <tr><th>Hostname</th><th>Target</th><th>Status</th><th>Action</th></tr>
+      </thead>
+      <tbody>
+        {% for route in routes %}
+        <tr>
+          <td>{{ route.hostname }}</td>
+          <td>{{ route.scheme }}://{{ route.target_host }}:{{ route.target_port }}</td>
+          <td>{{ route.status }}</td>
+          <td>
+            <form method="post" action="/api/routes/unmap">
+              <input type="hidden" name="hostname" value="{{ route.hostname }}">
+              <button type="submit" class="button secondary">Unmap</button>
+            </form>
+          </td>
+        </tr>
+        {% endfor %}
+      </tbody>
+    </table>
+  </section>
+
+  <section class="card">
+    <h2>Recent Sessions</h2>
+    <table>
+      <thead>
+        <tr><th>Actor</th><th>Workload</th><th>Status</th><th>Started</th></tr>
+      </thead>
+      <tbody>
+        {% for session in sessions %}
+        <tr>
+          <td>{{ session.actor }}</td>
+          <td>{{ session.workload_name }}</td>
+          <td>{{ session.status }}</td>
+          <td>{{ session.started_at }}</td>
+        </tr>
+        {% endfor %}
+      </tbody>
+    </table>
+  </section>
+</div>
+
+<div class="grid two">
+  <section class="card">
+    <h2>Recent Jobs</h2>
+    <table>
+      <thead>
+        <tr><th>ID</th><th>Type</th><th>Status</th><th>Actor</th><th>Created</th></tr>
+      </thead>
+      <tbody>
+        {% for job in jobs %}
+        <tr>
+          <td>{{ job.id }}</td>
+          <td>{{ job.job_type }}</td>
+          <td>{{ job.status }}</td>
+          <td>{{ job.actor or "-" }}</td>
+          <td>{{ job.created_at }}</td>
+        </tr>
+        {% endfor %}
+      </tbody>
+    </table>
+  </section>
+
+  <section class="card">
+    <h2>Audit</h2>
+    <table>
+      <thead>
+        <tr><th>Actor</th><th>Action</th><th>Entity</th><th>Time</th></tr>
+      </thead>
+      <tbody>
+        {% for event in audits %}
+        <tr>
+          <td>{{ event.actor or "-" }}</td>
+          <td>{{ event.action }}</td>
+          <td>{{ event.entity_type }} / {{ event.entity_id }}</td>
+          <td>{{ event.created_at }}</td>
+        </tr>
+        {% endfor %}
+      </tbody>
+    </table>
+  </section>
+</div>
+{% endblock %}
--- a/infrastructure/ops_control_plane/app/ops_control_plane/templates/login.html
+++ b/infrastructure/ops_control_plane/app/ops_control_plane/templates/login.html
@@ -0,0 +1,14 @@
+{% extends "base.html" %}
+{% block content %}
+<section class="card narrow">
+  <p class="eyebrow">Private Surface</p>
+  <h2>Login</h2>
+  <p class="muted">Use your Desineuron operator account.</p>
+  {% if error %}<p class="error">{{ error }}</p>{% endif %}
+  <form method="post" action="/login" class="stack">
+    <label>Email or username <input type="text" name="username" required></label>
+    <label>Password <input type="password" name="password" required></label>
+    <button type="submit">Enter Ops Console</button>
+  </form>
+</section>
+{% endblock %}
--- a/infrastructure/ops_control_plane/app/ops_control_plane/worker.py
+++ b/infrastructure/ops_control_plane/app/ops_control_plane/worker.py
@@ -0,0 +1,50 @@
+from __future__ import annotations
+
+import time
+from datetime import datetime, timedelta, timezone
+
+from sqlalchemy import select
+
+from .aws_control import latest_market_price, refresh_market_snapshots, sync_instances, upsert_session_cost
+from .database import Base, engine, session_scope
+from .models import Machine, MachineProfile, Session as RuntimeSession
+from .seed import seed_defaults
+
+
+def run_worker() -> None:
+    Base.metadata.create_all(bind=engine)
+    last_market_refresh: datetime | None = None
+    while True:
+        with session_scope() as db:
+            seed_defaults(db)
+            profiles = db.scalars(select(MachineProfile)).all()
+            sync_instances(db, {profile.region for profile in profiles})
+            running_machines = db.scalars(select(Machine).where(Machine.state == "running")).all()
+            for machine in running_machines:
+                active_session = db.scalar(
+                    select(RuntimeSession).where(RuntimeSession.machine_id == machine.id, RuntimeSession.status == "active")
+                )
+                if not active_session:
+                    db.add(
+                        RuntimeSession(
+                            machine_id=machine.id,
+                            actor="system-import",
+                            workload_name=machine.profile_name or machine.instance_type,
+                            status="active",
+                            notes="Imported from existing running machine state",
+                        )
+                    )
+            if last_market_refresh is None or datetime.now(timezone.utc) - last_market_refresh > timedelta(minutes=15):
+                refresh_market_snapshots(db, {profile.region for profile in profiles}, profiles)
+                last_market_refresh = datetime.now(timezone.utc)
+            sessions = db.scalars(select(RuntimeSession).where(RuntimeSession.status == "active")).all()
+            for session_row in sessions:
+                if session_row.machine_id:
+                    machine = db.get(Machine, session_row.machine_id)
+                    if machine:
+                        upsert_session_cost(db, session_row, machine)
+        time.sleep(60)
+
+
+if __name__ == "__main__":
+    run_worker()
--- a/infrastructure/ops_control_plane/app/requirements.txt
+++ b/infrastructure/ops_control_plane/app/requirements.txt
@@ -0,0 +1,13 @@
+fastapi==0.116.1
+uvicorn[standard]==0.35.0
+sqlalchemy==2.0.43
+psycopg[binary]==3.2.10
+jinja2==3.1.6
+python-multipart==0.0.20
+itsdangerous==2.2.0
+passlib[bcrypt]==1.7.4
+boto3==1.40.35
+httpx==0.28.1
+typer==0.16.1
+python-dateutil==2.9.0.post0
+
--- a/infrastructure/ops_control_plane/docker-compose.yml
+++ b/infrastructure/ops_control_plane/docker-compose.yml
@@ -0,0 +1,58 @@
+services:
+  ops-db:
+    image: postgres:16-alpine
+    container_name: desineuron-ops-db
+    environment:
+      POSTGRES_DB: ${OPS_DB_NAME}
+      POSTGRES_USER: ${OPS_DB_USER}
+      POSTGRES_PASSWORD: ${OPS_DB_PASSWORD}
+    ports:
+      - "5435:5432"
+    volumes:
+      - ./data/postgres:/var/lib/postgresql/data
+    healthcheck:
+      test: ["CMD-SHELL", "pg_isready -U ${OPS_DB_USER} -d ${OPS_DB_NAME}"]
+      interval: 10s
+      timeout: 5s
+      retries: 10
+    restart: unless-stopped
+
+  ops-api:
+    build:
+      context: ./app
+    container_name: desineuron-ops-api
+    command: ["python", "-m", "ops_control_plane.main"]
+    env_file:
+      - .env
+    environment:
+      OPS_ROLE: api
+    ports:
+      - "18765:8080"
+    depends_on:
+      ops-db:
+        condition: service_healthy
+    volumes:
+      - ./exports:/app/exports
+      - ./logs:/app/logs
+      - ./state:/app/state
+      - ${OPS_MODEL_LIBRARY_HOST_PATH:-/mnt/ServerStorage/ai-models/models}:/model-library:ro
+    restart: unless-stopped
+
+  ops-worker:
+    build:
+      context: ./app
+    container_name: desineuron-ops-worker
+    command: ["python", "-m", "ops_control_plane.worker"]
+    env_file:
+      - .env
+    environment:
+      OPS_ROLE: worker
+    depends_on:
+      ops-db:
+        condition: service_healthy
+    volumes:
+      - ./exports:/app/exports
+      - ./logs:/app/logs
+      - ./state:/app/state
+      - ${OPS_MODEL_LIBRARY_HOST_PATH:-/mnt/ServerStorage/ai-models/models}:/model-library:ro
+    restart: unless-stopped
--- a/infrastructure/ops_control_plane/install_ingress_route_helper.sh
+++ b/infrastructure/ops_control_plane/install_ingress_route_helper.sh
@@ -0,0 +1,9 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+sudo mkdir -p /etc/caddy/managed
+sudo install -m 0755 /tmp/manage_desineuron_routes.py /usr/local/bin/manage_desineuron_routes.py
+sudo install -m 0644 /tmp/desineuron_ingress_Caddyfile /etc/caddy/Caddyfile
+sudo python3 /usr/local/bin/manage_desineuron_routes.py list >/dev/null
+sudo caddy validate --config /etc/caddy/Caddyfile
+sudo systemctl reload caddy
--- a/infrastructure/ops_control_plane/install_linux_ops_control_plane.sh
+++ b/infrastructure/ops_control_plane/install_linux_ops_control_plane.sh
@@ -0,0 +1,52 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+TARGET_ROOT=/opt/desineuron-ops-control-plane
+SERVICE_FILE=/etc/systemd/system/desineuron-ops-control-plane.service
+
+sudo mkdir -p "$TARGET_ROOT"
+sudo mkdir -p "$TARGET_ROOT/data/postgres" "$TARGET_ROOT/exports" "$TARGET_ROOT/logs" "$TARGET_ROOT/state"
+sudo rsync -a \
+  --exclude '.env' \
+  --exclude 'data/' \
+  --exclude 'exports/' \
+  --exclude 'logs/' \
+  --exclude 'state/' \
+  /tmp/desineuron_ops_control_plane/ "$TARGET_ROOT/"
+sudo chown -R "$USER:$USER" "$TARGET_ROOT"
+
+if [[ ! -f "$TARGET_ROOT/.env" ]]; then
+  cp "$TARGET_ROOT/.env.example" "$TARGET_ROOT/.env"
+fi
+chmod 600 "$TARGET_ROOT/.env"
+
+if [[ ! -f "$TARGET_ROOT/state/desineuron-l4-node.pem" ]]; then
+  echo "Missing $TARGET_ROOT/state/desineuron-l4-node.pem" >&2
+  exit 1
+fi
+chmod 600 "$TARGET_ROOT/state/desineuron-l4-node.pem"
+
+mkdir -p "$TARGET_ROOT/data/postgres" "$TARGET_ROOT/exports" "$TARGET_ROOT/logs" "$TARGET_ROOT/state"
+sudo chown -R 999:999 "$TARGET_ROOT/data/postgres" || true
+
+sudo tee "$SERVICE_FILE" >/dev/null <<EOF
+[Unit]
+Description=Desineuron Ops Control Plane
+After=docker.service network-online.target
+Requires=docker.service
+
+[Service]
+Type=oneshot
+RemainAfterExit=yes
+WorkingDirectory=$TARGET_ROOT
+ExecStart=/usr/bin/docker compose up -d --build
+ExecStop=/usr/bin/docker compose down
+TimeoutStartSec=0
+
+[Install]
+WantedBy=multi-user.target
+EOF
+
+sudo systemctl daemon-reload
+sudo systemctl enable --now desineuron-ops-control-plane.service
+sudo systemctl --no-pager --full status desineuron-ops-control-plane.service
--- a/infrastructure/ops_control_plane/install_linux_ops_nginx_route.sh
+++ b/infrastructure/ops_control_plane/install_linux_ops_nginx_route.sh
@@ -0,0 +1,37 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+TARGET=/etc/nginx/sites-available/desineuron-ops-control-plane.conf
+LINK=/etc/nginx/sites-enabled/desineuron-ops-control-plane.conf
+
+sudo tee "$TARGET" >/dev/null <<'EOF'
+server {
+    listen 443 ssl http2;
+    listen [::]:443 ssl http2;
+    server_name ops.desineuron.in;
+
+    ssl_certificate /etc/letsencrypt/live/desineuron-infra/fullchain.pem;
+    ssl_certificate_key /etc/letsencrypt/live/desineuron-infra/privkey.pem;
+    ssl_protocols TLSv1.2 TLSv1.3;
+    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
+
+    client_max_body_size 128m;
+
+    location / {
+        proxy_pass http://127.0.0.1:18765;
+        proxy_http_version 1.1;
+        proxy_set_header Host $host;
+        proxy_set_header X-Real-IP $remote_addr;
+        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+        proxy_set_header X-Forwarded-Proto $scheme;
+        proxy_set_header Upgrade $http_upgrade;
+        proxy_set_header Connection "upgrade";
+        proxy_read_timeout 3600;
+        proxy_send_timeout 3600;
+    }
+}
+EOF
+
+sudo ln -sf "$TARGET" "$LINK"
+sudo nginx -t
+sudo systemctl reload nginx
--- a/infrastructure/ops_control_plane/manage_desineuron_routes.py
+++ b/infrastructure/ops_control_plane/manage_desineuron_routes.py
@@ -0,0 +1,76 @@
+#!/usr/bin/env python3
+from __future__ import annotations
+
+import json
+import sys
+from pathlib import Path
+
+
+STATE_FILE = Path("/etc/caddy/managed/desineuron-routes.json")
+SNIPPET_FILE = Path("/etc/caddy/managed/desineuron-routes.caddy")
+
+
+def load_routes() -> dict[str, dict]:
+    if STATE_FILE.exists():
+        return json.loads(STATE_FILE.read_text(encoding="utf-8"))
+    return {}
+
+
+def save_routes(routes: dict[str, dict]) -> None:
+    STATE_FILE.parent.mkdir(parents=True, exist_ok=True)
+    STATE_FILE.write_text(json.dumps(routes, indent=2), encoding="utf-8")
+
+
+def render_routes(routes: dict[str, dict]) -> None:
+    lines: list[str] = []
+    for hostname, route in sorted(routes.items()):
+        lines.extend(
+            [
+                f"{hostname} {{",
+                "\tlog {",
+                "\t\toutput file /var/log/caddy/access.log",
+                "\t\tformat json",
+                "\t}",
+                f"\treverse_proxy {route['scheme']}://{route['target_host']}:{route['target_port']} {{",
+                "\t\theader_up Host {host}",
+                "\t\theader_up X-Forwarded-Host {host}",
+                "\t\theader_up X-Forwarded-Proto {scheme}",
+                "\t\theader_up X-Forwarded-For {remote_host}",
+                "\t}",
+                "}",
+                "",
+            ]
+        )
+    SNIPPET_FILE.write_text("\n".join(lines).rstrip() + "\n", encoding="utf-8")
+
+
+def main() -> int:
+    if len(sys.argv) < 2:
+        print("usage: manage_desineuron_routes.py <upsert|delete|list> [payload|hostname]")
+        return 1
+    command = sys.argv[1]
+    routes = load_routes()
+    if command == "upsert":
+        payload = json.loads(sys.argv[2])
+        routes[payload["hostname"]] = payload
+        save_routes(routes)
+        render_routes(routes)
+        print(json.dumps({"status": "ok", "action": "upsert", "hostname": payload["hostname"]}))
+        return 0
+    if command == "delete":
+        hostname = sys.argv[2]
+        routes.pop(hostname, None)
+        save_routes(routes)
+        render_routes(routes)
+        print(json.dumps({"status": "ok", "action": "delete", "hostname": hostname}))
+        return 0
+    if command == "list":
+        print(json.dumps(routes, indent=2))
+        return 0
+    print(f"unknown command: {command}")
+    return 1
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
+