forked from sagnik/Project_Velocity
Co-authored-by: Sagnik <sagnik7896@gmail.com> Reviewed-on: sagnik/Project_Velocity#19
592 lines
21 KiB
Markdown
592 lines
21 KiB
Markdown
## Desineuron Stable Ingress Handoff
|
|
|
|
Date: 2026-04-08
|
|
|
|
### Chapters
|
|
|
|
1. Outcome
|
|
2. Final Architecture
|
|
3. AWS Resources
|
|
4. Linux Origin State
|
|
5. Migration Changes Applied
|
|
6. Validation Results
|
|
7. ComfyUI Recovery and GPU Route
|
|
8. Files and Config Artifacts
|
|
9. Dynamic Home IP Sync
|
|
10. Operational Commands
|
|
11. Future Service Mapping Runbook
|
|
12. Security Notes
|
|
13. Remaining Improvement Ideas
|
|
14. Rollback
|
|
15. Team Summary
|
|
16. Current Status Snapshot - 2026-04-12
|
|
17. Linux Ops Control Plane
|
|
|
|
### Outcome
|
|
|
|
The Cloudflare Tunnel dependency for the six public `desineuron.in` services has been replaced with a self-hosted AWS ingress layer:
|
|
|
|
- Public edge: AWS EC2 `t4g.micro`
|
|
- Stable public IP: `98.87.120.120`
|
|
- TLS termination: `Caddy` on the ingress node
|
|
- Private backend relay: `rathole`
|
|
- Origin: Linux box at `192.168.1.4`
|
|
- DNS: Cloudflare, `DNS only`
|
|
|
|
Public hostnames now route through AWS instead of Cloudflare Tunnel:
|
|
|
|
- `office.desineuron.in`
|
|
- `git.desineuron.in`
|
|
- `cloud.desineuron.in`
|
|
- `projects.desineuron.in`
|
|
- `talk.desineuron.in`
|
|
- `vpn.desineuron.in`
|
|
- `comfy.desineuron.in` (ingress route created for AWS GPU ComfyUI)
|
|
- `ops.desineuron.in` (private operator control surface on the Linux box)
|
|
|
|
### Final Architecture
|
|
|
|
```text
|
|
Internet
|
|
-> Cloudflare DNS
|
|
-> 98.87.120.120
|
|
-> EC2 ingress: desineuron-ingress-01
|
|
-> Caddy :443
|
|
-> rathole server (control on 2333, local relay on 127.0.0.1:8443)
|
|
-> Linux origin tunnel client
|
|
-> Linux nginx :443
|
|
-> per-host upstream routing
|
|
-> Gitea
|
|
-> Nextcloud
|
|
-> Taiga
|
|
-> OnlyOffice
|
|
-> NetBird
|
|
-> comfy.desineuron.in
|
|
-> EC2 ingress Caddy
|
|
-> private proxy to AWS GPU box `172.31.46.190:8188`
|
|
-> ComfyUI endpoints on systemd-managed GPU service
|
|
```
|
|
|
|
### AWS Resources
|
|
|
|
- Instance name: `desineuron-ingress-01`
|
|
- Instance ID: `i-094df09acafb72494`
|
|
- Type: `t4g.micro`
|
|
- Region: `us-east-1`
|
|
- Subnet: `subnet-03d684ed15f327151`
|
|
- VPC: `vpc-081d2397920aad268`
|
|
- Root disk: `20 GB gp3`
|
|
- Elastic IP: `98.87.120.120`
|
|
- IAM role: `desineuron-ingress-role`
|
|
- Instance profile: `desineuron-ingress-profile`
|
|
- Security group: `sg-0721b8b48e12c531d`
|
|
|
|
Current GPU worker:
|
|
|
|
- Instance ID: `i-0e4eab5fe67cf9abe`
|
|
- Type: `g6.12xlarge`
|
|
- Region: `us-east-1`
|
|
- Private IP: `172.31.46.190`
|
|
- Current public IP: `100.31.64.121`
|
|
- Launch time: `2026-04-11T06:14:04Z`
|
|
|
|
Open ingress ports:
|
|
|
|
- `80/tcp` from internet
|
|
- `443/tcp` from internet
|
|
- `22/tcp` restricted to the current home public IP and auto-synced from the Linux origin
|
|
- `2333/tcp` from internet for `rathole` control and data relay
|
|
|
|
GPU node security posture for ComfyUI:
|
|
|
|
- public `8118/tcp` removed
|
|
- public `8188/tcp` removed
|
|
- `8188/tcp` now allowed only from ingress security group `sg-0721b8b48e12c531d`
|
|
|
|
### Linux Origin State
|
|
|
|
Services exposed to local nginx:
|
|
|
|
- `git.desineuron.in` -> `127.0.0.1:3000` (`gitea`)
|
|
- `cloud.desineuron.in` -> `127.0.0.1:11000` (`nextcloud_app`)
|
|
- `talk.desineuron.in` -> `127.0.0.1:11000` (`nextcloud_app`, Talk-focused hostname)
|
|
- `projects.desineuron.in` -> `127.0.0.1:9100` (`taiga-gateway`)
|
|
- `office.desineuron.in` -> `127.0.0.1:9980` (`nextcloud_onlyoffice`)
|
|
- `vpn.desineuron.in` -> `127.0.0.1:8080` / `127.0.0.1:8081` (`netbird`)
|
|
|
|
Tunnel state:
|
|
|
|
- `rathole-client.service` active on Linux
|
|
- `rathole-server.service` active on AWS
|
|
- `cloudflared` inactive on Linux
|
|
|
|
### Migration Changes Applied
|
|
|
|
#### Cloudflare
|
|
|
|
Old CNAME tunnel records were removed for the six public hostnames.
|
|
|
|
New records were created:
|
|
|
|
- Type: `A`
|
|
- Value: `98.87.120.120`
|
|
- Proxy status: `DNS only`
|
|
- TTL: `300`
|
|
|
|
#### AWS Ingress
|
|
|
|
Installed and configured:
|
|
|
|
- `Caddy`
|
|
- `rathole`
|
|
- `amazon-ssm-agent`
|
|
- Linux-driven SSH allowlist sync for the ingress node
|
|
|
|
TLS:
|
|
|
|
- Existing valid certificate/key pair from the Linux origin was copied to the ingress node.
|
|
- Caddy now terminates HTTPS at the edge.
|
|
|
|
#### Linux Origin
|
|
|
|
nginx was already routing by hostname and remains the origin router.
|
|
|
|
Nextcloud was adjusted so `talk.desineuron.in` no longer canonicalizes to `cloud.desineuron.in`:
|
|
|
|
- removed `overwritehost` pin
|
|
- added `talk.desineuron.in` to trusted domains
|
|
- restarted `nextcloud_app`
|
|
|
|
### Validation Results
|
|
|
|
Public hostname checks through the new ingress:
|
|
|
|
- `office.desineuron.in` -> `200 /welcome/`
|
|
- `git.desineuron.in` -> `200`
|
|
- `cloud.desineuron.in` -> `200 /login`
|
|
- `projects.desineuron.in` -> `200`
|
|
- `talk.desineuron.in` -> `200 /login` on `talk.desineuron.in`
|
|
- `vpn.desineuron.in` -> `200`
|
|
- `ops.desineuron.in/login` -> `200`
|
|
- `comfy.desineuron.in` -> `200`
|
|
|
|
Important note:
|
|
|
|
- `talk.desineuron.in` now stays on the `talk` hostname.
|
|
- It is still backed by the same Nextcloud origin and presents the Nextcloud login flow, which is expected given the current Linux-side app layout.
|
|
|
|
### ComfyUI Recovery and GPU Route
|
|
|
|
Root cause of the earlier `502`:
|
|
|
|
- ingress route and TLS were correct
|
|
- the GPU spot node had lost the actual `/opt/dlami/nvme/ComfyUI` app tree
|
|
- nothing was listening on `172.31.46.190:8188`
|
|
|
|
Permanent fix applied:
|
|
|
|
- restored `/opt/dlami/nvme/ComfyUI` from upstream source control
|
|
- installed ComfyUI Python requirements on the GPU node
|
|
- created `systemd` unit `comfyui.service`
|
|
- enabled `comfyui.service` at boot with automatic restart
|
|
- kept `comfy.desineuron.in` mapped through ingress Caddy
|
|
- removed direct public access to `8118` and `8188`
|
|
- allowed `8188` only from ingress security group
|
|
|
|
Current live path:
|
|
|
|
- `https://comfy.desineuron.in`
|
|
-> ingress `98.87.120.120`
|
|
-> Caddy reverse proxy
|
|
-> GPU private IP `172.31.46.190:8188`
|
|
-> `comfyui.service`
|
|
|
|
Current public result:
|
|
|
|
- `comfy.desineuron.in` currently returns `200 OK`
|
|
- ingress route is now managed dynamically instead of hardcoded to one GPU private IP
|
|
|
|
Current GPU service:
|
|
|
|
- `comfyui.service`
|
|
- app path: `/opt/dlami/nvme/ComfyUI`
|
|
- log path: `/var/log/comfyui/service.log`
|
|
- port: `8188/tcp`
|
|
|
|
Current backend state on `2026-04-12`:
|
|
|
|
- `comfyui.service` is `active`
|
|
- `main.py` is present under `/opt/dlami/nvme/ComfyUI`
|
|
- the process is listening on `0.0.0.0:8188`
|
|
- the public ingress path is healthy again
|
|
|
|
Auto-healing fix applied:
|
|
|
|
- ComfyUI `systemd` service now runs an `ExecStartPre` recovery script at `/usr/local/bin/desineuron-ensure-comfyui.sh`
|
|
- that script reclones/repairs `/opt/dlami/nvme/ComfyUI` if the tree is missing or damaged
|
|
- Linux now runs `desineuron-comfy-route-sync.timer`
|
|
- the timer updates the managed Caddy route for `comfy.desineuron.in` to the current private IP of the AWS instance tagged `DesineuronRole=comfyui`
|
|
- this protects the public route from GPU instance IP drift without manual Caddy edits
|
|
|
|
Expected endpoints:
|
|
|
|
- `https://comfy.desineuron.in/`
|
|
- `https://comfy.desineuron.in/prompt`
|
|
- `https://comfy.desineuron.in/history/{prompt_id}`
|
|
- `https://comfy.desineuron.in/queue`
|
|
- `https://comfy.desineuron.in/upload/image`
|
|
|
|
### Files and Config Artifacts
|
|
|
|
Infrastructure artifacts in repo:
|
|
|
|
- [README.md](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/README.md)
|
|
- [Caddyfile](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/Caddyfile)
|
|
- [rathole-server.toml](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/rathole-server.toml)
|
|
- [rathole-client.toml](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/rathole-client.toml)
|
|
- [install_linux_rathole_client.sh](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/install_linux_rathole_client.sh)
|
|
- [user_data.sh](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/user_data.sh)
|
|
- [install_gpu_comfyui_service.sh](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/install_gpu_comfyui_service.sh)
|
|
- [map_gpu_comfy_security.ps1](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/map_gpu_comfy_security.ps1)
|
|
- [sync_ingress_home_ip.py](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/sync_ingress_home_ip.py)
|
|
- [desineuron-ingress-home-ip-sync.service](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/desineuron-ingress-home-ip-sync.service)
|
|
- [desineuron-ingress-home-ip-sync.timer](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/desineuron-ingress-home-ip-sync.timer)
|
|
- [install_linux_ingress_ip_sync.sh](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/install_linux_ingress_ip_sync.sh)
|
|
- [sync_comfy_route.py](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/sync_comfy_route.py)
|
|
- [desineuron-comfy-route-sync.service](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/desineuron-comfy-route-sync.service)
|
|
- [desineuron-comfy-route-sync.timer](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/desineuron-comfy-route-sync.timer)
|
|
- [install_linux_comfy_route_sync.sh](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/install_linux_comfy_route_sync.sh)
|
|
- [README.md](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/ops_control_plane/README.md)
|
|
- [Desineuron Ops Control Plane Bibel.md](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/.Agent%20Context/Bibels/Desineuron%20Ops%20Control%20Plane%20Bibel.md)
|
|
|
|
Linux origin files touched:
|
|
|
|
- `/etc/nginx/sites-enabled/desineuron.conf`
|
|
- `/mnt/ServerStorage/docker_apps/nextcloud/.env`
|
|
- `/mnt/ServerStorage/docker_apps/nextcloud/data/config/config.php`
|
|
- `/mnt/ServerStorage/docker_apps/nextcloud/data/config/reverse-proxy.config.php`
|
|
|
|
Backups created on Linux:
|
|
|
|
- `/mnt/ServerStorage/docker_apps/nextcloud/.env.pre_ingress_backup_2026-04-08`
|
|
- `/mnt/ServerStorage/docker_apps/nextcloud/data/config/reverse-proxy.config.php.pre_ingress_backup_2026-04-08`
|
|
|
|
### Dynamic Home IP Sync
|
|
|
|
Purpose:
|
|
|
|
- Keep ingress `22/tcp` restricted to the current Airtel public IP even when the ISP changes it
|
|
- Prevent future manual outages for SSH fallback caused by stale home-IP security-group rules
|
|
|
|
Design:
|
|
|
|
- Linux origin runs `desineuron-ingress-home-ip-sync.timer`
|
|
- Timer fires on boot and every 5 minutes
|
|
- Service resolves the current home public IP via `https://api.ipify.org`
|
|
- Service updates only the ingress security group `sg-0721b8b48e12c531d`
|
|
- Only the SSH fallback rule is mutated
|
|
- `rathole` is no longer dependent on the Airtel IP because `2333/tcp` remains open on the ingress
|
|
|
|
Installed Linux paths:
|
|
|
|
- `/usr/local/bin/sync_ingress_home_ip.py`
|
|
- `/etc/systemd/system/desineuron-ingress-home-ip-sync.service`
|
|
- `/etc/systemd/system/desineuron-ingress-home-ip-sync.timer`
|
|
- `/etc/desineuron-ingress-home-ip-sync.env`
|
|
- `/opt/desineuron-ingress-ip-sync/.venv`
|
|
- `/var/lib/desineuron-ingress-ip-sync/current_ip.txt`
|
|
|
|
Current state:
|
|
|
|
- Timer: enabled and active
|
|
- Last recorded home public IP: `223.185.28.89`
|
|
- Ingress SSH rule CIDR: `223.185.28.89/32`
|
|
|
|
### Dynamic Comfy Route Sync
|
|
|
|
Purpose:
|
|
|
|
- keep `comfy.desineuron.in` mapped to the correct AWS GPU private IP even if the GPU instance public/private IP changes
|
|
- remove the need to hand-edit `/etc/caddy/Caddyfile` for ComfyUI moves
|
|
|
|
Design:
|
|
|
|
- Linux runs `desineuron-comfy-route-sync.timer`
|
|
- timer fires on boot and every 2 minutes
|
|
- service looks for the newest running EC2 instance tagged `DesineuronRole=comfyui`
|
|
- service reads its current private IP
|
|
- service connects to the ingress node and updates the managed Caddy route with `/usr/local/bin/manage_desineuron_routes.py`
|
|
- Caddy is validated and reloaded only after a successful route update
|
|
|
|
Installed Linux paths:
|
|
|
|
- `/usr/local/bin/sync_comfy_route.py`
|
|
- `/etc/systemd/system/desineuron-comfy-route-sync.service`
|
|
- `/etc/systemd/system/desineuron-comfy-route-sync.timer`
|
|
- `/etc/desineuron-comfy-route-sync.env`
|
|
- `/opt/desineuron-comfy-route-sync/.venv`
|
|
- `/var/lib/desineuron-comfy-route-sync/current_target.txt`
|
|
|
|
Current state:
|
|
|
|
- Timer: enabled and active
|
|
- Current synced target: `172.31.46.190`
|
|
- Current target instance tag: `DesineuronRole=comfyui`
|
|
|
|
### Operational Commands
|
|
|
|
Check AWS ingress status:
|
|
|
|
```powershell
|
|
aws ec2 describe-instances --instance-ids i-094df09acafb72494 --region us-east-1
|
|
aws ec2 describe-addresses --allocation-ids eipalloc-0d54fc0f827450e7b --region us-east-1
|
|
```
|
|
|
|
Check ingress services:
|
|
|
|
```powershell
|
|
aws ssm send-command --region us-east-1 --instance-ids i-094df09acafb72494 --document-name AWS-RunShellScript --parameters commands="sudo systemctl status caddy rathole-server --no-pager"
|
|
```
|
|
|
|
Check GPU ComfyUI service:
|
|
|
|
```powershell
|
|
aws ssm send-command --region us-east-1 --instance-ids i-0e4eab5fe67cf9abe --document-name AWS-RunShellScript --parameters commands="sudo systemctl status comfyui --no-pager","ss -ltnp | grep 8188 || true","tail -n 40 /var/log/comfyui/service.log || true"
|
|
```
|
|
|
|
Check Linux origin services:
|
|
|
|
```powershell
|
|
ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S systemctl status rathole-client nginx"
|
|
ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S systemctl status desineuron-ingress-home-ip-sync.service desineuron-ingress-home-ip-sync.timer"
|
|
ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S journalctl -u desineuron-ingress-home-ip-sync -n 50 --no-pager"
|
|
ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S systemctl status desineuron-ops-control-plane.service --no-pager"
|
|
ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S docker compose -f /opt/desineuron-ops-control-plane/docker-compose.yml ps"
|
|
ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S systemctl status desineuron-comfy-route-sync.service desineuron-comfy-route-sync.timer --no-pager"
|
|
```
|
|
|
|
Public endpoint validation:
|
|
|
|
```powershell
|
|
curl.exe -I https://office.desineuron.in
|
|
curl.exe -I https://git.desineuron.in
|
|
curl.exe -I https://cloud.desineuron.in
|
|
curl.exe -I https://projects.desineuron.in
|
|
curl.exe -I https://talk.desineuron.in
|
|
curl.exe -I https://vpn.desineuron.in
|
|
curl.exe -I https://comfy.desineuron.in
|
|
curl.exe -I https://ops.desineuron.in/login
|
|
```
|
|
|
|
### Future Service Mapping Runbook
|
|
|
|
Use this pattern for any future public service behind the stable ingress layer.
|
|
|
|
1. Decide the backend location.
|
|
|
|
- Linux origin behind `rathole`
|
|
- AWS GPU/private EC2 node
|
|
- another private backend later
|
|
|
|
2. Decide whether the service should terminate TLS at ingress.
|
|
|
|
- default: yes
|
|
- Caddy on ingress should own the public hostname and certificate
|
|
|
|
3. Create the DNS record in Cloudflare.
|
|
|
|
- type: `A`
|
|
- value: `98.87.120.120`
|
|
- proxy mode: `DNS only`
|
|
- low TTL during rollout
|
|
|
|
4. Add the ingress route in [`Caddyfile`](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/Caddyfile).
|
|
|
|
Patterns:
|
|
|
|
- Linux-origin service:
|
|
- proxy to `https://127.0.0.1:8443`
|
|
- preserve `Host`
|
|
- private AWS backend service:
|
|
- proxy to `http://<private-ip>:<port>` or `https://<private-ip>:<port>`
|
|
|
|
5. Restrict backend network access.
|
|
|
|
- never leave backend app ports open to `0.0.0.0/0` unless absolutely necessary
|
|
- prefer security-group rule allowing traffic only from ingress security group
|
|
- for home-origin services, keep them private behind `rathole`
|
|
|
|
6. Reload ingress.
|
|
|
|
```powershell
|
|
ssh -i "F:\Workin In Progress\DESINEURON\GITLAB\Project_Velocity\desineuron-l4-node.pem" ec2-user@98.87.120.120 "sudo caddy validate --config /etc/caddy/Caddyfile && sudo systemctl reload caddy"
|
|
```
|
|
|
|
7. Validate TLS and app response.
|
|
|
|
- check certificate subject matches hostname
|
|
- check `curl -I https://<host>`
|
|
- check login page or health endpoint
|
|
- check browser behavior
|
|
|
|
8. If the backend is stateful, create a persistent service.
|
|
|
|
- prefer `systemd`
|
|
- enable restart on failure
|
|
- log to a stable path
|
|
- record service name, working directory, ports, and restart policy in this handoff doc
|
|
|
|
9. Update team docs immediately.
|
|
|
|
- hostname
|
|
- DNS record type
|
|
- ingress route target
|
|
- backend service owner
|
|
- service name
|
|
- health check command
|
|
- rollback step
|
|
|
|
### Security Notes
|
|
|
|
- Public traffic terminates only at the AWS edge.
|
|
- The Linux box no longer needs Cloudflare Tunnel for these six routes.
|
|
- The Linux origin is reached through an outbound tunnel, not by directly exposing the home machine to the public for app traffic.
|
|
- SSH on the Linux box remains key-only.
|
|
- The AWS ingress IAM role is limited to SSM core.
|
|
- ComfyUI is no longer directly exposed on the GPU public IP; only the ingress layer can reach `8188`.
|
|
- Ingress `22/tcp` stays restricted and is now auto-synced from the Linux origin.
|
|
- Ingress `2333/tcp` is intentionally open so `rathole` survives Airtel IP changes without operator action.
|
|
|
|
### Remaining Improvement Ideas
|
|
|
|
- Move the Linux nginx certificate issuance/renewal model to the AWS edge permanently instead of copying an existing certificate.
|
|
- Clean up nginx warnings about duplicated protocol options.
|
|
- Separate `talk.desineuron.in` more fully from general Nextcloud if a distinct Talk-only UX is desired.
|
|
- Add authentication in front of `comfy.desineuron.in`; internet scanners started hitting the route immediately after it went live.
|
|
- Consider putting Basic Auth or an allowlist in front of `comfy.desineuron.in` before broader team rollout.
|
|
- Add monitoring and alerting on:
|
|
- `caddy`
|
|
- `rathole-server`
|
|
- `rathole-client`
|
|
- public HTTPS checks
|
|
- Add infrastructure-as-code for the EC2 ingress node if this should be reproducible by the team without manual AWS CLI steps.
|
|
|
|
### Rollback
|
|
|
|
If rollback is needed:
|
|
|
|
1. Recreate Cloudflare CNAME/tunnel routes or repoint the DNS records away from `98.87.120.120`.
|
|
2. Stop `caddy` and `rathole-server` on AWS.
|
|
3. Stop `rathole-client` on Linux.
|
|
4. Restore Nextcloud files from:
|
|
- `.env.pre_ingress_backup_2026-04-08`
|
|
- `reverse-proxy.config.php.pre_ingress_backup_2026-04-08`
|
|
5. Restart `nextcloud_app` and nginx.
|
|
|
|
### Team Summary
|
|
|
|
This migration is complete.
|
|
|
|
Cloudflare Tunnel is no longer the production path for the six public service hostnames. The stable production ingress is now the AWS `t4g.micro` node with Elastic IP `98.87.120.120`, and the Linux machine remains the private origin behind `rathole`.
|
|
|
|
Additional mapped route:
|
|
|
|
- `comfy.desineuron.in` now terminates on the same stable ingress and forwards to the GPU node's private address `172.31.46.190:8188`.
|
|
- No further DNS change is needed for ComfyUI.
|
|
- The backend is supervised by `systemd` and currently healthy.
|
|
- The route is now auto-synced from Linux based on the tagged AWS ComfyUI worker, so future IP changes do not require manual ingress edits.
|
|
- The team can use:
|
|
- `https://comfy.desineuron.in/prompt`
|
|
- `https://comfy.desineuron.in/history/{prompt_id}`
|
|
- `https://comfy.desineuron.in/queue`
|
|
- `https://comfy.desineuron.in/upload/image`
|
|
|
|
### Current Status Snapshot - 2026-04-12
|
|
|
|
Live public service state:
|
|
|
|
- `office.desineuron.in` -> `200`
|
|
- `git.desineuron.in` -> `200`
|
|
- `cloud.desineuron.in` -> `200`
|
|
- `projects.desineuron.in` -> `200`
|
|
- `talk.desineuron.in` -> `200`
|
|
- `vpn.desineuron.in` -> `200`
|
|
- `ops.desineuron.in/login` -> `200`
|
|
- `comfy.desineuron.in` -> `200`
|
|
|
|
Linux-origin health:
|
|
|
|
- `nginx.service` -> `active`
|
|
- `rathole-client.service` -> `active`
|
|
- `desineuron-ingress-home-ip-sync.timer` -> `active`
|
|
- `desineuron-ops-control-plane.service` -> `active`
|
|
|
|
Linux ops stack containers:
|
|
|
|
- `desineuron-ops-api` -> `Up`
|
|
- `desineuron-ops-db` -> `Up (healthy)`
|
|
- `desineuron-ops-worker` -> `Up`
|
|
|
|
Ingress health:
|
|
|
|
- `caddy` -> `active`
|
|
- `rathole-server` -> `active`
|
|
- `comfy.desineuron.in` Caddy route is present in `/etc/caddy/Caddyfile`
|
|
|
|
GPU ComfyUI state:
|
|
|
|
- `comfyui.service` -> `active`
|
|
- `main.py` present under `/opt/dlami/nvme/ComfyUI`
|
|
- listener present on `0.0.0.0:8188`
|
|
- public ingress path is healthy
|
|
|
|
Comfy auto-heal state:
|
|
|
|
- `desineuron-comfy-route-sync.timer` -> `active`
|
|
- synced target file -> `/var/lib/desineuron-comfy-route-sync/current_target.txt`
|
|
- current synced target -> `172.31.46.190`
|
|
|
|
### Linux Ops Control Plane
|
|
|
|
The Linux box now also hosts the private AWS control surface for the team.
|
|
|
|
Public operator URL:
|
|
|
|
- `https://ops.desineuron.in/login`
|
|
|
|
Purpose:
|
|
|
|
- launch/stop/terminate AWS machines
|
|
- view spot/on-demand market data
|
|
- track runtime and estimated cost
|
|
- ingest model directories from the Linux box into S3
|
|
- hydrate models from S3 to AWS GPU nodes
|
|
- manage ingress routes through the `t4g.micro`
|
|
- export session/cost CSVs
|
|
|
|
Linux runtime paths:
|
|
|
|
- stack root: `/opt/desineuron-ops-control-plane`
|
|
- env file: `/opt/desineuron-ops-control-plane/.env`
|
|
- exports: `/opt/desineuron-ops-control-plane/exports`
|
|
- state: `/opt/desineuron-ops-control-plane/state`
|
|
|
|
Canonical S3 bucket:
|
|
|
|
- `desineuron-ops-control-plane-819079556187-us-east-1`
|
|
|
|
Model library source on Linux:
|
|
|
|
- `/mnt/ServerStorage/ai-models/models`
|
|
|
|
Current operator accounts:
|
|
|
|
- `sagnik@desineuron.in`
|
|
- `sayan@desineuron.in`
|
|
- `sourik@desineuron.in`
|
|
|
|
Reference docs:
|
|
|
|
- [README.md](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/ops_control_plane/README.md)
|
|
- [Desineuron Ops Control Plane Bibel.md](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/.Agent%20Context/Bibels/Desineuron%20Ops%20Control%20Plane%20Bibel.md)
|