WebOS completion
This commit is contained in:
591
.Agent Context/Bibels/Desineuron Stable Ingress Handoff.md
Normal file
591
.Agent Context/Bibels/Desineuron Stable Ingress Handoff.md
Normal file
@@ -0,0 +1,591 @@
|
||||
## Desineuron Stable Ingress Handoff
|
||||
|
||||
Date: 2026-04-08
|
||||
|
||||
### Chapters
|
||||
|
||||
1. Outcome
|
||||
2. Final Architecture
|
||||
3. AWS Resources
|
||||
4. Linux Origin State
|
||||
5. Migration Changes Applied
|
||||
6. Validation Results
|
||||
7. ComfyUI Recovery and GPU Route
|
||||
8. Files and Config Artifacts
|
||||
9. Dynamic Home IP Sync
|
||||
10. Operational Commands
|
||||
11. Future Service Mapping Runbook
|
||||
12. Security Notes
|
||||
13. Remaining Improvement Ideas
|
||||
14. Rollback
|
||||
15. Team Summary
|
||||
16. Current Status Snapshot - 2026-04-12
|
||||
17. Linux Ops Control Plane
|
||||
|
||||
### Outcome
|
||||
|
||||
The Cloudflare Tunnel dependency for the six public `desineuron.in` services has been replaced with a self-hosted AWS ingress layer:
|
||||
|
||||
- Public edge: AWS EC2 `t4g.micro`
|
||||
- Stable public IP: `98.87.120.120`
|
||||
- TLS termination: `Caddy` on the ingress node
|
||||
- Private backend relay: `rathole`
|
||||
- Origin: Linux box at `192.168.1.4`
|
||||
- DNS: Cloudflare, `DNS only`
|
||||
|
||||
Public hostnames now route through AWS instead of Cloudflare Tunnel:
|
||||
|
||||
- `office.desineuron.in`
|
||||
- `git.desineuron.in`
|
||||
- `cloud.desineuron.in`
|
||||
- `projects.desineuron.in`
|
||||
- `talk.desineuron.in`
|
||||
- `vpn.desineuron.in`
|
||||
- `comfy.desineuron.in` (ingress route created for AWS GPU ComfyUI)
|
||||
- `ops.desineuron.in` (private operator control surface on the Linux box)
|
||||
|
||||
### Final Architecture
|
||||
|
||||
```text
|
||||
Internet
|
||||
-> Cloudflare DNS
|
||||
-> 98.87.120.120
|
||||
-> EC2 ingress: desineuron-ingress-01
|
||||
-> Caddy :443
|
||||
-> rathole server (control on 2333, local relay on 127.0.0.1:8443)
|
||||
-> Linux origin tunnel client
|
||||
-> Linux nginx :443
|
||||
-> per-host upstream routing
|
||||
-> Gitea
|
||||
-> Nextcloud
|
||||
-> Taiga
|
||||
-> OnlyOffice
|
||||
-> NetBird
|
||||
-> comfy.desineuron.in
|
||||
-> EC2 ingress Caddy
|
||||
-> private proxy to AWS GPU box `172.31.46.190:8188`
|
||||
-> ComfyUI endpoints on systemd-managed GPU service
|
||||
```
|
||||
|
||||
### AWS Resources
|
||||
|
||||
- Instance name: `desineuron-ingress-01`
|
||||
- Instance ID: `i-094df09acafb72494`
|
||||
- Type: `t4g.micro`
|
||||
- Region: `us-east-1`
|
||||
- Subnet: `subnet-03d684ed15f327151`
|
||||
- VPC: `vpc-081d2397920aad268`
|
||||
- Root disk: `20 GB gp3`
|
||||
- Elastic IP: `98.87.120.120`
|
||||
- IAM role: `desineuron-ingress-role`
|
||||
- Instance profile: `desineuron-ingress-profile`
|
||||
- Security group: `sg-0721b8b48e12c531d`
|
||||
|
||||
Current GPU worker:
|
||||
|
||||
- Instance ID: `i-0e4eab5fe67cf9abe`
|
||||
- Type: `g6.12xlarge`
|
||||
- Region: `us-east-1`
|
||||
- Private IP: `172.31.46.190`
|
||||
- Current public IP: `100.31.64.121`
|
||||
- Launch time: `2026-04-11T06:14:04Z`
|
||||
|
||||
Open ingress ports:
|
||||
|
||||
- `80/tcp` from internet
|
||||
- `443/tcp` from internet
|
||||
- `22/tcp` restricted to the current home public IP and auto-synced from the Linux origin
|
||||
- `2333/tcp` from internet for `rathole` control and data relay
|
||||
|
||||
GPU node security posture for ComfyUI:
|
||||
|
||||
- public `8118/tcp` removed
|
||||
- public `8188/tcp` removed
|
||||
- `8188/tcp` now allowed only from ingress security group `sg-0721b8b48e12c531d`
|
||||
|
||||
### Linux Origin State
|
||||
|
||||
Services exposed to local nginx:
|
||||
|
||||
- `git.desineuron.in` -> `127.0.0.1:3000` (`gitea`)
|
||||
- `cloud.desineuron.in` -> `127.0.0.1:11000` (`nextcloud_app`)
|
||||
- `talk.desineuron.in` -> `127.0.0.1:11000` (`nextcloud_app`, Talk-focused hostname)
|
||||
- `projects.desineuron.in` -> `127.0.0.1:9100` (`taiga-gateway`)
|
||||
- `office.desineuron.in` -> `127.0.0.1:9980` (`nextcloud_onlyoffice`)
|
||||
- `vpn.desineuron.in` -> `127.0.0.1:8080` / `127.0.0.1:8081` (`netbird`)
|
||||
|
||||
Tunnel state:
|
||||
|
||||
- `rathole-client.service` active on Linux
|
||||
- `rathole-server.service` active on AWS
|
||||
- `cloudflared` inactive on Linux
|
||||
|
||||
### Migration Changes Applied
|
||||
|
||||
#### Cloudflare
|
||||
|
||||
Old CNAME tunnel records were removed for the six public hostnames.
|
||||
|
||||
New records were created:
|
||||
|
||||
- Type: `A`
|
||||
- Value: `98.87.120.120`
|
||||
- Proxy status: `DNS only`
|
||||
- TTL: `300`
|
||||
|
||||
#### AWS Ingress
|
||||
|
||||
Installed and configured:
|
||||
|
||||
- `Caddy`
|
||||
- `rathole`
|
||||
- `amazon-ssm-agent`
|
||||
- Linux-driven SSH allowlist sync for the ingress node
|
||||
|
||||
TLS:
|
||||
|
||||
- Existing valid certificate/key pair from the Linux origin was copied to the ingress node.
|
||||
- Caddy now terminates HTTPS at the edge.
|
||||
|
||||
#### Linux Origin
|
||||
|
||||
nginx was already routing by hostname and remains the origin router.
|
||||
|
||||
Nextcloud was adjusted so `talk.desineuron.in` no longer canonicalizes to `cloud.desineuron.in`:
|
||||
|
||||
- removed `overwritehost` pin
|
||||
- added `talk.desineuron.in` to trusted domains
|
||||
- restarted `nextcloud_app`
|
||||
|
||||
### Validation Results
|
||||
|
||||
Public hostname checks through the new ingress:
|
||||
|
||||
- `office.desineuron.in` -> `200 /welcome/`
|
||||
- `git.desineuron.in` -> `200`
|
||||
- `cloud.desineuron.in` -> `200 /login`
|
||||
- `projects.desineuron.in` -> `200`
|
||||
- `talk.desineuron.in` -> `200 /login` on `talk.desineuron.in`
|
||||
- `vpn.desineuron.in` -> `200`
|
||||
- `ops.desineuron.in/login` -> `200`
|
||||
- `comfy.desineuron.in` -> `200`
|
||||
|
||||
Important note:
|
||||
|
||||
- `talk.desineuron.in` now stays on the `talk` hostname.
|
||||
- It is still backed by the same Nextcloud origin and presents the Nextcloud login flow, which is expected given the current Linux-side app layout.
|
||||
|
||||
### ComfyUI Recovery and GPU Route
|
||||
|
||||
Root cause of the earlier `502`:
|
||||
|
||||
- ingress route and TLS were correct
|
||||
- the GPU spot node had lost the actual `/opt/dlami/nvme/ComfyUI` app tree
|
||||
- nothing was listening on `172.31.46.190:8188`
|
||||
|
||||
Permanent fix applied:
|
||||
|
||||
- restored `/opt/dlami/nvme/ComfyUI` from upstream source control
|
||||
- installed ComfyUI Python requirements on the GPU node
|
||||
- created `systemd` unit `comfyui.service`
|
||||
- enabled `comfyui.service` at boot with automatic restart
|
||||
- kept `comfy.desineuron.in` mapped through ingress Caddy
|
||||
- removed direct public access to `8118` and `8188`
|
||||
- allowed `8188` only from ingress security group
|
||||
|
||||
Current live path:
|
||||
|
||||
- `https://comfy.desineuron.in`
|
||||
-> ingress `98.87.120.120`
|
||||
-> Caddy reverse proxy
|
||||
-> GPU private IP `172.31.46.190:8188`
|
||||
-> `comfyui.service`
|
||||
|
||||
Current public result:
|
||||
|
||||
- `comfy.desineuron.in` currently returns `200 OK`
|
||||
- ingress route is now managed dynamically instead of hardcoded to one GPU private IP
|
||||
|
||||
Current GPU service:
|
||||
|
||||
- `comfyui.service`
|
||||
- app path: `/opt/dlami/nvme/ComfyUI`
|
||||
- log path: `/var/log/comfyui/service.log`
|
||||
- port: `8188/tcp`
|
||||
|
||||
Current backend state on `2026-04-12`:
|
||||
|
||||
- `comfyui.service` is `active`
|
||||
- `main.py` is present under `/opt/dlami/nvme/ComfyUI`
|
||||
- the process is listening on `0.0.0.0:8188`
|
||||
- the public ingress path is healthy again
|
||||
|
||||
Auto-healing fix applied:
|
||||
|
||||
- ComfyUI `systemd` service now runs an `ExecStartPre` recovery script at `/usr/local/bin/desineuron-ensure-comfyui.sh`
|
||||
- that script reclones/repairs `/opt/dlami/nvme/ComfyUI` if the tree is missing or damaged
|
||||
- Linux now runs `desineuron-comfy-route-sync.timer`
|
||||
- the timer updates the managed Caddy route for `comfy.desineuron.in` to the current private IP of the AWS instance tagged `DesineuronRole=comfyui`
|
||||
- this protects the public route from GPU instance IP drift without manual Caddy edits
|
||||
|
||||
Expected endpoints:
|
||||
|
||||
- `https://comfy.desineuron.in/`
|
||||
- `https://comfy.desineuron.in/prompt`
|
||||
- `https://comfy.desineuron.in/history/{prompt_id}`
|
||||
- `https://comfy.desineuron.in/queue`
|
||||
- `https://comfy.desineuron.in/upload/image`
|
||||
|
||||
### Files and Config Artifacts
|
||||
|
||||
Infrastructure artifacts in repo:
|
||||
|
||||
- [README.md](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/README.md)
|
||||
- [Caddyfile](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/Caddyfile)
|
||||
- [rathole-server.toml](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/rathole-server.toml)
|
||||
- [rathole-client.toml](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/rathole-client.toml)
|
||||
- [install_linux_rathole_client.sh](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/install_linux_rathole_client.sh)
|
||||
- [user_data.sh](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/user_data.sh)
|
||||
- [install_gpu_comfyui_service.sh](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/install_gpu_comfyui_service.sh)
|
||||
- [map_gpu_comfy_security.ps1](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/map_gpu_comfy_security.ps1)
|
||||
- [sync_ingress_home_ip.py](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/sync_ingress_home_ip.py)
|
||||
- [desineuron-ingress-home-ip-sync.service](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/desineuron-ingress-home-ip-sync.service)
|
||||
- [desineuron-ingress-home-ip-sync.timer](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/desineuron-ingress-home-ip-sync.timer)
|
||||
- [install_linux_ingress_ip_sync.sh](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/install_linux_ingress_ip_sync.sh)
|
||||
- [sync_comfy_route.py](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/sync_comfy_route.py)
|
||||
- [desineuron-comfy-route-sync.service](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/desineuron-comfy-route-sync.service)
|
||||
- [desineuron-comfy-route-sync.timer](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/desineuron-comfy-route-sync.timer)
|
||||
- [install_linux_comfy_route_sync.sh](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/install_linux_comfy_route_sync.sh)
|
||||
- [README.md](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/ops_control_plane/README.md)
|
||||
- [Desineuron Ops Control Plane Bibel.md](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/.Agent%20Context/Bibels/Desineuron%20Ops%20Control%20Plane%20Bibel.md)
|
||||
|
||||
Linux origin files touched:
|
||||
|
||||
- `/etc/nginx/sites-enabled/desineuron.conf`
|
||||
- `/mnt/ServerStorage/docker_apps/nextcloud/.env`
|
||||
- `/mnt/ServerStorage/docker_apps/nextcloud/data/config/config.php`
|
||||
- `/mnt/ServerStorage/docker_apps/nextcloud/data/config/reverse-proxy.config.php`
|
||||
|
||||
Backups created on Linux:
|
||||
|
||||
- `/mnt/ServerStorage/docker_apps/nextcloud/.env.pre_ingress_backup_2026-04-08`
|
||||
- `/mnt/ServerStorage/docker_apps/nextcloud/data/config/reverse-proxy.config.php.pre_ingress_backup_2026-04-08`
|
||||
|
||||
### Dynamic Home IP Sync
|
||||
|
||||
Purpose:
|
||||
|
||||
- Keep ingress `22/tcp` restricted to the current Airtel public IP even when the ISP changes it
|
||||
- Prevent future manual outages for SSH fallback caused by stale home-IP security-group rules
|
||||
|
||||
Design:
|
||||
|
||||
- Linux origin runs `desineuron-ingress-home-ip-sync.timer`
|
||||
- Timer fires on boot and every 5 minutes
|
||||
- Service resolves the current home public IP via `https://api.ipify.org`
|
||||
- Service updates only the ingress security group `sg-0721b8b48e12c531d`
|
||||
- Only the SSH fallback rule is mutated
|
||||
- `rathole` is no longer dependent on the Airtel IP because `2333/tcp` remains open on the ingress
|
||||
|
||||
Installed Linux paths:
|
||||
|
||||
- `/usr/local/bin/sync_ingress_home_ip.py`
|
||||
- `/etc/systemd/system/desineuron-ingress-home-ip-sync.service`
|
||||
- `/etc/systemd/system/desineuron-ingress-home-ip-sync.timer`
|
||||
- `/etc/desineuron-ingress-home-ip-sync.env`
|
||||
- `/opt/desineuron-ingress-ip-sync/.venv`
|
||||
- `/var/lib/desineuron-ingress-ip-sync/current_ip.txt`
|
||||
|
||||
Current state:
|
||||
|
||||
- Timer: enabled and active
|
||||
- Last recorded home public IP: `223.185.28.89`
|
||||
- Ingress SSH rule CIDR: `223.185.28.89/32`
|
||||
|
||||
### Dynamic Comfy Route Sync
|
||||
|
||||
Purpose:
|
||||
|
||||
- keep `comfy.desineuron.in` mapped to the correct AWS GPU private IP even if the GPU instance public/private IP changes
|
||||
- remove the need to hand-edit `/etc/caddy/Caddyfile` for ComfyUI moves
|
||||
|
||||
Design:
|
||||
|
||||
- Linux runs `desineuron-comfy-route-sync.timer`
|
||||
- timer fires on boot and every 2 minutes
|
||||
- service looks for the newest running EC2 instance tagged `DesineuronRole=comfyui`
|
||||
- service reads its current private IP
|
||||
- service connects to the ingress node and updates the managed Caddy route with `/usr/local/bin/manage_desineuron_routes.py`
|
||||
- Caddy is validated and reloaded only after a successful route update
|
||||
|
||||
Installed Linux paths:
|
||||
|
||||
- `/usr/local/bin/sync_comfy_route.py`
|
||||
- `/etc/systemd/system/desineuron-comfy-route-sync.service`
|
||||
- `/etc/systemd/system/desineuron-comfy-route-sync.timer`
|
||||
- `/etc/desineuron-comfy-route-sync.env`
|
||||
- `/opt/desineuron-comfy-route-sync/.venv`
|
||||
- `/var/lib/desineuron-comfy-route-sync/current_target.txt`
|
||||
|
||||
Current state:
|
||||
|
||||
- Timer: enabled and active
|
||||
- Current synced target: `172.31.46.190`
|
||||
- Current target instance tag: `DesineuronRole=comfyui`
|
||||
|
||||
### Operational Commands
|
||||
|
||||
Check AWS ingress status:
|
||||
|
||||
```powershell
|
||||
aws ec2 describe-instances --instance-ids i-094df09acafb72494 --region us-east-1
|
||||
aws ec2 describe-addresses --allocation-ids eipalloc-0d54fc0f827450e7b --region us-east-1
|
||||
```
|
||||
|
||||
Check ingress services:
|
||||
|
||||
```powershell
|
||||
aws ssm send-command --region us-east-1 --instance-ids i-094df09acafb72494 --document-name AWS-RunShellScript --parameters commands="sudo systemctl status caddy rathole-server --no-pager"
|
||||
```
|
||||
|
||||
Check GPU ComfyUI service:
|
||||
|
||||
```powershell
|
||||
aws ssm send-command --region us-east-1 --instance-ids i-0e4eab5fe67cf9abe --document-name AWS-RunShellScript --parameters commands="sudo systemctl status comfyui --no-pager","ss -ltnp | grep 8188 || true","tail -n 40 /var/log/comfyui/service.log || true"
|
||||
```
|
||||
|
||||
Check Linux origin services:
|
||||
|
||||
```powershell
|
||||
ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S systemctl status rathole-client nginx"
|
||||
ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S systemctl status desineuron-ingress-home-ip-sync.service desineuron-ingress-home-ip-sync.timer"
|
||||
ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S journalctl -u desineuron-ingress-home-ip-sync -n 50 --no-pager"
|
||||
ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S systemctl status desineuron-ops-control-plane.service --no-pager"
|
||||
ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S docker compose -f /opt/desineuron-ops-control-plane/docker-compose.yml ps"
|
||||
ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S systemctl status desineuron-comfy-route-sync.service desineuron-comfy-route-sync.timer --no-pager"
|
||||
```
|
||||
|
||||
Public endpoint validation:
|
||||
|
||||
```powershell
|
||||
curl.exe -I https://office.desineuron.in
|
||||
curl.exe -I https://git.desineuron.in
|
||||
curl.exe -I https://cloud.desineuron.in
|
||||
curl.exe -I https://projects.desineuron.in
|
||||
curl.exe -I https://talk.desineuron.in
|
||||
curl.exe -I https://vpn.desineuron.in
|
||||
curl.exe -I https://comfy.desineuron.in
|
||||
curl.exe -I https://ops.desineuron.in/login
|
||||
```
|
||||
|
||||
### Future Service Mapping Runbook
|
||||
|
||||
Use this pattern for any future public service behind the stable ingress layer.
|
||||
|
||||
1. Decide the backend location.
|
||||
|
||||
- Linux origin behind `rathole`
|
||||
- AWS GPU/private EC2 node
|
||||
- another private backend later
|
||||
|
||||
2. Decide whether the service should terminate TLS at ingress.
|
||||
|
||||
- default: yes
|
||||
- Caddy on ingress should own the public hostname and certificate
|
||||
|
||||
3. Create the DNS record in Cloudflare.
|
||||
|
||||
- type: `A`
|
||||
- value: `98.87.120.120`
|
||||
- proxy mode: `DNS only`
|
||||
- low TTL during rollout
|
||||
|
||||
4. Add the ingress route in [`Caddyfile`](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/Caddyfile).
|
||||
|
||||
Patterns:
|
||||
|
||||
- Linux-origin service:
|
||||
- proxy to `https://127.0.0.1:8443`
|
||||
- preserve `Host`
|
||||
- private AWS backend service:
|
||||
- proxy to `http://<private-ip>:<port>` or `https://<private-ip>:<port>`
|
||||
|
||||
5. Restrict backend network access.
|
||||
|
||||
- never leave backend app ports open to `0.0.0.0/0` unless absolutely necessary
|
||||
- prefer security-group rule allowing traffic only from ingress security group
|
||||
- for home-origin services, keep them private behind `rathole`
|
||||
|
||||
6. Reload ingress.
|
||||
|
||||
```powershell
|
||||
ssh -i "F:\Workin In Progress\DESINEURON\GITLAB\Project_Velocity\desineuron-l4-node.pem" ec2-user@98.87.120.120 "sudo caddy validate --config /etc/caddy/Caddyfile && sudo systemctl reload caddy"
|
||||
```
|
||||
|
||||
7. Validate TLS and app response.
|
||||
|
||||
- check certificate subject matches hostname
|
||||
- check `curl -I https://<host>`
|
||||
- check login page or health endpoint
|
||||
- check browser behavior
|
||||
|
||||
8. If the backend is stateful, create a persistent service.
|
||||
|
||||
- prefer `systemd`
|
||||
- enable restart on failure
|
||||
- log to a stable path
|
||||
- record service name, working directory, ports, and restart policy in this handoff doc
|
||||
|
||||
9. Update team docs immediately.
|
||||
|
||||
- hostname
|
||||
- DNS record type
|
||||
- ingress route target
|
||||
- backend service owner
|
||||
- service name
|
||||
- health check command
|
||||
- rollback step
|
||||
|
||||
### Security Notes
|
||||
|
||||
- Public traffic terminates only at the AWS edge.
|
||||
- The Linux box no longer needs Cloudflare Tunnel for these six routes.
|
||||
- The Linux origin is reached through an outbound tunnel, not by directly exposing the home machine to the public for app traffic.
|
||||
- SSH on the Linux box remains key-only.
|
||||
- The AWS ingress IAM role is limited to SSM core.
|
||||
- ComfyUI is no longer directly exposed on the GPU public IP; only the ingress layer can reach `8188`.
|
||||
- Ingress `22/tcp` stays restricted and is now auto-synced from the Linux origin.
|
||||
- Ingress `2333/tcp` is intentionally open so `rathole` survives Airtel IP changes without operator action.
|
||||
|
||||
### Remaining Improvement Ideas
|
||||
|
||||
- Move the Linux nginx certificate issuance/renewal model to the AWS edge permanently instead of copying an existing certificate.
|
||||
- Clean up nginx warnings about duplicated protocol options.
|
||||
- Separate `talk.desineuron.in` more fully from general Nextcloud if a distinct Talk-only UX is desired.
|
||||
- Add authentication in front of `comfy.desineuron.in`; internet scanners started hitting the route immediately after it went live.
|
||||
- Consider putting Basic Auth or an allowlist in front of `comfy.desineuron.in` before broader team rollout.
|
||||
- Add monitoring and alerting on:
|
||||
- `caddy`
|
||||
- `rathole-server`
|
||||
- `rathole-client`
|
||||
- public HTTPS checks
|
||||
- Add infrastructure-as-code for the EC2 ingress node if this should be reproducible by the team without manual AWS CLI steps.
|
||||
|
||||
### Rollback
|
||||
|
||||
If rollback is needed:
|
||||
|
||||
1. Recreate Cloudflare CNAME/tunnel routes or repoint the DNS records away from `98.87.120.120`.
|
||||
2. Stop `caddy` and `rathole-server` on AWS.
|
||||
3. Stop `rathole-client` on Linux.
|
||||
4. Restore Nextcloud files from:
|
||||
- `.env.pre_ingress_backup_2026-04-08`
|
||||
- `reverse-proxy.config.php.pre_ingress_backup_2026-04-08`
|
||||
5. Restart `nextcloud_app` and nginx.
|
||||
|
||||
### Team Summary
|
||||
|
||||
This migration is complete.
|
||||
|
||||
Cloudflare Tunnel is no longer the production path for the six public service hostnames. The stable production ingress is now the AWS `t4g.micro` node with Elastic IP `98.87.120.120`, and the Linux machine remains the private origin behind `rathole`.
|
||||
|
||||
Additional mapped route:
|
||||
|
||||
- `comfy.desineuron.in` now terminates on the same stable ingress and forwards to the GPU node's private address `172.31.46.190:8188`.
|
||||
- No further DNS change is needed for ComfyUI.
|
||||
- The backend is supervised by `systemd` and currently healthy.
|
||||
- The route is now auto-synced from Linux based on the tagged AWS ComfyUI worker, so future IP changes do not require manual ingress edits.
|
||||
- The team can use:
|
||||
- `https://comfy.desineuron.in/prompt`
|
||||
- `https://comfy.desineuron.in/history/{prompt_id}`
|
||||
- `https://comfy.desineuron.in/queue`
|
||||
- `https://comfy.desineuron.in/upload/image`
|
||||
|
||||
### Current Status Snapshot - 2026-04-12
|
||||
|
||||
Live public service state:
|
||||
|
||||
- `office.desineuron.in` -> `200`
|
||||
- `git.desineuron.in` -> `200`
|
||||
- `cloud.desineuron.in` -> `200`
|
||||
- `projects.desineuron.in` -> `200`
|
||||
- `talk.desineuron.in` -> `200`
|
||||
- `vpn.desineuron.in` -> `200`
|
||||
- `ops.desineuron.in/login` -> `200`
|
||||
- `comfy.desineuron.in` -> `200`
|
||||
|
||||
Linux-origin health:
|
||||
|
||||
- `nginx.service` -> `active`
|
||||
- `rathole-client.service` -> `active`
|
||||
- `desineuron-ingress-home-ip-sync.timer` -> `active`
|
||||
- `desineuron-ops-control-plane.service` -> `active`
|
||||
|
||||
Linux ops stack containers:
|
||||
|
||||
- `desineuron-ops-api` -> `Up`
|
||||
- `desineuron-ops-db` -> `Up (healthy)`
|
||||
- `desineuron-ops-worker` -> `Up`
|
||||
|
||||
Ingress health:
|
||||
|
||||
- `caddy` -> `active`
|
||||
- `rathole-server` -> `active`
|
||||
- `comfy.desineuron.in` Caddy route is present in `/etc/caddy/Caddyfile`
|
||||
|
||||
GPU ComfyUI state:
|
||||
|
||||
- `comfyui.service` -> `active`
|
||||
- `main.py` present under `/opt/dlami/nvme/ComfyUI`
|
||||
- listener present on `0.0.0.0:8188`
|
||||
- public ingress path is healthy
|
||||
|
||||
Comfy auto-heal state:
|
||||
|
||||
- `desineuron-comfy-route-sync.timer` -> `active`
|
||||
- synced target file -> `/var/lib/desineuron-comfy-route-sync/current_target.txt`
|
||||
- current synced target -> `172.31.46.190`
|
||||
|
||||
### Linux Ops Control Plane
|
||||
|
||||
The Linux box now also hosts the private AWS control surface for the team.
|
||||
|
||||
Public operator URL:
|
||||
|
||||
- `https://ops.desineuron.in/login`
|
||||
|
||||
Purpose:
|
||||
|
||||
- launch/stop/terminate AWS machines
|
||||
- view spot/on-demand market data
|
||||
- track runtime and estimated cost
|
||||
- ingest model directories from the Linux box into S3
|
||||
- hydrate models from S3 to AWS GPU nodes
|
||||
- manage ingress routes through the `t4g.micro`
|
||||
- export session/cost CSVs
|
||||
|
||||
Linux runtime paths:
|
||||
|
||||
- stack root: `/opt/desineuron-ops-control-plane`
|
||||
- env file: `/opt/desineuron-ops-control-plane/.env`
|
||||
- exports: `/opt/desineuron-ops-control-plane/exports`
|
||||
- state: `/opt/desineuron-ops-control-plane/state`
|
||||
|
||||
Canonical S3 bucket:
|
||||
|
||||
- `desineuron-ops-control-plane-819079556187-us-east-1`
|
||||
|
||||
Model library source on Linux:
|
||||
|
||||
- `/mnt/ServerStorage/ai-models/models`
|
||||
|
||||
Current operator accounts:
|
||||
|
||||
- `sagnik@desineuron.in`
|
||||
- `sayan@desineuron.in`
|
||||
- `sourik@desineuron.in`
|
||||
|
||||
Reference docs:
|
||||
|
||||
- [README.md](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/ops_control_plane/README.md)
|
||||
- [Desineuron Ops Control Plane Bibel.md](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/.Agent%20Context/Bibels/Desineuron%20Ops%20Control%20Plane%20Bibel.md)
|
||||
Reference in New Issue
Block a user