Missed files
This commit is contained in:
@@ -19,7 +19,7 @@ Date: 2026-04-08
|
||||
13. Remaining Improvement Ideas
|
||||
14. Rollback
|
||||
15. Team Summary
|
||||
16. Current Status Snapshot - 2026-04-11
|
||||
16. Current Status Snapshot - 2026-04-12
|
||||
17. Linux Ops Control Plane
|
||||
|
||||
### Outcome
|
||||
@@ -87,7 +87,7 @@ Current GPU worker:
|
||||
- Type: `g6.12xlarge`
|
||||
- Region: `us-east-1`
|
||||
- Private IP: `172.31.46.190`
|
||||
- Current public IP: `18.208.176.121`
|
||||
- Current public IP: `100.31.64.121`
|
||||
- Launch time: `2026-04-11T06:14:04Z`
|
||||
|
||||
Open ingress ports:
|
||||
@@ -168,7 +168,7 @@ Public hostname checks through the new ingress:
|
||||
- `talk.desineuron.in` -> `200 /login` on `talk.desineuron.in`
|
||||
- `vpn.desineuron.in` -> `200`
|
||||
- `ops.desineuron.in/login` -> `200`
|
||||
- `comfy.desineuron.in` -> `502`
|
||||
- `comfy.desineuron.in` -> `200`
|
||||
|
||||
Important note:
|
||||
|
||||
@@ -203,9 +203,8 @@ Current live path:
|
||||
|
||||
Current public result:
|
||||
|
||||
- `comfy.desineuron.in` currently returns `502 Bad Gateway`
|
||||
- ingress route is present and Caddy is healthy
|
||||
- the current GPU backend is not yet listening on `172.31.46.190:8188`, so this is a backend readiness issue, not a DNS or edge-TLS issue
|
||||
- `comfy.desineuron.in` currently returns `200 OK`
|
||||
- ingress route is now managed dynamically instead of hardcoded to one GPU private IP
|
||||
|
||||
Current GPU service:
|
||||
|
||||
@@ -214,11 +213,20 @@ Current GPU service:
|
||||
- log path: `/var/log/comfyui/service.log`
|
||||
- port: `8188/tcp`
|
||||
|
||||
Current backend state on `2026-04-11`:
|
||||
Current backend state on `2026-04-12`:
|
||||
|
||||
- `comfyui.service` is `activating`
|
||||
- latest log shows ComfyUI startup and `Starting server`
|
||||
- the process is still not binding `8188`, so ingress sees the backend as unavailable
|
||||
- `comfyui.service` is `active`
|
||||
- `main.py` is present under `/opt/dlami/nvme/ComfyUI`
|
||||
- the process is listening on `0.0.0.0:8188`
|
||||
- the public ingress path is healthy again
|
||||
|
||||
Auto-healing fix applied:
|
||||
|
||||
- ComfyUI `systemd` service now runs an `ExecStartPre` recovery script at `/usr/local/bin/desineuron-ensure-comfyui.sh`
|
||||
- that script reclones/repairs `/opt/dlami/nvme/ComfyUI` if the tree is missing or damaged
|
||||
- Linux now runs `desineuron-comfy-route-sync.timer`
|
||||
- the timer updates the managed Caddy route for `comfy.desineuron.in` to the current private IP of the AWS instance tagged `DesineuronRole=comfyui`
|
||||
- this protects the public route from GPU instance IP drift without manual Caddy edits
|
||||
|
||||
Expected endpoints:
|
||||
|
||||
@@ -244,6 +252,10 @@ Infrastructure artifacts in repo:
|
||||
- [desineuron-ingress-home-ip-sync.service](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/desineuron-ingress-home-ip-sync.service)
|
||||
- [desineuron-ingress-home-ip-sync.timer](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/desineuron-ingress-home-ip-sync.timer)
|
||||
- [install_linux_ingress_ip_sync.sh](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/install_linux_ingress_ip_sync.sh)
|
||||
- [sync_comfy_route.py](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/sync_comfy_route.py)
|
||||
- [desineuron-comfy-route-sync.service](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/desineuron-comfy-route-sync.service)
|
||||
- [desineuron-comfy-route-sync.timer](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/desineuron-comfy-route-sync.timer)
|
||||
- [install_linux_comfy_route_sync.sh](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/install_linux_comfy_route_sync.sh)
|
||||
- [README.md](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/ops_control_plane/README.md)
|
||||
- [Desineuron Ops Control Plane Bibel.md](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/.Agent%20Context/Bibels/Desineuron%20Ops%20Control%20Plane%20Bibel.md)
|
||||
|
||||
@@ -290,6 +302,37 @@ Current state:
|
||||
- Last recorded home public IP: `223.185.28.89`
|
||||
- Ingress SSH rule CIDR: `223.185.28.89/32`
|
||||
|
||||
### Dynamic Comfy Route Sync
|
||||
|
||||
Purpose:
|
||||
|
||||
- keep `comfy.desineuron.in` mapped to the correct AWS GPU private IP even if the GPU instance public/private IP changes
|
||||
- remove the need to hand-edit `/etc/caddy/Caddyfile` for ComfyUI moves
|
||||
|
||||
Design:
|
||||
|
||||
- Linux runs `desineuron-comfy-route-sync.timer`
|
||||
- timer fires on boot and every 2 minutes
|
||||
- service looks for the newest running EC2 instance tagged `DesineuronRole=comfyui`
|
||||
- service reads its current private IP
|
||||
- service connects to the ingress node and updates the managed Caddy route with `/usr/local/bin/manage_desineuron_routes.py`
|
||||
- Caddy is validated and reloaded only after a successful route update
|
||||
|
||||
Installed Linux paths:
|
||||
|
||||
- `/usr/local/bin/sync_comfy_route.py`
|
||||
- `/etc/systemd/system/desineuron-comfy-route-sync.service`
|
||||
- `/etc/systemd/system/desineuron-comfy-route-sync.timer`
|
||||
- `/etc/desineuron-comfy-route-sync.env`
|
||||
- `/opt/desineuron-comfy-route-sync/.venv`
|
||||
- `/var/lib/desineuron-comfy-route-sync/current_target.txt`
|
||||
|
||||
Current state:
|
||||
|
||||
- Timer: enabled and active
|
||||
- Current synced target: `172.31.46.190`
|
||||
- Current target instance tag: `DesineuronRole=comfyui`
|
||||
|
||||
### Operational Commands
|
||||
|
||||
Check AWS ingress status:
|
||||
@@ -319,6 +362,7 @@ ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.
|
||||
ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S journalctl -u desineuron-ingress-home-ip-sync -n 50 --no-pager"
|
||||
ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S systemctl status desineuron-ops-control-plane.service --no-pager"
|
||||
ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S docker compose -f /opt/desineuron-ops-control-plane/docker-compose.yml ps"
|
||||
ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S systemctl status desineuron-comfy-route-sync.service desineuron-comfy-route-sync.timer --no-pager"
|
||||
```
|
||||
|
||||
Public endpoint validation:
|
||||
@@ -449,14 +493,15 @@ Additional mapped route:
|
||||
|
||||
- `comfy.desineuron.in` now terminates on the same stable ingress and forwards to the GPU node's private address `172.31.46.190:8188`.
|
||||
- No further DNS change is needed for ComfyUI.
|
||||
- The backend is supervised by `systemd`, but the current worker is not yet binding `8188`, so public access is currently degraded with `502`.
|
||||
- The backend is supervised by `systemd` and currently healthy.
|
||||
- The route is now auto-synced from Linux based on the tagged AWS ComfyUI worker, so future IP changes do not require manual ingress edits.
|
||||
- The team can use:
|
||||
- `https://comfy.desineuron.in/prompt`
|
||||
- `https://comfy.desineuron.in/history/{prompt_id}`
|
||||
- `https://comfy.desineuron.in/queue`
|
||||
- `https://comfy.desineuron.in/upload/image`
|
||||
|
||||
### Current Status Snapshot - 2026-04-11
|
||||
### Current Status Snapshot - 2026-04-12
|
||||
|
||||
Live public service state:
|
||||
|
||||
@@ -467,7 +512,7 @@ Live public service state:
|
||||
- `talk.desineuron.in` -> `200`
|
||||
- `vpn.desineuron.in` -> `200`
|
||||
- `ops.desineuron.in/login` -> `200`
|
||||
- `comfy.desineuron.in` -> `502`
|
||||
- `comfy.desineuron.in` -> `200`
|
||||
|
||||
Linux-origin health:
|
||||
|
||||
@@ -490,10 +535,16 @@ Ingress health:
|
||||
|
||||
GPU ComfyUI state:
|
||||
|
||||
- `comfyui.service` -> `activating`
|
||||
- latest logs show ComfyUI startup sequence completing toward `Starting server`
|
||||
- no active listener on `8188` yet
|
||||
- ingress cannot connect to `172.31.46.190:8188`, which is why the public result is `502`
|
||||
- `comfyui.service` -> `active`
|
||||
- `main.py` present under `/opt/dlami/nvme/ComfyUI`
|
||||
- listener present on `0.0.0.0:8188`
|
||||
- public ingress path is healthy
|
||||
|
||||
Comfy auto-heal state:
|
||||
|
||||
- `desineuron-comfy-route-sync.timer` -> `active`
|
||||
- synced target file -> `/var/lib/desineuron-comfy-route-sync/current_target.txt`
|
||||
- current synced target -> `172.31.46.190`
|
||||
|
||||
### Linux Ops Control Plane
|
||||
|
||||
|
||||
Reference in New Issue
Block a user