Missed files

This commit is contained in:
Sagnik
2026-04-12 19:25:43 +05:30
parent 4645ff737b
commit d77373900a
69 changed files with 4375 additions and 2469 deletions

View File

@@ -19,7 +19,7 @@ Date: 2026-04-08
13. Remaining Improvement Ideas
14. Rollback
15. Team Summary
16. Current Status Snapshot - 2026-04-11
16. Current Status Snapshot - 2026-04-12
17. Linux Ops Control Plane
### Outcome
@@ -87,7 +87,7 @@ Current GPU worker:
- Type: `g6.12xlarge`
- Region: `us-east-1`
- Private IP: `172.31.46.190`
- Current public IP: `18.208.176.121`
- Current public IP: `100.31.64.121`
- Launch time: `2026-04-11T06:14:04Z`
Open ingress ports:
@@ -168,7 +168,7 @@ Public hostname checks through the new ingress:
- `talk.desineuron.in` -> `200 /login` on `talk.desineuron.in`
- `vpn.desineuron.in` -> `200`
- `ops.desineuron.in/login` -> `200`
- `comfy.desineuron.in` -> `502`
- `comfy.desineuron.in` -> `200`
Important note:
@@ -203,9 +203,8 @@ Current live path:
Current public result:
- `comfy.desineuron.in` currently returns `502 Bad Gateway`
- ingress route is present and Caddy is healthy
- the current GPU backend is not yet listening on `172.31.46.190:8188`, so this is a backend readiness issue, not a DNS or edge-TLS issue
- `comfy.desineuron.in` currently returns `200 OK`
- ingress route is now managed dynamically instead of hardcoded to one GPU private IP
Current GPU service:
@@ -214,11 +213,20 @@ Current GPU service:
- log path: `/var/log/comfyui/service.log`
- port: `8188/tcp`
Current backend state on `2026-04-11`:
Current backend state on `2026-04-12`:
- `comfyui.service` is `activating`
- latest log shows ComfyUI startup and `Starting server`
- the process is still not binding `8188`, so ingress sees the backend as unavailable
- `comfyui.service` is `active`
- `main.py` is present under `/opt/dlami/nvme/ComfyUI`
- the process is listening on `0.0.0.0:8188`
- the public ingress path is healthy again
Auto-healing fix applied:
- ComfyUI `systemd` service now runs an `ExecStartPre` recovery script at `/usr/local/bin/desineuron-ensure-comfyui.sh`
- that script reclones/repairs `/opt/dlami/nvme/ComfyUI` if the tree is missing or damaged
- Linux now runs `desineuron-comfy-route-sync.timer`
- the timer updates the managed Caddy route for `comfy.desineuron.in` to the current private IP of the AWS instance tagged `DesineuronRole=comfyui`
- this protects the public route from GPU instance IP drift without manual Caddy edits
Expected endpoints:
@@ -244,6 +252,10 @@ Infrastructure artifacts in repo:
- [desineuron-ingress-home-ip-sync.service](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/desineuron-ingress-home-ip-sync.service)
- [desineuron-ingress-home-ip-sync.timer](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/desineuron-ingress-home-ip-sync.timer)
- [install_linux_ingress_ip_sync.sh](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/install_linux_ingress_ip_sync.sh)
- [sync_comfy_route.py](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/sync_comfy_route.py)
- [desineuron-comfy-route-sync.service](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/desineuron-comfy-route-sync.service)
- [desineuron-comfy-route-sync.timer](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/desineuron-comfy-route-sync.timer)
- [install_linux_comfy_route_sync.sh](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/install_linux_comfy_route_sync.sh)
- [README.md](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/ops_control_plane/README.md)
- [Desineuron Ops Control Plane Bibel.md](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/.Agent%20Context/Bibels/Desineuron%20Ops%20Control%20Plane%20Bibel.md)
@@ -290,6 +302,37 @@ Current state:
- Last recorded home public IP: `223.185.28.89`
- Ingress SSH rule CIDR: `223.185.28.89/32`
### Dynamic Comfy Route Sync
Purpose:
- keep `comfy.desineuron.in` mapped to the correct AWS GPU private IP even if the GPU instance public/private IP changes
- remove the need to hand-edit `/etc/caddy/Caddyfile` for ComfyUI moves
Design:
- Linux runs `desineuron-comfy-route-sync.timer`
- timer fires on boot and every 2 minutes
- service looks for the newest running EC2 instance tagged `DesineuronRole=comfyui`
- service reads its current private IP
- service connects to the ingress node and updates the managed Caddy route with `/usr/local/bin/manage_desineuron_routes.py`
- Caddy is validated and reloaded only after a successful route update
Installed Linux paths:
- `/usr/local/bin/sync_comfy_route.py`
- `/etc/systemd/system/desineuron-comfy-route-sync.service`
- `/etc/systemd/system/desineuron-comfy-route-sync.timer`
- `/etc/desineuron-comfy-route-sync.env`
- `/opt/desineuron-comfy-route-sync/.venv`
- `/var/lib/desineuron-comfy-route-sync/current_target.txt`
Current state:
- Timer: enabled and active
- Current synced target: `172.31.46.190`
- Current target instance tag: `DesineuronRole=comfyui`
### Operational Commands
Check AWS ingress status:
@@ -319,6 +362,7 @@ ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.
ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S journalctl -u desineuron-ingress-home-ip-sync -n 50 --no-pager"
ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S systemctl status desineuron-ops-control-plane.service --no-pager"
ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S docker compose -f /opt/desineuron-ops-control-plane/docker-compose.yml ps"
ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S systemctl status desineuron-comfy-route-sync.service desineuron-comfy-route-sync.timer --no-pager"
```
Public endpoint validation:
@@ -449,14 +493,15 @@ Additional mapped route:
- `comfy.desineuron.in` now terminates on the same stable ingress and forwards to the GPU node's private address `172.31.46.190:8188`.
- No further DNS change is needed for ComfyUI.
- The backend is supervised by `systemd`, but the current worker is not yet binding `8188`, so public access is currently degraded with `502`.
- The backend is supervised by `systemd` and currently healthy.
- The route is now auto-synced from Linux based on the tagged AWS ComfyUI worker, so future IP changes do not require manual ingress edits.
- The team can use:
- `https://comfy.desineuron.in/prompt`
- `https://comfy.desineuron.in/history/{prompt_id}`
- `https://comfy.desineuron.in/queue`
- `https://comfy.desineuron.in/upload/image`
### Current Status Snapshot - 2026-04-11
### Current Status Snapshot - 2026-04-12
Live public service state:
@@ -467,7 +512,7 @@ Live public service state:
- `talk.desineuron.in` -> `200`
- `vpn.desineuron.in` -> `200`
- `ops.desineuron.in/login` -> `200`
- `comfy.desineuron.in` -> `502`
- `comfy.desineuron.in` -> `200`
Linux-origin health:
@@ -490,10 +535,16 @@ Ingress health:
GPU ComfyUI state:
- `comfyui.service` -> `activating`
- latest logs show ComfyUI startup sequence completing toward `Starting server`
- no active listener on `8188` yet
- ingress cannot connect to `172.31.46.190:8188`, which is why the public result is `502`
- `comfyui.service` -> `active`
- `main.py` present under `/opt/dlami/nvme/ComfyUI`
- listener present on `0.0.0.0:8188`
- public ingress path is healthy
Comfy auto-heal state:
- `desineuron-comfy-route-sync.timer` -> `active`
- synced target file -> `/var/lib/desineuron-comfy-route-sync/current_target.txt`
- current synced target -> `172.31.46.190`
### Linux Ops Control Plane