# ComfyUI Setup Truth **Date:** 2026-04-15 **Purpose:** Capture the current ComfyUI operating truth, team access path, model hydration path, and the exact repo and infra artifacts that matter. ## 1. Current Production Truth ComfyUI is exposed through the stable ingress, not through the GPU box public IP. Current live path: - public hostname: `https://comfy.desineuron.in` - ingress elastic IP: `98.87.120.120` - ingress target: `172.31.46.190:8188` - GPU instance: `i-0e4eab5fe67cf9abe` - GPU type: `g6.12xlarge` As of `2026-04-15`, the public path is healthy again and returns `200 OK`. ## 2. What Failed The recent outage was not an ingress TLS problem. The GPU box had lost its ComfyUI working tree and the systemd recovery path expected by the service was missing. Observed failure state: - `/opt/dlami/nvme/ComfyUI` missing - `/usr/local/bin/desineuron-ensure-comfyui.sh` missing - `comfyui.service` entered restart loops - ingress returned `502` ## 3. What Was Restored The GPU node was restored to the intended service shape: - `comfyui.service` is active - `/opt/dlami/nvme/ComfyUI` exists again - ComfyUI is listening on `0.0.0.0:8188` - ingress can reach `172.31.46.190:8188` - public `https://comfy.desineuron.in` returns `200` ## 4. Team Usability Contract All team members should use the stable hostname only: - `https://comfy.desineuron.in/` - `https://comfy.desineuron.in/prompt` - `https://comfy.desineuron.in/history/{prompt_id}` - `https://comfy.desineuron.in/queue` - `https://comfy.desineuron.in/upload/image` Do not use the GPU public IP directly. Do not expose `8188` publicly again. ## 5. Storage Truth Model and staging work should land on NVMe, not on the root volume. Canonical GPU storage roots: - ComfyUI app: `/opt/dlami/nvme/ComfyUI` - HF cache: `/opt/dlami/nvme/hf` - model staging: `/opt/dlami/nvme/model-staging` - model logs: `/opt/dlami/nvme/model-logs` ## 6. S3 Model Hydration Truth Existing S3 bucket used for Project Velocity model storage: - `s3://project-velocity/models/` Existing model prefix examples were already present there before this pass. This is therefore the current working hydration bucket and prefix family. Wan 2.2 target prefix: - `s3://project-velocity/models/Wan2.2-Animate-14B/` ## 7. Wan 2.2 Animate 14B Download Path Tooling installed on the GPU box: - `hf` - `huggingface_hub` with `hf_transfer` - `s5cmd` Download is staged to NVMe under: - `/opt/dlami/nvme/model-staging/Wan2.2-Animate-14B` Support scripts created on the GPU node: - `/usr/local/bin/desineuron-download-wan22.sh` - `/usr/local/bin/desineuron-sync-wan22-to-s3.sh` The intended flow is: 1. download from Hugging Face to NVMe 2. sync from NVMe to `s3://project-velocity/models/Wan2.2-Animate-14B/` 3. use S3 as the hydration source for future GPU or Linux-side restoration workflows ## 8. Current Wan State The Wan 2.2 Animate 14B download was started on the GPU box and is writing into the NVMe staging directory. This is a long-running asset download and should be treated as resumable model hydration work, not a short command. ## 9. Repo Artifacts That Matter Relevant repo files: - [install_gpu_comfyui_service.sh](F:\Workin In Progress\DESINEURON\GITLAB\Project_Velocity\infrastructure\desineuron_ingress\install_gpu_comfyui_service.sh) - [sync_comfy_route.py](F:\Workin In Progress\DESINEURON\GITLAB\Project_Velocity\infrastructure\desineuron_ingress\sync_comfy_route.py) - [Caddyfile](F:\Workin In Progress\DESINEURON\GITLAB\Project_Velocity\infrastructure\desineuron_ingress\Caddyfile) - [Desineuron Stable Ingress Handoff.md](F:\Workin In Progress\DESINEURON\GITLAB\Project_Velocity\.Agent Context\Sprint 1\Desineuron Stable Ingress Handoff.md) ## 10. Operational Guidance If Comfy breaks again, check in this order: 1. public `https://comfy.desineuron.in` 2. ingress managed route target 3. GPU listener on `8188` 4. existence of `/opt/dlami/nvme/ComfyUI` 5. existence of `/usr/local/bin/desineuron-ensure-comfyui.sh` 6. `comfyui.service` journal ## 11. Bottom Line ComfyUI is a stable-ingress service now, not a direct GPU-IP service. Team usage should go through the ingress hostname, model storage should go to NVMe first, and S3 should act as the hydration source of truth for large model recovery and replication.