4.2 KiB
ComfyUI Setup Truth
Date: 2026-04-15
Purpose: Capture the current ComfyUI operating truth, team access path, model hydration path, and the exact repo and infra artifacts that matter.
1. Current Production Truth
ComfyUI is exposed through the stable ingress, not through the GPU box public IP.
Current live path:
- public hostname:
https://comfy.desineuron.in - ingress elastic IP:
98.87.120.120 - ingress target:
172.31.46.190:8188 - GPU instance:
i-0e4eab5fe67cf9abe - GPU type:
g6.12xlarge
As of 2026-04-15, the public path is healthy again and returns 200 OK.
2. What Failed
The recent outage was not an ingress TLS problem. The GPU box had lost its ComfyUI working tree and the systemd recovery path expected by the service was missing.
Observed failure state:
/opt/dlami/nvme/ComfyUImissing/usr/local/bin/desineuron-ensure-comfyui.shmissingcomfyui.serviceentered restart loops- ingress returned
502
3. What Was Restored
The GPU node was restored to the intended service shape:
comfyui.serviceis active/opt/dlami/nvme/ComfyUIexists again- ComfyUI is listening on
0.0.0.0:8188 - ingress can reach
172.31.46.190:8188 - public
https://comfy.desineuron.inreturns200
4. Team Usability Contract
All team members should use the stable hostname only:
https://comfy.desineuron.in/https://comfy.desineuron.in/prompthttps://comfy.desineuron.in/history/{prompt_id}https://comfy.desineuron.in/queuehttps://comfy.desineuron.in/upload/image
Do not use the GPU public IP directly.
Do not expose 8188 publicly again.
5. Storage Truth
Model and staging work should land on NVMe, not on the root volume.
Canonical GPU storage roots:
- ComfyUI app:
/opt/dlami/nvme/ComfyUI - HF cache:
/opt/dlami/nvme/hf - model staging:
/opt/dlami/nvme/model-staging - model logs:
/opt/dlami/nvme/model-logs
6. S3 Model Hydration Truth
Existing S3 bucket used for Project Velocity model storage:
s3://project-velocity/models/
Existing model prefix examples were already present there before this pass. This is therefore the current working hydration bucket and prefix family.
Wan 2.2 target prefix:
s3://project-velocity/models/Wan2.2-Animate-14B/
7. Wan 2.2 Animate 14B Download Path
Tooling installed on the GPU box:
hfhuggingface_hubwithhf_transfers5cmd
Download is staged to NVMe under:
/opt/dlami/nvme/model-staging/Wan2.2-Animate-14B
Support scripts created on the GPU node:
/usr/local/bin/desineuron-download-wan22.sh/usr/local/bin/desineuron-sync-wan22-to-s3.sh
The intended flow is:
- download from Hugging Face to NVMe
- sync from NVMe to
s3://project-velocity/models/Wan2.2-Animate-14B/ - use S3 as the hydration source for future GPU or Linux-side restoration workflows
8. Current Wan State
The Wan 2.2 Animate 14B download was started on the GPU box and is writing into the NVMe staging directory.
This is a long-running asset download and should be treated as resumable model hydration work, not a short command.
9. Repo Artifacts That Matter
Relevant repo files:
- [install_gpu_comfyui_service.sh](F:\Workin In Progress\DESINEURON\GITLAB\Project_Velocity\infrastructure\desineuron_ingress\install_gpu_comfyui_service.sh)
- [sync_comfy_route.py](F:\Workin In Progress\DESINEURON\GITLAB\Project_Velocity\infrastructure\desineuron_ingress\sync_comfy_route.py)
- [Caddyfile](F:\Workin In Progress\DESINEURON\GITLAB\Project_Velocity\infrastructure\desineuron_ingress\Caddyfile)
- [Desineuron Stable Ingress Handoff.md](F:\Workin In Progress\DESINEURON\GITLAB\Project_Velocity.Agent Context\Sprint 1\Desineuron Stable Ingress Handoff.md)
10. Operational Guidance
If Comfy breaks again, check in this order:
- public
https://comfy.desineuron.in - ingress managed route target
- GPU listener on
8188 - existence of
/opt/dlami/nvme/ComfyUI - existence of
/usr/local/bin/desineuron-ensure-comfyui.sh comfyui.servicejournal
11. Bottom Line
ComfyUI is a stable-ingress service now, not a direct GPU-IP service. Team usage should go through the ingress hostname, model storage should go to NVMe first, and S3 should act as the hydration source of truth for large model recovery and replication.