Built the Sentinel Tab

This commit is contained in:
Sagnik
2026-04-12 02:02:58 +05:30
parent fb656d1443
commit 075ab280ad
526 changed files with 17646 additions and 70931 deletions

View File

@@ -0,0 +1,59 @@
{
email admin@desineuron.in
log {
output file /var/log/caddy/admin.log
format json
}
}
office.desineuron.in, git.desineuron.in, cloud.desineuron.in, projects.desineuron.in, talk.desineuron.in, vpn.desineuron.in {
tls /etc/caddy/tls/fullchain.pem /etc/caddy/tls/privkey.pem
log {
output file /var/log/caddy/access.log
format json
}
reverse_proxy https://127.0.0.1:8443 {
header_up Host {host}
header_up X-Forwarded-Host {host}
header_up X-Forwarded-Proto {scheme}
header_up X-Forwarded-For {remote_host}
transport http {
tls_insecure_skip_verify
}
}
}
ops.desineuron.in {
log {
output file /var/log/caddy/access.log
format json
}
reverse_proxy https://127.0.0.1:8443 {
header_up Host {host}
header_up X-Forwarded-Host {host}
header_up X-Forwarded-Proto {scheme}
header_up X-Forwarded-For {remote_host}
transport http {
tls_insecure_skip_verify
}
}
}
comfy.desineuron.in {
log {
output file /var/log/caddy/access.log
format json
}
reverse_proxy http://172.31.46.190:8188 {
header_up Host {host}
header_up X-Forwarded-Host {host}
header_up X-Forwarded-Proto {scheme}
header_up X-Forwarded-For {remote_host}
}
}
import /etc/caddy/managed/*.caddy

View File

@@ -0,0 +1,38 @@
# Desineuron Ingress
This directory contains the reproducible bootstrap artifacts for the
`desineuron-ingress-01` EC2 node.
Architecture:
- EC2 `t4g.micro` on-demand in `us-east-1`
- Amazon Linux 2023 ARM64
- `20 GB` gp3 root volume
- `Caddy` as the public HTTPS edge
- `rathole` as the reverse TCP relay from the Linux origin box
Traffic model:
- Public DNS stays in Cloudflare
- Public HTTPS terminates on EC2
- All six public hostnames proxy through EC2 to one local relay socket
- Linux origin continues to serve the actual apps on `https://localhost:443`
Key files:
- `user_data.sh`: first-boot provisioning for the EC2 ingress node
- `Caddyfile`: public edge routing
- `rathole-server.toml`: EC2-side relay config
- `rathole-client.toml`: Linux-side relay config template
- `install_linux_rathole_client.sh`: Linux-side installer/service script
- `sync_ingress_home_ip.py`: detects current home public IP and updates the ingress SSH allowlist rule
- `desineuron-ingress-home-ip-sync.service`: systemd oneshot service for the IP sync
- `desineuron-ingress-home-ip-sync.timer`: persistent timer that reruns the sync every 5 minutes and on boot
- `install_linux_ingress_ip_sync.sh`: Linux-side installer for the IP sync service
Manual Cloudflare work still required unless API credentials are provided:
- set the six hostnames to DNS-only
- point them to the ingress Elastic IP
- retire the Cloudflare Tunnel routes once public validation passes
Dynamic home IP handling:
- `rathole` control port `2333/tcp` is intentionally open on the ingress so public services do not break when the ISP IP changes
- SSH fallback on the ingress remains restricted to the current home public IP on `22/tcp`
- the Linux-side IP sync service keeps that SSH fallback rule current after ISP churn or reboot

View File

@@ -0,0 +1,540 @@
## Desineuron Stable Ingress Handoff
Date: 2026-04-08
### Chapters
1. Outcome
2. Final Architecture
3. AWS Resources
4. Linux Origin State
5. Migration Changes Applied
6. Validation Results
7. ComfyUI Recovery and GPU Route
8. Files and Config Artifacts
9. Dynamic Home IP Sync
10. Operational Commands
11. Future Service Mapping Runbook
12. Security Notes
13. Remaining Improvement Ideas
14. Rollback
15. Team Summary
16. Current Status Snapshot - 2026-04-11
17. Linux Ops Control Plane
### Outcome
The Cloudflare Tunnel dependency for the six public `desineuron.in` services has been replaced with a self-hosted AWS ingress layer:
- Public edge: AWS EC2 `t4g.micro`
- Stable public IP: `98.87.120.120`
- TLS termination: `Caddy` on the ingress node
- Private backend relay: `rathole`
- Origin: Linux box at `192.168.1.4`
- DNS: Cloudflare, `DNS only`
Public hostnames now route through AWS instead of Cloudflare Tunnel:
- `office.desineuron.in`
- `git.desineuron.in`
- `cloud.desineuron.in`
- `projects.desineuron.in`
- `talk.desineuron.in`
- `vpn.desineuron.in`
- `comfy.desineuron.in` (ingress route created for AWS GPU ComfyUI)
- `ops.desineuron.in` (private operator control surface on the Linux box)
### Final Architecture
```text
Internet
-> Cloudflare DNS
-> 98.87.120.120
-> EC2 ingress: desineuron-ingress-01
-> Caddy :443
-> rathole server (control on 2333, local relay on 127.0.0.1:8443)
-> Linux origin tunnel client
-> Linux nginx :443
-> per-host upstream routing
-> Gitea
-> Nextcloud
-> Taiga
-> OnlyOffice
-> NetBird
-> comfy.desineuron.in
-> EC2 ingress Caddy
-> private proxy to AWS GPU box `172.31.46.190:8188`
-> ComfyUI endpoints on systemd-managed GPU service
```
### AWS Resources
- Instance name: `desineuron-ingress-01`
- Instance ID: `i-094df09acafb72494`
- Type: `t4g.micro`
- Region: `us-east-1`
- Subnet: `subnet-03d684ed15f327151`
- VPC: `vpc-081d2397920aad268`
- Root disk: `20 GB gp3`
- Elastic IP: `98.87.120.120`
- IAM role: `desineuron-ingress-role`
- Instance profile: `desineuron-ingress-profile`
- Security group: `sg-0721b8b48e12c531d`
Current GPU worker:
- Instance ID: `i-0e4eab5fe67cf9abe`
- Type: `g6.12xlarge`
- Region: `us-east-1`
- Private IP: `172.31.46.190`
- Current public IP: `18.208.176.121`
- Launch time: `2026-04-11T06:14:04Z`
Open ingress ports:
- `80/tcp` from internet
- `443/tcp` from internet
- `22/tcp` restricted to the current home public IP and auto-synced from the Linux origin
- `2333/tcp` from internet for `rathole` control and data relay
GPU node security posture for ComfyUI:
- public `8118/tcp` removed
- public `8188/tcp` removed
- `8188/tcp` now allowed only from ingress security group `sg-0721b8b48e12c531d`
### Linux Origin State
Services exposed to local nginx:
- `git.desineuron.in` -> `127.0.0.1:3000` (`gitea`)
- `cloud.desineuron.in` -> `127.0.0.1:11000` (`nextcloud_app`)
- `talk.desineuron.in` -> `127.0.0.1:11000` (`nextcloud_app`, Talk-focused hostname)
- `projects.desineuron.in` -> `127.0.0.1:9100` (`taiga-gateway`)
- `office.desineuron.in` -> `127.0.0.1:9980` (`nextcloud_onlyoffice`)
- `vpn.desineuron.in` -> `127.0.0.1:8080` / `127.0.0.1:8081` (`netbird`)
Tunnel state:
- `rathole-client.service` active on Linux
- `rathole-server.service` active on AWS
- `cloudflared` inactive on Linux
### Migration Changes Applied
#### Cloudflare
Old CNAME tunnel records were removed for the six public hostnames.
New records were created:
- Type: `A`
- Value: `98.87.120.120`
- Proxy status: `DNS only`
- TTL: `300`
#### AWS Ingress
Installed and configured:
- `Caddy`
- `rathole`
- `amazon-ssm-agent`
- Linux-driven SSH allowlist sync for the ingress node
TLS:
- Existing valid certificate/key pair from the Linux origin was copied to the ingress node.
- Caddy now terminates HTTPS at the edge.
#### Linux Origin
nginx was already routing by hostname and remains the origin router.
Nextcloud was adjusted so `talk.desineuron.in` no longer canonicalizes to `cloud.desineuron.in`:
- removed `overwritehost` pin
- added `talk.desineuron.in` to trusted domains
- restarted `nextcloud_app`
### Validation Results
Public hostname checks through the new ingress:
- `office.desineuron.in` -> `200 /welcome/`
- `git.desineuron.in` -> `200`
- `cloud.desineuron.in` -> `200 /login`
- `projects.desineuron.in` -> `200`
- `talk.desineuron.in` -> `200 /login` on `talk.desineuron.in`
- `vpn.desineuron.in` -> `200`
- `ops.desineuron.in/login` -> `200`
- `comfy.desineuron.in` -> `502`
Important note:
- `talk.desineuron.in` now stays on the `talk` hostname.
- It is still backed by the same Nextcloud origin and presents the Nextcloud login flow, which is expected given the current Linux-side app layout.
### ComfyUI Recovery and GPU Route
Root cause of the earlier `502`:
- ingress route and TLS were correct
- the GPU spot node had lost the actual `/opt/dlami/nvme/ComfyUI` app tree
- nothing was listening on `172.31.46.190:8188`
Permanent fix applied:
- restored `/opt/dlami/nvme/ComfyUI` from upstream source control
- installed ComfyUI Python requirements on the GPU node
- created `systemd` unit `comfyui.service`
- enabled `comfyui.service` at boot with automatic restart
- kept `comfy.desineuron.in` mapped through ingress Caddy
- removed direct public access to `8118` and `8188`
- allowed `8188` only from ingress security group
Current live path:
- `https://comfy.desineuron.in`
-> ingress `98.87.120.120`
-> Caddy reverse proxy
-> GPU private IP `172.31.46.190:8188`
-> `comfyui.service`
Current public result:
- `comfy.desineuron.in` currently returns `502 Bad Gateway`
- ingress route is present and Caddy is healthy
- the current GPU backend is not yet listening on `172.31.46.190:8188`, so this is a backend readiness issue, not a DNS or edge-TLS issue
Current GPU service:
- `comfyui.service`
- app path: `/opt/dlami/nvme/ComfyUI`
- log path: `/var/log/comfyui/service.log`
- port: `8188/tcp`
Current backend state on `2026-04-11`:
- `comfyui.service` is `activating`
- latest log shows ComfyUI startup and `Starting server`
- the process is still not binding `8188`, so ingress sees the backend as unavailable
Expected endpoints:
- `https://comfy.desineuron.in/`
- `https://comfy.desineuron.in/prompt`
- `https://comfy.desineuron.in/history/{prompt_id}`
- `https://comfy.desineuron.in/queue`
- `https://comfy.desineuron.in/upload/image`
### Files and Config Artifacts
Infrastructure artifacts in repo:
- [README.md](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/README.md)
- [Caddyfile](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/Caddyfile)
- [rathole-server.toml](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/rathole-server.toml)
- [rathole-client.toml](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/rathole-client.toml)
- [install_linux_rathole_client.sh](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/install_linux_rathole_client.sh)
- [user_data.sh](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/user_data.sh)
- [install_gpu_comfyui_service.sh](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/install_gpu_comfyui_service.sh)
- [map_gpu_comfy_security.ps1](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/map_gpu_comfy_security.ps1)
- [sync_ingress_home_ip.py](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/sync_ingress_home_ip.py)
- [desineuron-ingress-home-ip-sync.service](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/desineuron-ingress-home-ip-sync.service)
- [desineuron-ingress-home-ip-sync.timer](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/desineuron-ingress-home-ip-sync.timer)
- [install_linux_ingress_ip_sync.sh](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/install_linux_ingress_ip_sync.sh)
- [README.md](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/ops_control_plane/README.md)
- [Desineuron Ops Control Plane Bibel.md](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/.Agent%20Context/Bibels/Desineuron%20Ops%20Control%20Plane%20Bibel.md)
Linux origin files touched:
- `/etc/nginx/sites-enabled/desineuron.conf`
- `/mnt/ServerStorage/docker_apps/nextcloud/.env`
- `/mnt/ServerStorage/docker_apps/nextcloud/data/config/config.php`
- `/mnt/ServerStorage/docker_apps/nextcloud/data/config/reverse-proxy.config.php`
Backups created on Linux:
- `/mnt/ServerStorage/docker_apps/nextcloud/.env.pre_ingress_backup_2026-04-08`
- `/mnt/ServerStorage/docker_apps/nextcloud/data/config/reverse-proxy.config.php.pre_ingress_backup_2026-04-08`
### Dynamic Home IP Sync
Purpose:
- Keep ingress `22/tcp` restricted to the current Airtel public IP even when the ISP changes it
- Prevent future manual outages for SSH fallback caused by stale home-IP security-group rules
Design:
- Linux origin runs `desineuron-ingress-home-ip-sync.timer`
- Timer fires on boot and every 5 minutes
- Service resolves the current home public IP via `https://api.ipify.org`
- Service updates only the ingress security group `sg-0721b8b48e12c531d`
- Only the SSH fallback rule is mutated
- `rathole` is no longer dependent on the Airtel IP because `2333/tcp` remains open on the ingress
Installed Linux paths:
- `/usr/local/bin/sync_ingress_home_ip.py`
- `/etc/systemd/system/desineuron-ingress-home-ip-sync.service`
- `/etc/systemd/system/desineuron-ingress-home-ip-sync.timer`
- `/etc/desineuron-ingress-home-ip-sync.env`
- `/opt/desineuron-ingress-ip-sync/.venv`
- `/var/lib/desineuron-ingress-ip-sync/current_ip.txt`
Current state:
- Timer: enabled and active
- Last recorded home public IP: `223.185.28.89`
- Ingress SSH rule CIDR: `223.185.28.89/32`
### Operational Commands
Check AWS ingress status:
```powershell
aws ec2 describe-instances --instance-ids i-094df09acafb72494 --region us-east-1
aws ec2 describe-addresses --allocation-ids eipalloc-0d54fc0f827450e7b --region us-east-1
```
Check ingress services:
```powershell
aws ssm send-command --region us-east-1 --instance-ids i-094df09acafb72494 --document-name AWS-RunShellScript --parameters commands="sudo systemctl status caddy rathole-server --no-pager"
```
Check GPU ComfyUI service:
```powershell
aws ssm send-command --region us-east-1 --instance-ids i-0e4eab5fe67cf9abe --document-name AWS-RunShellScript --parameters commands="sudo systemctl status comfyui --no-pager","ss -ltnp | grep 8188 || true","tail -n 40 /var/log/comfyui/service.log || true"
```
Check Linux origin services:
```powershell
ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S systemctl status rathole-client nginx"
ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S systemctl status desineuron-ingress-home-ip-sync.service desineuron-ingress-home-ip-sync.timer"
ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S journalctl -u desineuron-ingress-home-ip-sync -n 50 --no-pager"
ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S systemctl status desineuron-ops-control-plane.service --no-pager"
ssh -i "$env:USERPROFILE\.ssh\id_ed25519_desineuron_lan" desineuron-node-01@192.168.1.4 "echo '***' | sudo -S docker compose -f /opt/desineuron-ops-control-plane/docker-compose.yml ps"
```
Public endpoint validation:
```powershell
curl.exe -I https://office.desineuron.in
curl.exe -I https://git.desineuron.in
curl.exe -I https://cloud.desineuron.in
curl.exe -I https://projects.desineuron.in
curl.exe -I https://talk.desineuron.in
curl.exe -I https://vpn.desineuron.in
curl.exe -I https://comfy.desineuron.in
curl.exe -I https://ops.desineuron.in/login
```
### Future Service Mapping Runbook
Use this pattern for any future public service behind the stable ingress layer.
1. Decide the backend location.
- Linux origin behind `rathole`
- AWS GPU/private EC2 node
- another private backend later
2. Decide whether the service should terminate TLS at ingress.
- default: yes
- Caddy on ingress should own the public hostname and certificate
3. Create the DNS record in Cloudflare.
- type: `A`
- value: `98.87.120.120`
- proxy mode: `DNS only`
- low TTL during rollout
4. Add the ingress route in [`Caddyfile`](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/desineuron_ingress/Caddyfile).
Patterns:
- Linux-origin service:
- proxy to `https://127.0.0.1:8443`
- preserve `Host`
- private AWS backend service:
- proxy to `http://<private-ip>:<port>` or `https://<private-ip>:<port>`
5. Restrict backend network access.
- never leave backend app ports open to `0.0.0.0/0` unless absolutely necessary
- prefer security-group rule allowing traffic only from ingress security group
- for home-origin services, keep them private behind `rathole`
6. Reload ingress.
```powershell
ssh -i "F:\Workin In Progress\DESINEURON\GITLAB\Project_Velocity\desineuron-l4-node.pem" ec2-user@98.87.120.120 "sudo caddy validate --config /etc/caddy/Caddyfile && sudo systemctl reload caddy"
```
7. Validate TLS and app response.
- check certificate subject matches hostname
- check `curl -I https://<host>`
- check login page or health endpoint
- check browser behavior
8. If the backend is stateful, create a persistent service.
- prefer `systemd`
- enable restart on failure
- log to a stable path
- record service name, working directory, ports, and restart policy in this handoff doc
9. Update team docs immediately.
- hostname
- DNS record type
- ingress route target
- backend service owner
- service name
- health check command
- rollback step
### Security Notes
- Public traffic terminates only at the AWS edge.
- The Linux box no longer needs Cloudflare Tunnel for these six routes.
- The Linux origin is reached through an outbound tunnel, not by directly exposing the home machine to the public for app traffic.
- SSH on the Linux box remains key-only.
- The AWS ingress IAM role is limited to SSM core.
- ComfyUI is no longer directly exposed on the GPU public IP; only the ingress layer can reach `8188`.
- Ingress `22/tcp` stays restricted and is now auto-synced from the Linux origin.
- Ingress `2333/tcp` is intentionally open so `rathole` survives Airtel IP changes without operator action.
### Remaining Improvement Ideas
- Move the Linux nginx certificate issuance/renewal model to the AWS edge permanently instead of copying an existing certificate.
- Clean up nginx warnings about duplicated protocol options.
- Separate `talk.desineuron.in` more fully from general Nextcloud if a distinct Talk-only UX is desired.
- Add authentication in front of `comfy.desineuron.in`; internet scanners started hitting the route immediately after it went live.
- Consider putting Basic Auth or an allowlist in front of `comfy.desineuron.in` before broader team rollout.
- Add monitoring and alerting on:
- `caddy`
- `rathole-server`
- `rathole-client`
- public HTTPS checks
- Add infrastructure-as-code for the EC2 ingress node if this should be reproducible by the team without manual AWS CLI steps.
### Rollback
If rollback is needed:
1. Recreate Cloudflare CNAME/tunnel routes or repoint the DNS records away from `98.87.120.120`.
2. Stop `caddy` and `rathole-server` on AWS.
3. Stop `rathole-client` on Linux.
4. Restore Nextcloud files from:
- `.env.pre_ingress_backup_2026-04-08`
- `reverse-proxy.config.php.pre_ingress_backup_2026-04-08`
5. Restart `nextcloud_app` and nginx.
### Team Summary
This migration is complete.
Cloudflare Tunnel is no longer the production path for the six public service hostnames. The stable production ingress is now the AWS `t4g.micro` node with Elastic IP `98.87.120.120`, and the Linux machine remains the private origin behind `rathole`.
Additional mapped route:
- `comfy.desineuron.in` now terminates on the same stable ingress and forwards to the GPU node's private address `172.31.46.190:8188`.
- No further DNS change is needed for ComfyUI.
- The backend is supervised by `systemd`, but the current worker is not yet binding `8188`, so public access is currently degraded with `502`.
- The team can use:
- `https://comfy.desineuron.in/prompt`
- `https://comfy.desineuron.in/history/{prompt_id}`
- `https://comfy.desineuron.in/queue`
- `https://comfy.desineuron.in/upload/image`
### Current Status Snapshot - 2026-04-11
Live public service state:
- `office.desineuron.in` -> `200`
- `git.desineuron.in` -> `200`
- `cloud.desineuron.in` -> `200`
- `projects.desineuron.in` -> `200`
- `talk.desineuron.in` -> `200`
- `vpn.desineuron.in` -> `200`
- `ops.desineuron.in/login` -> `200`
- `comfy.desineuron.in` -> `502`
Linux-origin health:
- `nginx.service` -> `active`
- `rathole-client.service` -> `active`
- `desineuron-ingress-home-ip-sync.timer` -> `active`
- `desineuron-ops-control-plane.service` -> `active`
Linux ops stack containers:
- `desineuron-ops-api` -> `Up`
- `desineuron-ops-db` -> `Up (healthy)`
- `desineuron-ops-worker` -> `Up`
Ingress health:
- `caddy` -> `active`
- `rathole-server` -> `active`
- `comfy.desineuron.in` Caddy route is present in `/etc/caddy/Caddyfile`
GPU ComfyUI state:
- `comfyui.service` -> `activating`
- latest logs show ComfyUI startup sequence completing toward `Starting server`
- no active listener on `8188` yet
- ingress cannot connect to `172.31.46.190:8188`, which is why the public result is `502`
### Linux Ops Control Plane
The Linux box now also hosts the private AWS control surface for the team.
Public operator URL:
- `https://ops.desineuron.in/login`
Purpose:
- launch/stop/terminate AWS machines
- view spot/on-demand market data
- track runtime and estimated cost
- ingest model directories from the Linux box into S3
- hydrate models from S3 to AWS GPU nodes
- manage ingress routes through the `t4g.micro`
- export session/cost CSVs
Linux runtime paths:
- stack root: `/opt/desineuron-ops-control-plane`
- env file: `/opt/desineuron-ops-control-plane/.env`
- exports: `/opt/desineuron-ops-control-plane/exports`
- state: `/opt/desineuron-ops-control-plane/state`
Canonical S3 bucket:
- `desineuron-ops-control-plane-819079556187-us-east-1`
Model library source on Linux:
- `/mnt/ServerStorage/ai-models/models`
Current operator accounts:
- `sagnik@desineuron.in`
- `sayan@desineuron.in`
- `sourik@desineuron.in`
Reference docs:
- [README.md](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/infrastructure/ops_control_plane/README.md)
- [Desineuron Ops Control Plane Bibel.md](/F:/Workin%20In%20Progress/DESINEURON/GITLAB/Project_Velocity/.Agent%20Context/Bibels/Desineuron%20Ops%20Control%20Plane%20Bibel.md)

View File

@@ -0,0 +1,12 @@
[Unit]
Description=Update ingress SSH allowlist to current home public IP
After=network-online.target
Wants=network-online.target
[Service]
Type=oneshot
EnvironmentFile=/etc/desineuron-ingress-home-ip-sync.env
ExecStart=/opt/desineuron-ingress-ip-sync/.venv/bin/python /usr/local/bin/sync_ingress_home_ip.py
WorkingDirectory=/var/lib/desineuron-ingress-ip-sync
User=root
Group=root

View File

@@ -0,0 +1,11 @@
[Unit]
Description=Run ingress home IP sync on boot and every 5 minutes
[Timer]
OnBootSec=45s
OnUnitActiveSec=5min
Unit=desineuron-ingress-home-ip-sync.service
Persistent=true
[Install]
WantedBy=timers.target

View File

@@ -0,0 +1,52 @@
#!/usr/bin/env bash
set -euo pipefail
COMFY_DIR="/opt/dlami/nvme/ComfyUI"
SERVICE_NAME="comfyui"
LOG_DIR="/var/log/comfyui"
if ! command -v git >/dev/null 2>&1; then
sudo apt-get update
sudo apt-get install -y git
fi
if [ ! -d "${COMFY_DIR}/.git" ]; then
sudo mkdir -p /opt/dlami/nvme
sudo chown -R ubuntu:ubuntu /opt/dlami/nvme
git clone https://github.com/comfyanonymous/ComfyUI.git "${COMFY_DIR}"
else
git -C "${COMFY_DIR}" pull --ff-only
fi
python3 -m pip install -r "${COMFY_DIR}/requirements.txt"
sudo mkdir -p "${LOG_DIR}"
sudo chown -R ubuntu:ubuntu "${LOG_DIR}"
sudo tee /etc/systemd/system/${SERVICE_NAME}.service >/dev/null <<'EOF'
[Unit]
Description=ComfyUI GPU Service
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User=ubuntu
Group=ubuntu
WorkingDirectory=/opt/dlami/nvme/ComfyUI
Environment=HOME=/home/ubuntu
Environment=PYTHONUNBUFFERED=1
ExecStart=/usr/bin/python3 /opt/dlami/nvme/ComfyUI/main.py --listen 0.0.0.0 --port 8188 --disable-auto-launch
Restart=always
RestartSec=5
StandardOutput=append:/var/log/comfyui/service.log
StandardError=append:/var/log/comfyui/service.log
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now "${SERVICE_NAME}.service"
sleep 5
sudo systemctl --no-pager --full status "${SERVICE_NAME}.service"

View File

@@ -0,0 +1,40 @@
#!/usr/bin/env bash
set -euo pipefail
if [[ $# -ne 2 ]]; then
echo "Usage: $0 <aws_access_key_id> <aws_secret_access_key>" >&2
exit 1
fi
AWS_ACCESS_KEY_ID="$1"
AWS_SECRET_ACCESS_KEY="$2"
INSTALL_ROOT="/opt/desineuron-ingress-ip-sync"
VENV_PATH="${INSTALL_ROOT}/.venv"
sudo apt-get update
sudo apt-get install -y python3-venv
sudo mkdir -p "${INSTALL_ROOT}"
sudo python3 -m venv "${VENV_PATH}"
sudo "${VENV_PATH}/bin/pip" install --upgrade pip boto3
sudo install -m 0755 /tmp/sync_ingress_home_ip.py /usr/local/bin/sync_ingress_home_ip.py
sudo install -m 0644 /tmp/desineuron-ingress-home-ip-sync.service /etc/systemd/system/desineuron-ingress-home-ip-sync.service
sudo install -m 0644 /tmp/desineuron-ingress-home-ip-sync.timer /etc/systemd/system/desineuron-ingress-home-ip-sync.timer
sudo mkdir -p /var/lib/desineuron-ingress-ip-sync
sudo tee /etc/desineuron-ingress-home-ip-sync.env >/dev/null <<EOF
AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
AWS_REGION=us-east-1
INGRESS_SECURITY_GROUP_ID=sg-0721b8b48e12c531d
INGRESS_SSH_PORT=22
INGRESS_SSH_RULE_DESCRIPTION=SSH fallback from origin network
INGRESS_IP_STATE_FILE=/var/lib/desineuron-ingress-ip-sync/current_ip.txt
EOF
sudo chmod 600 /etc/desineuron-ingress-home-ip-sync.env
sudo systemctl daemon-reload
sudo systemctl enable --now desineuron-ingress-home-ip-sync.timer
sudo systemctl start desineuron-ingress-home-ip-sync.service
sudo systemctl --no-pager --full status desineuron-ingress-home-ip-sync.service
sudo systemctl --no-pager --full status desineuron-ingress-home-ip-sync.timer

View File

@@ -0,0 +1,44 @@
#!/usr/bin/env bash
set -euo pipefail
RATHOLE_VERSION="${RATHOLE_VERSION:-v0.4.3}"
RATHOLE_URL="${RATHOLE_URL:-https://github.com/rapiz1/rathole/releases/download/${RATHOLE_VERSION}/rathole-x86_64-unknown-linux-gnu.zip}"
CONFIG_SOURCE="${CONFIG_SOURCE:-/tmp/rathole-client.toml}"
sudo install -d -m 0755 /etc/rathole
sudo install -d -m 0755 /opt/rathole
tmp_dir="$(mktemp -d)"
trap 'rm -rf "$tmp_dir"' EXIT
cd "$tmp_dir"
curl -fL "$RATHOLE_URL" -o rathole.zip
python3 - <<'PY'
import zipfile
z = zipfile.ZipFile("rathole.zip")
z.extractall(".")
PY
sudo install -m 0755 rathole /usr/local/bin/rathole
sudo install -m 0600 "$CONFIG_SOURCE" /etc/rathole/client.toml
cat <<'EOF' | sudo tee /etc/systemd/system/rathole-client.service >/dev/null
[Unit]
Description=Desineuron Rathole Client
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
ExecStart=/usr/local/bin/rathole /etc/rathole/client.toml
Restart=always
RestartSec=5
User=root
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now rathole-client.service
sudo systemctl status --no-pager rathole-client.service || true

View File

@@ -0,0 +1,33 @@
$ErrorActionPreference = "Stop"
$gpuGroups = @(
"sg-0b144c17b1b89f4c6",
"sg-05e4de3fe94ad6558"
)
$ingressGroup = "sg-0721b8b48e12c531d"
try {
aws ec2 authorize-security-group-ingress `
--group-id "sg-0b144c17b1b89f4c6" `
--ip-permissions "[{\"IpProtocol\":\"tcp\",\"FromPort\":8188,\"ToPort\":8188,\"UserIdGroupPairs\":[{\"GroupId\":\"$ingressGroup\",\"Description\":\"Allow ComfyUI from ingress\"}]}]" | Out-Null
} catch {
}
foreach ($group in $gpuGroups) {
foreach ($port in 8118, 8188) {
try {
aws ec2 revoke-security-group-ingress `
--group-id $group `
--protocol tcp `
--port $port `
--cidr 0.0.0.0/0 | Out-Null
} catch {
}
}
}
aws ec2 describe-security-groups `
--group-ids $gpuGroups `
--query "SecurityGroups[].{GroupId:GroupId,GroupName:GroupName,Ingress:IpPermissions}" `
--output json

View File

@@ -0,0 +1,12 @@
[client]
remote_addr = "__INGRESS_HOST__:2333"
default_token = "__RATHOLE_TOKEN__"
[client.transport]
type = "noise"
[client.transport.noise]
remote_public_key = "__RATHOLE_SERVER_PUBLIC_KEY__"
[client.services.https_origin]
local_addr = "127.0.0.1:443"

View File

@@ -0,0 +1,12 @@
[server]
bind_addr = "0.0.0.0:2333"
default_token = "__RATHOLE_TOKEN__"
[server.transport]
type = "noise"
[server.transport.noise]
local_private_key = "__RATHOLE_SERVER_PRIVATE_KEY__"
[server.services.https_origin]
bind_addr = "127.0.0.1:8443"

View File

@@ -0,0 +1,110 @@
#!/usr/bin/env python3
import json
import os
import sys
import urllib.request
from pathlib import Path
import boto3
SECURITY_GROUP_ID = os.environ["INGRESS_SECURITY_GROUP_ID"]
RULE_DESCRIPTION = os.environ.get("INGRESS_SSH_RULE_DESCRIPTION", "SSH fallback from origin network")
PORT = int(os.environ.get("INGRESS_SSH_PORT", "22"))
STATE_FILE = Path(os.environ.get("INGRESS_IP_STATE_FILE", "/var/lib/desineuron-ingress-ip-sync/current_ip.txt"))
def get_public_ip() -> str:
with urllib.request.urlopen("https://api.ipify.org", timeout=15) as response:
return response.read().decode("utf-8").strip()
def get_security_group():
ec2 = boto3.client("ec2", region_name=os.environ.get("AWS_REGION", "us-east-1"))
response = ec2.describe_security_groups(GroupIds=[SECURITY_GROUP_ID])
return ec2, response["SecurityGroups"][0]
def find_existing_ssh_rules(ip_permissions):
matches = []
for permission in ip_permissions:
if permission.get("IpProtocol") != "tcp":
continue
if permission.get("FromPort") != PORT or permission.get("ToPort") != PORT:
continue
for ip_range in permission.get("IpRanges", []):
if ip_range.get("Description") == RULE_DESCRIPTION:
matches.append(ip_range["CidrIp"])
return matches
def revoke_old_rules(ec2, cidrs):
for cidr in cidrs:
ec2.revoke_security_group_ingress(
GroupId=SECURITY_GROUP_ID,
IpPermissions=[
{
"IpProtocol": "tcp",
"FromPort": PORT,
"ToPort": PORT,
"IpRanges": [{"CidrIp": cidr}],
}
],
)
def authorize_new_rule(ec2, cidr):
ec2.authorize_security_group_ingress(
GroupId=SECURITY_GROUP_ID,
IpPermissions=[
{
"IpProtocol": "tcp",
"FromPort": PORT,
"ToPort": PORT,
"IpRanges": [{"CidrIp": cidr, "Description": RULE_DESCRIPTION}],
}
],
)
def write_state(ip: str):
STATE_FILE.parent.mkdir(parents=True, exist_ok=True)
STATE_FILE.write_text(ip + "\n", encoding="utf-8")
def main() -> int:
public_ip = get_public_ip()
desired_cidr = f"{public_ip}/32"
ec2, group = get_security_group()
existing_rules = find_existing_ssh_rules(group["IpPermissions"])
if existing_rules == [desired_cidr]:
write_state(public_ip)
print(json.dumps({"status": "noop", "public_ip": public_ip, "cidr": desired_cidr}))
return 0
if existing_rules:
revoke_old_rules(ec2, existing_rules)
authorize_new_rule(ec2, desired_cidr)
write_state(public_ip)
print(
json.dumps(
{
"status": "updated",
"public_ip": public_ip,
"cidr": desired_cidr,
"replaced": existing_rules,
}
)
)
return 0
if __name__ == "__main__":
try:
raise SystemExit(main())
except Exception as exc:
print(json.dumps({"status": "error", "error": str(exc)}), file=sys.stderr)
raise

View File

@@ -0,0 +1,102 @@
#!/bin/bash
set -euxo pipefail
exec > >(tee /var/log/desineuron-ingress-bootstrap.log | logger -t user-data -s 2>/dev/console) 2>&1
dnf update -y
dnf install -y curl tar gzip unzip jq policycoreutils-python-utils
systemctl enable amazon-ssm-agent
systemctl restart amazon-ssm-agent
useradd --system --home /var/lib/caddy --shell /sbin/nologin caddy || true
install -d -o caddy -g caddy -m 0755 /etc/caddy /var/lib/caddy /var/log/caddy
install -d -m 0755 /etc/rathole /opt/rathole
cat >/etc/ssh/sshd_config.d/10-desineuron-hardening.conf <<'EOF'
PasswordAuthentication no
KbdInteractiveAuthentication no
PermitRootLogin no
PubkeyAuthentication yes
EOF
systemctl restart sshd
CADDY_VERSION="v2.10.2"
CADDY_URL="https://github.com/caddyserver/caddy/releases/download/${CADDY_VERSION}/caddy_2.10.2_linux_arm64.tar.gz"
RATHOLE_VERSION="v0.4.3"
RATHOLE_URL="https://github.com/rapiz1/rathole/releases/download/${RATHOLE_VERSION}/rathole-aarch64-unknown-linux-musl.zip"
tmp_dir="$(mktemp -d)"
cd "$tmp_dir"
curl -fL "$CADDY_URL" -o caddy.tar.gz
tar -xzf caddy.tar.gz
install -m 0755 caddy /usr/local/bin/caddy
setcap cap_net_bind_service=+ep /usr/local/bin/caddy || true
curl -fL "$RATHOLE_URL" -o rathole.zip
python3 - <<'PY'
import zipfile
z = zipfile.ZipFile("rathole.zip")
z.extractall(".")
PY
install -m 0755 rathole /usr/local/bin/rathole
rm -rf "$tmp_dir"
cat >/etc/systemd/system/caddy.service <<'EOF'
[Unit]
Description=Caddy
After=network-online.target
Wants=network-online.target
[Service]
User=caddy
Group=caddy
ExecStart=/usr/local/bin/caddy run --environ --config /etc/caddy/Caddyfile
ExecReload=/usr/local/bin/caddy reload --config /etc/caddy/Caddyfile
TimeoutStopSec=5s
LimitNOFILE=1048576
PrivateTmp=true
ProtectSystem=full
AmbientCapabilities=CAP_NET_BIND_SERVICE
CapabilityBoundingSet=CAP_NET_BIND_SERVICE
NoNewPrivileges=true
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
cat >/etc/systemd/system/rathole-server.service <<'EOF'
[Unit]
Description=Desineuron Rathole Server
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
ExecStart=/usr/local/bin/rathole /etc/rathole/server.toml
Restart=always
RestartSec=5
User=root
[Install]
WantedBy=multi-user.target
EOF
cat >/etc/logrotate.d/caddy <<'EOF'
/var/log/caddy/*.log {
daily
rotate 14
compress
missingok
notifempty
copytruncate
}
EOF
touch /etc/caddy/Caddyfile
touch /etc/rathole/server.toml
systemctl daemon-reload
systemctl enable caddy.service
systemctl enable rathole-server.service