154 lines
7.0 KiB
Markdown
154 lines
7.0 KiB
Markdown
# **Part 1: Software Requirements Specification (SRS)**
|
||
|
||
## **1\. Project Overview**
|
||
|
||
* **Project Name:** Project Astral (Ethical Celebrity AI Platform)
|
||
* **Client:** Cine Bahini Studios
|
||
* **Developer:** Desineuron Labs
|
||
* **Objective:** To build a locally hosted, air-gapped AI production suite that generates high-fidelity, consent-driven commercial videos using **LTX-2** (Audio+Video) and **LiDAR** data.
|
||
|
||
## **2\. System Architecture**
|
||
|
||
The system follows a **Hybrid-Local Architecture** to satisfy the "Air-Gapped Security" requirement while maintaining ease of use.
|
||
|
||
* **Frontend (The Studio):** Next.js \+ Tailwind CSS (Glassmorphism UI).
|
||
|
||
* **Middleware (The Agent):** **OpenClaw** running as a local API Gateway to orchestrate tasks between the UI and the GPU.
|
||
* **Backend (The Engine):** **ComfyUI** (Headless Mode) executing hidden JSON workflows.
|
||
|
||
* **AI Models:**
|
||
* **Video:** LTX-2 (19B Param Asymmetric Dual-Stream) for joint Audio-Video generation.
|
||
|
||
* **Image/Refinement:** Flux or SDXL for the initial frame; FaceDetailer for restoration.
|
||
|
||
* **Geometry:** ControlNet driven by LiDAR Depth Maps.
|
||
|
||
* **Storage:** Synology NAS (Local 10GbE Mount) for saving final assets.
|
||
|
||
## **3\. Functional Requirements (FR)**
|
||
|
||
### **FR-01: Identity Ingestion (The "Astral Capture")**
|
||
|
||
* **Input:** System must accept .obj or .usdz files from iPhone Pro LiDAR \+ 48MP reference photos.
|
||
|
||
* **Processing:** System must convert LiDAR point clouds into **Grayscale Depth Maps** for ControlNet usage.
|
||
|
||
* **Security:** Raw biometric data must be encrypted and stored in the "Astral Vault" (isolated directory structure).
|
||
|
||
### **FR-02: The "Hidden" Orchestrator**
|
||
|
||
* **Logic:** The user interacts with a simple text prompt (e.g., "Drinking coffee in rain"). The backend must inject a "System Prompt" (e.g., "Arri Alexa, 8k, highly detailed") automatically.
|
||
|
||
* **Routing:** The OpenClaw agent must route requests to the available GPU (RTX 6000 \#1 vs \#2) to balance load between Training and Inference.
|
||
|
||
### **FR-03: Audio-Visual Synchronization**
|
||
|
||
* **Mechanism:** System must use LTX-2’s **Asymmetric Dual-Stream** architecture to generate foley (background sound) and video simultaneously to ensure synchronization (e.g., cup hitting table \= "clink" sound).
|
||
|
||
### **FR-04: Ethics & Compliance**
|
||
|
||
* **The Kill-Switch:** An admin toggle that instantly unloads a celebrity's LoRA from VRAM and locks their dataset if a contract expires.
|
||
|
||
* **Watermarking:** Every generated frame must include an invisible watermark denoting it as "Synthetic".
|
||
|
||
# ---
|
||
|
||
**Part 2: The 14-Day "Skunkworks" Sprint**
|
||
|
||
This roadmap is strictly derived from your **Handwritten Note** and the **Pitch Deck Timeline**.
|
||
|
||
### **Phase 1: The Spine (Infrastructure & Workflow)**
|
||
|
||
**Goal:** A working "Ugly" Prototype that generates video.
|
||
|
||
* **Day 1: Hardware & Environment**
|
||
* **Task:** Mount RTX 6000s. Install Ubuntu \+ CUDA 12.x.
|
||
* **Task:** Install **ComfyUI** and the **LTX-2 Nodes**.
|
||
* **Task:** Verify NAS connectivity (/mnt/nas/output).
|
||
|
||
* **Day 2: The "Hidden" Workflow**
|
||
* **Task (Handwritten):** *Plan the initial ComfyUI Workflow.*
|
||
* **Execution:** Build a Comfy JSON that takes Image\_Input \+ Depth\_Map \-\> LTX-2 Img2Vid.
|
||
* **Test:** Generate a generic "Man holding bottle" video to prove LTX-2 audio-video sync works.
|
||
* **Day 3: OpenClaw Integration**
|
||
* **Task (Handwritten):** *Setup Claw Bot.*
|
||
* **Execution:** Configure OpenClaw to listen on a local port. Write a custom "Skill" (comfy\_skill.py) that allows OpenClaw to send POST requests to ComfyUI's /prompt endpoint.
|
||
|
||
### **Phase 2: The Identity (Fine-Tuning & Ingestion)**
|
||
|
||
**Goal:** Putting the "Celebrity" inside the machine.
|
||
|
||
* **Day 4: LiDAR Pipeline**
|
||
* **Task:** Write a Python script to convert iPhone LiDAR .obj dumps into normalized Depth Maps (White \= Near, Black \= Far) for ControlNet.
|
||
* **Day 5: Training (LoRA)**
|
||
* **Task (Handwritten):** *Fine Tune Model.*
|
||
* **Execution:** Train a test LoRA on a Cine Bahini actor (or yourself) using the captured photos. Target 2000 steps on the RTX 6000\.
|
||
* **Day 6: Verification**
|
||
* **Task:** Run the LoRA through the Workflow. Tweaking the "System Prompt" to ensure the actor doesn't look plastic.
|
||
|
||
### **Phase 3: The Skin (Frontend & Dashboard)**
|
||
|
||
**Goal:** Making it look like a "Studio," not a "Lab."
|
||
|
||
* **Day 7: UI Skeleton**
|
||
* **Task (Handwritten):** *Plan the Dashboard and all components.*
|
||
* **Execution:** Sketch the "Glassmorphism" layout. Sidebar (Nav), Center (Drop Zone), Bottom (Task Strip).
|
||
|
||
* **Day 8: Frontend Build**
|
||
* **Task (Handwritten):** *Make the initial Dashboard.*
|
||
* **Execution:** Initialize Next.js project. Build the DragAndDrop component that accepts files and sends them to the OpenClaw API.
|
||
* **Day 9: Real-Time Feedback**
|
||
* **Task:** Implement WebSockets to show the "Green/Red pulse" for API health and the Progress Bar.
|
||
|
||
### **Phase 4: The Brain (Auth & Logic)**
|
||
|
||
**Goal:** Security and "Productization."
|
||
|
||
* **Day 10: Authentication**
|
||
* **Task (Handwritten):** *Setup Authentication.*
|
||
* **Execution:** Integrate Firebase Auth. Create the "Admin" vs "Creative" roles. Ensure "Creative" users cannot access the "Model Forge" page.
|
||
|
||
* **Day 11: The "Kill Switch"**
|
||
* **Task:** Code the logic where the OpenClaw agent checks Firebase for contract\_status: active before loading any LoRA.
|
||
|
||
### **Phase 5: Final Polish & "The Shot"**
|
||
|
||
**Goal:** The Demo Asset.
|
||
|
||
* **Day 12: Dashboard Polish**
|
||
* **Task (Handwritten):** *Finalise Dashboard.*
|
||
* **Execution:** Apply the "Midnight Black" theme. Ensure LiDAR 3D previews render using Three.js.
|
||
|
||
* **Day 13: Stress Test**
|
||
* **Task:** Queue 20 videos. Watch GPU thermals and VRAM usage.
|
||
* **Day 14: The Demo**
|
||
* **Task (Handwritten):** *Generate 5 Sec Demo Video with custom product.*
|
||
* **Execution:** Generate the "Astral Shot" (e.g., The actor holding a specific Cine Bahini product, perfectly lit, speaking a line). This is your deliverable.
|
||
|
||
# ---
|
||
|
||
**Part 3: Project Folder Structure**
|
||
|
||
To keep this organized for you and Sayan, use this structure:
|
||
|
||
Plaintext
|
||
|
||
/Project\_Astral
|
||
├── /docs \# Pitch decks, RFP, Architecture diagrams
|
||
├── /infrastructure \# Docker compose files for OpenClaw, Redis, NAS mount scripts
|
||
├── /models
|
||
│ ├── /checkpoints \# LTX-2, Flux, SDXL weights
|
||
│ ├── /loras \# The "Astral Vault" (Client LoRAs)
|
||
│ └── /controlnet \# Depth/Canny models
|
||
├── /backend-agent \# OpenClaw Custom Skills
|
||
│ ├── /skills \# Python scripts for ComfyUI interaction
|
||
│ └── agent\_config.yaml \# Routing logic
|
||
├── /comfy-workflows \# JSON files for the "Hidden" workflows
|
||
│ ├── workflow\_dev.json
|
||
│ └── workflow\_prod.json
|
||
└── /frontend-studio \# Next.js Application
|
||
├── /components \# DragDrop, LiDARPreview, ProgressBar
|
||
├── /pages \# ProductionHub, AssetLibrary, Admin
|
||
└── /lib \# Firebase config, WebSocket hooks
|
||
|