Files
Project_Astral/.agent Context/Project Astral_ SRS & Sprint Plan.md
2026-02-25 00:50:23 +05:30

154 lines
7.0 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# **Part 1: Software Requirements Specification (SRS)**
## **1\. Project Overview**
* **Project Name:** Project Astral (Ethical Celebrity AI Platform)
* **Client:** Cine Bahini Studios
* **Developer:** Desineuron Labs
* **Objective:** To build a locally hosted, air-gapped AI production suite that generates high-fidelity, consent-driven commercial videos using **LTX-2** (Audio+Video) and **LiDAR** data.
## **2\. System Architecture**
The system follows a **Hybrid-Local Architecture** to satisfy the "Air-Gapped Security" requirement while maintaining ease of use.
* **Frontend (The Studio):** Next.js \+ Tailwind CSS (Glassmorphism UI).
* **Middleware (The Agent):** **OpenClaw** running as a local API Gateway to orchestrate tasks between the UI and the GPU.
* **Backend (The Engine):** **ComfyUI** (Headless Mode) executing hidden JSON workflows.
* **AI Models:**
* **Video:** LTX-2 (19B Param Asymmetric Dual-Stream) for joint Audio-Video generation.
* **Image/Refinement:** Flux or SDXL for the initial frame; FaceDetailer for restoration.
* **Geometry:** ControlNet driven by LiDAR Depth Maps.
* **Storage:** Synology NAS (Local 10GbE Mount) for saving final assets.
## **3\. Functional Requirements (FR)**
### **FR-01: Identity Ingestion (The "Astral Capture")**
* **Input:** System must accept .obj or .usdz files from iPhone Pro LiDAR \+ 48MP reference photos.
* **Processing:** System must convert LiDAR point clouds into **Grayscale Depth Maps** for ControlNet usage.
* **Security:** Raw biometric data must be encrypted and stored in the "Astral Vault" (isolated directory structure).
### **FR-02: The "Hidden" Orchestrator**
* **Logic:** The user interacts with a simple text prompt (e.g., "Drinking coffee in rain"). The backend must inject a "System Prompt" (e.g., "Arri Alexa, 8k, highly detailed") automatically.
* **Routing:** The OpenClaw agent must route requests to the available GPU (RTX 6000 \#1 vs \#2) to balance load between Training and Inference.
### **FR-03: Audio-Visual Synchronization**
* **Mechanism:** System must use LTX-2s **Asymmetric Dual-Stream** architecture to generate foley (background sound) and video simultaneously to ensure synchronization (e.g., cup hitting table \= "clink" sound).
### **FR-04: Ethics & Compliance**
* **The Kill-Switch:** An admin toggle that instantly unloads a celebrity's LoRA from VRAM and locks their dataset if a contract expires.
* **Watermarking:** Every generated frame must include an invisible watermark denoting it as "Synthetic".
# ---
**Part 2: The 14-Day "Skunkworks" Sprint**
This roadmap is strictly derived from your **Handwritten Note** and the **Pitch Deck Timeline**.
### **Phase 1: The Spine (Infrastructure & Workflow)**
**Goal:** A working "Ugly" Prototype that generates video.
* **Day 1: Hardware & Environment**
* **Task:** Mount RTX 6000s. Install Ubuntu \+ CUDA 12.x.
* **Task:** Install **ComfyUI** and the **LTX-2 Nodes**.
* **Task:** Verify NAS connectivity (/mnt/nas/output).
* **Day 2: The "Hidden" Workflow**
* **Task (Handwritten):** *Plan the initial ComfyUI Workflow.*
* **Execution:** Build a Comfy JSON that takes Image\_Input \+ Depth\_Map \-\> LTX-2 Img2Vid.
* **Test:** Generate a generic "Man holding bottle" video to prove LTX-2 audio-video sync works.
* **Day 3: OpenClaw Integration**
* **Task (Handwritten):** *Setup Claw Bot.*
* **Execution:** Configure OpenClaw to listen on a local port. Write a custom "Skill" (comfy\_skill.py) that allows OpenClaw to send POST requests to ComfyUI's /prompt endpoint.
### **Phase 2: The Identity (Fine-Tuning & Ingestion)**
**Goal:** Putting the "Celebrity" inside the machine.
* **Day 4: LiDAR Pipeline**
* **Task:** Write a Python script to convert iPhone LiDAR .obj dumps into normalized Depth Maps (White \= Near, Black \= Far) for ControlNet.
* **Day 5: Training (LoRA)**
* **Task (Handwritten):** *Fine Tune Model.*
* **Execution:** Train a test LoRA on a Cine Bahini actor (or yourself) using the captured photos. Target 2000 steps on the RTX 6000\.
* **Day 6: Verification**
* **Task:** Run the LoRA through the Workflow. Tweaking the "System Prompt" to ensure the actor doesn't look plastic.
### **Phase 3: The Skin (Frontend & Dashboard)**
**Goal:** Making it look like a "Studio," not a "Lab."
* **Day 7: UI Skeleton**
* **Task (Handwritten):** *Plan the Dashboard and all components.*
* **Execution:** Sketch the "Glassmorphism" layout. Sidebar (Nav), Center (Drop Zone), Bottom (Task Strip).
* **Day 8: Frontend Build**
* **Task (Handwritten):** *Make the initial Dashboard.*
* **Execution:** Initialize Next.js project. Build the DragAndDrop component that accepts files and sends them to the OpenClaw API.
* **Day 9: Real-Time Feedback**
* **Task:** Implement WebSockets to show the "Green/Red pulse" for API health and the Progress Bar.
### **Phase 4: The Brain (Auth & Logic)**
**Goal:** Security and "Productization."
* **Day 10: Authentication**
* **Task (Handwritten):** *Setup Authentication.*
* **Execution:** Integrate Firebase Auth. Create the "Admin" vs "Creative" roles. Ensure "Creative" users cannot access the "Model Forge" page.
* **Day 11: The "Kill Switch"**
* **Task:** Code the logic where the OpenClaw agent checks Firebase for contract\_status: active before loading any LoRA.
### **Phase 5: Final Polish & "The Shot"**
**Goal:** The Demo Asset.
* **Day 12: Dashboard Polish**
* **Task (Handwritten):** *Finalise Dashboard.*
* **Execution:** Apply the "Midnight Black" theme. Ensure LiDAR 3D previews render using Three.js.
* **Day 13: Stress Test**
* **Task:** Queue 20 videos. Watch GPU thermals and VRAM usage.
* **Day 14: The Demo**
* **Task (Handwritten):** *Generate 5 Sec Demo Video with custom product.*
* **Execution:** Generate the "Astral Shot" (e.g., The actor holding a specific Cine Bahini product, perfectly lit, speaking a line). This is your deliverable.
# ---
**Part 3: Project Folder Structure**
To keep this organized for you and Sayan, use this structure:
Plaintext
/Project\_Astral
├── /docs \# Pitch decks, RFP, Architecture diagrams
├── /infrastructure \# Docker compose files for OpenClaw, Redis, NAS mount scripts
├── /models
│ ├── /checkpoints \# LTX-2, Flux, SDXL weights
│ ├── /loras \# The "Astral Vault" (Client LoRAs)
│ └── /controlnet \# Depth/Canny models
├── /backend-agent \# OpenClaw Custom Skills
│ ├── /skills \# Python scripts for ComfyUI interaction
│ └── agent\_config.yaml \# Routing logic
├── /comfy-workflows \# JSON files for the "Hidden" workflows
│ ├── workflow\_dev.json
│ └── workflow\_prod.json
└── /frontend-studio \# Next.js Application
├── /components \# DragDrop, LiDARPreview, ProgressBar
├── /pages \# ProductionHub, AssetLibrary, Admin
└── /lib \# Firebase config, WebSocket hooks