Files
Project_Animatix/Docs/Animatrix Monolithic SRS - Wan 2.2 Flow Studio.md
2026-04-17 19:11:57 +05:30

1238 lines
30 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Animatrix Monolithic SRS - Wan 2.2 Flow Studio
Date: 2026-04-15
Authoring context: This document defines the first production-ready Animatrix system built on top of the existing Desineuron ingress, the current ComfyUI GPU service, and the Wan 2.2 model family.
## 1. Purpose
Animatrix is a focused product for guided character video generation. It is not a general-purpose node editor. It is a constrained, operator-safe application that exposes two production workflows behind one simple frontend:
1. Character Animation and Replacement using `Wan2.2-Animate-14B`
2. Audio-Driven Character Performance using `Wan2.2-S2V-14B`
The frontend interaction model is inspired by the simplicity and compositional feel of Google Flow, but the execution runtime is ComfyUI-backed and Desineuron-hosted.
The objective is to give users a minimal interface:
- prompt box
- ground-truth starting image upload
- optional reference images and pose sheet uploads
- optional audio upload
- simple mode selection
- one-click generation
while the backend handles:
- asset ingestion
- workflow selection
- parameter validation
- ComfyUI prompt orchestration
- queueing
- status tracking
- result persistence
- streaming-ready delivery
## 2. Executive Product Truth
Animatrix v1 must be built around the actual Wan 2.2 model split, not a blended assumption.
Capability mapping:
- `Wan2.2-Animate-14B` is for character animation and character replacement.
- `Wan2.2-S2V-14B` is for audio-driven video generation with dialogue, singing, and performance.
- `Wan2.2 Fun Inp` is the Wan family workflow for strict first-frame and last-frame control.
- `Wan2.2 Fun Control` is the Wan family workflow for stronger control-video inputs such as OpenPose, depth, canny, and trajectory control.
Therefore the first release must not falsely claim that one single model covers all of the following natively:
- character replacement
- motion transfer
- audio lip-sync
- exact first/last-frame constraints
It does not.
The correct v1 product line is:
- Workflow A: `Animate Studio` on `Wan2.2-Animate-14B`
- Workflow B: `Audio Performance Studio` on `Wan2.2-S2V-14B`
The correct v1.1 or v2 expansion is:
- Workflow C: `Start/End Frame Studio` on `Wan2.2 Fun Inp`
- Workflow D: `Pose/Trajectory Control Studio` on `Wan2.2 Fun Control`
This distinction is mandatory because it affects UI truthfulness, node graphs, validation rules, asset requirements, and customer expectations.
## 3. Source Truth and Rationale
This SRS is grounded in the following current sources:
- official Wan 2.2 GitHub repository: `https://github.com/Wan-Video/Wan2.2`
- official Wan 2.2 Animate model page: `https://huggingface.co/Wan-AI/Wan2.2-Animate-14B`
- official ComfyUI Wan 2.2 docs:
- `https://docs.comfy.org/tutorials/video/wan/wan2_2`
- `https://docs.comfy.org/tutorials/video/wan/wan2-2-animate`
- `https://docs.comfy.org/tutorials/video/wan/wan2-2-s2v`
- `https://docs.comfy.org/tutorials/video/wan/wan2-2-fun-inp`
- `https://docs.comfy.org/tutorials/video/wan/wan2-2-fun-control`
- current Desineuron infrastructure truth:
- [comfyui_setup_truth.md](F:\Workin In Progress\DESINEURON\GITLAB\Project_Velocity\.Agent Context\Sprint 1\comfyui_setup_truth.md)
- [Desineuron Stable Ingress Handoff.md](F:\Workin In Progress\DESINEURON\GITLAB\Project_Velocity\.Agent Context\Sprint 1\Desineuron Stable Ingress Handoff.md)
Critical source-backed facts that drive the design:
- ComfyUI is already exposed safely through `https://comfy.desineuron.in`
- the GPU service already runs behind stable ingress
- `Wan2.2-Animate-14B` supports two operating modes in ComfyUI docs: `Mix` and `Move`
- `Wan2.2-S2V-14B` is the audio-driven workflow with image plus audio inputs
- ComfyUIs official Animate docs require additional custom nodes for the full direct workflow
- exact start/end-frame control is documented under `Wan2.2 Fun Inp`, not Animate
## 4. Product Vision
Animatrix should behave like a focused video creation surface, not like a research sandbox.
The product promise is:
"Upload a hero frame, optionally attach references, pose guidance, or audio, write a prompt, and generate a directed character video without touching ComfyUI nodes."
The UI must feel lightweight, but the execution system behind it must be opinionated and rigid enough to be supportable.
That means:
- limited number of modes
- strict validation
- controlled presets
- reproducible workflow JSON
- consistent output formats
- no raw-node exposure in the customer-facing frontend
## 5. Scope
### 5.1 In Scope for v1
- one frontend
- one backend API
- two ComfyUI production workflows
- status and result tracking
- stable ingress compatibility
- persistent storage for uploads and outputs
- preview and download experience
- operator-oriented logging and troubleshooting
- support for team usage through the existing Desineuron architecture
### 5.2 Out of Scope for v1
- arbitrary node editing by end users
- live collaborative editing
- in-browser timeline editing
- multi-scene stitching
- automatic sound effects design
- full NLE replacement
- customer-facing batch farms
- fine-tuning or LoRA training
### 5.3 Deferred but Planned
- first/last-frame exact control as a third workflow using `Wan2.2 Fun Inp`
- stronger pose or trajectory control using `Wan2.2 Fun Control`
- style packs and prompt presets
- branded credits or quota system
- user libraries and reusable character packs
## 6. User Personas
### 6.1 Internal Creative Operator
This user understands creative direction but should not need to edit node graphs. They need:
- fast iteration
- predictable inputs
- reliable outputs
- access to previous runs
### 6.2 Sales Demo Operator
This user needs a polished experience that can be shown live. They need:
- simple UX
- low operator error
- dependable queue feedback
- visible result cards
### 6.3 Technical Media Designer
This user understands reference material quality and wants more control without dropping into raw ComfyUI. They need:
- reference images
- pose sheet upload
- clear mode distinctions
- optional advanced settings
## 7. Functional Overview
Animatrix v1 will contain one shell product with two generation modes.
### 7.1 Mode A: Animate Studio
Underlying engine:
- `Wan2.2-Animate-14B`
Primary purpose:
- animate a character from a source image using the motion and expression from a source video
- replace the subject in a video with a new character image
Sub-modes:
- `Move`
- `Mix`
User inputs:
- prompt
- ground-truth character image, required
- source motion video, required
- optional reference images
- optional pose sheet image set
- optional aspect preset
- optional duration target
### 7.2 Mode B: Audio Performance Studio
Underlying engine:
- `Wan2.2-S2V-14B`
Primary purpose:
- generate a character video from a static image and audio input
- support dialogue, singing, and audio-driven performance
User inputs:
- prompt
- ground-truth character image, required
- source audio, required
- optional reference images
- optional pose sheet image set
- optional full-body / half-body framing preset
- optional duration target inferred from audio length
## 8. Frontend Vision
The frontend must preserve the interaction language shown in the reference screenshots:
- one large prompt composer
- image chips at the top-left of the composer
- plus button for additional attachments
- compact right-aligned mode selector
- advanced settings revealed through a controlled panel, not always visible
The frontend should feel immediate, not enterprise-heavy.
### 8.1 Core Layout
Top-level zones:
1. Attachment rail
2. Prompt composer
3. Optional advanced drawer
4. Generate action and mode switch
5. Run history / output gallery below
### 8.2 Attachment Types
Attachment chips in v1:
- `Ground Truth`
- `Reference`
- `Pose Sheet`
- `Audio`
- `Motion Video`
Visibility rules:
- `Ground Truth` always available
- `Motion Video` visible only in Animate Studio
- `Audio` visible only in Audio Performance Studio
- `Pose Sheet` optional in both modes
- `Reference` optional in both modes
### 8.3 Frontend Controls
Base controls:
- prompt text area
- optional keyword helper line
- mode toggle: `Animate` / `Audio`
- output aspect toggle: `9:16`, `16:9`, later `1:1`
- quality profile: `Draft`, `Standard`, `High`
- generate button
Advanced controls:
- Animate sub-mode: `Move` / `Mix`
- target duration
- seed
- negative prompt
- extension segments
- background preservation flag
- relighting flag
- lip-sync intensity or audio adherence preset
### 8.4 UX Rules
- do not expose raw model names to standard users
- use user language like `Animate`, `Replace Character`, `Audio Performance`
- surface warnings before submission if required inputs are missing
- show asset previews as compact rounded chips
- keep advanced panel collapsed by default
## 9. Exact Capability Mapping by Workflow
### 9.1 Workflow A: Animate Studio
Supported in v1:
- character animation from image plus motion video
- character replacement from image plus source video
- prompt conditioning
- optional pose preprocessing
- iterative video extension
Not truly supported by Animate Studio itself:
- direct audio-driven lip sync
- strict start/end-frame guarantees
### 9.2 Workflow B: Audio Performance Studio
Supported in v1:
- image plus audio driven generation
- prompt-conditioned motion/environment
- dialogue and singing style use cases
- long-form generation by extension chunks
Not truly supported by S2V itself:
- guaranteed subject replacement from an existing motion video
- exact last-frame lock
### 9.3 Pose Sheet Truth
The user-requested pose sheet can be supported in two ways:
1. Soft support in v1
- pose sheet stored as reference asset
- backend uses it for prompt augmentation and optional preprocessing assistance
- operator can map selected sheet frames to manual key pose hints
2. Hard support in later release
- migrate pose guidance to a dedicated `Wan2.2 Fun Control` or equivalent control-video workflow
The v1 document must state this honestly. A static pose sheet is not the same as a control video. It helps guide generation but does not become full deterministic motion control without an additional preprocessing and control pipeline.
## 10. Ground Truth Asset Model
The users "ground truth" image is the canonical identity anchor.
In both workflows it must serve as:
- the primary subject identity reference
- the default starting visual state
- the basis for preview thumbnails
Rules:
- exactly one primary ground-truth image per run
- image must pass minimum size and aspect checks
- background should preferably be clean but not mandatory
- user may crop or center the character before submission
Optional extension:
- future support for multiple identity references per character pack
## 11. Workflow Architecture
### 11.1 System Shape
```text
Browser
-> Animatrix frontend
-> Animatrix API
-> job store
-> asset store
-> workflow composer
-> ComfyUI client
-> https://comfy.desineuron.in
-> GPU ComfyUI service
-> Wan2.2 workflow execution
-> result collector
-> output persistence
-> result CDN / static delivery
```
### 11.2 Architectural Rule
The frontend must never submit raw prompts directly to ComfyUI.
The backend must always mediate:
- asset upload
- workflow selection
- workflow JSON parameter binding
- run metadata persistence
- output tracking
This is required for observability, rate control, product safety, and sales-readiness.
## 12. Ingress and Deployment Compatibility
Animatrix must be designed around the current Desineuron ingress truth.
Current infrastructure constraints:
- ComfyUI is already live at `https://comfy.desineuron.in`
- ComfyUI runs behind AWS ingress and stable TLS
- GPU private IP is not a stable application contract
- Linux origin is currently `192.168.1.2`
### 12.1 Mandatory Integration Rule
Animatrix backend must integrate with ComfyUI through the stable hostname or through a controlled internal service abstraction that resolves to the same managed route.
Do not bind Animatrix to:
- the GPU public IP
- direct `8188` public traffic
- hardcoded current private IP
### 12.2 Recommended Host Layout
Recommended public routing:
- `animatrix.desineuron.in` -> frontend and public product shell
- `api.animatrix.desineuron.in` or `animatrix.desineuron.in/api` -> backend API
- `comfy.desineuron.in` -> internal execution dependency only, not user-facing
If separate subdomains are not created immediately, the fallback deployment pattern may mirror the current Velocity site pattern:
- frontend served from Linux origin through ingress
- backend served from Linux origin through ingress
- backend calls ComfyUI through `https://comfy.desineuron.in`
## 13. Runtime Components
### 13.1 Frontend Application
Responsibilities:
- render simplified generation interface
- manage uploads
- validate user fields before submit
- create job requests
- poll or subscribe to job progress
- render previews and outputs
Suggested stack:
- Next.js or Vite React app
- Tailwind or CSS modules
- upload components with image/audio/video preview
### 13.2 Animatrix Backend API
Responsibilities:
- receive upload metadata
- store files
- generate canonical run record
- choose workflow template
- bind node inputs
- submit prompt payload to ComfyUI
- track prompt ID and history
- collect generated outputs
- persist result artifacts
Suggested stack:
- FastAPI if aligned with existing Python-heavy operations
- or Node/TypeScript only if the team wants one frontend-backend language
Recommendation:
- use Python FastAPI for v1 if reusing current Desineuron operational style and image/media tooling
### 13.3 Workflow Composer
Responsibilities:
- keep frozen template JSON files in version control
- inject prompt text, model selections, size, length, and asset paths
- enforce mode-specific constraints
This component must be deterministic. It is not a prompt improviser.
### 13.4 ComfyUI Execution Layer
Responsibilities:
- execute pre-approved workflow JSON
- expose queue, prompt, history, upload endpoints
- return output metadata
### 13.5 Asset Store
Responsibilities:
- raw upload persistence
- normalized derivative generation
- final output video persistence
- preview image generation
Recommended storage split:
- hot local cache on Linux origin
- durable object storage in S3 for long-term retention
## 13A. Current Infrastructure Contract
Animatrix v1 must be compatible with the currently operating Desineuron media stack as it exists today.
Live execution truth:
- public ComfyUI hostname: `https://comfy.desineuron.in`
- ingress elastic IP: `98.87.120.120`
- GPU private target currently managed behind ingress
- Linux origin currently: `192.168.1.2`
Current GPU-side storage truth:
- ComfyUI app root: `/opt/dlami/nvme/ComfyUI`
- HF cache: `/opt/dlami/nvme/hf`
- model staging root: `/opt/dlami/nvme/model-staging`
- model logs: `/opt/dlami/nvme/model-logs`
Current model hydration truth:
- durable bucket family already in use: `s3://project-velocity/models/`
- existing Wan hydration prefix: `s3://project-velocity/models/Wan2.2-Animate-14B/`
Animatrix must not introduce a second contradictory deployment path for ComfyUI. It must reuse this stable route and storage discipline.
## 13B. ComfyUI API Contract
The backend integration layer must be implemented against the current ComfyUI HTTP contract.
Required endpoints:
- `GET /`
- `POST /prompt`
- `GET /history/{prompt_id}`
- `GET /queue`
- `POST /upload/image`
Recommended extension checks:
- health probe against `/`
- prompt submission response validation
- history polling with bounded backoff
- queue introspection for operator dashboards
The backend must wrap these endpoints in a typed client and must not scatter raw HTTP calls throughout business logic.
## 13C. Model and Node Manifest
### Workflow A: Animate Studio Required Assets
Required model family:
- `Wan2.2-Animate-14B`
- `clip_vision_h.safetensors`
- `wan_2.1_vae.safetensors`
- `umt5_xxl_fp8_e4m3fn_scaled.safetensors`
Required custom nodes:
- `ComfyUI-KJNodes`
- `ComfyUI-comfyui_controlnet_aux`
Suggested placement contract:
- diffusion model files under `ComfyUI/models/diffusion_models/`
- text encoder under `ComfyUI/models/text_encoders/`
- VAE under `ComfyUI/models/vae/`
- CLIP Vision under `ComfyUI/models/clip_vision/`
### Workflow B: Audio Performance Studio Required Assets
Required model family:
- `wan2.2_s2v_14B_fp8_scaled.safetensors` or `wan2.2_s2v_14B_bf16.safetensors`
- `wav2vec2_large_english_fp16.safetensors`
- `wan_2.1_vae.safetensors`
- `umt5_xxl_fp8_e4m3fn_scaled.safetensors`
Suggested placement contract:
- diffusion model under `ComfyUI/models/diffusion_models/`
- text encoder under `ComfyUI/models/text_encoders/`
- audio encoder under `ComfyUI/models/audio_encoders/`
- VAE under `ComfyUI/models/vae/`
### Deferred Workflow Assets
For future strict start/end-frame control:
- `Wan2.2 Fun Inp` models and optional associated LoRAs
For future stronger pose control:
- `Wan2.2 Fun Control`
The frontend and API must be written so these workflows can be added later without reworking the entire product shell.
## 14. File and Repository Blueprint
Animatrix should be structured as an application repository or top-level product directory with explicit separation between app, API, and workflow assets.
Recommended layout:
```text
Animatrix/
docs/
Animatrix Monolithic SRS - Wan 2.2 Flow Studio.md
frontend/
src/
app/
components/
features/
lib/
styles/
backend/
app/
api/
services/
models/
repositories/
workers/
workflows/
animate/
wan22_animate_mix.json
wan22_animate_move.json
s2v/
wan22_s2v_base.json
shared/
prompt_profiles/
node_maps/
scripts/
deploy/
media/
sync/
infra/
systemd/
nginx/
caddy/
tests/
api/
workflows/
ui/
```
## 15. Workflow A Detailed Design: Animate Studio
### 15.1 Objective
Deliver a workflow that supports:
- character replacement from a source video
- character animation from a performer video
- prompt-guided visual refinement
### 15.2 Input Contract
Required:
- `prompt`
- `ground_truth_image`
- `motion_video`
- `mode`: `move` or `mix`
Optional:
- `reference_images[]`
- `pose_sheet_images[]`
- `negative_prompt`
- `duration_override_seconds`
- `aspect_ratio`
- `quality_profile`
- `seed`
### 15.3 Output Contract
Primary outputs:
- `video_mp4`
- `poster_frame_jpg`
- `job_manifest.json`
- `debug_metadata.json`
Secondary outputs:
- pose preview if preprocessing is enabled
- first-frame snapshot
### 15.4 Internal Workflow Stages
1. Ingest image and video
2. Normalize formats and dimensions
3. Extract first frame and thumbnail
4. Run optional DWPose or auxiliary preprocessing
5. Bind workflow JSON for `move` or `mix`
6. Upload normalized assets to ComfyUI
7. Submit workflow
8. Poll queue and history
9. Collect result paths
10. Persist final outputs and metadata
### 15.5 ComfyUI Notes
The official Animate workflow requires:
- `clip_vision_h.safetensors`
- `wan_2.1_vae.safetensors`
- `umt5_xxl_fp8_e4m3fn_scaled.safetensors`
- Animate diffusion model
- optional Lightning LoRA
- custom nodes:
- `ComfyUI-KJNodes`
- `ComfyUI-comfyui_controlnet_aux`
### 15.6 Product-Level Rule
Animatrix v1 must hide these internals from the standard UI, but the backend and operator docs must track them exactly.
## 16. Workflow B Detailed Design: Audio Performance Studio
### 16.1 Objective
Deliver a workflow that supports:
- talking-head and half-body performance
- singing and dialogue use cases
- audio-driven facial and motion synthesis
### 16.2 Input Contract
Required:
- `prompt`
- `ground_truth_image`
- `audio_file`
Optional:
- `reference_images[]`
- `pose_sheet_images[]`
- `negative_prompt`
- `framing_mode`: `portrait`, `half_body`, `full_body`
- `quality_profile`
- `seed`
### 16.3 Output Contract
Primary outputs:
- `video_mp4`
- `poster_frame_jpg`
- `job_manifest.json`
- `debug_metadata.json`
### 16.4 Internal Workflow Stages
1. Ingest image and audio
2. Normalize sample rate and file format
3. Infer required frame count from audio duration
4. Determine required S2V extension chunks
5. Bind workflow JSON
6. Upload image and audio to ComfyUI
7. Submit workflow
8. Poll queue and history
9. Collect output video
10. Persist artifacts
### 16.5 ComfyUI Notes
The official S2V workflow requires:
- `wan2.2_s2v_14B_fp8_scaled.safetensors` or bf16 variant
- `wav2vec2_large_english_fp16.safetensors`
- `wan_2.1_vae.safetensors`
- `umt5_xxl_fp8_e4m3fn_scaled.safetensors`
The ComfyUI docs note that:
- fp8 uses less VRAM
- bf16 may reduce quality degradation
- Lightning LoRA can reduce generation time but can also significantly reduce quality and dynamics
Therefore Animatrix must default to:
- `Standard`: fp8 without aggressive LoRA by default for customer-facing quality stability
- `Draft`: fp8 with acceleration options
- `High`: bf16 where hardware allows
## 17. UI-to-Workflow Mapping
The UI must map cleanly to backend request objects.
### 17.1 Shared Fields
- `mode`
- `prompt`
- `negative_prompt`
- `ground_truth_asset_id`
- `reference_asset_ids[]`
- `pose_sheet_asset_ids[]`
- `aspect_ratio`
- `quality_profile`
- `seed`
### 17.2 Animate-Specific Fields
- `motion_video_asset_id`
- `animate_submode`
- `background_preservation`
- `relighting`
- `extension_segments`
### 17.3 Audio-Specific Fields
- `audio_asset_id`
- `framing_mode`
- `audio_adherence_profile`
- `extension_segments`
## 18. Suggested Backend API
### 18.1 Asset Endpoints
- `POST /api/assets/image`
- `POST /api/assets/video`
- `POST /api/assets/audio`
- `GET /api/assets/{asset_id}`
### 18.2 Job Endpoints
- `POST /api/jobs/animate`
- `POST /api/jobs/audio-performance`
- `GET /api/jobs/{job_id}`
- `GET /api/jobs/{job_id}/events`
- `GET /api/jobs/{job_id}/outputs`
- `POST /api/jobs/{job_id}/cancel`
### 18.3 Admin Endpoints
- `GET /api/admin/workflows`
- `GET /api/admin/health`
- `GET /api/admin/queue`
- `POST /api/admin/retry/{job_id}`
### 18.4 Websocket or SSE Progress Channel
Recommended:
- `GET /api/jobs/{job_id}/stream`
This should emit:
- accepted
- uploaded
- queued
- executing
- collecting_outputs
- completed
- failed
The frontend should use this channel if available and fall back to polling if the connection drops.
## 19. Data Model
### 19.1 Asset
Fields:
- `asset_id`
- `asset_type`
- `mime_type`
- `original_filename`
- `storage_url`
- `thumbnail_url`
- `width`
- `height`
- `duration_seconds`
- `size_bytes`
- `created_at`
### 19.2 Job
Fields:
- `job_id`
- `mode`
- `workflow_template`
- `status`
- `submitted_by`
- `prompt`
- `negative_prompt`
- `settings_json`
- `comfy_prompt_id`
- `created_at`
- `updated_at`
### 19.3 JobOutput
Fields:
- `output_id`
- `job_id`
- `video_url`
- `poster_url`
- `manifest_url`
- `duration_seconds`
- `resolution`
- `fps`
- `created_at`
## 20. Workflow Template Governance
Workflow JSON must be treated as versioned product assets.
Rules:
- each production workflow JSON must have an immutable version identifier
- node IDs must be mapped in a dedicated config file
- backend parameter injection must never depend on informal manual node lookup
- each workflow change must pass snapshot regression checks
Required metadata for every workflow:
- `workflow_name`
- `workflow_version`
- `model_family`
- `required_assets`
- `required_models`
- `custom_nodes`
- `compatible_backend_version`
## 21. Storage and Delivery Design
### 21.1 Inputs
Store raw uploads in durable storage with stable references.
Recommended:
- object storage in S3
- local temporary cache for preprocessing
### 21.2 Outputs
Store:
- mp4 output
- poster image
- optional animated preview
- manifest json
### 21.3 Delivery
Outputs must be streamable from a public HTTPS origin via ingress.
If using Linux origin:
- serve final assets through nginx under Animatrix public domain
If using S3-backed storage:
- use signed or public-read delivery depending on account mode
## 22. Quality Profiles
Animatrix must expose productized quality profiles rather than raw step counts to users.
### 22.1 Draft
Purpose:
- internal ideation
- faster previews
Behavior:
- lower resolution
- lower steps
- acceleration LoRA allowed
### 22.2 Standard
Purpose:
- most normal production runs
Behavior:
- balanced speed and quality
- conservative defaults
- no quality-destructive shortcuts unless explicitly enabled
### 22.3 High
Purpose:
- demo and delivery quality
Behavior:
- higher quality model variant when available
- larger resolution
- longer runtime accepted
## 23. Error Handling
Failure classes:
- missing asset
- invalid asset format
- unsupported aspect ratio
- workflow binding failure
- ComfyUI upload failure
- ComfyUI queue failure
- generation timeout
- result collection failure
User-facing errors must be simplified.
Operator-facing logs must preserve exact failure cause.
## 23A. Validation Rules
### Shared Validation
- reject empty prompt if prompt is required by the selected workflow profile
- reject missing ground-truth image
- reject unsupported file extensions
- reject files above configured upload limit
### Animate Studio Validation
- reject missing motion video
- reject unsupported source video codecs that cannot be normalized
- reject conflicting `move` and `mix` settings
### Audio Performance Studio Validation
- reject missing audio
- reject audio longer than configured maximum duration for the selected profile
- normalize sample rate before workflow submission
### Pose Sheet Validation
- accept only supported image formats
- cap pose sheet image count in v1
- mark pose sheet as "soft guidance" in job metadata unless a later hard-control pipeline is introduced
## 24. Observability
Minimum operational telemetry:
- job creation rate
- queue depth
- mean wait time
- mean generation time by workflow
- failure rate by workflow version
- storage growth
- top asset sizes
Required correlation identifiers:
- `job_id`
- `asset_id`
- `comfy_prompt_id`
## 25. Security and Access Control
Rules:
- do not expose raw ComfyUI publicly to end users as the product surface
- backend owns ComfyUI credentials and workflow orchestration
- validate file size and MIME type on upload
- strip executable uploads
- limit accepted formats
- preserve audit trail for every run
## 26. Team and Operator UX
The system must support:
- internal team usage through the stable ingress
- supportable operator triage
- easy workflow version rollback
- safe demo usage during sales calls
Operators need:
- admin queue view
- job replay
- access to input and output manifests
- workflow version annotation
## 27. Non-Functional Requirements
### 27.1 Reliability
- no direct dependency on ephemeral GPU public IP
- graceful retry around ComfyUI upload and history polling
- job state persisted outside memory
### 27.2 Performance
- fast upload validation
- async polling and result collection
- cached thumbnails
### 27.3 Scalability
- workflow templates stateless
- API horizontally scalable
- storage externalized
### 27.4 Maintainability
- one source-of-truth workflow config per mode
- explicit model manifest
- no hidden hand-edited production JSON
### 27.5 Sales Readiness
- stable hostname
- reliable queue messaging
- polished success and failure states
- deterministic demo inputs
## 27A. Demo and Commercial Readiness Requirements
Animatrix will be used in live demos and pre-sales conversations. That changes the bar.
Required product behavior:
- first meaningful UI paint fast enough for live sales use
- one-click sample project loading for demo mode
- clear progress messaging during long generations
- shareable output URL or operator download path
- no raw ComfyUI terminology in the customer-facing layer unless explicitly in admin mode
Required operator support behavior:
- known-good demo assets packaged and versioned
- visible warning when GPU queue is saturated
- ability to retry a failed job without recreating all metadata manually
## 28. MVP Acceptance Criteria
Animatrix v1 is only considered complete when all of the following are true:
1. A user can upload a ground-truth image, type a prompt, attach a motion video, select `Move` or `Mix`, and receive a finished video output.
2. A user can upload a ground-truth image, type a prompt, attach audio, and receive an audio-driven character video.
3. Both flows work through the stable Desineuron ingress model and do not depend on hardcoded GPU IPs.
4. Every run produces a persisted job record and output manifest.
5. Generated videos are streamable over HTTPS.
6. Operators can inspect job state and correlate product job ID to ComfyUI prompt ID.
7. The UI remains simple enough for a non-technical demo operator.
## 29. Explicit Product Decisions
### 29.1 What v1 Must Say No To
Animatrix v1 must not claim:
- perfect deterministic pose-sheet control
- exact first and last frame locking
- full timeline editing
- full audio mastering
### 29.2 What v1 Must Say Yes To
Animatrix v1 can truthfully claim:
- guided character animation
- guided character replacement
- audio-driven talking or performance video
- reference-assisted generation
- production-safe simplified UI on top of ComfyUI
## 30. Recommended Delivery Phases
### Phase 1
- backend skeleton
- asset model
- one frozen Animate workflow
- one frozen S2V workflow
- barebones frontend
### Phase 2
- quality profiles
- operator dashboard
- output gallery
- S3 persistence
### Phase 3
- first/last-frame workflow
- stronger pose control
- reusable character libraries
## 31. Final Architecture Recommendation
Build Animatrix as a thin product layer over stable infrastructure that already exists:
- keep ComfyUI where it is
- keep ingress where it is
- add a dedicated Animatrix backend
- keep the frontend intentionally minimal
- treat workflow JSON as versioned software artifacts
Do not begin by building a large generic creative suite.
Build the narrowest saleable product first:
- `Animate Studio`
- `Audio Performance Studio`
Then expand to:
- `Start/End Frame Studio`
- `Pose Control Studio`
## 32. Bottom Line
Animatrix v1 should be a Flow-like creative surface backed by two real Wan 2.2 workflows, not one imaginary super-workflow.
The correct implementation target is:
- one frontend
- one orchestration backend
- two workflow families
- one stable ingress-compatible execution path
- one durable output system
If the team follows this document strictly, the result will be productizable, supportable, and compatible with the current Desineuron infrastructure without lying about model capabilities.