Project_Animatix/Docs/Animatrix Monolithic SRS - Wan 2.2 Flow Studio.md

# Animatrix Monolithic SRS - Wan 2.2 Flow Studio

Date: 2026-04-15

Authoring context: This document defines the first production-ready Animatrix system built on top of the existing Desineuron ingress, the current ComfyUI GPU service, and the Wan 2.2 model family.

## 1. Purpose

Animatrix is a focused product for guided character video generation. It is not a general-purpose node editor. It is a constrained, operator-safe application that exposes two production workflows behind one simple frontend:

1. Character Animation and Replacement using `Wan2.2-Animate-14B`
2. Audio-Driven Character Performance using `Wan2.2-S2V-14B`

The frontend interaction model is inspired by the simplicity and compositional feel of Google Flow, but the execution runtime is ComfyUI-backed and Desineuron-hosted.

The objective is to give users a minimal interface:

- prompt box
- ground-truth starting image upload
- optional reference images and pose sheet uploads
- optional audio upload
- simple mode selection
- one-click generation

while the backend handles:

- asset ingestion
- workflow selection
- parameter validation
- ComfyUI prompt orchestration
- queueing
- status tracking
- result persistence
- streaming-ready delivery

## 2. Executive Product Truth

Animatrix v1 must be built around the actual Wan 2.2 model split, not a blended assumption.

Capability mapping:

- `Wan2.2-Animate-14B` is for character animation and character replacement.
- `Wan2.2-S2V-14B` is for audio-driven video generation with dialogue, singing, and performance.
- `Wan2.2 Fun Inp` is the Wan family workflow for strict first-frame and last-frame control.
- `Wan2.2 Fun Control` is the Wan family workflow for stronger control-video inputs such as OpenPose, depth, canny, and trajectory control.

Therefore the first release must not falsely claim that one single model covers all of the following natively:

- character replacement
- motion transfer
- audio lip-sync
- exact first/last-frame constraints

It does not.

The correct v1 product line is:

- Workflow A: `Animate Studio` on `Wan2.2-Animate-14B`
- Workflow B: `Audio Performance Studio` on `Wan2.2-S2V-14B`

The correct v1.1 or v2 expansion is:

- Workflow C: `Start/End Frame Studio` on `Wan2.2 Fun Inp`
- Workflow D: `Pose/Trajectory Control Studio` on `Wan2.2 Fun Control`

This distinction is mandatory because it affects UI truthfulness, node graphs, validation rules, asset requirements, and customer expectations.

## 3. Source Truth and Rationale

This SRS is grounded in the following current sources:

- official Wan 2.2 GitHub repository: `https://github.com/Wan-Video/Wan2.2`
- official Wan 2.2 Animate model page: `https://huggingface.co/Wan-AI/Wan2.2-Animate-14B`
- official ComfyUI Wan 2.2 docs:
  - `https://docs.comfy.org/tutorials/video/wan/wan2_2`
  - `https://docs.comfy.org/tutorials/video/wan/wan2-2-animate`
  - `https://docs.comfy.org/tutorials/video/wan/wan2-2-s2v`
  - `https://docs.comfy.org/tutorials/video/wan/wan2-2-fun-inp`
  - `https://docs.comfy.org/tutorials/video/wan/wan2-2-fun-control`
- current Desineuron infrastructure truth:
  - [comfyui_setup_truth.md](F:\Workin In Progress\DESINEURON\GITLAB\Project_Velocity\.Agent Context\Sprint 1\comfyui_setup_truth.md)
  - [Desineuron Stable Ingress Handoff.md](F:\Workin In Progress\DESINEURON\GITLAB\Project_Velocity\.Agent Context\Sprint 1\Desineuron Stable Ingress Handoff.md)

Critical source-backed facts that drive the design:

- ComfyUI is already exposed safely through `https://comfy.desineuron.in`
- the GPU service already runs behind stable ingress
- `Wan2.2-Animate-14B` supports two operating modes in ComfyUI docs: `Mix` and `Move`
- `Wan2.2-S2V-14B` is the audio-driven workflow with image plus audio inputs
- ComfyUI’s official Animate docs require additional custom nodes for the full direct workflow
- exact start/end-frame control is documented under `Wan2.2 Fun Inp`, not Animate

## 4. Product Vision

Animatrix should behave like a focused video creation surface, not like a research sandbox.

The product promise is:

"Upload a hero frame, optionally attach references, pose guidance, or audio, write a prompt, and generate a directed character video without touching ComfyUI nodes."

The UI must feel lightweight, but the execution system behind it must be opinionated and rigid enough to be supportable.

That means:

- limited number of modes
- strict validation
- controlled presets
- reproducible workflow JSON
- consistent output formats
- no raw-node exposure in the customer-facing frontend

## 5. Scope

### 5.1 In Scope for v1

- one frontend
- one backend API
- two ComfyUI production workflows
- status and result tracking
- stable ingress compatibility
- persistent storage for uploads and outputs
- preview and download experience
- operator-oriented logging and troubleshooting
- support for team usage through the existing Desineuron architecture

### 5.2 Out of Scope for v1

- arbitrary node editing by end users
- live collaborative editing
- in-browser timeline editing
- multi-scene stitching
- automatic sound effects design
- full NLE replacement
- customer-facing batch farms
- fine-tuning or LoRA training

### 5.3 Deferred but Planned

- first/last-frame exact control as a third workflow using `Wan2.2 Fun Inp`
- stronger pose or trajectory control using `Wan2.2 Fun Control`
- style packs and prompt presets
- branded credits or quota system
- user libraries and reusable character packs

## 6. User Personas

### 6.1 Internal Creative Operator

This user understands creative direction but should not need to edit node graphs. They need:

- fast iteration
- predictable inputs
- reliable outputs
- access to previous runs

### 6.2 Sales Demo Operator

This user needs a polished experience that can be shown live. They need:

- simple UX
- low operator error
- dependable queue feedback
- visible result cards

### 6.3 Technical Media Designer

This user understands reference material quality and wants more control without dropping into raw ComfyUI. They need:

- reference images
- pose sheet upload
- clear mode distinctions
- optional advanced settings

## 7. Functional Overview

Animatrix v1 will contain one shell product with two generation modes.

### 7.1 Mode A: Animate Studio

Underlying engine:

- `Wan2.2-Animate-14B`

Primary purpose:

- animate a character from a source image using the motion and expression from a source video
- replace the subject in a video with a new character image

Sub-modes:

- `Move`
- `Mix`

User inputs:

- prompt
- ground-truth character image, required
- source motion video, required
- optional reference images
- optional pose sheet image set
- optional aspect preset
- optional duration target

### 7.2 Mode B: Audio Performance Studio

Underlying engine:

- `Wan2.2-S2V-14B`

Primary purpose:

- generate a character video from a static image and audio input
- support dialogue, singing, and audio-driven performance

User inputs:

- prompt
- ground-truth character image, required
- source audio, required
- optional reference images
- optional pose sheet image set
- optional full-body / half-body framing preset
- optional duration target inferred from audio length

## 8. Frontend Vision

The frontend must preserve the interaction language shown in the reference screenshots:

- one large prompt composer
- image chips at the top-left of the composer
- plus button for additional attachments
- compact right-aligned mode selector
- advanced settings revealed through a controlled panel, not always visible

The frontend should feel immediate, not enterprise-heavy.

### 8.1 Core Layout

Top-level zones:

1. Attachment rail
2. Prompt composer
3. Optional advanced drawer
4. Generate action and mode switch
5. Run history / output gallery below

### 8.2 Attachment Types

Attachment chips in v1:

- `Ground Truth`
- `Reference`
- `Pose Sheet`
- `Audio`
- `Motion Video`

Visibility rules:

- `Ground Truth` always available
- `Motion Video` visible only in Animate Studio
- `Audio` visible only in Audio Performance Studio
- `Pose Sheet` optional in both modes
- `Reference` optional in both modes

### 8.3 Frontend Controls

Base controls:

- prompt text area
- optional keyword helper line
- mode toggle: `Animate` / `Audio`
- output aspect toggle: `9:16`, `16:9`, later `1:1`
- quality profile: `Draft`, `Standard`, `High`
- generate button

Advanced controls:

- Animate sub-mode: `Move` / `Mix`
- target duration
- seed
- negative prompt
- extension segments
- background preservation flag
- relighting flag
- lip-sync intensity or audio adherence preset

### 8.4 UX Rules

- do not expose raw model names to standard users
- use user language like `Animate`, `Replace Character`, `Audio Performance`
- surface warnings before submission if required inputs are missing
- show asset previews as compact rounded chips
- keep advanced panel collapsed by default

## 9. Exact Capability Mapping by Workflow

### 9.1 Workflow A: Animate Studio

Supported in v1:

- character animation from image plus motion video
- character replacement from image plus source video
- prompt conditioning
- optional pose preprocessing
- iterative video extension

Not truly supported by Animate Studio itself:

- direct audio-driven lip sync
- strict start/end-frame guarantees

### 9.2 Workflow B: Audio Performance Studio

Supported in v1:

- image plus audio driven generation
- prompt-conditioned motion/environment
- dialogue and singing style use cases
- long-form generation by extension chunks

Not truly supported by S2V itself:

- guaranteed subject replacement from an existing motion video
- exact last-frame lock

### 9.3 Pose Sheet Truth

The user-requested pose sheet can be supported in two ways:

1. Soft support in v1
   - pose sheet stored as reference asset
   - backend uses it for prompt augmentation and optional preprocessing assistance
   - operator can map selected sheet frames to manual key pose hints

2. Hard support in later release
   - migrate pose guidance to a dedicated `Wan2.2 Fun Control` or equivalent control-video workflow

The v1 document must state this honestly. A static pose sheet is not the same as a control video. It helps guide generation but does not become full deterministic motion control without an additional preprocessing and control pipeline.

## 10. Ground Truth Asset Model

The user’s "ground truth" image is the canonical identity anchor.

In both workflows it must serve as:

- the primary subject identity reference
- the default starting visual state
- the basis for preview thumbnails

Rules:

- exactly one primary ground-truth image per run
- image must pass minimum size and aspect checks
- background should preferably be clean but not mandatory
- user may crop or center the character before submission

Optional extension:

- future support for multiple identity references per character pack

## 11. Workflow Architecture

### 11.1 System Shape

```text
Browser
  -> Animatrix frontend
  -> Animatrix API
     -> job store
     -> asset store
     -> workflow composer
     -> ComfyUI client
        -> https://comfy.desineuron.in
           -> GPU ComfyUI service
              -> Wan2.2 workflow execution
     -> result collector
     -> output persistence
  -> result CDN / static delivery
```

### 11.2 Architectural Rule

The frontend must never submit raw prompts directly to ComfyUI.

The backend must always mediate:

- asset upload
- workflow selection
- workflow JSON parameter binding
- run metadata persistence
- output tracking

This is required for observability, rate control, product safety, and sales-readiness.

## 12. Ingress and Deployment Compatibility

Animatrix must be designed around the current Desineuron ingress truth.

Current infrastructure constraints:

- ComfyUI is already live at `https://comfy.desineuron.in`
- ComfyUI runs behind AWS ingress and stable TLS
- GPU private IP is not a stable application contract
- Linux origin is currently `192.168.1.2`

### 12.1 Mandatory Integration Rule

Animatrix backend must integrate with ComfyUI through the stable hostname or through a controlled internal service abstraction that resolves to the same managed route.

Do not bind Animatrix to:

- the GPU public IP
- direct `8188` public traffic
- hardcoded current private IP

### 12.2 Recommended Host Layout

Recommended public routing:

- `animatrix.desineuron.in` -> frontend and public product shell
- `api.animatrix.desineuron.in` or `animatrix.desineuron.in/api` -> backend API
- `comfy.desineuron.in` -> internal execution dependency only, not user-facing

If separate subdomains are not created immediately, the fallback deployment pattern may mirror the current Velocity site pattern:

- frontend served from Linux origin through ingress
- backend served from Linux origin through ingress
- backend calls ComfyUI through `https://comfy.desineuron.in`

## 13. Runtime Components

### 13.1 Frontend Application

Responsibilities:

- render simplified generation interface
- manage uploads
- validate user fields before submit
- create job requests
- poll or subscribe to job progress
- render previews and outputs

Suggested stack:

- Next.js or Vite React app
- Tailwind or CSS modules
- upload components with image/audio/video preview

### 13.2 Animatrix Backend API

Responsibilities:

- receive upload metadata
- store files
- generate canonical run record
- choose workflow template
- bind node inputs
- submit prompt payload to ComfyUI
- track prompt ID and history
- collect generated outputs
- persist result artifacts

Suggested stack:

- FastAPI if aligned with existing Python-heavy operations
- or Node/TypeScript only if the team wants one frontend-backend language

Recommendation:

- use Python FastAPI for v1 if reusing current Desineuron operational style and image/media tooling

### 13.3 Workflow Composer

Responsibilities:

- keep frozen template JSON files in version control
- inject prompt text, model selections, size, length, and asset paths
- enforce mode-specific constraints

This component must be deterministic. It is not a prompt improviser.

### 13.4 ComfyUI Execution Layer

Responsibilities:

- execute pre-approved workflow JSON
- expose queue, prompt, history, upload endpoints
- return output metadata

### 13.5 Asset Store

Responsibilities:

- raw upload persistence
- normalized derivative generation
- final output video persistence
- preview image generation

Recommended storage split:

- hot local cache on Linux origin
- durable object storage in S3 for long-term retention

## 13A. Current Infrastructure Contract

Animatrix v1 must be compatible with the currently operating Desineuron media stack as it exists today.

Live execution truth:

- public ComfyUI hostname: `https://comfy.desineuron.in`
- ingress elastic IP: `98.87.120.120`
- GPU private target currently managed behind ingress
- Linux origin currently: `192.168.1.2`

Current GPU-side storage truth:

- ComfyUI app root: `/opt/dlami/nvme/ComfyUI`
- HF cache: `/opt/dlami/nvme/hf`
- model staging root: `/opt/dlami/nvme/model-staging`
- model logs: `/opt/dlami/nvme/model-logs`

Current model hydration truth:

- durable bucket family already in use: `s3://project-velocity/models/`
- existing Wan hydration prefix: `s3://project-velocity/models/Wan2.2-Animate-14B/`

Animatrix must not introduce a second contradictory deployment path for ComfyUI. It must reuse this stable route and storage discipline.

## 13B. ComfyUI API Contract

The backend integration layer must be implemented against the current ComfyUI HTTP contract.

Required endpoints:

- `GET /`
- `POST /prompt`
- `GET /history/{prompt_id}`
- `GET /queue`
- `POST /upload/image`

Recommended extension checks:

- health probe against `/`
- prompt submission response validation
- history polling with bounded backoff
- queue introspection for operator dashboards

The backend must wrap these endpoints in a typed client and must not scatter raw HTTP calls throughout business logic.

## 13C. Model and Node Manifest

### Workflow A: Animate Studio Required Assets

Required model family:

- `Wan2.2-Animate-14B`
- `clip_vision_h.safetensors`
- `wan_2.1_vae.safetensors`
- `umt5_xxl_fp8_e4m3fn_scaled.safetensors`

Required custom nodes:

- `ComfyUI-KJNodes`
- `ComfyUI-comfyui_controlnet_aux`

Suggested placement contract:

- diffusion model files under `ComfyUI/models/diffusion_models/`
- text encoder under `ComfyUI/models/text_encoders/`
- VAE under `ComfyUI/models/vae/`
- CLIP Vision under `ComfyUI/models/clip_vision/`

### Workflow B: Audio Performance Studio Required Assets

Required model family:

- `wan2.2_s2v_14B_fp8_scaled.safetensors` or `wan2.2_s2v_14B_bf16.safetensors`
- `wav2vec2_large_english_fp16.safetensors`
- `wan_2.1_vae.safetensors`
- `umt5_xxl_fp8_e4m3fn_scaled.safetensors`

Suggested placement contract:

- diffusion model under `ComfyUI/models/diffusion_models/`
- text encoder under `ComfyUI/models/text_encoders/`
- audio encoder under `ComfyUI/models/audio_encoders/`
- VAE under `ComfyUI/models/vae/`

### Deferred Workflow Assets

For future strict start/end-frame control:

- `Wan2.2 Fun Inp` models and optional associated LoRAs

For future stronger pose control:

- `Wan2.2 Fun Control`

The frontend and API must be written so these workflows can be added later without reworking the entire product shell.

## 14. File and Repository Blueprint

Animatrix should be structured as an application repository or top-level product directory with explicit separation between app, API, and workflow assets.

Recommended layout:

```text
Animatrix/
  docs/
    Animatrix Monolithic SRS - Wan 2.2 Flow Studio.md
  frontend/
    src/
      app/
      components/
      features/
      lib/
      styles/
  backend/
    app/
      api/
      services/
      models/
      repositories/
      workers/
  workflows/
    animate/
      wan22_animate_mix.json
      wan22_animate_move.json
    s2v/
      wan22_s2v_base.json
    shared/
      prompt_profiles/
      node_maps/
  scripts/
    deploy/
    media/
    sync/
  infra/
    systemd/
    nginx/
    caddy/
  tests/
    api/
    workflows/
    ui/
```

## 15. Workflow A Detailed Design: Animate Studio

### 15.1 Objective

Deliver a workflow that supports:

- character replacement from a source video
- character animation from a performer video
- prompt-guided visual refinement

### 15.2 Input Contract

Required:

- `prompt`
- `ground_truth_image`
- `motion_video`
- `mode`: `move` or `mix`

Optional:

- `reference_images[]`
- `pose_sheet_images[]`
- `negative_prompt`
- `duration_override_seconds`
- `aspect_ratio`
- `quality_profile`
- `seed`

### 15.3 Output Contract

Primary outputs:

- `video_mp4`
- `poster_frame_jpg`
- `job_manifest.json`
- `debug_metadata.json`

Secondary outputs:

- pose preview if preprocessing is enabled
- first-frame snapshot

### 15.4 Internal Workflow Stages

1. Ingest image and video
2. Normalize formats and dimensions
3. Extract first frame and thumbnail
4. Run optional DWPose or auxiliary preprocessing
5. Bind workflow JSON for `move` or `mix`
6. Upload normalized assets to ComfyUI
7. Submit workflow
8. Poll queue and history
9. Collect result paths
10. Persist final outputs and metadata

### 15.5 ComfyUI Notes

The official Animate workflow requires:

- `clip_vision_h.safetensors`
- `wan_2.1_vae.safetensors`
- `umt5_xxl_fp8_e4m3fn_scaled.safetensors`
- Animate diffusion model
- optional Lightning LoRA
- custom nodes:
  - `ComfyUI-KJNodes`
  - `ComfyUI-comfyui_controlnet_aux`

### 15.6 Product-Level Rule

Animatrix v1 must hide these internals from the standard UI, but the backend and operator docs must track them exactly.

## 16. Workflow B Detailed Design: Audio Performance Studio

### 16.1 Objective

Deliver a workflow that supports:

- talking-head and half-body performance
- singing and dialogue use cases
- audio-driven facial and motion synthesis

### 16.2 Input Contract

Required:

- `prompt`
- `ground_truth_image`
- `audio_file`

Optional:

- `reference_images[]`
- `pose_sheet_images[]`
- `negative_prompt`
- `framing_mode`: `portrait`, `half_body`, `full_body`
- `quality_profile`
- `seed`

### 16.3 Output Contract

Primary outputs:

- `video_mp4`
- `poster_frame_jpg`
- `job_manifest.json`
- `debug_metadata.json`

### 16.4 Internal Workflow Stages

1. Ingest image and audio
2. Normalize sample rate and file format
3. Infer required frame count from audio duration
4. Determine required S2V extension chunks
5. Bind workflow JSON
6. Upload image and audio to ComfyUI
7. Submit workflow
8. Poll queue and history
9. Collect output video
10. Persist artifacts

### 16.5 ComfyUI Notes

The official S2V workflow requires:

- `wan2.2_s2v_14B_fp8_scaled.safetensors` or bf16 variant
- `wav2vec2_large_english_fp16.safetensors`
- `wan_2.1_vae.safetensors`
- `umt5_xxl_fp8_e4m3fn_scaled.safetensors`

The ComfyUI docs note that:

- fp8 uses less VRAM
- bf16 may reduce quality degradation
- Lightning LoRA can reduce generation time but can also significantly reduce quality and dynamics

Therefore Animatrix must default to:

- `Standard`: fp8 without aggressive LoRA by default for customer-facing quality stability
- `Draft`: fp8 with acceleration options
- `High`: bf16 where hardware allows

## 17. UI-to-Workflow Mapping

The UI must map cleanly to backend request objects.

### 17.1 Shared Fields

- `mode`
- `prompt`
- `negative_prompt`
- `ground_truth_asset_id`
- `reference_asset_ids[]`
- `pose_sheet_asset_ids[]`
- `aspect_ratio`
- `quality_profile`
- `seed`

### 17.2 Animate-Specific Fields

- `motion_video_asset_id`
- `animate_submode`
- `background_preservation`
- `relighting`
- `extension_segments`

### 17.3 Audio-Specific Fields

- `audio_asset_id`
- `framing_mode`
- `audio_adherence_profile`
- `extension_segments`

## 18. Suggested Backend API

### 18.1 Asset Endpoints

- `POST /api/assets/image`
- `POST /api/assets/video`
- `POST /api/assets/audio`
- `GET /api/assets/{asset_id}`

### 18.2 Job Endpoints

- `POST /api/jobs/animate`
- `POST /api/jobs/audio-performance`
- `GET /api/jobs/{job_id}`
- `GET /api/jobs/{job_id}/events`
- `GET /api/jobs/{job_id}/outputs`
- `POST /api/jobs/{job_id}/cancel`

### 18.3 Admin Endpoints

- `GET /api/admin/workflows`
- `GET /api/admin/health`
- `GET /api/admin/queue`
- `POST /api/admin/retry/{job_id}`

### 18.4 Websocket or SSE Progress Channel

Recommended:

- `GET /api/jobs/{job_id}/stream`

This should emit:

- accepted
- uploaded
- queued
- executing
- collecting_outputs
- completed
- failed

The frontend should use this channel if available and fall back to polling if the connection drops.

## 19. Data Model

### 19.1 Asset

Fields:

- `asset_id`
- `asset_type`
- `mime_type`
- `original_filename`
- `storage_url`
- `thumbnail_url`
- `width`
- `height`
- `duration_seconds`
- `size_bytes`
- `created_at`

### 19.2 Job

Fields:

- `job_id`
- `mode`
- `workflow_template`
- `status`
- `submitted_by`
- `prompt`
- `negative_prompt`
- `settings_json`
- `comfy_prompt_id`
- `created_at`
- `updated_at`

### 19.3 JobOutput

Fields:

- `output_id`
- `job_id`
- `video_url`
- `poster_url`
- `manifest_url`
- `duration_seconds`
- `resolution`
- `fps`
- `created_at`

## 20. Workflow Template Governance

Workflow JSON must be treated as versioned product assets.

Rules:

- each production workflow JSON must have an immutable version identifier
- node IDs must be mapped in a dedicated config file
- backend parameter injection must never depend on informal manual node lookup
- each workflow change must pass snapshot regression checks

Required metadata for every workflow:

- `workflow_name`
- `workflow_version`
- `model_family`
- `required_assets`
- `required_models`
- `custom_nodes`
- `compatible_backend_version`

## 21. Storage and Delivery Design

### 21.1 Inputs

Store raw uploads in durable storage with stable references.

Recommended:

- object storage in S3
- local temporary cache for preprocessing

### 21.2 Outputs

Store:

- mp4 output
- poster image
- optional animated preview
- manifest json

### 21.3 Delivery

Outputs must be streamable from a public HTTPS origin via ingress.

If using Linux origin:

- serve final assets through nginx under Animatrix public domain

If using S3-backed storage:

- use signed or public-read delivery depending on account mode

## 22. Quality Profiles

Animatrix must expose productized quality profiles rather than raw step counts to users.

### 22.1 Draft

Purpose:

- internal ideation
- faster previews

Behavior:

- lower resolution
- lower steps
- acceleration LoRA allowed

### 22.2 Standard

Purpose:

- most normal production runs

Behavior:

- balanced speed and quality
- conservative defaults
- no quality-destructive shortcuts unless explicitly enabled

### 22.3 High

Purpose:

- demo and delivery quality

Behavior:

- higher quality model variant when available
- larger resolution
- longer runtime accepted

## 23. Error Handling

Failure classes:

- missing asset
- invalid asset format
- unsupported aspect ratio
- workflow binding failure
- ComfyUI upload failure
- ComfyUI queue failure
- generation timeout
- result collection failure

User-facing errors must be simplified.

Operator-facing logs must preserve exact failure cause.

## 23A. Validation Rules

### Shared Validation

- reject empty prompt if prompt is required by the selected workflow profile
- reject missing ground-truth image
- reject unsupported file extensions
- reject files above configured upload limit

### Animate Studio Validation

- reject missing motion video
- reject unsupported source video codecs that cannot be normalized
- reject conflicting `move` and `mix` settings

### Audio Performance Studio Validation

- reject missing audio
- reject audio longer than configured maximum duration for the selected profile
- normalize sample rate before workflow submission

### Pose Sheet Validation

- accept only supported image formats
- cap pose sheet image count in v1
- mark pose sheet as "soft guidance" in job metadata unless a later hard-control pipeline is introduced

## 24. Observability

Minimum operational telemetry:

- job creation rate
- queue depth
- mean wait time
- mean generation time by workflow
- failure rate by workflow version
- storage growth
- top asset sizes

Required correlation identifiers:

- `job_id`
- `asset_id`
- `comfy_prompt_id`

## 25. Security and Access Control

Rules:

- do not expose raw ComfyUI publicly to end users as the product surface
- backend owns ComfyUI credentials and workflow orchestration
- validate file size and MIME type on upload
- strip executable uploads
- limit accepted formats
- preserve audit trail for every run

## 26. Team and Operator UX

The system must support:

- internal team usage through the stable ingress
- supportable operator triage
- easy workflow version rollback
- safe demo usage during sales calls

Operators need:

- admin queue view
- job replay
- access to input and output manifests
- workflow version annotation

## 27. Non-Functional Requirements

### 27.1 Reliability

- no direct dependency on ephemeral GPU public IP
- graceful retry around ComfyUI upload and history polling
- job state persisted outside memory

### 27.2 Performance

- fast upload validation
- async polling and result collection
- cached thumbnails

### 27.3 Scalability

- workflow templates stateless
- API horizontally scalable
- storage externalized

### 27.4 Maintainability

- one source-of-truth workflow config per mode
- explicit model manifest
- no hidden hand-edited production JSON

### 27.5 Sales Readiness

- stable hostname
- reliable queue messaging
- polished success and failure states
- deterministic demo inputs

## 27A. Demo and Commercial Readiness Requirements

Animatrix will be used in live demos and pre-sales conversations. That changes the bar.

Required product behavior:

- first meaningful UI paint fast enough for live sales use
- one-click sample project loading for demo mode
- clear progress messaging during long generations
- shareable output URL or operator download path
- no raw ComfyUI terminology in the customer-facing layer unless explicitly in admin mode

Required operator support behavior:

- known-good demo assets packaged and versioned
- visible warning when GPU queue is saturated
- ability to retry a failed job without recreating all metadata manually

## 28. MVP Acceptance Criteria

Animatrix v1 is only considered complete when all of the following are true:

1. A user can upload a ground-truth image, type a prompt, attach a motion video, select `Move` or `Mix`, and receive a finished video output.
2. A user can upload a ground-truth image, type a prompt, attach audio, and receive an audio-driven character video.
3. Both flows work through the stable Desineuron ingress model and do not depend on hardcoded GPU IPs.
4. Every run produces a persisted job record and output manifest.
5. Generated videos are streamable over HTTPS.
6. Operators can inspect job state and correlate product job ID to ComfyUI prompt ID.
7. The UI remains simple enough for a non-technical demo operator.

## 29. Explicit Product Decisions

### 29.1 What v1 Must Say No To

Animatrix v1 must not claim:

- perfect deterministic pose-sheet control
- exact first and last frame locking
- full timeline editing
- full audio mastering

### 29.2 What v1 Must Say Yes To

Animatrix v1 can truthfully claim:

- guided character animation
- guided character replacement
- audio-driven talking or performance video
- reference-assisted generation
- production-safe simplified UI on top of ComfyUI

## 30. Recommended Delivery Phases

### Phase 1

- backend skeleton
- asset model
- one frozen Animate workflow
- one frozen S2V workflow
- barebones frontend

### Phase 2

- quality profiles
- operator dashboard
- output gallery
- S3 persistence

### Phase 3

- first/last-frame workflow
- stronger pose control
- reusable character libraries

## 31. Final Architecture Recommendation

Build Animatrix as a thin product layer over stable infrastructure that already exists:

- keep ComfyUI where it is
- keep ingress where it is
- add a dedicated Animatrix backend
- keep the frontend intentionally minimal
- treat workflow JSON as versioned software artifacts

Do not begin by building a large generic creative suite.

Build the narrowest saleable product first:

- `Animate Studio`
- `Audio Performance Studio`

Then expand to:

- `Start/End Frame Studio`
- `Pose Control Studio`

## 32. Bottom Line

Animatrix v1 should be a Flow-like creative surface backed by two real Wan 2.2 workflows, not one imaginary super-workflow.

The correct implementation target is:

- one frontend
- one orchestration backend
- two workflow families
- one stable ingress-compatible execution path
- one durable output system

If the team follows this document strictly, the result will be productizable, supportable, and compatible with the current Desineuron infrastructure without lying about model capabilities.