Files

Sagnik c7994d17a9 Initial Animatrix import

2026-04-17 19:11:57 +05:30

30 KiB

Raw Permalink Blame History

Animatrix Monolithic SRS - Wan 2.2 Flow Studio

Date: 2026-04-15

Authoring context: This document defines the first production-ready Animatrix system built on top of the existing Desineuron ingress, the current ComfyUI GPU service, and the Wan 2.2 model family.

1. Purpose

Animatrix is a focused product for guided character video generation. It is not a general-purpose node editor. It is a constrained, operator-safe application that exposes two production workflows behind one simple frontend:

Character Animation and Replacement using Wan2.2-Animate-14B
Audio-Driven Character Performance using Wan2.2-S2V-14B

The frontend interaction model is inspired by the simplicity and compositional feel of Google Flow, but the execution runtime is ComfyUI-backed and Desineuron-hosted.

The objective is to give users a minimal interface:

prompt box
ground-truth starting image upload
optional reference images and pose sheet uploads
optional audio upload
simple mode selection
one-click generation

while the backend handles:

asset ingestion
workflow selection
parameter validation
ComfyUI prompt orchestration
queueing
status tracking
result persistence
streaming-ready delivery

2. Executive Product Truth

Animatrix v1 must be built around the actual Wan 2.2 model split, not a blended assumption.

Capability mapping:

Wan2.2-Animate-14B is for character animation and character replacement.
Wan2.2-S2V-14B is for audio-driven video generation with dialogue, singing, and performance.
Wan2.2 Fun Inp is the Wan family workflow for strict first-frame and last-frame control.
Wan2.2 Fun Control is the Wan family workflow for stronger control-video inputs such as OpenPose, depth, canny, and trajectory control.

Therefore the first release must not falsely claim that one single model covers all of the following natively:

character replacement
motion transfer
audio lip-sync
exact first/last-frame constraints

It does not.

The correct v1 product line is:

Workflow A: Animate Studio on Wan2.2-Animate-14B
Workflow B: Audio Performance Studio on Wan2.2-S2V-14B

The correct v1.1 or v2 expansion is:

Workflow C: Start/End Frame Studio on Wan2.2 Fun Inp
Workflow D: Pose/Trajectory Control Studio on Wan2.2 Fun Control

This distinction is mandatory because it affects UI truthfulness, node graphs, validation rules, asset requirements, and customer expectations.

3. Source Truth and Rationale

This SRS is grounded in the following current sources:

official Wan 2.2 GitHub repository: https://github.com/Wan-Video/Wan2.2
official Wan 2.2 Animate model page: https://huggingface.co/Wan-AI/Wan2.2-Animate-14B
official ComfyUI Wan 2.2 docs:
- https://docs.comfy.org/tutorials/video/wan/wan2_2
- https://docs.comfy.org/tutorials/video/wan/wan2-2-animate
- https://docs.comfy.org/tutorials/video/wan/wan2-2-s2v
- https://docs.comfy.org/tutorials/video/wan/wan2-2-fun-inp
- https://docs.comfy.org/tutorials/video/wan/wan2-2-fun-control
current Desineuron infrastructure truth:
- [comfyui_setup_truth.md](F:\Workin In Progress\DESINEURON\GITLAB\Project_Velocity.Agent Context\Sprint 1\comfyui_setup_truth.md)
- [Desineuron Stable Ingress Handoff.md](F:\Workin In Progress\DESINEURON\GITLAB\Project_Velocity.Agent Context\Sprint 1\Desineuron Stable Ingress Handoff.md)

Critical source-backed facts that drive the design:

ComfyUI is already exposed safely through https://comfy.desineuron.in
the GPU service already runs behind stable ingress
Wan2.2-Animate-14B supports two operating modes in ComfyUI docs: Mix and Move
Wan2.2-S2V-14B is the audio-driven workflow with image plus audio inputs
ComfyUI’s official Animate docs require additional custom nodes for the full direct workflow
exact start/end-frame control is documented under Wan2.2 Fun Inp, not Animate

4. Product Vision

Animatrix should behave like a focused video creation surface, not like a research sandbox.

The product promise is:

"Upload a hero frame, optionally attach references, pose guidance, or audio, write a prompt, and generate a directed character video without touching ComfyUI nodes."

The UI must feel lightweight, but the execution system behind it must be opinionated and rigid enough to be supportable.

That means:

limited number of modes
strict validation
controlled presets
reproducible workflow JSON
consistent output formats
no raw-node exposure in the customer-facing frontend

5. Scope

5.1 In Scope for v1

one frontend
one backend API
two ComfyUI production workflows
status and result tracking
stable ingress compatibility
persistent storage for uploads and outputs
preview and download experience
operator-oriented logging and troubleshooting
support for team usage through the existing Desineuron architecture

5.2 Out of Scope for v1

arbitrary node editing by end users
live collaborative editing
in-browser timeline editing
multi-scene stitching
automatic sound effects design
full NLE replacement
customer-facing batch farms
fine-tuning or LoRA training

5.3 Deferred but Planned

first/last-frame exact control as a third workflow using Wan2.2 Fun Inp
stronger pose or trajectory control using Wan2.2 Fun Control
style packs and prompt presets
branded credits or quota system
user libraries and reusable character packs

6. User Personas

6.1 Internal Creative Operator

This user understands creative direction but should not need to edit node graphs. They need:

fast iteration
predictable inputs
reliable outputs
access to previous runs

6.2 Sales Demo Operator

This user needs a polished experience that can be shown live. They need:

simple UX
low operator error
dependable queue feedback
visible result cards

6.3 Technical Media Designer

This user understands reference material quality and wants more control without dropping into raw ComfyUI. They need:

reference images
pose sheet upload
clear mode distinctions
optional advanced settings

7. Functional Overview

Animatrix v1 will contain one shell product with two generation modes.

7.1 Mode A: Animate Studio

Underlying engine:

Wan2.2-Animate-14B

Primary purpose:

animate a character from a source image using the motion and expression from a source video
replace the subject in a video with a new character image

Sub-modes:

Move
Mix

User inputs:

prompt
ground-truth character image, required
source motion video, required
optional reference images
optional pose sheet image set
optional aspect preset
optional duration target

7.2 Mode B: Audio Performance Studio

Underlying engine:

Wan2.2-S2V-14B

Primary purpose:

generate a character video from a static image and audio input
support dialogue, singing, and audio-driven performance

User inputs:

prompt
ground-truth character image, required
source audio, required
optional reference images
optional pose sheet image set
optional full-body / half-body framing preset
optional duration target inferred from audio length

8. Frontend Vision

The frontend must preserve the interaction language shown in the reference screenshots:

one large prompt composer
image chips at the top-left of the composer
plus button for additional attachments
compact right-aligned mode selector
advanced settings revealed through a controlled panel, not always visible

The frontend should feel immediate, not enterprise-heavy.

8.1 Core Layout

Top-level zones:

Attachment rail
Prompt composer
Optional advanced drawer
Generate action and mode switch
Run history / output gallery below

8.2 Attachment Types

Attachment chips in v1:

Ground Truth
Reference
Pose Sheet
Audio
Motion Video

Visibility rules:

Ground Truth always available
Motion Video visible only in Animate Studio
Audio visible only in Audio Performance Studio
Pose Sheet optional in both modes
Reference optional in both modes

8.3 Frontend Controls

Base controls:

prompt text area
optional keyword helper line
mode toggle: Animate / Audio
output aspect toggle: 9:16, 16:9, later 1:1
quality profile: Draft, Standard, High
generate button

Advanced controls:

Animate sub-mode: Move / Mix
target duration
seed
negative prompt
extension segments
background preservation flag
relighting flag
lip-sync intensity or audio adherence preset

8.4 UX Rules

do not expose raw model names to standard users
use user language like Animate, Replace Character, Audio Performance
surface warnings before submission if required inputs are missing
show asset previews as compact rounded chips
keep advanced panel collapsed by default

9. Exact Capability Mapping by Workflow

9.1 Workflow A: Animate Studio

Supported in v1:

character animation from image plus motion video
character replacement from image plus source video
prompt conditioning
optional pose preprocessing
iterative video extension

Not truly supported by Animate Studio itself:

direct audio-driven lip sync
strict start/end-frame guarantees

9.2 Workflow B: Audio Performance Studio

Supported in v1:

image plus audio driven generation
prompt-conditioned motion/environment
dialogue and singing style use cases
long-form generation by extension chunks

Not truly supported by S2V itself:

guaranteed subject replacement from an existing motion video
exact last-frame lock

9.3 Pose Sheet Truth

The user-requested pose sheet can be supported in two ways:

Soft support in v1
- pose sheet stored as reference asset
- backend uses it for prompt augmentation and optional preprocessing assistance
- operator can map selected sheet frames to manual key pose hints
Hard support in later release
- migrate pose guidance to a dedicated Wan2.2 Fun Control or equivalent control-video workflow

The v1 document must state this honestly. A static pose sheet is not the same as a control video. It helps guide generation but does not become full deterministic motion control without an additional preprocessing and control pipeline.

10. Ground Truth Asset Model

The user’s "ground truth" image is the canonical identity anchor.

In both workflows it must serve as:

the primary subject identity reference
the default starting visual state
the basis for preview thumbnails

Rules:

exactly one primary ground-truth image per run
image must pass minimum size and aspect checks
background should preferably be clean but not mandatory
user may crop or center the character before submission

Optional extension:

future support for multiple identity references per character pack

11. Workflow Architecture

11.1 System Shape

Browser
  -> Animatrix frontend
  -> Animatrix API
     -> job store
     -> asset store
     -> workflow composer
     -> ComfyUI client
        -> https://comfy.desineuron.in
           -> GPU ComfyUI service
              -> Wan2.2 workflow execution
     -> result collector
     -> output persistence
  -> result CDN / static delivery

11.2 Architectural Rule

The frontend must never submit raw prompts directly to ComfyUI.

The backend must always mediate:

asset upload
workflow selection
workflow JSON parameter binding
run metadata persistence
output tracking

This is required for observability, rate control, product safety, and sales-readiness.

12. Ingress and Deployment Compatibility

Animatrix must be designed around the current Desineuron ingress truth.

Current infrastructure constraints:

ComfyUI is already live at https://comfy.desineuron.in
ComfyUI runs behind AWS ingress and stable TLS
GPU private IP is not a stable application contract
Linux origin is currently 192.168.1.2

12.1 Mandatory Integration Rule

Animatrix backend must integrate with ComfyUI through the stable hostname or through a controlled internal service abstraction that resolves to the same managed route.

Do not bind Animatrix to:

the GPU public IP
direct 8188 public traffic
hardcoded current private IP

12.2 Recommended Host Layout

Recommended public routing:

animatrix.desineuron.in -> frontend and public product shell
api.animatrix.desineuron.in or animatrix.desineuron.in/api -> backend API
comfy.desineuron.in -> internal execution dependency only, not user-facing

If separate subdomains are not created immediately, the fallback deployment pattern may mirror the current Velocity site pattern:

frontend served from Linux origin through ingress
backend served from Linux origin through ingress
backend calls ComfyUI through https://comfy.desineuron.in

13. Runtime Components

13.1 Frontend Application

Responsibilities:

render simplified generation interface
manage uploads
validate user fields before submit
create job requests
poll or subscribe to job progress
render previews and outputs

Suggested stack:

Next.js or Vite React app
Tailwind or CSS modules
upload components with image/audio/video preview

13.2 Animatrix Backend API

Responsibilities:

receive upload metadata
store files
generate canonical run record
choose workflow template
bind node inputs
submit prompt payload to ComfyUI
track prompt ID and history
collect generated outputs
persist result artifacts

Suggested stack:

FastAPI if aligned with existing Python-heavy operations
or Node/TypeScript only if the team wants one frontend-backend language

Recommendation:

use Python FastAPI for v1 if reusing current Desineuron operational style and image/media tooling

13.3 Workflow Composer

Responsibilities:

keep frozen template JSON files in version control
inject prompt text, model selections, size, length, and asset paths
enforce mode-specific constraints

This component must be deterministic. It is not a prompt improviser.

13.4 ComfyUI Execution Layer

Responsibilities:

execute pre-approved workflow JSON
expose queue, prompt, history, upload endpoints
return output metadata

13.5 Asset Store

Responsibilities:

raw upload persistence
normalized derivative generation
final output video persistence
preview image generation

Recommended storage split:

hot local cache on Linux origin
durable object storage in S3 for long-term retention

13A. Current Infrastructure Contract

Animatrix v1 must be compatible with the currently operating Desineuron media stack as it exists today.

Live execution truth:

public ComfyUI hostname: https://comfy.desineuron.in
ingress elastic IP: 98.87.120.120
GPU private target currently managed behind ingress
Linux origin currently: 192.168.1.2

Current GPU-side storage truth:

ComfyUI app root: /opt/dlami/nvme/ComfyUI
HF cache: /opt/dlami/nvme/hf
model staging root: /opt/dlami/nvme/model-staging
model logs: /opt/dlami/nvme/model-logs

Current model hydration truth:

durable bucket family already in use: s3://project-velocity/models/
existing Wan hydration prefix: s3://project-velocity/models/Wan2.2-Animate-14B/

Animatrix must not introduce a second contradictory deployment path for ComfyUI. It must reuse this stable route and storage discipline.

13B. ComfyUI API Contract

The backend integration layer must be implemented against the current ComfyUI HTTP contract.

Required endpoints:

GET /
POST /prompt
GET /history/{prompt_id}
GET /queue
POST /upload/image

Recommended extension checks:

health probe against /
prompt submission response validation
history polling with bounded backoff
queue introspection for operator dashboards

The backend must wrap these endpoints in a typed client and must not scatter raw HTTP calls throughout business logic.

13C. Model and Node Manifest

Workflow A: Animate Studio Required Assets

Required model family:

Wan2.2-Animate-14B
clip_vision_h.safetensors
wan_2.1_vae.safetensors
umt5_xxl_fp8_e4m3fn_scaled.safetensors

Required custom nodes:

ComfyUI-KJNodes
ComfyUI-comfyui_controlnet_aux

Suggested placement contract:

diffusion model files under ComfyUI/models/diffusion_models/
text encoder under ComfyUI/models/text_encoders/
VAE under ComfyUI/models/vae/
CLIP Vision under ComfyUI/models/clip_vision/

Workflow B: Audio Performance Studio Required Assets

Required model family:

wan2.2_s2v_14B_fp8_scaled.safetensors or wan2.2_s2v_14B_bf16.safetensors
wav2vec2_large_english_fp16.safetensors
wan_2.1_vae.safetensors
umt5_xxl_fp8_e4m3fn_scaled.safetensors

Suggested placement contract:

diffusion model under ComfyUI/models/diffusion_models/
text encoder under ComfyUI/models/text_encoders/
audio encoder under ComfyUI/models/audio_encoders/
VAE under ComfyUI/models/vae/

Deferred Workflow Assets

For future strict start/end-frame control:

Wan2.2 Fun Inp models and optional associated LoRAs

For future stronger pose control:

Wan2.2 Fun Control

The frontend and API must be written so these workflows can be added later without reworking the entire product shell.

14. File and Repository Blueprint

Animatrix should be structured as an application repository or top-level product directory with explicit separation between app, API, and workflow assets.

Recommended layout:

Animatrix/
  docs/
    Animatrix Monolithic SRS - Wan 2.2 Flow Studio.md
  frontend/
    src/
      app/
      components/
      features/
      lib/
      styles/
  backend/
    app/
      api/
      services/
      models/
      repositories/
      workers/
  workflows/
    animate/
      wan22_animate_mix.json
      wan22_animate_move.json
    s2v/
      wan22_s2v_base.json
    shared/
      prompt_profiles/
      node_maps/
  scripts/
    deploy/
    media/
    sync/
  infra/
    systemd/
    nginx/
    caddy/
  tests/
    api/
    workflows/
    ui/

15. Workflow A Detailed Design: Animate Studio

15.1 Objective

Deliver a workflow that supports:

character replacement from a source video
character animation from a performer video
prompt-guided visual refinement

15.2 Input Contract

Required:

prompt
ground_truth_image
motion_video
mode: move or mix

Optional:

reference_images[]
pose_sheet_images[]
negative_prompt
duration_override_seconds
aspect_ratio
quality_profile
seed

15.3 Output Contract

Primary outputs:

video_mp4
poster_frame_jpg
job_manifest.json
debug_metadata.json

Secondary outputs:

pose preview if preprocessing is enabled
first-frame snapshot

15.4 Internal Workflow Stages

Ingest image and video
Normalize formats and dimensions
Extract first frame and thumbnail
Run optional DWPose or auxiliary preprocessing
Bind workflow JSON for move or mix
Upload normalized assets to ComfyUI
Submit workflow
Poll queue and history
Collect result paths
Persist final outputs and metadata

15.5 ComfyUI Notes

The official Animate workflow requires:

clip_vision_h.safetensors
wan_2.1_vae.safetensors
umt5_xxl_fp8_e4m3fn_scaled.safetensors
Animate diffusion model
optional Lightning LoRA
custom nodes:
- ComfyUI-KJNodes
- ComfyUI-comfyui_controlnet_aux

15.6 Product-Level Rule

Animatrix v1 must hide these internals from the standard UI, but the backend and operator docs must track them exactly.

16. Workflow B Detailed Design: Audio Performance Studio

16.1 Objective

Deliver a workflow that supports:

talking-head and half-body performance
singing and dialogue use cases
audio-driven facial and motion synthesis

16.2 Input Contract

Required:

prompt
ground_truth_image
audio_file

Optional:

reference_images[]
pose_sheet_images[]
negative_prompt
framing_mode: portrait, half_body, full_body
quality_profile
seed

16.3 Output Contract

Primary outputs:

video_mp4
poster_frame_jpg
job_manifest.json
debug_metadata.json

16.4 Internal Workflow Stages

Ingest image and audio
Normalize sample rate and file format
Infer required frame count from audio duration
Determine required S2V extension chunks
Bind workflow JSON
Upload image and audio to ComfyUI
Submit workflow
Poll queue and history
Collect output video
Persist artifacts

16.5 ComfyUI Notes

The official S2V workflow requires:

wan2.2_s2v_14B_fp8_scaled.safetensors or bf16 variant
wav2vec2_large_english_fp16.safetensors
wan_2.1_vae.safetensors
umt5_xxl_fp8_e4m3fn_scaled.safetensors

The ComfyUI docs note that:

fp8 uses less VRAM
bf16 may reduce quality degradation
Lightning LoRA can reduce generation time but can also significantly reduce quality and dynamics

Therefore Animatrix must default to:

Standard: fp8 without aggressive LoRA by default for customer-facing quality stability
Draft: fp8 with acceleration options
High: bf16 where hardware allows

17. UI-to-Workflow Mapping

The UI must map cleanly to backend request objects.

17.1 Shared Fields

mode
prompt
negative_prompt
ground_truth_asset_id
reference_asset_ids[]
pose_sheet_asset_ids[]
aspect_ratio
quality_profile
seed

17.2 Animate-Specific Fields

motion_video_asset_id
animate_submode
background_preservation
relighting
extension_segments

17.3 Audio-Specific Fields

audio_asset_id
framing_mode
audio_adherence_profile
extension_segments

18. Suggested Backend API

18.1 Asset Endpoints

POST /api/assets/image
POST /api/assets/video
POST /api/assets/audio
GET /api/assets/{asset_id}

18.2 Job Endpoints

POST /api/jobs/animate
POST /api/jobs/audio-performance
GET /api/jobs/{job_id}
GET /api/jobs/{job_id}/events
GET /api/jobs/{job_id}/outputs
POST /api/jobs/{job_id}/cancel

18.3 Admin Endpoints

GET /api/admin/workflows
GET /api/admin/health
GET /api/admin/queue
POST /api/admin/retry/{job_id}

18.4 Websocket or SSE Progress Channel

Recommended:

GET /api/jobs/{job_id}/stream

This should emit:

accepted
uploaded
queued
executing
collecting_outputs
completed
failed

The frontend should use this channel if available and fall back to polling if the connection drops.

19. Data Model

19.1 Asset

Fields:

asset_id
asset_type
mime_type
original_filename
storage_url
thumbnail_url
width
height
duration_seconds
size_bytes
created_at

19.2 Job

Fields:

job_id
mode
workflow_template
status
submitted_by
prompt
negative_prompt
settings_json
comfy_prompt_id
created_at
updated_at

19.3 JobOutput

Fields:

output_id
job_id
video_url
poster_url
manifest_url
duration_seconds
resolution
fps
created_at

20. Workflow Template Governance

Workflow JSON must be treated as versioned product assets.

Rules:

each production workflow JSON must have an immutable version identifier
node IDs must be mapped in a dedicated config file
backend parameter injection must never depend on informal manual node lookup
each workflow change must pass snapshot regression checks

Required metadata for every workflow:

workflow_name
workflow_version
model_family
required_assets
required_models
custom_nodes
compatible_backend_version

21. Storage and Delivery Design

21.1 Inputs

Store raw uploads in durable storage with stable references.

Recommended:

object storage in S3
local temporary cache for preprocessing

21.2 Outputs

Store:

mp4 output
poster image
optional animated preview
manifest json

21.3 Delivery

Outputs must be streamable from a public HTTPS origin via ingress.

If using Linux origin:

serve final assets through nginx under Animatrix public domain

If using S3-backed storage:

use signed or public-read delivery depending on account mode

22. Quality Profiles

Animatrix must expose productized quality profiles rather than raw step counts to users.

22.1 Draft

Purpose:

internal ideation
faster previews

Behavior:

lower resolution
lower steps
acceleration LoRA allowed

22.2 Standard

Purpose:

most normal production runs

Behavior:

balanced speed and quality
conservative defaults
no quality-destructive shortcuts unless explicitly enabled

22.3 High

Purpose:

demo and delivery quality

Behavior:

higher quality model variant when available
larger resolution
longer runtime accepted

23. Error Handling

Failure classes:

missing asset
invalid asset format
unsupported aspect ratio
workflow binding failure
ComfyUI upload failure
ComfyUI queue failure
generation timeout
result collection failure

User-facing errors must be simplified.

Operator-facing logs must preserve exact failure cause.

23A. Validation Rules

Shared Validation

reject empty prompt if prompt is required by the selected workflow profile
reject missing ground-truth image
reject unsupported file extensions
reject files above configured upload limit

Animate Studio Validation

reject missing motion video
reject unsupported source video codecs that cannot be normalized
reject conflicting move and mix settings

Audio Performance Studio Validation

reject missing audio
reject audio longer than configured maximum duration for the selected profile
normalize sample rate before workflow submission

Pose Sheet Validation

accept only supported image formats
cap pose sheet image count in v1
mark pose sheet as "soft guidance" in job metadata unless a later hard-control pipeline is introduced

24. Observability

Minimum operational telemetry:

job creation rate
queue depth
mean wait time
mean generation time by workflow
failure rate by workflow version
storage growth
top asset sizes

Required correlation identifiers:

job_id
asset_id
comfy_prompt_id

25. Security and Access Control

Rules:

do not expose raw ComfyUI publicly to end users as the product surface
backend owns ComfyUI credentials and workflow orchestration
validate file size and MIME type on upload
strip executable uploads
limit accepted formats
preserve audit trail for every run

26. Team and Operator UX

The system must support:

internal team usage through the stable ingress
supportable operator triage
easy workflow version rollback
safe demo usage during sales calls

Operators need:

admin queue view
job replay
access to input and output manifests
workflow version annotation

27. Non-Functional Requirements

27.1 Reliability

no direct dependency on ephemeral GPU public IP
graceful retry around ComfyUI upload and history polling
job state persisted outside memory

27.2 Performance

fast upload validation
async polling and result collection
cached thumbnails

27.3 Scalability

workflow templates stateless
API horizontally scalable
storage externalized

27.4 Maintainability

one source-of-truth workflow config per mode
explicit model manifest
no hidden hand-edited production JSON

27.5 Sales Readiness

stable hostname
reliable queue messaging
polished success and failure states
deterministic demo inputs

27A. Demo and Commercial Readiness Requirements

Animatrix will be used in live demos and pre-sales conversations. That changes the bar.

Required product behavior:

first meaningful UI paint fast enough for live sales use
one-click sample project loading for demo mode
clear progress messaging during long generations
shareable output URL or operator download path
no raw ComfyUI terminology in the customer-facing layer unless explicitly in admin mode

Required operator support behavior:

known-good demo assets packaged and versioned
visible warning when GPU queue is saturated
ability to retry a failed job without recreating all metadata manually

28. MVP Acceptance Criteria

Animatrix v1 is only considered complete when all of the following are true:

A user can upload a ground-truth image, type a prompt, attach a motion video, select Move or Mix, and receive a finished video output.
A user can upload a ground-truth image, type a prompt, attach audio, and receive an audio-driven character video.
Both flows work through the stable Desineuron ingress model and do not depend on hardcoded GPU IPs.
Every run produces a persisted job record and output manifest.
Generated videos are streamable over HTTPS.
Operators can inspect job state and correlate product job ID to ComfyUI prompt ID.
The UI remains simple enough for a non-technical demo operator.

29. Explicit Product Decisions

29.1 What v1 Must Say No To

Animatrix v1 must not claim:

perfect deterministic pose-sheet control
exact first and last frame locking
full timeline editing
full audio mastering

29.2 What v1 Must Say Yes To

Animatrix v1 can truthfully claim:

guided character animation
guided character replacement
audio-driven talking or performance video
reference-assisted generation
production-safe simplified UI on top of ComfyUI

30. Recommended Delivery Phases

Phase 1

backend skeleton
asset model
one frozen Animate workflow
one frozen S2V workflow
barebones frontend

Phase 2

quality profiles
operator dashboard
output gallery
S3 persistence

Phase 3

first/last-frame workflow
stronger pose control
reusable character libraries

31. Final Architecture Recommendation

Build Animatrix as a thin product layer over stable infrastructure that already exists:

keep ComfyUI where it is
keep ingress where it is
add a dedicated Animatrix backend
keep the frontend intentionally minimal
treat workflow JSON as versioned software artifacts

Do not begin by building a large generic creative suite.

Build the narrowest saleable product first:

Animate Studio
Audio Performance Studio

Then expand to:

Start/End Frame Studio
Pose Control Studio

32. Bottom Line

Animatrix v1 should be a Flow-like creative surface backed by two real Wan 2.2 workflows, not one imaginary super-workflow.

The correct implementation target is:

one frontend
one orchestration backend
two workflow families
one stable ingress-compatible execution path
one durable output system

If the team follows this document strictly, the result will be productizable, supportable, and compatible with the current Desineuron infrastructure without lying about model capabilities.

30 KiB Raw Permalink Blame History Unescape Escape

Animatrix Monolithic SRS - Wan 2.2 Flow Studio

1. Purpose

2. Executive Product Truth

3. Source Truth and Rationale

4. Product Vision

5. Scope

5.1 In Scope for v1

5.2 Out of Scope for v1

5.3 Deferred but Planned

6. User Personas

6.1 Internal Creative Operator

6.2 Sales Demo Operator

6.3 Technical Media Designer

7. Functional Overview

7.1 Mode A: Animate Studio

7.2 Mode B: Audio Performance Studio

8. Frontend Vision

8.1 Core Layout

8.2 Attachment Types

8.3 Frontend Controls

8.4 UX Rules

9. Exact Capability Mapping by Workflow

9.1 Workflow A: Animate Studio

9.2 Workflow B: Audio Performance Studio

9.3 Pose Sheet Truth

10. Ground Truth Asset Model

11. Workflow Architecture

11.1 System Shape

11.2 Architectural Rule

12. Ingress and Deployment Compatibility

12.1 Mandatory Integration Rule

12.2 Recommended Host Layout

13. Runtime Components

13.1 Frontend Application

13.2 Animatrix Backend API

13.3 Workflow Composer

13.4 ComfyUI Execution Layer

13.5 Asset Store

13A. Current Infrastructure Contract

13B. ComfyUI API Contract

13C. Model and Node Manifest

Workflow A: Animate Studio Required Assets

Workflow B: Audio Performance Studio Required Assets

Deferred Workflow Assets

14. File and Repository Blueprint

15. Workflow A Detailed Design: Animate Studio

15.1 Objective

15.2 Input Contract

15.3 Output Contract

15.4 Internal Workflow Stages

15.5 ComfyUI Notes

15.6 Product-Level Rule

16. Workflow B Detailed Design: Audio Performance Studio

16.1 Objective

16.2 Input Contract

16.3 Output Contract

16.4 Internal Workflow Stages

16.5 ComfyUI Notes

17. UI-to-Workflow Mapping

17.1 Shared Fields

17.2 Animate-Specific Fields

17.3 Audio-Specific Fields

18. Suggested Backend API

18.1 Asset Endpoints

18.2 Job Endpoints

18.3 Admin Endpoints

18.4 Websocket or SSE Progress Channel

19. Data Model

19.1 Asset

19.2 Job

19.3 JobOutput

20. Workflow Template Governance

21. Storage and Delivery Design

21.1 Inputs

21.2 Outputs

21.3 Delivery

22. Quality Profiles

22.1 Draft

22.2 Standard

30 KiB

Raw Permalink Blame History