Files
Project_Animatix/Docs/Animatrix Monolithic SRS - Wan 2.2 Flow Studio.md
2026-04-17 19:11:57 +05:30

30 KiB
Raw Blame History

Animatrix Monolithic SRS - Wan 2.2 Flow Studio

Date: 2026-04-15

Authoring context: This document defines the first production-ready Animatrix system built on top of the existing Desineuron ingress, the current ComfyUI GPU service, and the Wan 2.2 model family.

1. Purpose

Animatrix is a focused product for guided character video generation. It is not a general-purpose node editor. It is a constrained, operator-safe application that exposes two production workflows behind one simple frontend:

  1. Character Animation and Replacement using Wan2.2-Animate-14B
  2. Audio-Driven Character Performance using Wan2.2-S2V-14B

The frontend interaction model is inspired by the simplicity and compositional feel of Google Flow, but the execution runtime is ComfyUI-backed and Desineuron-hosted.

The objective is to give users a minimal interface:

  • prompt box
  • ground-truth starting image upload
  • optional reference images and pose sheet uploads
  • optional audio upload
  • simple mode selection
  • one-click generation

while the backend handles:

  • asset ingestion
  • workflow selection
  • parameter validation
  • ComfyUI prompt orchestration
  • queueing
  • status tracking
  • result persistence
  • streaming-ready delivery

2. Executive Product Truth

Animatrix v1 must be built around the actual Wan 2.2 model split, not a blended assumption.

Capability mapping:

  • Wan2.2-Animate-14B is for character animation and character replacement.
  • Wan2.2-S2V-14B is for audio-driven video generation with dialogue, singing, and performance.
  • Wan2.2 Fun Inp is the Wan family workflow for strict first-frame and last-frame control.
  • Wan2.2 Fun Control is the Wan family workflow for stronger control-video inputs such as OpenPose, depth, canny, and trajectory control.

Therefore the first release must not falsely claim that one single model covers all of the following natively:

  • character replacement
  • motion transfer
  • audio lip-sync
  • exact first/last-frame constraints

It does not.

The correct v1 product line is:

  • Workflow A: Animate Studio on Wan2.2-Animate-14B
  • Workflow B: Audio Performance Studio on Wan2.2-S2V-14B

The correct v1.1 or v2 expansion is:

  • Workflow C: Start/End Frame Studio on Wan2.2 Fun Inp
  • Workflow D: Pose/Trajectory Control Studio on Wan2.2 Fun Control

This distinction is mandatory because it affects UI truthfulness, node graphs, validation rules, asset requirements, and customer expectations.

3. Source Truth and Rationale

This SRS is grounded in the following current sources:

  • official Wan 2.2 GitHub repository: https://github.com/Wan-Video/Wan2.2
  • official Wan 2.2 Animate model page: https://huggingface.co/Wan-AI/Wan2.2-Animate-14B
  • official ComfyUI Wan 2.2 docs:
    • https://docs.comfy.org/tutorials/video/wan/wan2_2
    • https://docs.comfy.org/tutorials/video/wan/wan2-2-animate
    • https://docs.comfy.org/tutorials/video/wan/wan2-2-s2v
    • https://docs.comfy.org/tutorials/video/wan/wan2-2-fun-inp
    • https://docs.comfy.org/tutorials/video/wan/wan2-2-fun-control
  • current Desineuron infrastructure truth:
    • [comfyui_setup_truth.md](F:\Workin In Progress\DESINEURON\GITLAB\Project_Velocity.Agent Context\Sprint 1\comfyui_setup_truth.md)
    • [Desineuron Stable Ingress Handoff.md](F:\Workin In Progress\DESINEURON\GITLAB\Project_Velocity.Agent Context\Sprint 1\Desineuron Stable Ingress Handoff.md)

Critical source-backed facts that drive the design:

  • ComfyUI is already exposed safely through https://comfy.desineuron.in
  • the GPU service already runs behind stable ingress
  • Wan2.2-Animate-14B supports two operating modes in ComfyUI docs: Mix and Move
  • Wan2.2-S2V-14B is the audio-driven workflow with image plus audio inputs
  • ComfyUIs official Animate docs require additional custom nodes for the full direct workflow
  • exact start/end-frame control is documented under Wan2.2 Fun Inp, not Animate

4. Product Vision

Animatrix should behave like a focused video creation surface, not like a research sandbox.

The product promise is:

"Upload a hero frame, optionally attach references, pose guidance, or audio, write a prompt, and generate a directed character video without touching ComfyUI nodes."

The UI must feel lightweight, but the execution system behind it must be opinionated and rigid enough to be supportable.

That means:

  • limited number of modes
  • strict validation
  • controlled presets
  • reproducible workflow JSON
  • consistent output formats
  • no raw-node exposure in the customer-facing frontend

5. Scope

5.1 In Scope for v1

  • one frontend
  • one backend API
  • two ComfyUI production workflows
  • status and result tracking
  • stable ingress compatibility
  • persistent storage for uploads and outputs
  • preview and download experience
  • operator-oriented logging and troubleshooting
  • support for team usage through the existing Desineuron architecture

5.2 Out of Scope for v1

  • arbitrary node editing by end users
  • live collaborative editing
  • in-browser timeline editing
  • multi-scene stitching
  • automatic sound effects design
  • full NLE replacement
  • customer-facing batch farms
  • fine-tuning or LoRA training

5.3 Deferred but Planned

  • first/last-frame exact control as a third workflow using Wan2.2 Fun Inp
  • stronger pose or trajectory control using Wan2.2 Fun Control
  • style packs and prompt presets
  • branded credits or quota system
  • user libraries and reusable character packs

6. User Personas

6.1 Internal Creative Operator

This user understands creative direction but should not need to edit node graphs. They need:

  • fast iteration
  • predictable inputs
  • reliable outputs
  • access to previous runs

6.2 Sales Demo Operator

This user needs a polished experience that can be shown live. They need:

  • simple UX
  • low operator error
  • dependable queue feedback
  • visible result cards

6.3 Technical Media Designer

This user understands reference material quality and wants more control without dropping into raw ComfyUI. They need:

  • reference images
  • pose sheet upload
  • clear mode distinctions
  • optional advanced settings

7. Functional Overview

Animatrix v1 will contain one shell product with two generation modes.

7.1 Mode A: Animate Studio

Underlying engine:

  • Wan2.2-Animate-14B

Primary purpose:

  • animate a character from a source image using the motion and expression from a source video
  • replace the subject in a video with a new character image

Sub-modes:

  • Move
  • Mix

User inputs:

  • prompt
  • ground-truth character image, required
  • source motion video, required
  • optional reference images
  • optional pose sheet image set
  • optional aspect preset
  • optional duration target

7.2 Mode B: Audio Performance Studio

Underlying engine:

  • Wan2.2-S2V-14B

Primary purpose:

  • generate a character video from a static image and audio input
  • support dialogue, singing, and audio-driven performance

User inputs:

  • prompt
  • ground-truth character image, required
  • source audio, required
  • optional reference images
  • optional pose sheet image set
  • optional full-body / half-body framing preset
  • optional duration target inferred from audio length

8. Frontend Vision

The frontend must preserve the interaction language shown in the reference screenshots:

  • one large prompt composer
  • image chips at the top-left of the composer
  • plus button for additional attachments
  • compact right-aligned mode selector
  • advanced settings revealed through a controlled panel, not always visible

The frontend should feel immediate, not enterprise-heavy.

8.1 Core Layout

Top-level zones:

  1. Attachment rail
  2. Prompt composer
  3. Optional advanced drawer
  4. Generate action and mode switch
  5. Run history / output gallery below

8.2 Attachment Types

Attachment chips in v1:

  • Ground Truth
  • Reference
  • Pose Sheet
  • Audio
  • Motion Video

Visibility rules:

  • Ground Truth always available
  • Motion Video visible only in Animate Studio
  • Audio visible only in Audio Performance Studio
  • Pose Sheet optional in both modes
  • Reference optional in both modes

8.3 Frontend Controls

Base controls:

  • prompt text area
  • optional keyword helper line
  • mode toggle: Animate / Audio
  • output aspect toggle: 9:16, 16:9, later 1:1
  • quality profile: Draft, Standard, High
  • generate button

Advanced controls:

  • Animate sub-mode: Move / Mix
  • target duration
  • seed
  • negative prompt
  • extension segments
  • background preservation flag
  • relighting flag
  • lip-sync intensity or audio adherence preset

8.4 UX Rules

  • do not expose raw model names to standard users
  • use user language like Animate, Replace Character, Audio Performance
  • surface warnings before submission if required inputs are missing
  • show asset previews as compact rounded chips
  • keep advanced panel collapsed by default

9. Exact Capability Mapping by Workflow

9.1 Workflow A: Animate Studio

Supported in v1:

  • character animation from image plus motion video
  • character replacement from image plus source video
  • prompt conditioning
  • optional pose preprocessing
  • iterative video extension

Not truly supported by Animate Studio itself:

  • direct audio-driven lip sync
  • strict start/end-frame guarantees

9.2 Workflow B: Audio Performance Studio

Supported in v1:

  • image plus audio driven generation
  • prompt-conditioned motion/environment
  • dialogue and singing style use cases
  • long-form generation by extension chunks

Not truly supported by S2V itself:

  • guaranteed subject replacement from an existing motion video
  • exact last-frame lock

9.3 Pose Sheet Truth

The user-requested pose sheet can be supported in two ways:

  1. Soft support in v1

    • pose sheet stored as reference asset
    • backend uses it for prompt augmentation and optional preprocessing assistance
    • operator can map selected sheet frames to manual key pose hints
  2. Hard support in later release

    • migrate pose guidance to a dedicated Wan2.2 Fun Control or equivalent control-video workflow

The v1 document must state this honestly. A static pose sheet is not the same as a control video. It helps guide generation but does not become full deterministic motion control without an additional preprocessing and control pipeline.

10. Ground Truth Asset Model

The users "ground truth" image is the canonical identity anchor.

In both workflows it must serve as:

  • the primary subject identity reference
  • the default starting visual state
  • the basis for preview thumbnails

Rules:

  • exactly one primary ground-truth image per run
  • image must pass minimum size and aspect checks
  • background should preferably be clean but not mandatory
  • user may crop or center the character before submission

Optional extension:

  • future support for multiple identity references per character pack

11. Workflow Architecture

11.1 System Shape

Browser
  -> Animatrix frontend
  -> Animatrix API
     -> job store
     -> asset store
     -> workflow composer
     -> ComfyUI client
        -> https://comfy.desineuron.in
           -> GPU ComfyUI service
              -> Wan2.2 workflow execution
     -> result collector
     -> output persistence
  -> result CDN / static delivery

11.2 Architectural Rule

The frontend must never submit raw prompts directly to ComfyUI.

The backend must always mediate:

  • asset upload
  • workflow selection
  • workflow JSON parameter binding
  • run metadata persistence
  • output tracking

This is required for observability, rate control, product safety, and sales-readiness.

12. Ingress and Deployment Compatibility

Animatrix must be designed around the current Desineuron ingress truth.

Current infrastructure constraints:

  • ComfyUI is already live at https://comfy.desineuron.in
  • ComfyUI runs behind AWS ingress and stable TLS
  • GPU private IP is not a stable application contract
  • Linux origin is currently 192.168.1.2

12.1 Mandatory Integration Rule

Animatrix backend must integrate with ComfyUI through the stable hostname or through a controlled internal service abstraction that resolves to the same managed route.

Do not bind Animatrix to:

  • the GPU public IP
  • direct 8188 public traffic
  • hardcoded current private IP

Recommended public routing:

  • animatrix.desineuron.in -> frontend and public product shell
  • api.animatrix.desineuron.in or animatrix.desineuron.in/api -> backend API
  • comfy.desineuron.in -> internal execution dependency only, not user-facing

If separate subdomains are not created immediately, the fallback deployment pattern may mirror the current Velocity site pattern:

  • frontend served from Linux origin through ingress
  • backend served from Linux origin through ingress
  • backend calls ComfyUI through https://comfy.desineuron.in

13. Runtime Components

13.1 Frontend Application

Responsibilities:

  • render simplified generation interface
  • manage uploads
  • validate user fields before submit
  • create job requests
  • poll or subscribe to job progress
  • render previews and outputs

Suggested stack:

  • Next.js or Vite React app
  • Tailwind or CSS modules
  • upload components with image/audio/video preview

13.2 Animatrix Backend API

Responsibilities:

  • receive upload metadata
  • store files
  • generate canonical run record
  • choose workflow template
  • bind node inputs
  • submit prompt payload to ComfyUI
  • track prompt ID and history
  • collect generated outputs
  • persist result artifacts

Suggested stack:

  • FastAPI if aligned with existing Python-heavy operations
  • or Node/TypeScript only if the team wants one frontend-backend language

Recommendation:

  • use Python FastAPI for v1 if reusing current Desineuron operational style and image/media tooling

13.3 Workflow Composer

Responsibilities:

  • keep frozen template JSON files in version control
  • inject prompt text, model selections, size, length, and asset paths
  • enforce mode-specific constraints

This component must be deterministic. It is not a prompt improviser.

13.4 ComfyUI Execution Layer

Responsibilities:

  • execute pre-approved workflow JSON
  • expose queue, prompt, history, upload endpoints
  • return output metadata

13.5 Asset Store

Responsibilities:

  • raw upload persistence
  • normalized derivative generation
  • final output video persistence
  • preview image generation

Recommended storage split:

  • hot local cache on Linux origin
  • durable object storage in S3 for long-term retention

13A. Current Infrastructure Contract

Animatrix v1 must be compatible with the currently operating Desineuron media stack as it exists today.

Live execution truth:

  • public ComfyUI hostname: https://comfy.desineuron.in
  • ingress elastic IP: 98.87.120.120
  • GPU private target currently managed behind ingress
  • Linux origin currently: 192.168.1.2

Current GPU-side storage truth:

  • ComfyUI app root: /opt/dlami/nvme/ComfyUI
  • HF cache: /opt/dlami/nvme/hf
  • model staging root: /opt/dlami/nvme/model-staging
  • model logs: /opt/dlami/nvme/model-logs

Current model hydration truth:

  • durable bucket family already in use: s3://project-velocity/models/
  • existing Wan hydration prefix: s3://project-velocity/models/Wan2.2-Animate-14B/

Animatrix must not introduce a second contradictory deployment path for ComfyUI. It must reuse this stable route and storage discipline.

13B. ComfyUI API Contract

The backend integration layer must be implemented against the current ComfyUI HTTP contract.

Required endpoints:

  • GET /
  • POST /prompt
  • GET /history/{prompt_id}
  • GET /queue
  • POST /upload/image

Recommended extension checks:

  • health probe against /
  • prompt submission response validation
  • history polling with bounded backoff
  • queue introspection for operator dashboards

The backend must wrap these endpoints in a typed client and must not scatter raw HTTP calls throughout business logic.

13C. Model and Node Manifest

Workflow A: Animate Studio Required Assets

Required model family:

  • Wan2.2-Animate-14B
  • clip_vision_h.safetensors
  • wan_2.1_vae.safetensors
  • umt5_xxl_fp8_e4m3fn_scaled.safetensors

Required custom nodes:

  • ComfyUI-KJNodes
  • ComfyUI-comfyui_controlnet_aux

Suggested placement contract:

  • diffusion model files under ComfyUI/models/diffusion_models/
  • text encoder under ComfyUI/models/text_encoders/
  • VAE under ComfyUI/models/vae/
  • CLIP Vision under ComfyUI/models/clip_vision/

Workflow B: Audio Performance Studio Required Assets

Required model family:

  • wan2.2_s2v_14B_fp8_scaled.safetensors or wan2.2_s2v_14B_bf16.safetensors
  • wav2vec2_large_english_fp16.safetensors
  • wan_2.1_vae.safetensors
  • umt5_xxl_fp8_e4m3fn_scaled.safetensors

Suggested placement contract:

  • diffusion model under ComfyUI/models/diffusion_models/
  • text encoder under ComfyUI/models/text_encoders/
  • audio encoder under ComfyUI/models/audio_encoders/
  • VAE under ComfyUI/models/vae/

Deferred Workflow Assets

For future strict start/end-frame control:

  • Wan2.2 Fun Inp models and optional associated LoRAs

For future stronger pose control:

  • Wan2.2 Fun Control

The frontend and API must be written so these workflows can be added later without reworking the entire product shell.

14. File and Repository Blueprint

Animatrix should be structured as an application repository or top-level product directory with explicit separation between app, API, and workflow assets.

Recommended layout:

Animatrix/
  docs/
    Animatrix Monolithic SRS - Wan 2.2 Flow Studio.md
  frontend/
    src/
      app/
      components/
      features/
      lib/
      styles/
  backend/
    app/
      api/
      services/
      models/
      repositories/
      workers/
  workflows/
    animate/
      wan22_animate_mix.json
      wan22_animate_move.json
    s2v/
      wan22_s2v_base.json
    shared/
      prompt_profiles/
      node_maps/
  scripts/
    deploy/
    media/
    sync/
  infra/
    systemd/
    nginx/
    caddy/
  tests/
    api/
    workflows/
    ui/

15. Workflow A Detailed Design: Animate Studio

15.1 Objective

Deliver a workflow that supports:

  • character replacement from a source video
  • character animation from a performer video
  • prompt-guided visual refinement

15.2 Input Contract

Required:

  • prompt
  • ground_truth_image
  • motion_video
  • mode: move or mix

Optional:

  • reference_images[]
  • pose_sheet_images[]
  • negative_prompt
  • duration_override_seconds
  • aspect_ratio
  • quality_profile
  • seed

15.3 Output Contract

Primary outputs:

  • video_mp4
  • poster_frame_jpg
  • job_manifest.json
  • debug_metadata.json

Secondary outputs:

  • pose preview if preprocessing is enabled
  • first-frame snapshot

15.4 Internal Workflow Stages

  1. Ingest image and video
  2. Normalize formats and dimensions
  3. Extract first frame and thumbnail
  4. Run optional DWPose or auxiliary preprocessing
  5. Bind workflow JSON for move or mix
  6. Upload normalized assets to ComfyUI
  7. Submit workflow
  8. Poll queue and history
  9. Collect result paths
  10. Persist final outputs and metadata

15.5 ComfyUI Notes

The official Animate workflow requires:

  • clip_vision_h.safetensors
  • wan_2.1_vae.safetensors
  • umt5_xxl_fp8_e4m3fn_scaled.safetensors
  • Animate diffusion model
  • optional Lightning LoRA
  • custom nodes:
    • ComfyUI-KJNodes
    • ComfyUI-comfyui_controlnet_aux

15.6 Product-Level Rule

Animatrix v1 must hide these internals from the standard UI, but the backend and operator docs must track them exactly.

16. Workflow B Detailed Design: Audio Performance Studio

16.1 Objective

Deliver a workflow that supports:

  • talking-head and half-body performance
  • singing and dialogue use cases
  • audio-driven facial and motion synthesis

16.2 Input Contract

Required:

  • prompt
  • ground_truth_image
  • audio_file

Optional:

  • reference_images[]
  • pose_sheet_images[]
  • negative_prompt
  • framing_mode: portrait, half_body, full_body
  • quality_profile
  • seed

16.3 Output Contract

Primary outputs:

  • video_mp4
  • poster_frame_jpg
  • job_manifest.json
  • debug_metadata.json

16.4 Internal Workflow Stages

  1. Ingest image and audio
  2. Normalize sample rate and file format
  3. Infer required frame count from audio duration
  4. Determine required S2V extension chunks
  5. Bind workflow JSON
  6. Upload image and audio to ComfyUI
  7. Submit workflow
  8. Poll queue and history
  9. Collect output video
  10. Persist artifacts

16.5 ComfyUI Notes

The official S2V workflow requires:

  • wan2.2_s2v_14B_fp8_scaled.safetensors or bf16 variant
  • wav2vec2_large_english_fp16.safetensors
  • wan_2.1_vae.safetensors
  • umt5_xxl_fp8_e4m3fn_scaled.safetensors

The ComfyUI docs note that:

  • fp8 uses less VRAM
  • bf16 may reduce quality degradation
  • Lightning LoRA can reduce generation time but can also significantly reduce quality and dynamics

Therefore Animatrix must default to:

  • Standard: fp8 without aggressive LoRA by default for customer-facing quality stability
  • Draft: fp8 with acceleration options
  • High: bf16 where hardware allows

17. UI-to-Workflow Mapping

The UI must map cleanly to backend request objects.

17.1 Shared Fields

  • mode
  • prompt
  • negative_prompt
  • ground_truth_asset_id
  • reference_asset_ids[]
  • pose_sheet_asset_ids[]
  • aspect_ratio
  • quality_profile
  • seed

17.2 Animate-Specific Fields

  • motion_video_asset_id
  • animate_submode
  • background_preservation
  • relighting
  • extension_segments

17.3 Audio-Specific Fields

  • audio_asset_id
  • framing_mode
  • audio_adherence_profile
  • extension_segments

18. Suggested Backend API

18.1 Asset Endpoints

  • POST /api/assets/image
  • POST /api/assets/video
  • POST /api/assets/audio
  • GET /api/assets/{asset_id}

18.2 Job Endpoints

  • POST /api/jobs/animate
  • POST /api/jobs/audio-performance
  • GET /api/jobs/{job_id}
  • GET /api/jobs/{job_id}/events
  • GET /api/jobs/{job_id}/outputs
  • POST /api/jobs/{job_id}/cancel

18.3 Admin Endpoints

  • GET /api/admin/workflows
  • GET /api/admin/health
  • GET /api/admin/queue
  • POST /api/admin/retry/{job_id}

18.4 Websocket or SSE Progress Channel

Recommended:

  • GET /api/jobs/{job_id}/stream

This should emit:

  • accepted
  • uploaded
  • queued
  • executing
  • collecting_outputs
  • completed
  • failed

The frontend should use this channel if available and fall back to polling if the connection drops.

19. Data Model

19.1 Asset

Fields:

  • asset_id
  • asset_type
  • mime_type
  • original_filename
  • storage_url
  • thumbnail_url
  • width
  • height
  • duration_seconds
  • size_bytes
  • created_at

19.2 Job

Fields:

  • job_id
  • mode
  • workflow_template
  • status
  • submitted_by
  • prompt
  • negative_prompt
  • settings_json
  • comfy_prompt_id
  • created_at
  • updated_at

19.3 JobOutput

Fields:

  • output_id
  • job_id
  • video_url
  • poster_url
  • manifest_url
  • duration_seconds
  • resolution
  • fps
  • created_at

20. Workflow Template Governance

Workflow JSON must be treated as versioned product assets.

Rules:

  • each production workflow JSON must have an immutable version identifier
  • node IDs must be mapped in a dedicated config file
  • backend parameter injection must never depend on informal manual node lookup
  • each workflow change must pass snapshot regression checks

Required metadata for every workflow:

  • workflow_name
  • workflow_version
  • model_family
  • required_assets
  • required_models
  • custom_nodes
  • compatible_backend_version

21. Storage and Delivery Design

21.1 Inputs

Store raw uploads in durable storage with stable references.

Recommended:

  • object storage in S3
  • local temporary cache for preprocessing

21.2 Outputs

Store:

  • mp4 output
  • poster image
  • optional animated preview
  • manifest json

21.3 Delivery

Outputs must be streamable from a public HTTPS origin via ingress.

If using Linux origin:

  • serve final assets through nginx under Animatrix public domain

If using S3-backed storage:

  • use signed or public-read delivery depending on account mode

22. Quality Profiles

Animatrix must expose productized quality profiles rather than raw step counts to users.

22.1 Draft

Purpose:

  • internal ideation
  • faster previews

Behavior:

  • lower resolution
  • lower steps
  • acceleration LoRA allowed

22.2 Standard

Purpose:

  • most normal production runs

Behavior:

  • balanced speed and quality
  • conservative defaults
  • no quality-destructive shortcuts unless explicitly enabled

22.3 High

Purpose:

  • demo and delivery quality

Behavior:

  • higher quality model variant when available
  • larger resolution
  • longer runtime accepted

23. Error Handling

Failure classes:

  • missing asset
  • invalid asset format
  • unsupported aspect ratio
  • workflow binding failure
  • ComfyUI upload failure
  • ComfyUI queue failure
  • generation timeout
  • result collection failure

User-facing errors must be simplified.

Operator-facing logs must preserve exact failure cause.

23A. Validation Rules

Shared Validation

  • reject empty prompt if prompt is required by the selected workflow profile
  • reject missing ground-truth image
  • reject unsupported file extensions
  • reject files above configured upload limit

Animate Studio Validation

  • reject missing motion video
  • reject unsupported source video codecs that cannot be normalized
  • reject conflicting move and mix settings

Audio Performance Studio Validation

  • reject missing audio
  • reject audio longer than configured maximum duration for the selected profile
  • normalize sample rate before workflow submission

Pose Sheet Validation

  • accept only supported image formats
  • cap pose sheet image count in v1
  • mark pose sheet as "soft guidance" in job metadata unless a later hard-control pipeline is introduced

24. Observability

Minimum operational telemetry:

  • job creation rate
  • queue depth
  • mean wait time
  • mean generation time by workflow
  • failure rate by workflow version
  • storage growth
  • top asset sizes

Required correlation identifiers:

  • job_id
  • asset_id
  • comfy_prompt_id

25. Security and Access Control

Rules:

  • do not expose raw ComfyUI publicly to end users as the product surface
  • backend owns ComfyUI credentials and workflow orchestration
  • validate file size and MIME type on upload
  • strip executable uploads
  • limit accepted formats
  • preserve audit trail for every run

26. Team and Operator UX

The system must support:

  • internal team usage through the stable ingress
  • supportable operator triage
  • easy workflow version rollback
  • safe demo usage during sales calls

Operators need:

  • admin queue view
  • job replay
  • access to input and output manifests
  • workflow version annotation

27. Non-Functional Requirements

27.1 Reliability

  • no direct dependency on ephemeral GPU public IP
  • graceful retry around ComfyUI upload and history polling
  • job state persisted outside memory

27.2 Performance

  • fast upload validation
  • async polling and result collection
  • cached thumbnails

27.3 Scalability

  • workflow templates stateless
  • API horizontally scalable
  • storage externalized

27.4 Maintainability

  • one source-of-truth workflow config per mode
  • explicit model manifest
  • no hidden hand-edited production JSON

27.5 Sales Readiness

  • stable hostname
  • reliable queue messaging
  • polished success and failure states
  • deterministic demo inputs

27A. Demo and Commercial Readiness Requirements

Animatrix will be used in live demos and pre-sales conversations. That changes the bar.

Required product behavior:

  • first meaningful UI paint fast enough for live sales use
  • one-click sample project loading for demo mode
  • clear progress messaging during long generations
  • shareable output URL or operator download path
  • no raw ComfyUI terminology in the customer-facing layer unless explicitly in admin mode

Required operator support behavior:

  • known-good demo assets packaged and versioned
  • visible warning when GPU queue is saturated
  • ability to retry a failed job without recreating all metadata manually

28. MVP Acceptance Criteria

Animatrix v1 is only considered complete when all of the following are true:

  1. A user can upload a ground-truth image, type a prompt, attach a motion video, select Move or Mix, and receive a finished video output.
  2. A user can upload a ground-truth image, type a prompt, attach audio, and receive an audio-driven character video.
  3. Both flows work through the stable Desineuron ingress model and do not depend on hardcoded GPU IPs.
  4. Every run produces a persisted job record and output manifest.
  5. Generated videos are streamable over HTTPS.
  6. Operators can inspect job state and correlate product job ID to ComfyUI prompt ID.
  7. The UI remains simple enough for a non-technical demo operator.

29. Explicit Product Decisions

29.1 What v1 Must Say No To

Animatrix v1 must not claim:

  • perfect deterministic pose-sheet control
  • exact first and last frame locking
  • full timeline editing
  • full audio mastering

29.2 What v1 Must Say Yes To

Animatrix v1 can truthfully claim:

  • guided character animation
  • guided character replacement
  • audio-driven talking or performance video
  • reference-assisted generation
  • production-safe simplified UI on top of ComfyUI

Phase 1

  • backend skeleton
  • asset model
  • one frozen Animate workflow
  • one frozen S2V workflow
  • barebones frontend

Phase 2

  • quality profiles
  • operator dashboard
  • output gallery
  • S3 persistence

Phase 3

  • first/last-frame workflow
  • stronger pose control
  • reusable character libraries

31. Final Architecture Recommendation

Build Animatrix as a thin product layer over stable infrastructure that already exists:

  • keep ComfyUI where it is
  • keep ingress where it is
  • add a dedicated Animatrix backend
  • keep the frontend intentionally minimal
  • treat workflow JSON as versioned software artifacts

Do not begin by building a large generic creative suite.

Build the narrowest saleable product first:

  • Animate Studio
  • Audio Performance Studio

Then expand to:

  • Start/End Frame Studio
  • Pose Control Studio

32. Bottom Line

Animatrix v1 should be a Flow-like creative surface backed by two real Wan 2.2 workflows, not one imaginary super-workflow.

The correct implementation target is:

  • one frontend
  • one orchestration backend
  • two workflow families
  • one stable ingress-compatible execution path
  • one durable output system

If the team follows this document strictly, the result will be productizable, supportable, and compatible with the current Desineuron infrastructure without lying about model capabilities.