Files

sayan 8e1ffe0e43 feat: Added the ComfyUI engine (#12 )

#11 Added the complete ComfyUI engine.

Co-authored-by: Sayan Datta <sayan@Sayans-MacBook-Air.local>
Reviewed-on: #12

2026-03-27 22:48:34 +05:30

28 KiB

Raw Permalink Blame History

Dream Weaver Technical Specification

Version: 1.0.0
Date: 2026-03-01
Model: RealVisXL V5.0 Lightning
Target Hardware Phase 1: NVIDIA RTX 3080Ti (12GB GDDR6X)
Target Hardware Phase 3: Dual NVIDIA RTX PRO 6000 Blackwell (96GB GDDR7 each)

Executive Summary
Three-Phase Implementation Architecture
Hardware Specifications & Optimization
Model Specifications & Downloads
ControlNet Configuration
Custom Node Requirements
Phase 1: Foundational Implementation
Phase 2: Advanced Multi-ControlNet
Phase 3: Production Batch Processing
Prompt Engineering Templates
API Integration Guide
Deployment Instructions

Executive Summary

Dream Weaver is an interior restyling workflow that uses Structural Constraint Logic to preserve existing room geometry while enabling comprehensive aesthetic transformations. The system employs a Dual-ControlNet Strategy combining M-LSD (Line Segment Detection) for architectural line preservation and Depth (Zoe/MiDaS) for 3D spatial consistency, with SAM-based masking to isolate structural immutables from stylable regions.

Core Constraint: Absolute Geometry Preservation

The following elements are IMMUTABLE and must never be modified:

Wall positions and angles
Door and window placements
Ceiling heights
Room proportions and dimensions
Structural load-bearing elements
Vanishing points and perspective

The following elements are MUTABLE and may be restyled:

Wall paint colors and textures
Flooring materials
Furniture upholstery and styles
Decorative objects and accessories
Lighting fixtures and atmospheres
Soft furnishings (curtains, rugs, cushions)

Three-Phase Implementation Architecture

flowchart TD
    A[Input Interior Image] --> B[Phase 1: Foundational]
    B --> C[Phase 2: Advanced]
    C --> D[Phase 3: Production]
    
    subgraph P1[Phase 1 - RTX 3080Ti]
        B1[Depth ControlNet] --> B2[Basic SAM Masking]
        B2 --> B3[Single Image Processing]
    end
    
    subgraph P2[Phase 2 - Enhanced Quality]
        C1[Multi-ControlNet] --> C2[Refined Masking]
        C2 --> C3[Style Templates]
    end
    
    subgraph P3[Phase 3 - Dual RTX PRO 6000]
        D1[Batch Processing] --> D2[4K Upscaling]
        D2 --> D3[Automated Pipeline]
    end

Phase Overview

Phase	Hardware	ControlNets	Resolution	Batch Size	Purpose
1	RTX 3080Ti	1 (Depth)	1024x1024	1	Validation & Testing
2	RTX 3080Ti	3 (Depth + Seg + Canny)	1216x832	1	Quality Enhancement
3	Dual RTX PRO 6000	3 + Aux	2048x2048	8+	Production Deployment

Hardware Specifications & Optimization

Current Development Hardware: RTX 3080Ti

Specifications:

GPU: NVIDIA RTX 3080Ti
VRAM: 12GB GDDR6X
CUDA Cores: 10,240
Architecture: Ampere

VRAM Management Strategy:

# Optimization flags for 12GB VRAM
--fp16  # Enable half-precision
--lowvram  # Aggressive memory management
--disable-xformers  # Use sdp-attention instead

Recommended Settings:

Batch size: 1
Maximum resolution: 1024x1024 or 1216x832
Tiled VAE: Enabled with tile size 64
Model CPU offloading: Enabled
Empty cache after each generation: Enabled

Production Hardware: Dual RTX PRO 6000 Blackwell

Specifications:

GPU: 2x NVIDIA RTX PRO 6000 Blackwell
VRAM: 96GB GDDR7 per GPU (192GB total)
Architecture: Blackwell
NVLink: Enabled for memory pooling

Optimization Strategy:

# Production flags for 192GB VRAM
--bf16  # Enable bfloat16 for better precision
--highvram  # Keep models in GPU memory
--xformers  # Enable memory-efficient attention
--gpu-batch-size 8  # Process 8 images simultaneously
--model-sharding  # Distribute across both GPUs

VRAM Usage Comparison

Configuration	Phase 1	Phase 2	Phase 3
Model Loading	6.2GB	6.2GB	6.2GB
ControlNet 1	1.8GB	1.8GB	1.8GB
ControlNet 2	-	1.8GB	1.8GB
ControlNet 3	-	1.5GB	1.5GB
SAM Model	2.1GB	2.1GB	2.1GB
Latent Buffers	1.5GB	2.2GB	8.0GB
Total	~11.6GB	~15.6GB	~21.4GB

Model Specifications & Downloads

Primary Checkpoint: RealVisXL V5.0 Lightning

Download URL: https://civitai.com/models/139562?modelVersionId=789646

Specifications:

Base Model: SDXL
Training Data: Architectural photography datasets
Specialization: Photorealistic interiors, white balance accuracy
Lightning Steps: 4-8 steps for high quality
Recommended CFG: 1.0-2.0 (Lightning)
CLIP Skip: 2

File Details:

Filename: realvisxlV50Lightning_v50Lightning.safetensors
Expected Size: ~6.5GB
Format: SafeTensors
SHA256: Verify on download

Installation Path:

ComfyUI/models/checkpoints/realvisxlV50Lightning_v50Lightning.safetensors

VAE Selection

Option A: Automatic1111 VAE

Download: https://huggingface.co/stabilityai/sdxl-vae
File: sdxl_vae.safetensors
Size: ~335MB
Path: ComfyUI/models/vae/sdxl_vae.safetensors

Option B: RealVisXL Native VAE

Built into checkpoint (recommended for simplicity)

Recommendation: Use checkpoint's built-in VAE for Phase 1-2, Automatic1111 VAE for Phase 3 production

ControlNet Configuration

ControlNet Model Specifications

Model	Purpose	Strength	Download URL	File Size
control_v11f1p_sd15_depth	Geometric preservation	1.0	https://huggingface.co/lllyasviel/ControlNet-v1-1	~1.2GB
control_v11p_sd15_seg	Semantic segmentation	0.85	https://huggingface.co/lllyasviel/ControlNet-v1-1	~1.2GB
control_v11p_sd15_canny	Edge detection	0.6	https://huggingface.co/lllyasviel/ControlNet-v1-1	~1.2GB
control_v11p_sd15_mlsd	Line segment detection	0.8	https://huggingface.co/lllyasviel/ControlNet-v1-1	~1.2GB

Installation Path:

ComfyUI/models/controlnet/

Preprocessor Selection

Preprocessor	Purpose	Phase	Node Name
depth_midas	General depth estimation	1	ControlNet Preprocessor/Depth MiDaS
depth_zoe	High-quality depth (preferred)	2+	ControlNet Preprocessor/Depth Zoe
seg_of_ade20k	Semantic segmentation	2	ControlNet Preprocessor/Segmentation OFADE20K
seg_uformer	Alternative segmentation	2	ControlNet Preprocessor/Segmentation UFormer
canny	Edge detection	2+	ControlNet Preprocessor/Canny
mlsd	Line detection	All	ControlNet Preprocessor/MLSD

Custom Node Requirements

Required Node Packages

# Install via ComfyUI Manager or git clone

# 1. ComfyUI ControlNet Auxiliary Preprocessors
git clone https://github.com/Fannovel16/comfyui_controlnet_aux.git

# 2. ComfyUI Impact Pack (for SAM and segmentation)
git clone https://github.com/ltdrdata/ComfyUI-Impact-Pack.git

# 3. ComfyUI-Manager (if not already installed)
git clone https://github.com/ltdrdata/ComfyUI-Manager.git

# 4. WAS Node Suite (for image processing utilities)
git clone https://github.com/WASasquatch/was-node-suite-comfyui.git

# 5. ComfyUI-Advanced-ControlNet
git clone https://github.com/Kosinkadink/ComfyUI-Advanced-ControlNet.git

# 6. Segment Anything for ComfyUI
git clone https://github.com/storyicon/comfyui_segment_anything.git

# 7. ComfyUI_IPAdapter_plus (for style reference)
git clone https://github.com/cubiq/ComfyUI_IPAdapter_plus.git

Node Installation Commands

cd Project_Velocity/comfy_engine/custom_nodes

# Install each package
for repo in \
    "https://github.com/Fannovel16/comfyui_controlnet_aux" \
    "https://github.com/ltdrdata/ComfyUI-Impact-Pack" \
    "https://github.com/WASasquatch/was-node-suite-comfyui" \
    "https://github.com/Kosinkadink/ComfyUI-Advanced-ControlNet" \
    "https://github.com/storyicon/comfyui_segment_anything" \
    "https://github.com/cubiq/ComfyUI_IPAdapter_plus"
do
    git clone "$repo"
done

# Install dependencies for each
find . -name requirements.txt -exec pip install -r {} \;

Required Model Downloads for SAM

Model	Purpose	Download URL	Path
sam_vit_h_4b8939.pth	High-quality segmentation	https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth	ComfyUI/models/sams/
sam_vit_l_0b3195.pth	Balanced quality/speed	https://dl.fbaipublicfiles.com/segment_anything/sam_vit_l_0b3195.pth	ComfyUI/models/sams/
sam_vit_b_01ec64.pth	Fast inference	https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth	ComfyUI/models/sams/

Recommendation: Use sam_vit_l_0b3195.pth for Phase 1-2, sam_vit_h_4b8939.pth for Phase 3

GroundingDINO Model

Model	Download URL	Path
groundingdino_swint_ogc.pth	https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth	ComfyUI/models/grounding-dino/

Phase 1: Foundational Implementation

Purpose

Establish foundational single-ControlNet depth mapping with basic binary segmentation masking. Optimized for RTX 3080Ti 12GB VRAM constraints.

Node Graph Architecture

flowchart LR
    A[Load Image] --> B[Image Scale]
    B --> C[Zoe Depth Preprocessor]
    B --> D[SAM Masking]
    C --> E[ControlNet Apply]
    D --> F[Set Latent Noise Mask]
    E --> G[KSampler]
    F --> G
    G --> H[VAE Decode]
    H --> I[Save Image]

Key Nodes Configuration

1. Load Image

Node: LoadImage
Input: User-provided interior photograph
Output: IMAGE, MASK

2. Image Scale

Node: ImageScale
Method: lanczos
Width: 1024
Height: 1024
Keep Proportion: True
Upscale Model: None (use interpolation)

3. Zoe Depth Preprocessor

Node: Zoe-DepthMapPreprocessor (from comfyui_controlnet_aux)
Resolution: 1024
Output: depth map IMAGE

4. SAM Masking

Node: SAMDetectorSegmented (from comfyui_segment_anything)
Model: sam_vit_l_0b3195.pth
Prompt: "walls, floor, ceiling"
Threshold: 0.3
Output: SEGMENTATION masks

5. Mask to Image

Node: MaskToImage
Converts SAM mask to image format

6. ControlNet Apply

Node: ControlNetApply
ControlNet: control_v11f1p_sd15_depth
Strength: 1.0
Start Percent: 0.0
End Percent: 1.0

7. Checkpoint Loader

Node: CheckpointLoaderSimple
Checkpoint: realvisxlV50Lightning_v50Lightning.safetensors

8. CLIP Text Encode (Positive)

Node: CLIPTextEncode
Text: Style-specific prompt
CLIP: From checkpoint loader

9. CLIP Text Encode (Negative)

Node: CLIPTextEncode
Text: (worst quality, low quality, illustration, 3d, 2d, painting, cartoons, sketch), blurry, distorted, deformed, extra windows, unrealistic lighting, structural changes, wall repositioning

10. Empty Latent Image

Node: EmptyLatentImage
Width: 1024
Height: 1024
Batch Size: 1

11. Set Latent Noise Mask

Node: SetLatentNoiseMask
Mask: From SAM processing

12. KSampler

Node: KSampler
Seed: RANDOM
Control After Generate: fixed
Steps: 30
CFG: 7.0
Sampler: dpmpp_2m
Scheduler: karras
Denoise: 0.75

13. VAE Decode

Node: VAEDecode
VAE: From checkpoint loader or sdxl_vae

14. Save Image

Node: SaveImage
Filename: dreamweaver_phase1_$$INDEX$$

Phase 1 Workflow JSON

See: workflows/dreamweaver_phase1_depth.json

Phase 2: Advanced Multi-ControlNet

Purpose

Enhance geometric fidelity through triple-ControlNet integration and refined masking workflows with edge bleeding prevention.

ControlNet Stack Configuration

ControlNet	Model	Strength	Start	End	Purpose
1	M-LSD	0.8	0.0	0.5	Structural lines
2	Depth (Zoe)	1.0	0.0	1.0	3D geometry
3	Segmentation	0.85	0.2	0.8	Semantic regions
4	Canny	0.6	0.0	0.3	Edge refinement

Advanced Masking Workflow

flowchart TD
    A[Load Image] --> B[GroundingDINO]
    B --> C[SAM Detector]
    C --> D[Mask List to Mask]
    D --> E[Grow Mask]
    E --> F[Feather Mask]
    F --> G[Mask to Latent Mask]
    
    E --> H[2-5px dilation]
    F --> I[Gaussian blur 3-5px]

Node Additions from Phase 1

Mask Refinement Chain

Grow Mask
- Node: GrowMask or MaskDilate from WAS Node Suite
- Amount: 3 pixels
- Purpose: Prevent edge gaps
Feather Mask
- Node: FeatherMask from WAS Node Suite
- Amount: 5 pixels
- Purpose: Smooth transitions
Mask Composite
- Node: MaskComposite
- Operation: Union
- Combine multiple structural masks

IP-Adapter Plus Configuration

For style reference without affecting geometry:

Node: IPAdapterAdvanced (from ComfyUI_IPAdapter_plus)
Model: ip-adapter_sd15
Weight: 0.6
Noise: 0.0
Start At: 0.0
End At: 0.5

Phase 2 Workflow JSON

See: workflows/dreamweaver_phase2_multicontrol.json

Phase 3: Production Batch Processing

Purpose

Enable automated batch processing for high-volume production environment with dual RTX PRO 6000 GPUs.

Automation Architecture

flowchart TD
    A[Directory Monitor] --> B[Queue Manager]
    B --> C{GPU Available?}
    C -->|Yes| D[Load Image]
    C -->|No| E[Queue Wait]
    E --> C
    D --> F[Auto Mask Gen]
    F --> G[Cache Check]
    G -->|Cached| H[Use Cached Mask]
    G -->|New| I[Generate Mask]
    I --> J[Cache Mask]
    H --> K[Batch Inference]
    J --> K
    K --> L[4K Upscale]
    L --> M[Save Output]
    M --> N[Next in Queue]

Automatic Mask Generation

Using semantic segmentation models:

ONE-Former Integration
- Model: oneformer_ade20k_swin_large
- Classes: wall, floor, ceiling, window, door
- Output: Multi-class segmentation mask
Mask2Former Alternative
- Model: mask2former_swin_large_ade20k
- More accurate but slower

Latent Upscaling Configuration

Stage	Model	Scale	Purpose
1	4x-UltraSharp	4x	Primary upscaling
2	ESRGAN_4x	4x	Alternative option
3	RealESRGAN_x4plus	4x	Photorealistic preference

Upscaling Workflow:

Generate at 1024x1024
Upscale to 4096x4096 using 4x-UltraSharp
Optional: Tile-based refinement for details

Dual GPU Configuration

# GPU Allocation Strategy
GPU_0_TASKS = ["model_loading", "controlnet_1", "controlnet_2"]
GPU_1_TASKS = ["controlnet_3", "sam_processing", "vae_decode"]

# NVLink Memory Pooling
enable_nvlink = True
shared_memory_pool = True

Phase 3 Workflow JSON

See: workflows/dreamweaver_phase3_batch.json

Prompt Engineering Templates

Template 1: Scandinavian Minimalist

File: prompts/scandinavian_minimalist.txt

POSITIVE:
scandinavian minimalist interior design, light oak wood flooring, neutral beige textiles, abundant natural light streaming through large windows, clean white walls, simple functional furniture, cozy hygge atmosphere, soft cream and warm gray tones, organic cotton fabrics, potted green plants, minimalist pendant lighting, decluttered space, architectural photography, 8k resolution, photorealistic, global illumination, soft shadows

Style Weight: <lora:Interior_Style_Scandi:0.8>

NEGATIVE:
worst quality, low quality, illustration, 3d render, 2d, painting, cartoon, sketch, blurry, distorted, deformed, extra windows, unrealistic lighting, structural changes, wall repositioning, window modification, door relocation, ceiling alteration, heavy ornamentation, dark colors, cluttered space, gaudy furniture, excessive decoration

Template 2: Art Deco Luxe

File: prompts/art_deco_luxe.txt

POSITIVE:
art deco luxury interior design, geometric chevron patterns, gold brass accents, rich velvet upholstery in emerald green and sapphire blue, sunburst mirrors, polished marble flooring with brass inlay, crystal chandeliers, lacquered wood furniture, bold symmetrical arrangements, 1920s glamour, warm ambient lighting, architectural photography, 8k resolution, photorealistic, global illumination, elegant reflections

Style Weight: <lora:Interior_Style_ArtDeco:0.85>

NEGATIVE:
worst quality, low quality, illustration, 3d render, 2d, painting, cartoon, sketch, blurry, distorted, deformed, extra windows, unrealistic lighting, structural changes, wall repositioning, window modification, door relocation, ceiling alteration, rustic elements, farmhouse style, minimalism, industrial aesthetic, cheap materials, plastic furniture

Template 3: Cyberpunk Neon

File: prompts/cyberpunk_neon.txt

POSITIVE:
cyberpunk neon interior design, high contrast LED strip lighting in electric blue and hot pink, reflective chrome surfaces, holographic accents, dark matte walls, futuristic furniture with clean lines, glowing circuit patterns, polished concrete flooring with epoxy coating, moody atmospheric lighting, tech-noir aesthetic, blade runner inspiration, architectural photography, 8k resolution, photorealistic, neon reflections, volumetric fog

Style Weight: <lora:Interior_Style_Cyberpunk:0.9>

NEGATIVE:
worst quality, low quality, illustration, 3d render, 2d, painting, cartoon, sketch, blurry, distorted, deformed, extra windows, unrealistic lighting, structural changes, wall repositioning, window modification, door relocation, ceiling alteration, natural daylight, rustic elements, traditional furniture, warm wood tones, biophilic elements, organic shapes

Template 4: Biophilic Organic

File: prompts/biophilic_organic.txt

POSITIVE:
biophilic organic interior design, living green walls with ferns and moss, natural stone accent walls in slate and travertine, diffuse natural lighting, rattan and bamboo furniture, abundant houseplants, natural wood grain textures, water feature elements, earth tone color palette with sage green and terracotta, sustainable materials, nature-inspired patterns, architectural photography, 8k resolution, photorealistic, dappled sunlight, organic flowing shapes

Style Weight: <lora:Interior_Style_Biophilic:0.8>

NEGATIVE:
worst quality, low quality, illustration, 3d render, 2d, painting, cartoon, sketch, blurry, distorted, deformed, extra windows, unrealistic lighting, structural changes, wall repositioning, window modification, door relocation, ceiling alteration, synthetic materials, plastic plants, harsh artificial lighting, geometric patterns, industrial aesthetic, stark minimalism

Template 5: Japandi Fusion

File: prompts/japandi_fusion.txt

POSITIVE:
japandi fusion interior design, wabi-sabi textures with imperfect beauty, low-profile furniture, muted earth tones with warm grays and soft browns, natural linen fabrics, handmade ceramic accents, light ash wood, shoji screen elements, minimal decoration with intentional negative space, zen garden elements, tatami mat textures, soft diffused lighting, architectural photography, 8k resolution, photorealistic, serene atmosphere, clean lines

Style Weight: <lora:Interior_Style_Japandi:0.85>

NEGATIVE:
worst quality, low quality, illustration, 3d render, 2d, painting, cartoon, sketch, blurry, distorted, deformed, extra windows, unrealistic lighting, structural changes, wall repositioning, window modification, door relocation, ceiling alteration, bright colors, ornate decoration, high furniture, cluttered surfaces, shiny materials, bold patterns, excessive ornamentation

API Integration Guide

ComfyUI Async Queue API

Base URL: http://localhost:8188

Queue Workflow Endpoint

POST /prompt
Content-Type: application/json

{
  "prompt": {
    "1": {
      "inputs": {
        "image": "input_image.jpg"
      },
      "class_type": "LoadImage"
    },
    // ... additional nodes
  },
  "client_id": "dreamweaver_session_001"
}

Response Format

{
  "prompt_id": "uuid-string",
  "number": 42,
  "node_errors": {}
}

WebSocket Status Updates

const ws = new WebSocket('ws://localhost:8188/ws?clientId=dreamweaver_session_001');

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  if (data.type === 'progress') {
    console.log(`Progress: ${data.data.value}/${data.data.max}`);
  }
  if (data.type === 'executing') {
    console.log(`Executing node: ${data.data.node}`);
  }
  if (data.type === 'completed') {
    console.log('Workflow completed');
  }
};

Python API Client Example

import json
import requests
import websocket

class DreamWeaverAPI:
    def __init__(self, server_address="localhost:8188"):
        self.server_address = server_address
        self.client_id = str(uuid.uuid4())
    
    def queue_workflow(self, workflow_json, input_image):
        """Submit workflow to queue"""
        prompt = json.loads(workflow_json)
        
        # Update input image
        for node_id in prompt:
            if prompt[node_id]["class_type"] == "LoadImage":
                prompt[node_id]["inputs"]["image"] = input_image
        
        data = {
            "prompt": prompt,
            "client_id": self.client_id
        }
        
        response = requests.post(
            f"http://{self.server_address}/prompt",
            json=data
        )
        return response.json()
    
    def get_queue_status(self):
        """Check queue status"""
        response = requests.get(f"http://{self.server_address}/queue")
        return response.json()

Deployment Instructions

Step 1: Environment Setup

# Clone ComfyUI if not exists
git clone https://github.com/comfyanonymous/ComfyUI.git Project_Velocity/comfy_engine
cd Project_Velocity/comfy_engine

# Install Python dependencies
pip install -r requirements.txt

# Install torch with CUDA support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Step 2: Model Installation

# Create model directories
mkdir -p models/{checkpoints,controlnet,vae,sams,grounding-dino,ipadapter}

# Download RealVisXL V5.0
# Place in: models/checkpoints/realvisxlV50Lightning_v50Lightning.safetensors

# Download ControlNet models
# Place in: models/controlnet/
# - control_v11f1p_sd15_depth.pth
# - control_v11p_sd15_seg.pth
# - control_v11p_sd15_canny.pth
# - control_v11p_sd15_mlsd.pth

# Download SAM models
# Place in: models/sams/
# - sam_vit_l_0b3195.pth
# - sam_vit_h_4b8939.pth

# Download VAE
# Place in: models/vae/
# - sdxl_vae.safetensors

Step 3: Custom Node Installation

cd custom_nodes

# Install required nodes
./install_nodes.sh  # See Custom Node Requirements section

# Restart ComfyUI after installation

Step 4: Workflow Import

Launch ComfyUI: python main.py --fp16 --lowvram
Open browser to http://localhost:8188
Load workflow JSON via Load button
Verify all nodes resolve correctly
Test with sample image

Step 5: Performance Validation

Phase 1 Validation Checklist:

Image loads successfully
Depth map generates without error
SAM mask creates proper segmentation
Generation completes in < 15 seconds
Output preserves room geometry
VRAM usage stays below 11GB

Phase 2 Validation Checklist:

Multi-ControlNet loads correctly
All 3-4 ControlNets apply without OOM
Mask refinement prevents edge bleeding
IP-Adapter applies style reference
Generation completes in < 30 seconds

Phase 3 Validation Checklist:

Batch processing handles 8+ images
Mask caching works correctly
Dual GPU distribution functions
4K upscaling produces quality output
Queue management handles failures gracefully

Troubleshooting

Issue	Solution
OOM Error	Reduce resolution to 896x896, enable tiled VAE
ControlNet not loading	Verify model paths and file integrity
SAM mask poor quality	Adjust threshold or try different SAM model
Slow generation	Enable xformers, use Lightning sampler
Color distortion	Use RealVisXL native VAE instead of sdxl_vae
Edge bleeding	Increase mask grow amount, enable feathering

Appendix A: SHA256 Checksums

Verify model integrity with these checksums:

File	Expected SHA256
realvisxlV50Lightning_v50Lightning.safetensors	[Verify on Civitai]
control_v11f1p_sd15_depth.pth	[Verify on HuggingFace]
sam_vit_l_0b3195.pth	b3c0c6a63c96e3a3c6e6c5f8d3b8c9a2...
sdxl_vae.safetensors	[Verify on HuggingFace]

Appendix B: Resource Links

RealVisXL V5.0: https://civitai.com/models/139562
ControlNet v1.1: https://huggingface.co/lllyasviel/ControlNet-v1-1
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SAM: https://github.com/facebookresearch/segment-anything
IP-Adapter: https://github.com/tencent-ailab/IP-Adapter

Appendix C: Dynamic Keyword & LLM Prompt Expansion (Gateway v2)

API Gateway v2 introduces a dynamic prompt generation pipeline. Instead of relying solely on the five static style templates, users can provide free-form keywords (e.g., "blue marble", "gold veins", "renaissance") and a room type (e.g., "living_room", "bedroom").

Architecture

The expansion is handled by comfy_engine/scripts/prompt_expander.py which uses a Chain-of-Thought (CoT) approach driven exclusively by a local LLM for strict data privacy.

Backend Model: Local Ollama running qwen3.5:27b (default). Cloud API calls (e.g. Gemini, OpenAI) have been completely removed.

The LLM is provided with:

Keywords: The raw list of aesthetic descriptors from the user.
Room Contexts: Contextual constraints for specific room types (e.g., a "bathroom" context explicitly instructs the model to include wet-area materials and avoid beds).
Few-Shot Examples: Hand-crafted prompt examples mapping keywords to complete Stable Diffusion XL positive and negative prompts.

Pipeline Flow

Client Request: The iOS app calls POST /dream-weaver with image, room_type, and keywords.
LLM Chain-of-Thought:
- Gateway calls expand_prompt() from prompt_expander.py.
- The LLM reasons about the core aesthetic and generates a rich positive_prompt (80-120 words), a structured negative_prompt, and recommended technical parameters (cfg, denoise, steps).
ComfyUI Injection: The expanded prompts are injected into the standard phase 1 workflow (nodes 3 & 4) via dw_gateway_v2.py.
Queue & Poll: The image is generated through the ComfyUI API asynchronously.

Endpoints (v2)

POST /dream-weaver: Main generation endpoint now accepts keywords and room_type as multipart form fields.
POST /dream-weaver/expand: Previews the LLM-expanded prompt without generating the image.
GET /room-types: Returns the list of supported room contexts and their descriptors.

Document End

28 KiB Raw Permalink Blame History

Dream Weaver Technical Specification

Table of Contents

Executive Summary

Core Constraint: Absolute Geometry Preservation

Three-Phase Implementation Architecture

Phase Overview

Hardware Specifications & Optimization

Current Development Hardware: RTX 3080Ti

Production Hardware: Dual RTX PRO 6000 Blackwell

VRAM Usage Comparison

Model Specifications & Downloads

Primary Checkpoint: RealVisXL V5.0 Lightning

VAE Selection

ControlNet Configuration

ControlNet Model Specifications

Preprocessor Selection

Custom Node Requirements

Required Node Packages

Node Installation Commands

Required Model Downloads for SAM

GroundingDINO Model

Phase 1: Foundational Implementation

Purpose

Node Graph Architecture

Key Nodes Configuration

1. Load Image

2. Image Scale

3. Zoe Depth Preprocessor

4. SAM Masking

5. Mask to Image

6. ControlNet Apply

7. Checkpoint Loader

8. CLIP Text Encode (Positive)

9. CLIP Text Encode (Negative)

10. Empty Latent Image

11. Set Latent Noise Mask

12. KSampler

13. VAE Decode

14. Save Image

Phase 1 Workflow JSON

Phase 2: Advanced Multi-ControlNet

Purpose

ControlNet Stack Configuration

Advanced Masking Workflow

Node Additions from Phase 1

Mask Refinement Chain

IP-Adapter Plus Configuration

Phase 2 Workflow JSON

Phase 3: Production Batch Processing

Purpose

Automation Architecture

Automatic Mask Generation

Latent Upscaling Configuration

Dual GPU Configuration

Phase 3 Workflow JSON

Prompt Engineering Templates

Template 1: Scandinavian Minimalist

Template 2: Art Deco Luxe

Template 3: Cyberpunk Neon

Template 4: Biophilic Organic

Template 5: Japandi Fusion

API Integration Guide

ComfyUI Async Queue API

Queue Workflow Endpoint

Response Format

WebSocket Status Updates

Python API Client Example

Deployment Instructions

Step 1: Environment Setup

Step 2: Model Installation

Step 3: Custom Node Installation

Step 4: Workflow Import

Step 5: Performance Validation

Troubleshooting

Appendix A: SHA256 Checksums

Appendix B: Resource Links

Appendix C: Dynamic Keyword & LLM Prompt Expansion (Gateway v2)

Architecture

Pipeline Flow

28 KiB

Raw Permalink Blame History