Introducing Modular Diffusers – Composable Constructing Blocks for Diffusion Pipelines

-



Modular Diffusers introduces a brand new method to construct diffusion pipelines by composing reusable blocks. As an alternative of writing entire pipelines from scratch, you’ll be able to mix and match blocks to create workflows tailored to your needs! This complements the prevailing DiffusionPipeline class with a more flexible, composable alternative.

On this post, we’ll walk through how Modular Diffusers works — from the familiar API to run a modular pipeline, to constructing fully custom blocks and composing them into your personal workflow. We’ll also show the way it integrates with Mellon, a node-based visual workflow interface that you may use to wire Modular Diffusers blocks together.

Table of contents



Quickstart

Here is an easy example of run inference with FLUX.2 Klein 4B using pre-built blocks:

import torch
from diffusers import ModularPipeline


pipe = ModularPipeline.from_pretrained(
    "black-forest-labs/FLUX.2-klein-4B"
)

pipe.load_components(torch_dtype=torch.bfloat16)
pipe.to("cuda")


image = pipe(
    prompt="a serene landscape at sunset",
    num_inference_steps=4,
).images[0]

image.save("output.png")

You get the identical results as with a regular DiffusionPipeline, however the pipeline could be very different under the hood: it’s composed of flexible blocks — text encoding, image encoding, denoising, and decoding — that you may inspect directly:

print(pipe.blocks)
Flux2KleinAutoBlocks(
  ...
  Sub-Blocks:
    [0] text_encoder (Flux2KleinTextEncoderStep)
    [1] vae_encoder (Flux2KleinAutoVaeEncoderStep)
    [2] denoise (Flux2KleinCoreDenoiseStep)
    [3] decode (Flux2DecodeStep)
)

Each block is self-contained with its own inputs and outputs. You possibly can run any block independently as its own pipeline, or add, remove, and swap blocks freely — they dynamically recompose to work with whatever blocks remain. Use .init_pipeline() to convert blocks right into a runnable pipeline, and .load_components() to load the model weights.


blocks = pipe.blocks


text_blocks = blocks.sub_blocks.pop("text_encoder")


text_pipe = text_blocks.init_pipeline("black-forest-labs/FLUX.2-klein-4B")


text_pipe.load_components(torch_dtype=torch.bfloat16)
text_pipe.to("cuda")
prompt_embeds = text_pipe(prompt="a serene landscape at sunset").prompt_embeds



remaining_pipe = blocks.init_pipeline("black-forest-labs/FLUX.2-klein-4B")
remaining_pipe.load_components(torch_dtype=torch.bfloat16)
remaining_pipe.to("cuda")
image = remaining_pipe(prompt_embeds=prompt_embeds, num_inference_steps=4).images[0]

For more on block types, composition patterns, lazy loading, and memory management with ComponentsManager, try the Modular Diffusers documentation.



Custom Blocks

Modular Diffusers really shines when creating your personal blocks. A custom block is a Python class that defines its components, inputs, outputs, and computation logic — and once defined, you’ll be able to plug it into any workflow.



Writing a Custom Block

Here’s an example block that extracts depth maps from images using Depth Anything V2.

class DepthProcessorBlock(ModularPipelineBlocks):
    @property
    def expected_components(self):
        return [
            ComponentSpec("depth_processor", DepthPreprocessor,
                          pretrained_model_name_or_path="depth-anything/Depth-Anything-V2-Large-hf")
        ]

    @property
    def inputs(self):
        return [
            InputParam("image", required=True,
                       description="Image(s) to extract depth maps from"),
        ]

    @property
    def intermediate_outputs(self):
        return [
            OutputParam("control_image", type_hint=torch.Tensor,
                        description="Depth map(s) of input image(s)"),
        ]

    @torch.no_grad()
    def __call__(self, components, state):
        block_state = self.get_block_state(state)
        depth_map = components.depth_processor(block_state.image)
        block_state.control_image = depth_map.to(block_state.device)
        self.set_block_state(state, block_state)
        return components, state
  • expected_components defines what models the block needs — on this case, a depth estimation model. The pretrained_model_name_or_path parameter sets a default Hub repo to load from, so load_components robotically fetches the depth model unless you override it in modular_model_index.json.
  • inputs and intermediate_outputs define what goes in and comes out.
  • __call__ is where the computation logic lives.



Composing Blocks into Workflows

Let’s use this block with Qwen’s ControlNet workflow. Extract the ControlNet workflow and insert the depth block in the beginning:


pipe = ModularPipeline.from_pretrained("Qwen/Qwen-Image")

print(pipe.blocks.available_workflows)








blocks = pipe.blocks.get_workflow("controlnet_text2image")

print(blocks)



blocks.sub_blocks.insert("depth", DepthProcessorBlock(), 0)


blocks.sub_blocks['depth'].doc

Blocks in a sequence share data robotically: the depth block’s control_image output flows to downstream blocks that need it, and its image input becomes a pipeline input since no earlier block provides it.

blocks_composed

from diffusers import ComponentsManager, AutoModel
from diffusers.utils import load_image



manager = ComponentsManager()

pipeline = blocks.init_pipeline("Qwen/Qwen-Image", components_manager=manager)
pipeline.load_components(torch_dtype=torch.bfloat16)




controlnet = AutoModel.from_pretrained("InstantX/Qwen-Image-ControlNet-Union", torch_dtype=torch.bfloat16)
pipeline.update_components(controlnet=controlnet)


image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/predominant/diffusers/astronaut.jpg")
output = pipeline(
    prompt="an astronaut hatching from an egg, detailed, fantasy, Pixar, Disney",
    image=image,
).images[0]



Sharing Custom Blocks on the Hub

You possibly can publish your custom block to the Hub so anyone can load it with trust_remote_code=True. We have created a template to get you began — try the Constructing Custom Blocks guide for the total walkthrough.

pipeline.save_pretrained(local_dir, repo_id="your-username/your-block-name", push_to_hub=True)

The DepthProcessorBlock from this post is published at diffusers/depth-processor-custom-block — you’ll be able to load and use it directly:

from diffusers import ModularPipelineBlocks

depth_block = ModularPipelineBlocks.from_pretrained(
    "diffusers/depth-processor-custom-block", trust_remote_code=True
)

We have published a set of ready-to-use custom blocks here.



Modular Repositories

ModularPipeline.from_pretrained works with any existing Diffusers repo out of the box, but Modular Diffusers also introduces a brand new form of repo: the Modular Repository.

A modular repository is in a position to reference components from their original model repos. For instance, diffusers/flux2-bnb-4bit-modular accommodates a quantized transformer and loads the remaining components from the unique repo.


{
    "transformer": [
        "diffusers", 
        "Flux2Transformer2DModel", 
        {
            "pretrained_model_name_or_path": "diffusers/flux2-bnb-4bit-modular",
            "subfolder": "transformer",
            "type_hint": ["diffusers", "Flux2Transformer2DModel"]
        }
    ],
    "vae": [
        "diffusers", 
        "AutoencoderKLFlux2", 
        {
            "pretrained_model_name_or_path": "black-forest-labs/FLUX.2-dev",
            "subfolder": "vae",
            "type_hint": ["diffusers", "AutoencoderKLFlux2"]
        }
    ],
    ...
}

Modular repositories also can host custom pipeline blocks as Python code and visual UI configurations for tools like Mellon — multi function place.


Community Pipelines

The community has already began constructing complete pipelines with Modular Diffusers and publishing them on the Hub, with model weights and ready-to-run code.

  • Krea Realtime Video — A 14B parameter real-time video generation model distilled from Wan 2.1, achieving 11fps on a single B200 GPU. It supports text-to-video, video-to-video, and streaming video-to-video — all built as modular blocks. Users can modify prompts mid-generation, restyle videos on-the-fly, and see first frames inside 1 second.
import torch
from diffusers import ModularPipeline

pipe = ModularPipeline.from_pretrained("krea/krea-realtime-video", trust_remote_code=True)
pipe.load_components(
    trust_remote_code=True, 
    device_map="cuda",
    torch_dtype={"default": torch.bfloat16, "vae": torch.float16}
)
  • Waypoint-1 — A 2.3B parameter real-time diffusion world model from Overworld. It autoregressively generates interactive worlds from control inputs and text prompts — you’ll be able to explore and interact with generated environments in real time on consumer hardware.

Teams can construct novel architectures, package them as blocks, and publish the whole pipeline on the Hub for anyone to make use of with ModularPipeline.from_pretrained.

Take a look at the total collection of community pipelines for more.



Integration with Mellon

💡 Mellon is in early development and never ready for production use yet. Consider this a sneak peek of how the combination works!

Mellon is a visible workflow interface integrated with Modular Diffusers. When you’re aware of node-based tools like ComfyUI, you will feel right at home — but there are some key differences:

  • Dynamic nodes — As an alternative of dozens of model-specific nodes, now we have a small set of nodes that robotically adapt their interface based on the model you choose. Learn them once, use them with any model.
  • Single-node workflows — Due to Modular Diffusers’ composable block system, you’ll be able to collapse a whole pipeline right into a single node. Run multiple workflows on the identical canvas without the clutter.
  • Hub integration out of the box — Custom blocks published to the Hugging Face Hub work immediately in Mellon. We offer a utility function to robotically generate the node interface out of your block definition — no UI code required.

This integration is feasible because every block exposes the identical properties (inputs, intermediate_outputs, expected_components). This consistent API means Mellon can robotically generate a node’s UI from any block definition and compose blocks into higher-level nodes.

For instance, diffusers/FLUX.2-klein-4B-modular accommodates a pipeline definition, component references, and a mellon_pipeline_config.json — multi function repo. Load it in Python with ModularPipeline.from_pretrained("diffusers/FLUX.2-klein-4B-modular") or in Mellon to create either a single-node or multi-node workflow.

Here’s a fast example. We add a Gemini prompt expansion node — hosted as a modular repo at diffusers/gemini-prompt-expander-mellon — to an existing text-to-image workflow:

  1. Drag in a Dynamic Block node and enter the repo_id (i.e. diffusers/gemini-prompt-expander-mellon)
  2. Click LOAD CUSTOM BLOCK — the node robotically grows a textbox in your prompt input and an output socket named “prompt”, all configured from the repo
  3. Type a brief prompt, connect the output to the Encode Prompt node, and run

Gemini expands your short prompt into an in depth description before generating the image. No code, no configuration — only a Hub repo id.

This is only one example. For an in depth walkthrough, try the Mellon x Modular Diffusers guide.



Conclusion

Modular Diffusers brings the composability and adaptability the community has been asking for, without compromising the features that make Diffusers powerful. It’s still early — we wish your input to shape what comes next. Give it a attempt to tell us what works, what doesn’t, and what’s missing.



Resources

Due to Chun Te Lee for the thumbnail, and to Poli, Pedro, Lysandre, Linoy, Aritra, and Steven for his or her thoughtful reviews.



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x