Modular Diffusers introduces a brand new method to construct diffusion pipelines by composing reusable blocks. As an alternative of writing entire pipelines from scratch, you’ll be able to mix and match blocks to create workflows tailored to your needs! This complements the prevailing
DiffusionPipeline class with a more flexible, composable alternative.
On this post, we’ll walk through how Modular Diffusers works — from the familiar API to run a modular pipeline, to constructing fully custom blocks and composing them into your personal workflow. We’ll also show the way it integrates with Mellon, a node-based visual workflow interface that you may use to wire Modular Diffusers blocks together.
Table of contents
Quickstart
Here is an easy example of run inference with FLUX.2 Klein 4B using pre-built blocks:
import torch
from diffusers import ModularPipeline
pipe = ModularPipeline.from_pretrained(
"black-forest-labs/FLUX.2-klein-4B"
)
pipe.load_components(torch_dtype=torch.bfloat16)
pipe.to("cuda")
image = pipe(
prompt="a serene landscape at sunset",
num_inference_steps=4,
).images[0]
image.save("output.png")
You get the identical results as with a regular DiffusionPipeline, however the pipeline could be very different under the hood: it’s composed of flexible blocks — text encoding, image encoding, denoising, and decoding — that you may inspect directly:
print(pipe.blocks)
Flux2KleinAutoBlocks(
...
Sub-Blocks:
[0] text_encoder (Flux2KleinTextEncoderStep)
[1] vae_encoder (Flux2KleinAutoVaeEncoderStep)
[2] denoise (Flux2KleinCoreDenoiseStep)
[3] decode (Flux2DecodeStep)
)
Each block is self-contained with its own inputs and outputs. You possibly can run any block independently as its own pipeline, or add, remove, and swap blocks freely — they dynamically recompose to work with whatever blocks remain. Use .init_pipeline() to convert blocks right into a runnable pipeline, and .load_components() to load the model weights.
blocks = pipe.blocks
text_blocks = blocks.sub_blocks.pop("text_encoder")
text_pipe = text_blocks.init_pipeline("black-forest-labs/FLUX.2-klein-4B")
text_pipe.load_components(torch_dtype=torch.bfloat16)
text_pipe.to("cuda")
prompt_embeds = text_pipe(prompt="a serene landscape at sunset").prompt_embeds
remaining_pipe = blocks.init_pipeline("black-forest-labs/FLUX.2-klein-4B")
remaining_pipe.load_components(torch_dtype=torch.bfloat16)
remaining_pipe.to("cuda")
image = remaining_pipe(prompt_embeds=prompt_embeds, num_inference_steps=4).images[0]
For more on block types, composition patterns, lazy loading, and memory management with ComponentsManager, try the Modular Diffusers documentation.
Custom Blocks
Modular Diffusers really shines when creating your personal blocks. A custom block is a Python class that defines its components, inputs, outputs, and computation logic — and once defined, you’ll be able to plug it into any workflow.
Writing a Custom Block
Here’s an example block that extracts depth maps from images using Depth Anything V2.
class DepthProcessorBlock(ModularPipelineBlocks):
@property
def expected_components(self):
return [
ComponentSpec("depth_processor", DepthPreprocessor,
pretrained_model_name_or_path="depth-anything/Depth-Anything-V2-Large-hf")
]
@property
def inputs(self):
return [
InputParam("image", required=True,
description="Image(s) to extract depth maps from"),
]
@property
def intermediate_outputs(self):
return [
OutputParam("control_image", type_hint=torch.Tensor,
description="Depth map(s) of input image(s)"),
]
@torch.no_grad()
def __call__(self, components, state):
block_state = self.get_block_state(state)
depth_map = components.depth_processor(block_state.image)
block_state.control_image = depth_map.to(block_state.device)
self.set_block_state(state, block_state)
return components, state
expected_componentsdefines what models the block needs — on this case, a depth estimation model. Thepretrained_model_name_or_pathparameter sets a default Hub repo to load from, soload_componentsrobotically fetches the depth model unless you override it inmodular_model_index.json.inputsandintermediate_outputsdefine what goes in and comes out.__call__is where the computation logic lives.
Composing Blocks into Workflows
Let’s use this block with Qwen’s ControlNet workflow. Extract the ControlNet workflow and insert the depth block in the beginning:
pipe = ModularPipeline.from_pretrained("Qwen/Qwen-Image")
print(pipe.blocks.available_workflows)
blocks = pipe.blocks.get_workflow("controlnet_text2image")
print(blocks)
blocks.sub_blocks.insert("depth", DepthProcessorBlock(), 0)
blocks.sub_blocks['depth'].doc
Blocks in a sequence share data robotically: the depth block’s control_image output flows to downstream blocks that need it, and its image input becomes a pipeline input since no earlier block provides it.
from diffusers import ComponentsManager, AutoModel
from diffusers.utils import load_image
manager = ComponentsManager()
pipeline = blocks.init_pipeline("Qwen/Qwen-Image", components_manager=manager)
pipeline.load_components(torch_dtype=torch.bfloat16)
controlnet = AutoModel.from_pretrained("InstantX/Qwen-Image-ControlNet-Union", torch_dtype=torch.bfloat16)
pipeline.update_components(controlnet=controlnet)
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/predominant/diffusers/astronaut.jpg")
output = pipeline(
prompt="an astronaut hatching from an egg, detailed, fantasy, Pixar, Disney",
image=image,
).images[0]
Sharing Custom Blocks on the Hub
You possibly can publish your custom block to the Hub so anyone can load it with trust_remote_code=True. We have created a template to get you began — try the Constructing Custom Blocks guide for the total walkthrough.
pipeline.save_pretrained(local_dir, repo_id="your-username/your-block-name", push_to_hub=True)
The DepthProcessorBlock from this post is published at diffusers/depth-processor-custom-block — you’ll be able to load and use it directly:
from diffusers import ModularPipelineBlocks
depth_block = ModularPipelineBlocks.from_pretrained(
"diffusers/depth-processor-custom-block", trust_remote_code=True
)
We have published a set of ready-to-use custom blocks here.
Modular Repositories
ModularPipeline.from_pretrained works with any existing Diffusers repo out of the box, but Modular Diffusers also introduces a brand new form of repo: the Modular Repository.
A modular repository is in a position to reference components from their original model repos. For instance, diffusers/flux2-bnb-4bit-modular accommodates a quantized transformer and loads the remaining components from the unique repo.
{
"transformer": [
"diffusers",
"Flux2Transformer2DModel",
{
"pretrained_model_name_or_path": "diffusers/flux2-bnb-4bit-modular",
"subfolder": "transformer",
"type_hint": ["diffusers", "Flux2Transformer2DModel"]
}
],
"vae": [
"diffusers",
"AutoencoderKLFlux2",
{
"pretrained_model_name_or_path": "black-forest-labs/FLUX.2-dev",
"subfolder": "vae",
"type_hint": ["diffusers", "AutoencoderKLFlux2"]
}
],
...
}
Modular repositories also can host custom pipeline blocks as Python code and visual UI configurations for tools like Mellon — multi function place.
Community Pipelines
The community has already began constructing complete pipelines with Modular Diffusers and publishing them on the Hub, with model weights and ready-to-run code.
- Krea Realtime Video — A 14B parameter real-time video generation model distilled from Wan 2.1, achieving 11fps on a single B200 GPU. It supports text-to-video, video-to-video, and streaming video-to-video — all built as modular blocks. Users can modify prompts mid-generation, restyle videos on-the-fly, and see first frames inside 1 second.
import torch
from diffusers import ModularPipeline
pipe = ModularPipeline.from_pretrained("krea/krea-realtime-video", trust_remote_code=True)
pipe.load_components(
trust_remote_code=True,
device_map="cuda",
torch_dtype={"default": torch.bfloat16, "vae": torch.float16}
)
- Waypoint-1 — A 2.3B parameter real-time diffusion world model from Overworld. It autoregressively generates interactive worlds from control inputs and text prompts — you’ll be able to explore and interact with generated environments in real time on consumer hardware.
Teams can construct novel architectures, package them as blocks, and publish the whole pipeline on the Hub for anyone to make use of with ModularPipeline.from_pretrained.
Take a look at the total collection of community pipelines for more.
Integration with Mellon
💡 Mellon is in early development and never ready for production use yet. Consider this a sneak peek of how the combination works!
Mellon is a visible workflow interface integrated with Modular Diffusers. When you’re aware of node-based tools like ComfyUI, you will feel right at home — but there are some key differences:
- Dynamic nodes — As an alternative of dozens of model-specific nodes, now we have a small set of nodes that robotically adapt their interface based on the model you choose. Learn them once, use them with any model.
- Single-node workflows — Due to Modular Diffusers’ composable block system, you’ll be able to collapse a whole pipeline right into a single node. Run multiple workflows on the identical canvas without the clutter.
- Hub integration out of the box — Custom blocks published to the Hugging Face Hub work immediately in Mellon. We offer a utility function to robotically generate the node interface out of your block definition — no UI code required.
This integration is feasible because every block exposes the identical properties (inputs, intermediate_outputs, expected_components). This consistent API means Mellon can robotically generate a node’s UI from any block definition and compose blocks into higher-level nodes.
For instance, diffusers/FLUX.2-klein-4B-modular accommodates a pipeline definition, component references, and a mellon_pipeline_config.json — multi function repo. Load it in Python with ModularPipeline.from_pretrained("diffusers/FLUX.2-klein-4B-modular") or in Mellon to create either a single-node or multi-node workflow.
Here’s a fast example. We add a Gemini prompt expansion node — hosted as a modular repo at diffusers/gemini-prompt-expander-mellon — to an existing text-to-image workflow:
- Drag in a Dynamic Block node and enter the
repo_id(i.e.diffusers/gemini-prompt-expander-mellon) - Click LOAD CUSTOM BLOCK — the node robotically grows a textbox in your prompt input and an output socket named “prompt”, all configured from the repo
- Type a brief prompt, connect the output to the Encode Prompt node, and run
Gemini expands your short prompt into an in depth description before generating the image. No code, no configuration — only a Hub repo id.
This is only one example. For an in depth walkthrough, try the Mellon x Modular Diffusers guide.
Conclusion
Modular Diffusers brings the composability and adaptability the community has been asking for, without compromising the features that make Diffusers powerful. It’s still early — we wish your input to shape what comes next. Give it a attempt to tell us what works, what doesn’t, and what’s missing.
Resources
Due to Chun Te Lee for the thumbnail, and to Poli, Pedro, Lysandre, Linoy, Aritra, and Steven for his or her thoughtful reviews.
