SegMoE is an exciting framework for creating Mixture-of-Experts Diffusion models from scratch! SegMoE is comprehensively integrated inside the Hugging Face ecosystem and comes supported with diffusers 🔥!
Among the many features and integrations being released today:
Table of Contents
What’s SegMoE?
SegMoE models follow the identical architecture as Stable Diffusion. Like Mixtral 8x7b, a SegMoE model comes with multiple models in a single. The way in which this works is by replacing some Feed-Forward layers with a sparse MoE layer. A MoE layer accommodates a router network to pick which experts process which tokens most efficiently.
You need to use the segmoe package to create your personal MoE models! The method takes just a couple of minutes. For further information, please visit the Github Repository. We take inspiration from the favored library mergekit to design segmoe. We thank the contributors of mergekit for such a useful library.
For more details on MoEs, see the Hugging Face 🤗 post: hf.co/blog/moe.
SegMoE release TL;DR;
- Release of SegMoE-4×2, SegMoE-2×1 and SegMoE-SD4x2 versions
- Release of custom MoE-making code
In regards to the name
The SegMoE MoEs are called SegMoE-AxB, where A refers back to the variety of expert models MoE-d together, while the second number refers back to the variety of experts involved within the generation of every image. Just some layers of the model (the feed-forward blocks, attentions, or all) are replicated depending on the configuration settings; the remaining of the parameters are the identical as in a Stable Diffusion model. For more details about how MoEs work, please check with the “Mixture of Experts Explained” post.
Inference
We release 3 merges on the Hub:
- SegMoE 2×1 has two expert models.
- SegMoE 4×2 has 4 expert models.
- SegMoE SD 4×2 has 4 Stable Diffusion 1.5 expert models.
Samples
Images generated using SegMoE 4×2
Images generated using SegMoE 2×1:
Images generated using SegMoE SD 4×2
Using 🤗 Diffusers
Please, run the next command to put in the segmoe package. Be sure you could have the newest version of diffusers and transformers installed.
pip install -U segmoe diffusers transformers
The next loads up the second model (“SegMoE 4×2”) from the list above, and runs generation on it.
from segmoe import SegMoEPipeline
pipeline = SegMoEPipeline("segmind/SegMoE-4x2-v0", device="cuda")
prompt = "cosmic canvas, orange city background, painting of a chubby cat"
negative_prompt = "nsfw, bad quality, worse quality"
img = pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
height=1024,
width=1024,
num_inference_steps=25,
guidance_scale=7.5,
).images[0]
img.save("image.png")
Using a Local Model
Alternatively, an area model may also be loaded up, here segmoe_v0 is the trail to the directory containing the local SegMoE model. Checkout Creating your Own SegMoE to learn tips on how to construct your personal!
from segmoe import SegMoEPipeline
pipeline = SegMoEPipeline("segmoe_v0", device="cuda")
prompt = "cosmic canvas, orange city background, painting of a chubby cat"
negative_prompt = "nsfw, bad quality, worse quality"
img = pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
height=1024,
width=1024,
num_inference_steps=25,
guidance_scale=7.5,
).images[0]
img.save("image.png")
Comparison
Prompt understanding seems to enhance, as shown in the photographs below. Each image shows the next models left to right: SegMoE-2×1-v0, SegMoE-4×2-v0, Base Model (RealVisXL_V3.0)
three green glass bottles
panda bear with aviator glasses on its head
the statue of Liberty next to the Washington Monument
Taj Mahal with its reflection. detailed charcoal sketch.
Creating your Own SegMoE
Simply prepare a config.yaml file, with the next structure:
base_model: Base Model Path, Model Card or CivitAI Download Link
num_experts: Number of experts to use
moe_layers: Type of Layers to Mix (can be "ff", "attn" or "all"). Defaults to "attn"
num_experts_per_tok: Number of Experts to use
experts:
- source_model: Expert 1 Path, Model Card or CivitAI Download Link
positive_prompt: Positive Prompt for computing gate weights
negative_prompt: Negative Prompt for computing gate weights
- source_model: Expert 2 Path, Model Card or CivitAI Download Link
positive_prompt: Positive Prompt for computing gate weights
negative_prompt: Negative Prompt for computing gate weights
- source_model: Expert 3 Path, Model Card or CivitAI Download Link
positive_prompt: Positive Prompt for computing gate weights
negative_prompt: Negative Prompt for computing gate weights
- source_model: Expert 4 Path, Model Card or CivitAI Download Link
positive_prompt: Positive Prompt for computing gate weights
negative_prompt: Negative Prompt for computing gate weights
Any variety of models could be combined. For detailed information on tips on how to create a config file, please check with the github repository
Note
Each Hugging Face and CivitAI models are supported. For CivitAI models, paste the download link of the model, for instance: “https://civitai.com/api/download/models/239306“
Then run the next command:
segmoe config.yaml segmoe_v0
It will create a folder called segmoe_v0 with the next structure:
├── model_index.json
├── scheduler
│  └── scheduler_config.json
├── text_encoder
│  ├── config.json
│  └── model.safetensors
├── text_encoder_2
│  ├── config.json
│  └── model.safetensors
├── tokenizer
│  ├── merges.txt
│  ├── special_tokens_map.json
│  ├── tokenizer_config.json
│  └── vocab.json
├── tokenizer_2
│  ├── merges.txt
│  ├── special_tokens_map.json
│  ├── tokenizer_config.json
│  └── vocab.json
├── unet
│  ├── config.json
│  └── diffusion_pytorch_model.safetensors
└──vae
  ├── config.json
  └── diffusion_pytorch_model.safetensors
Alternatively, you may as well use the Python API to create a mix of experts model:
from segmoe import SegMoEPipeline
pipeline = SegMoEPipeline("config.yaml", device="cuda")
pipeline.save_pretrained("segmoe_v0")
Push to Hub
The Model could be pushed to the hub via the huggingface-cli
huggingface-cli upload segmind/segmoe_v0 ./segmoe_v0
The model may also be pushed to the Hub directly from Python:
from huggingface_hub import create_repo, upload_folder
model_id = "segmind/SegMoE-v0"
repo_id = create_repo(repo_id=model_id, exist_ok=True).repo_id
upload_folder(
repo_id=repo_id,
folder_path="segmoe_v0",
commit_message="Initial Commit",
ignore_patterns=["step_*", "epoch_*"],
)
Detailed usage could be found here
Disclaimers and ongoing work
-
Slower Speed: If the variety of experts per token is larger than 1, the MoE performs computation across several expert models. This makes it slower than a single SD 1.5 or SDXL model.
-
High VRAM usage: MoEs run inference in a short time but still need a considerable amount of VRAM (and hence an expensive GPU). This makes it difficult to make use of them in local setups, but they’re great for deployments with multiple GPUs. As a reference point, SegMoE-4×2 requires 24GB of VRAM in half-precision.
Conclusion
We built SegMoE to supply the community a brand new tool that may potentially create SOTA Diffusion Models with ease, just by combining pretrained models while keeping inference times low. We’re excited to see what you may construct with it!



