SDXL in 4 steps with Latent Consistency LoRAs

Latent Consistency Models (LCM) are a approach to decrease the variety of steps required to generate a picture with Stable Diffusion (or SDXL) by distilling the unique model into one other version that requires fewer steps (4 to eight as a substitute of the unique 25 to 50). Distillation is a sort of training procedure that attempts to duplicate the outputs from a source model using a brand new one. The distilled model could also be designed to be smaller (that’s the case of DistilBERT or the recently-released Distil-Whisper) or, on this case, require fewer steps to run. It’s normally a lengthy and expensive process that requires huge amounts of information, patience, and a number of GPUs.

Well, that was the establishment before today!

We’re delighted to announce a brand new method that may essentially make Stable Diffusion and SDXL faster, as in the event that they had been distilled using the LCM process! How does it sound to run any SDXL model in about 1 second as a substitute of seven on a 3090, or 10x faster on Mac? Read on for details!

Method Overview

So, what’s the trick?
For latent consistency distillation, each model must be distilled individually. The core idea with LCM LoRA is to coach only a small variety of adapters, generally known as LoRA layers, as a substitute of the total model. The resulting LoRAs can then be applied to any fine-tuned version of the model without having to distil them individually. Should you are itching to see how this looks in practice, just jump to the next section to play with the inference code. If you wish to train your individual LoRAs, that is the method you’d use:

Select an available teacher model from the Hub. For instance, you should utilize SDXL (base), or any fine-tuned or dreamboothed version you want.
Train a LCM LoRA on the model. LoRA is a sort of performance-efficient fine-tuning, or PEFT, that’s less expensive to perform than full model fine-tuning. For added details on PEFT, please check this blog post or the diffusers LoRA documentation.
Use the LoRA with any SDXL diffusion model and the LCM scheduler; bingo! You get high-quality inference in only a number of steps.

For more details on the method, please download our paper.

Why does this matter?

Fast inference of Stable Diffusion and SDXL enables latest use-cases and workflows. To call a number of:

Accessibility: generative tools will be used effectively by more people, even in the event that they don’t have access to the most recent hardware.
Faster iteration: get more images and multiple variants in a fraction of the time! That is great for artists and researchers; whether for private or business use.
Production workloads could also be possible on different accelerators, including CPUs.
Cheaper image generation services.

To gauge the speed difference we’re talking about, generating a single 1024×1024 image on an M1 Mac with SDXL (base) takes a few minute. Using the LCM LoRA, we get great ends in just ~6s (4 steps). That is an order of magnitude faster, and never having to attend for results is a game-changer. Using a 4090, we get almost quick response (lower than 1s). This unlocks using SDXL in applications where real-time events are a requirement.

Fast Inference with SDXL LCM LoRAs

The version of diffusers released today makes it very easy to make use of LCM LoRAs:

from diffusers import DiffusionPipeline, LCMScheduler
import torch

model_id = "stabilityai/stable-diffusion-xl-base-1.0"
lcm_lora_id = "latent-consistency/lcm-lora-sdxl"

pipe = DiffusionPipeline.from_pretrained(model_id, variant="fp16")

pipe.load_lora_weights(lcm_lora_id)
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe.to(device="cuda", dtype=torch.float16)

prompt = "close-up photography of old man standing within the rain at night, in a street lit by lamps, leica 35mm summilux"
images = pipe(
    prompt=prompt,
    num_inference_steps=4,
    guidance_scale=1,
).images[0]

Note how the code:

Instantiates a normal diffusion pipeline with the SDXL 1.0 base model.
Applies the LCM LoRA.
Changes the scheduler to the LCMScheduler, which is the one utilized in latent consistency models.
That’s it!

This could lead to the next full-resolution image:

SDXL in 4 steps with LCM LoRA
Image generated with SDXL in 4 steps using an LCM LoRA.

Quality Comparison

Let’s see how the variety of steps impacts generation quality. The next code will generate images with 1 to eight total inference steps:

images = []
for steps in range(8):
    generator = torch.Generator(device=pipe.device).manual_seed(1337)
    image = pipe(
        prompt=prompt,
        num_inference_steps=steps+1,
        guidance_scale=1,
        generator=generator,
    ).images[0]
    images.append(image)

These are the 8 images displayed in a grid:

LCM LoRA generations with 1 to eight steps.

As expected, using just 1 step produces an approximate shape without discernible features and lacking texture. Nevertheless, results quickly improve, they usually are frequently very satisfactory in only 4 to six steps. Personally, I find the 8-step image within the previous test to be a bit too saturated and “cartoony” for my taste, so I’d probably choose from those with 5 and 6 steps in this instance. Generation is so fast that you could create a bunch of various variants using just 4 steps, after which select those you want and iterate using a pair more steps and refined prompts as vital.

Guidance Scale and Negative Prompts

Note that within the previous examples we used a guidance_scale of 1, which effectively disables it. This works well for many prompts, and it’s fastest, but ignores negative prompts. You can even explore using negative prompts by providing a guidance scale between 1 and 2 – we found that larger values don’t work.

Quality vs base SDXL

How does this compare against the usual SDXL pipeline, by way of quality? Let’s see an example!

We are able to quickly revert our pipeline to a normal SDXL pipeline by unloading the LoRA weights and switching to the default scheduler:

from diffusers import EulerDiscreteScheduler

pipe.unload_lora_weights()
pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config)

Then we will run inference as usual for SDXL. We’ll gather results using various variety of steps:

images = []
for steps in (1, 4, 8, 15, 20, 25, 30, 50):
    generator = torch.Generator(device=pipe.device).manual_seed(1337)
    image = pipe(
        prompt=prompt,
        num_inference_steps=steps,
        generator=generator,
    ).images[0]
    images.append(image)

SDXL results for various inference steps
SDXL pipeline results (same prompt and random seed), using 1, 4, 8, 15, 20, 25, 30, and 50 steps.

As you may see, images in this instance are just about useless until ~20 steps (second row), and quality still increases noticeably with more steps. The small print in the ultimate image are amazing, but it surely took 50 steps to get there.

LCM LoRAs with other models

This method also works for another fine-tuned SDXL or Stable Diffusion model. To reveal, let’s examine the way to run inference on collage-diffusion, a model fine-tuned from Stable Diffusion v1.5 using Dreambooth.

The code is analogous to the one we saw within the previous examples. We load the fine-tuned model, after which the LCM LoRA suitable for Stable Diffusion v1.5.

from diffusers import DiffusionPipeline, LCMScheduler
import torch

model_id = "wavymulder/collage-diffusion"
lcm_lora_id = "latent-consistency/lcm-lora-sdv1-5"

pipe = DiffusionPipeline.from_pretrained(model_id, variant="fp16")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe.load_lora_weights(lcm_lora_id)
pipe.to(device="cuda", dtype=torch.float16)

prompt = "collage style kid sits taking a look at the night sky, stuffed with stars"

generator = torch.Generator(device=pipe.device).manual_seed(1337)
images = pipe(
    prompt=prompt,
    generator=generator,
    negative_prompt=negative_prompt,
    num_inference_steps=4,
    guidance_scale=1,
).images[0]
images

LCM LoRA technique with a Dreambooth Stable Diffusion v1.5 model, allowing 4-step inference.

Full Diffusers Integration

The mixing of LCM in diffusers makes it possible to reap the benefits of many features and workflows which can be a part of the diffusers toolbox. For instance:

Out of the box mps support for Macs with Apple Silicon.
Memory and performance optimizations like flash attention or torch.compile().
Additional memory saving strategies for low-RAM environments, including model offload.
Workflows like ControlNet or image-to-image.
Training and fine-tuning scripts.

Benchmarks

This section will not be meant to be exhaustive, but illustrative of the generation speed we achieve on various computers. Allow us to stress again how liberating it’s to explore image generation so easily.

Hardware	SDXL LoRA LCM (4 steps)	SDXL standard (25 steps)
Mac, M1 Max	6.5s	64s
2080 Ti	4.7s	10.2s
3090	1.4s	7s
4090	0.7s	3.4s
T4 (Google Colab Free Tier)	8.4s	26.5s
A100 (80 GB)	1.2s	3.8s
Intel i9-10980XE CPU (1/36 cores used)	29s	219s

These tests were run with a batch size of 1 in all cases, using this script by Sayak Paul.

For cards with numerous capability, corresponding to A100, performance increases significantly when generating multiple images without delay, which is frequently the case for production workloads.

LCM LoRAs and Models Released Today

Bonus: Mix LCM LoRAs with regular SDXL LoRAs

Using the diffusers + PEFT integration, you may mix LCM LoRAs with regular SDXL LoRAs, giving them the superpower to run LCM inference in just 4 steps.

Here we’re going to mix CiroN2022/toy_face LoRA with the LCM LoRA:

from diffusers import DiffusionPipeline, LCMScheduler
import torch

model_id = "stabilityai/stable-diffusion-xl-base-1.0"
lcm_lora_id = "latent-consistency/lcm-lora-sdxl"
pipe = DiffusionPipeline.from_pretrained(model_id, variant="fp16")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

pipe.load_lora_weights(lcm_lora_id)
pipe.load_lora_weights("CiroN2022/toy-face", weight_name="toy_face_sdxl.safetensors", adapter_name="toy")

pipe.set_adapters(["lora", "toy"], adapter_weights=[1.0, 0.8])
pipe.to(device="cuda", dtype=torch.float16)

prompt = "a toy_face man"
negative_prompt = "blurry, low quality, render, 3D, oversaturated"
images = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=4,
    guidance_scale=0.5,
).images[0]
images

Combining LoRAs for fast inference
Standard and LCM LoRAs combined for fast (4 step) inference.

Need ideas to explore some LoRAs? Take a look at our experimental LoRA the Explorer (LCM version) Space to check amazing creations by the community and get inspired!

Train LCM Models and LoRAs

As a part of the diffusers release today, we’re providing training and fine-tuning scripts developed in collaboration with the LCM team authors. They permit users to:

Perform full-model distillation of Stable Diffusion or SDXL models on large datasets corresponding to Laion.
Train LCM LoRAs, which is a much easier process. As we have shown on this post, it also makes it possible to run fast inference with Stable Diffusion, without having to undergo distillation training.

For more details, please check the instructions for SDXL or Stable Diffusion within the repo.

We hope these scripts encourage the community to try their very own fine-tunes. Please, do tell us when you use them in your projects!

Resources

Credits

The amazing work on Latent Consistency Models was performed by the LCM Team, please be certain that to ascertain out their code, report and paper. This project is a collaboration between the diffusers team, the LCM team, and community contributor Daniel Gu. We imagine it is a testament to the enabling power of open source AI, the cornerstone that permits researchers, practitioners and tinkerers to explore latest ideas and collaborate. We might also prefer to thank @madebyollin for his or her continued contributions to the community, including the float16 autoencoder we use in our training scripts.

Source link

SDXL in 4 steps with Latent Consistency LoRAs

Contents

Method Overview

Why does this matter?

Fast Inference with SDXL LCM LoRAs

Quality Comparison

Guidance Scale and Negative Prompts

Quality vs base SDXL

LCM LoRAs with other models

Full Diffusers Integration

Benchmarks

LCM LoRAs and Models Released Today

Bonus: Mix LCM LoRAs with regular SDXL LoRAs

Train LCM Models and LoRAs

Resources

Credits

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

How Vision Language Models Are Trained from “Scratch”

Why Care About Prompt Caching in LLMs?

Supply-chain attack using invisible code hits GitHub and other repositories

Introducing NVIDIA NeMo Retriever’s Generalizable Agentic Retrieval Pipeline

Why physical AI is becoming manufacturing’s next advantage

SDXL in 4 steps with Latent Consistency LoRAs

Contents

Method Overview

Why does this matter?

Fast Inference with SDXL LCM LoRAs

Quality Comparison

Guidance Scale and Negative Prompts

Quality vs base SDXL

LCM LoRAs with other models

Full Diffusers Integration

Benchmarks

LCM LoRAs and Models Released Today

Bonus: Mix LCM LoRAs with regular SDXL LoRAs

Train LCM Models and LoRAs

Resources

Credits

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.