Skip to content

[docs] Modular diffusers #11931

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Conversation

stevhliu
Copy link
Member

@stevhliu stevhliu commented Jul 15, 2025

Draft for a quickstart that should ideally briefly summarize everything a developer needs to know about Modular Diffusers without referencing other resources.

edit: expanding scope into other modular docs as well with additions like API references

(bit of a brain dump at the moment, not ready for review yet!)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@stevhliu stevhliu changed the title [docs] Modular diffusers quickstart [docs] Modular diffusers Jul 16, 2025
@stevhliu stevhliu force-pushed the modular-diffusers branch from e445322 to 0f5330b Compare July 18, 2025 23:40
@stevhliu
Copy link
Member Author

Ok, I think I have an initial version of the refactored modular docs @huggingface/diffusers!

  • Added an API section for modular here. Let me know if there is anything useful that is missing!
  • A bit undecided about the Quickstart/End-to-end doc at the moment. The Quickstart tries cover everything through the lens of implementing Differential Diffusion. The End-to-end example focuses more on a 4-step process for implementing Differential Diffusion. I think the Quickstart is more comprehensive but I'm not sure and would appreciate your feedback!
  • Split Guiders out into a separate doc.
  • There were many practical examples scattered through that I have omitted for the time being because they made the docs really long. I'm thinking of adding a "Recipes" section to showcase these practical examples.

@stevhliu stevhliu force-pushed the modular-diffusers branch from 2aa3ef4 to 5ee815b Compare July 30, 2025 19:03
@stevhliu stevhliu marked this pull request as ready for review July 30, 2025 20:13
@stevhliu stevhliu requested a review from yiyixuxu July 30, 2025 20:13
Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot @stevhliu!


[Differential Diffusion](https://differential-diffusion.github.io/) differs from standard image-to-image in its `prepare_latents` and `denoise` blocks. All the other blocks can be reused, but you'll need to modify these two.

Create placeholder `PipelineBlocks` for `prepare_latents` and `denoise` by copying and modifying the existing ones.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the flow here is we start with rewriting these two special blocks and then assemble them
if we are going with this flow, we don't need to create placeholder here (my initial process is to create complete structure first, hence we needed the placeholder, I found it more efficient that way but less intuitive, so what you did here is more beginer friendly so let's go with this)

we can just jump to the prepare_latents section, and in denoise section, first go over the block's structure, and then how to rewrite the custom sub-block and finally assemble everything back together into a custom denoise before assemble the entire pipeline

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. But not opposed to showing the denoise block structure early for illustration (without the placeholders)

dd_pipeline = dd_auto_blocks.init_pipeline("YiYiXu/modular-demo-auto", collection="diffdiff")
dd_pipeline.load_default_components(torch_dtype=torch.float16)
```

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's link to the full implementation for this example here https://huggingface.co/YiYiXu/modular-diffdiff/blob/main/block.py

If a variable is modified in `block_state` but not declared as an `intermediate_outputs`, it won't be added to [`~modular_pipelines.PipelineState`].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If a variable is modified in `block_state` but not declared as an `intermediate_outputs`, it won't be added to [`~modular_pipelines.PipelineState`].
If a new variable is added in `block_state` but not declared as an `intermediate_outputs`, it won't be added to [`~modular_pipelines.PipelineState`].

a bit confusing here I think, lol but it is what it is

@DN6 is doing some refactor in #11969; we will be able to simplify things a lot and make this section of doc more intuitive in a future PR

@@ -12,83 +12,63 @@ specific language governing permissions and limitations under the License.

# PipelineBlock

<Tip warning={true}>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we don't want to show a complete example of a pipeline block here, maybe we can link to some source code ? https://github.com/huggingface/diffusers/blob/main/src/diffusers/modular_pipelines/stable_diffusion_xl/encoders.py

* No need to call `self.get_block_state()` or `self.set_block_state()`
## Loop blocks

A loop block is a [`~modular_pipelines.PipelineBlock`], but the `__call__` method behaves differently.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we point out the look block is used to compose the self.loop_step in the LoopWrappeer example above?


Finally, assemble your loop by adding the block(s) to the wrapper:
Use the [`~modular_pipelines.LoopSequentialPipelineBlocks.from_blocks_dict`] method to add the loop block to the loop wrapper to create [`~modular_pipelines.LoopSequentialPipelineBlocks`].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can mention that this loop take an initial value for x and add 1 each iteration;
and next example, it will add 2 at each iteration -
just so that it's easier to understand how the loop wrapper and loop block work together


🧪 **Experimental Feature**: Modular Diffusers is an experimental feature we are actively developing. The API may be subject to breaking changes.
The main difference is to include an expected `output` argument in the pipeline.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for running the pipeline, the main difference is just the output argument - we are eliminating this difference soon #11944

so the main difference is really from the loading (in modular pipeline does not load components by default), since the code example here also included loading, I think we should also briefly mention it here.


When we create a `SequentialPipelineBlocks` from this preset, it instantiates each block class into actual block objects. Its `sub_blocks` attribute now contains these instantiated objects:
## Adding blocks
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a transition from ModularPipeline to blocks here?

feel like something is missing: it is a bit sudden and unclear why we are talking about adding blocks right after introducing the ModularPipeline

we can just say that we need blocks to create pipeline, you can write your own blocks, or use the official ones from diffusers (the casee here)

and we really need to talk about how to create these ready-to-use blocks (if we want to talk about it in a different section, we can add a link here)


## Loading custom guiders

Guiders that are already saved on the Hub with a `modular_model_index.json` file are considered a `from_pretrained` component now instead of a `from_config` component.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Guiders that are already saved on the Hub with a `modular_model_index.json` file are considered a `from_pretrained` component now instead of a `from_config` component.
Guiders that are already saved on the Hub and listed in a `modular_model_index.json` file are considered a `from_pretrained` component now instead of a `from_config` component.

Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I just went through the Overview and Quickstart for now.

In the overview section, I'm missing a bit of context about what this provides. We could perhaps state that this is a way to implement new diffusion pipelines based on existing library components, making it possible for the community to use our model with diffusers without having to open a PR to diffusers. (Assuming that's the overall goal).


<Tip warning={true}>
> [!WARNING]
> ⚠︎ Modular Diffusers is still under active development and it's API may change.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> ⚠︎ Modular Diffusers is still under active development and it's API may change.
> ⚠︎ Modular Diffusers is still under active development and its API may change.


**Assemble Like LEGO®**: You can mix and match between blocks in flexible ways. This allows you to write dedicated blocks unique to specific workflows, and then assemble different blocks into a pipeline that can be used more conveniently for multiple workflows.
- A [quickstart](./quickstart) start for implementing an example workflow with Modular Diffusers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- A [quickstart](./quickstart) start for implementing an example workflow with Modular Diffusers.
- A [quickstart](./quickstart) guide for implementing an example workflow with Modular Diffusers.


## ModularPipeline
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From here on, it doesn't seem to match the structure in the toc tree.

- [`LoopSequentialPipelineBlocks`] is a multi-block that runs iteratively and is designed for iterative workflows.
- [`AutoPipelineBlocks`] is a collection of blocks for different workflows and it selects which block to run based on the input. It is designed to conveniently package multiple workflows into a single pipeline.

[Differential Diffusion](https://differential-diffusion.github.io/) is an image-to-image workflow. Start with the `IMAGE2IMAGE_BLOCKS` preset, a collection of `ModularPipelineBlocks` for image-to-image generation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[Differential Diffusion](https://differential-diffusion.github.io/) is an image-to-image workflow. Start with the `IMAGE2IMAGE_BLOCKS` preset, a collection of `ModularPipelineBlocks` for image-to-image generation.
This may still sound too abstract, but we can usually get started with block _presets_ provided by modular diffusers. In this case, [Differential Diffusion](https://differential-diffusion.github.io/) is an image-to-image workflow, so we can adopt the `IMAGE2IMAGE_BLOCKS` preset a collection of `ModularPipelineBlocks` for image-to-image generation.

Comment on lines +32 to +41
IMAGE2IMAGE_BLOCKS = InsertableDict([
("text_encoder", StableDiffusionXLTextEncoderStep),
("image_encoder", StableDiffusionXLVaeEncoderStep),
("input", StableDiffusionXLInputStep),
("set_timesteps", StableDiffusionXLImg2ImgSetTimestepsStep),
("prepare_latents", StableDiffusionXLImg2ImgPrepareLatentsStep),
("prepare_add_cond", StableDiffusionXLImg2ImgPrepareAdditionalConditioningStep),
("denoise", StableDiffusionXLDenoiseStep),
("decode", StableDiffusionXLDecodeStep)
])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this, are we overwriting the preset definition? I think we may just want to print IMAGE2IMAGE_BLOCKS to see what it contains:

>>> print(IMAGE2IMAGE_BLOCKS)
InsertableDict([
  0: ('text_encoder', <class 'diffusers.modular_pipelines.stable_diffusion_xl.encoders.StableDiffusionXLTextEncoderStep'>),
  1: ('image_encoder', <class 'diffusers.modular_pipelines.stable_diffusion_xl.encoders.StableDiffusionXLVaeEncoderStep'>),
  2: ('input', <class 'diffusers.modular_pipelines.stable_diffusion_xl.before_denoise.StableDiffusionXLInputStep'>),
  3: ('set_timesteps', <class 'diffusers.modular_pipelines.stable_diffusion_xl.before_denoise.StableDiffusionXLImg2ImgSetTimestepsStep'>),
  4: ('prepare_latents', <class 'diffusers.modular_pipelines.stable_diffusion_xl.before_denoise.StableDiffusionXLImg2ImgPrepareLatentsStep'>),
  5: ('prepare_add_cond', <class 'diffusers.modular_pipelines.stable_diffusion_xl.before_denoise.StableDiffusionXLImg2ImgPrepareAdditionalConditioningStep'>),
  6: ('denoise', <class 'diffusers.modular_pipelines.stable_diffusion_xl.denoise.StableDiffusionXLDenoiseStep'>),
  7: ('decode', <class 'diffusers.modular_pipelines.stable_diffusion_xl.decoders.StableDiffusionXLDecodeStep'>)
])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also some brief commentary might be helpful, like the denoise block being made with a loop block. This comes later in the guide, but we could still anticipate a bit so the pieces start to click.


### IP-Adapter

Stable Diffusion XL already has a preset IP-Adapter block that you can use and doesn't require any changes to the existing Differential Diffusion pipeline.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Stable Diffusion XL already has a preset IP-Adapter block that you can use and doesn't require any changes to the existing Differential Diffusion pipeline.
Stable Diffusion XL already has an IP-Adapter block preset that you can use, and it doesn't require any changes to work with the existing Differential Diffusion pipeline we created.

Stable Diffusion XL already has a preset IP-Adapter block that you can use and doesn't require any changes to the existing Differential Diffusion pipeline.

```py
from diffusers.modular_pipelines.stable_diffusion_xl.encoders import StableDiffusionXLAutoIPAdapterStep
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not using a "preset" in the same sense as in the previous instances, just a block.


### AutoPipelineBlocks

The Differential Diffusion, IP-Adapter, and ControlNet workflows can be bundled into a single [`ModularPipeline`] by using [`AutoPipelineBlocks`]. This allows automatically selecting which sub-blocks to run based on the inputs like `control_image` or `ip_adapter_image`. If none of these inputs are passed, then it defaults to the Differential Diffusion.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Differential Diffusion, IP-Adapter, and ControlNet workflows can be bundled into a single [`ModularPipeline`] by using [`AutoPipelineBlocks`]. This allows automatically selecting which sub-blocks to run based on the inputs like `control_image` or `ip_adapter_image`. If none of these inputs are passed, then it defaults to the Differential Diffusion.
The Differential Diffusion, IP-Adapter, and ControlNet workflows can be bundled into a single [`ModularPipeline`] by using [`AutoPipelineBlocks`]. This allows automatically selecting which sub-blocks to run based on the inputs like `control_image` or `ip_adapter_image`. If none of these inputs are passed, then it defaults to the standard Differential Diffusion implementation.


### ControlNet

Stable Diffusion XL already has a preset ControlNet block that can readily be used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as in the ip-adapter case.


components = ComponentsManager()

diffdiff_pipeline = ModularPipeline.from_pretrained("YiYiXu/modular-diffdiff-0704", trust_remote_code=True, components_manager=components, collection="diffdiff")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
diffdiff_pipeline = ModularPipeline.from_pretrained("YiYiXu/modular-diffdiff-0704", trust_remote_code=True, components_manager=components, collection="diffdiff")
diffdiff_pipeline = ModularPipeline.from_pretrained(
"YiYiXu/modular-diffdiff-0704",
trust_remote_code=True,
components_manager=components,
collection="diffdiff"
)

For more clarity. Also, a comment about trust_remote_code could be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants