[core] support attention backends for LTX #12021

sayakpaul · 2025-07-30T06:45:32Z

What does this PR do?

Working code:

Unfold

import torch
from diffusers import LTXConditionPipeline
from diffusers.pipelines.ltx.pipeline_ltx_condition import LTXVideoCondition
from diffusers.utils import export_to_video, load_image, load_video

def round_to_nearest_resolution_acceptable_by_vae(height, width):
    height = height - (height % pipe.vae_spatial_compression_ratio)
    width = width - (width % pipe.vae_spatial_compression_ratio)
    return height, width

pipe = LTXConditionPipeline.from_pretrained(
    "Lightricks/LTX-Video-0.9.8-13B-distilled", torch_dtype=torch.bfloat16
).to("cuda")

image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/penguin.png")
video = load_video(export_to_video([image])) # compress the image using video compression as the model was trained on videos
condition1 = LTXVideoCondition(video=video, frame_index=0)
print(f"{image.size=}")

prompt = "A cute little penguin takes out a book and starts reading it"
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"
expected_height, expected_width = 480, 832
downscale_factor = 2 / 3
num_frames = 96

# Part 1. Generate video at smaller resolution
downscaled_height, downscaled_width = int(expected_height * downscale_factor), int(expected_width * downscale_factor)
downscaled_height, downscaled_width = round_to_nearest_resolution_acceptable_by_vae(
    downscaled_height, downscaled_width
)
video = pipe(
    conditions=[condition1],
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=expected_width,
    height=expected_height,
    num_frames=num_frames,
    num_inference_steps=8,
    generator=torch.Generator().manual_seed(0),
).frames[0]
export_to_video(video, "output.mp4", fps=24)

Output

output.mp4

Regarding the output:

I get the same output on main.
I didn't follow the three-stage pipeline of first generating low-res latents, upsampling them, and then running a few rounds of denoising. So, the quality loss is somewhat expected I believe (regardless of this PR).

HuggingFaceDocBuilderDev · 2025-07-30T06:52:47Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

a-r-r-o-w

Thanks! Just some minor asks in the refactoring

src/diffusers/models/transformers/transformer_ltx.py

a-r-r-o-w · 2025-07-30T09:36:08Z

src/diffusers/models/transformers/transformer_ltx.py

        hidden_states = hidden_states.to(query.dtype)

        hidden_states = attn.to_out[0](hidden_states)
        hidden_states = attn.to_out[1](hidden_states)
        return hidden_states


+class LTXAttention(torch.nn.Module, AttentionModuleMixin):


super clean!

src/diffusers/models/transformers/transformer_ltx.py

Co-authored-by: Aryan <aryan@huggingface.co>

sayakpaul · 2025-07-30T10:49:43Z

Thanks @a-r-r-o-w! Have you started Wan already? If so, cool! If not, can I start and test your parallelism PR a bit with that?

a-r-r-o-w · 2025-07-30T10:55:14Z

Wan attention backend support has already been merged in #11918. For testing parallelism, I've tested most of the implementations (even some outside diffusers) to validate the soundness of going forth with CP-plans. Some plans are available here: https://github.com/huggingface/finetrainers/blob/f476c3717da6cbfb1070505a99ee989426b46d9c/finetrainers/models/_metadata/transformer.py#L68

sayakpaul · 2025-07-30T11:05:05Z

Cool! Once this gets matured a bit would love documenting some benchmakrs across the board.

support attention backends for lTX

d513efa

sayakpaul requested review from DN6 and a-r-r-o-w July 30, 2025 06:45

a-r-r-o-w approved these changes Jul 30, 2025

View reviewed changes

sayakpaul and others added 2 commits July 30, 2025 16:14

Apply suggestions from code review

532711f

Co-authored-by: Aryan <aryan@huggingface.co>

reviewer feedback.

8c8b44a

sayakpaul added the refactor label Jul 30, 2025

Merge branch 'main' into attn-refactor-ltx

5009449

sayakpaul merged commit c052791 into main Jul 30, 2025
14 of 15 checks passed

sayakpaul deleted the attn-refactor-ltx branch July 30, 2025 11:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[core] support attention backends for LTX #12021

[core] support attention backends for LTX #12021

Uh oh!

sayakpaul commented Jul 30, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jul 30, 2025

Uh oh!

a-r-r-o-w left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

a-r-r-o-w Jul 30, 2025

Uh oh!

Uh oh!

sayakpaul commented Jul 30, 2025

Uh oh!

a-r-r-o-w commented Jul 30, 2025

Uh oh!

sayakpaul commented Jul 30, 2025

Uh oh!

Uh oh!

Uh oh!

[core] support attention backends for LTX #12021

[core] support attention backends for LTX #12021

Uh oh!

Conversation

sayakpaul commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Jul 30, 2025

Uh oh!

a-r-r-o-w left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

a-r-r-o-w Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sayakpaul commented Jul 30, 2025

Uh oh!

a-r-r-o-w commented Jul 30, 2025

Uh oh!

sayakpaul commented Jul 30, 2025

Uh oh!

Uh oh!

Uh oh!

sayakpaul commented Jul 30, 2025 •

edited

Loading