-
Notifications
You must be signed in to change notification settings - Fork 6.2k
[wan2.2] follow-up #12024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[wan2.2] follow-up #12024
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Hey, With them both being accessed in one loop would enable_model_cpu_offload still work. Like when it hits the transformer_2 it could offload transformer? Cheers, |
@JoeGaffney interesting question, i was trying to debug a similar issue with for wan2.2 i2v the execution path is while the offload sequence is defined as
for me it causes the text_encoder to stay on gpu after encoding the initial image, but also changing the sequence is problematic and currently it supports a single position. i know this deserves it's own issue, still collecting examples coming back to your comment i think it's a valid argument that if the sequence is defined as |
Hey @okaris Also, I think referencing something like this in the example: image_processor = ModularPipeline.from_pretrained("YiYiXu/WanImageProcessor14B", trust_remote_code=True)
image = image_processor(
image="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/wan_i2v_input.JPG",
output="processed_image"
) ...feels a bit fragile. It relies on remote, opaque behavior. I get that it's convenient, but it’s not ideal for production integration or debugging. It would be great to also provide a minimal example that shows how to prepare inputs manually. Cheers, |
Some have mentioned that during their testing the 1st stage is relatively useless. Has anyone on the team actually done sufficient testing on WAN 2.2 to make a determination as to what approach works the best? |
so, yes text_encoder will stay on gpu after encoding the initial image, but it will get offloaded when vae would stay on GPU until the it's used again, including the time when transformers are loaded - but it's really small relatively so does not make that much diffeerence. we could force offload vae though if it's needed |
* support lightx2v lora in wan * add docsa. * reviewer feedback * empty
to use only high-noise stage/transformer, set boundary_ratio to be 0
to use only low-noise stage/transformer_2, set boundary_ratio to be 1
Two-Stage Denoising Loop
boundary_ratio = 0.9 (90%)
boundary_ratio = 1.0 (100% - Single Stage)
Stage Breakdown
boundary_ratio = 0.9
t >= 900
transformer
guidance_scale
t < 900
transformer_2
guidance_scale_2
boundary_ratio = 1.0
t >= 1000
(never true)t < 1000
(always true)transformer_2
guidance_scale_2