-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
I am using groupoffloading for saving gpu memory. I got worse results with a cosine similarity aboud 0.934 on A800, which is unexpected. And I got results with a cosine similarity about 0.78 on 4090, which is worse.
Could anyone give me any suggestions to fix the precision?
Reproduction
apply_group_offloading(
transformer,
onload_device=torch.device(f"cuda:{self.local_rank}"),
offload_device=torch.device("cpu"),
offload_type="block_level",
num_blocks_per_group=1,
non_blocking=True,
use_stream=True,
)
### Logs
```shell
System Info
I tried diffusers 0.33.1 and 0.34.
Who can help?
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working