WISA

This is the official reproduction of WISA, designed to enhance Text-to-Video models by improving their ability to simulate the real world.

WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation
Jing Wang*, Ao Ma*, Ke Cao*, Jun Zheng, Zhanjie Zhang, Jiasong Feng, Shanyuan Liu, Yuhang Ma, Bo Cheng, Dawei Leng‡, Yuhui Yin, Xiaodan Liang‡(*Equal Contribution, ‡Corresponding Authors)

📰 News

[2025.05.15] 🔥 We are excited to announce the official release of WISA's codebase and model weights on GitHub! This implementation is built upon the powerful finetrainers framework.
[2025.03.28] We have uploaded the WISA-80K dataset to Hugging Face, including processed video clips and annotations.
[2025.03.12] We have released our paper WISA and created a dedicated project homepage.

Wan2.1-14B	WISA	Prompt
wan_1.mp4	wisa_wan_1.mp4	A dry clump of soil rests on a flat surface, with fine details of its texture and cracks visible. ...
wan_2.mp4	wisa_wan_2.mp4	The camera focuses on a toothpaste tube on the bathroom countertop. As a finger gently applies...
wan_3.mp4	wisa_wan_3.mp4	A bowl of clear water sits in the center of a freezer. As the temperature gradually drops...

🚀 Quick Started

1. Environment Set Up

Clone this repository and install packages.

git clone https://github.com/360CVGroup/WISA.git
cd WISA
conda create -n wisa python=3.10
conda activate wisa
pip install -r requirements.txt

2. Download Pretrained Weights

1. Download Text-to-Video Pretrained Models

Please download CogvideoX and Wan2.1 checkpoints from ModelScope and put it in ./pretrain_models/.

mkdir ./pretrain_models
cd ./pretrain_models
pip install modelscope
modelscope download Wan-AI/Wan2.1-T2V-14B-Diffusers --local_dir ./Wan2.1-T2V-14B-Diffusers
modelscope download ZhipuAI/CogVideoX-5b --local_dir ./CogVideoX-5b-Diffusers

2. Download WISA Pretrained Lora and Physical-block Weight

Please download weight from Huggingface and put it in ./pretrain_models/WISA/.

git lfs install
git clone https://huggingface.co/qihoo360/WISA
cd ..

3. Generate Video

You can revise the MODEL_TYPE, GEN_TYPE, PROMPT_PATH, OUTPUT_FILE and LORA_PATH in inference.sh for different inference settings. Then run

sh inference.sh

✨ Training

1. Download WISA-80K

Download the WISA-80K dataset from huggingface.

2. Precomputing Latents and Text Embeddings (Optional)

This project supports precomputing and saving the latent codes of videos and text embeddings to avoid loading the VAE and Text Encoder onto the GPU during training, thereby reducing GPU memory usage. This operation is essential when training Wan2.1-14B; otherwise, it will result in an out-of-memory (OOM) error.

Step 1: you need to add the following parameters to the dataset_cmd in your training script (like examples/training/sft/wan/crush_smol_lora/train_wisa.sh), and ensure you have sufficient storage space available.

dataset_cmd=(
  --dataset_config $TRAINING_DATASET_CONFIG
  --dataset_shuffle_buffer_size 10
  --precomputation_items 2000        # Number of samples to precompute
  --enable_precomputation            # Flag to activate precomputation
  --precomputation_once
  --precomputation_dir ./cache/path  # Directory for cached outputs
  --hash_save                        # Enable hash-based filename storage
  --first_samples
)

Step 2: Configure dataset paths in file examples/training/sft/wan/crush_smol_lora/training_wisa.json and execute

sh examples/training/sft/wan/crush_smol_lora/train_wisa.sh

"Note: Process data in batches to prevent CPU cache overload (recommended maximum: 12,000 samples per batch)."

Step 3: Disable --enable_precomputation flag

dataset_cmd=(
  --dataset_config $TRAINING_DATASET_CONFIG
  --dataset_shuffle_buffer_size 10
  --precomputation_items 2000        # Number of samples to precompute
  # --enable_precomputation            # Flag to activate precomputation
  --precomputation_once
  --precomputation_dir ./cache/path  # Directory for cached outputs
  --hash_save                        # Enable hash-based filename storage
  --first_samples
)

3. Start Training

sh examples/training/sft/wan/crush_smol_lora/train_wisa.sh

Due to quality issues in the validation phase (bug-induced video generation artifacts causing significant deviation from test-phase results), we have disabled validation.

👍 Acknowledgement

This work stands on the shoulders of groundbreaking research and open-source contributions. We extend our deepest gratitude to the authors and contributors of the following projects:

CogVideoX - For their pioneering work in video generation
Wan2.1 - For their foundational contributions to large-scale video models

Special thanks to the finetrainers framework for enabling efficient model training - your excellent work has been invaluable to this project.

BibTeX

@misc{wang2025wisa,
                title={WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation}, 
                author={Jing Wang and Ao Ma and Ke Cao and Jun Zheng and Zhanjie Zhang and Jiasong Feng and Shanyuan Liu and Yuhang Ma and Bo Cheng and Dawei Leng and Yuhui Yin and Xiaodan Liang},
                year={2025},
                eprint={2502.08153},
                archivePrefix={arXiv},
                primaryClass={cs.CV},
                url={https://arxiv.org/abs/2502.08153}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
accelerate_configs		accelerate_configs
assets		assets
docs		docs
examples		examples
finetrainers		finetrainers
tests		tests
LICENSE		LICENSE
README.md		README.md
inference.sh		inference.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WISA

📰 News

🚀 Quick Started

1. Environment Set Up

2. Download Pretrained Weights

1. Download Text-to-Video Pretrained Models

2. Download WISA Pretrained Lora and Physical-block Weight

3. Generate Video

✨ Training

1. Download WISA-80K

2. Precomputing Latents and Text Embeddings (Optional)

3. Start Training

👍 Acknowledgement

BibTeX

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

360CVGroup/WISA

Folders and files

Latest commit

History

Repository files navigation

WISA

📰 News

🚀 Quick Started

1. Environment Set Up

2. Download Pretrained Weights

1. Download Text-to-Video Pretrained Models

2. Download WISA Pretrained Lora and Physical-block Weight

3. Generate Video

✨ Training

1. Download WISA-80K

2. Precomputing Latents and Text Embeddings (Optional)

3. Start Training

👍 Acknowledgement

BibTeX

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages