CAESAR

A Unified Framework of Foundation and Generative Models for Efficient Compression of Scientific Data

📖 Overview

We introduce CAESAR, a new framework for spatio-temporal scientific data reduction that stands for Conditional AutoEncoder with Super-resolution for Augmented Reduction. The baseline model, CAESAR-V, is built on a standard variational autoencoder with scale hyperpriors and super-resolution modules to achieve high compression. It encodes data into a latent space and uses learned priors for compact, information-rich representation.

The enhanced version, CAESAR-D, begins by compressing keyframes using an autoencoder and extends the architecture by incorporating conditional diffusion to interpolate the latent spaces of missing frames between keyframes. This enables high-fidelity reconstruction of intermediate data without requiring their explicit storage.

Additionally, we develop a GPU-accelerated postprocessing module that enforces error bounds on the reconstructed data, achieving real-time compression while maintaining rigorous accuracy guarantees. Combined together, this offers a set of solutions that balance compression efficiency, reconstruction accuracy, and computational cost for scientific data workflows.

Experimental results across multiple scientific datasets demonstrate that our framework achieves significantly better NRMSE rates compared to rule-based compressors such as SZ3, especially for higher compression ratios.

📦 Installation

1️⃣ Clone the repository

git clone https://github.com/Shaw-git/CAESAR.git
cd CAESAR

2️⃣ Install dependencies

We recommend using Python 3.10+ and a virtual environment (e.g., conda or venv).

pip install -r requirements.txt

✅ Tested Hardware

This project has been tested on:

NVIDIA A100 80GB
NVIDIA RTX 2080 24GB

📝 Pretrained Models

We provide 4 pretrained models for evaluation:

Model	Description	Download Link
`caesar_v.pth`	CAESAR-V	Google Drive
`caesar_d.pth`	CAESAR-D	Google Drive
`caesar_v_tuning_Turb-Rot.pth`	CAESAR-V Finetuned on Turb-Rot dataset	Google Drive
`caesar_d_tuning_Turb-Rot.pth`	CAESAR-D Finetuned on Turb-Rot dataset	Google Drive

📂 Place downloaded models into the ./pretrained/ folder.

📊 Datasets

Example scientific datasets used in this work:

Dataset	Description	Download Link
Turb-Rot	Rotating turbulence dataset	Google Drive

Download and organize datasets into the ./data/ folder as per instructions in data/README.md.

🗂️ Data Organization

All datasets used in this work are stored in NumPy .npz format and follow a standardized 5D tensor structure: [variable, n_samples, T, H, W]

Variable(V): number of physical quantities
Sections(S): number of independent spatial samples
Frames (T): number of time steps per sample
H/W (H, W): spatial resolution (height × width)

np.savez("path.npz", data=your_data)

🚀 Usage

Run compression on dataset

see eval_caesar.ipynb

📄 Citation

If you use CAESAR in your work, please cite:

@inproceedings{li2025foundation,
  title={Foundation Model for Lossy Compression of Spatiotemporal Scientific Data},
  author={Li, Xiao and Lee, Jaemoon and Rangarajan, Anand and Ranka, Sanjay},
  booktitle={Pacific-Asia Conference on Knowledge Discovery and Data Mining},
  pages={368--380},
  year={2025},
  organization={Springer}
}

@article{li2025generative,
  title={Generative Latent Diffusion for Efficient Spatiotemporal Data Reduction},
  author={Li, Xiao and Zhu, Liangji and Rangarajan, Anand and Ranka, Sanjay},
  journal={arXiv preprint arXiv:2507.02129},
  year={2025}
}

📬 Contact

For questions or feedback, feel free to contact Xiao Li at xiao.li@ufl.edu.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.ipynb_checkpoints		.ipynb_checkpoints
CAESAR		CAESAR
__pycache__		__pycache__
figures		figures
README.md		README.md
dataset.py		dataset.py
eval_caesar.ipynb		eval_caesar.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CAESAR

📖 Overview

📦 Installation

1️⃣ Clone the repository

2️⃣ Install dependencies

✅ Tested Hardware

📝 Pretrained Models

📊 Datasets

🗂️ Data Organization

🚀 Usage

Run compression on dataset

📄 Citation

📬 Contact

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Shaw-git/CAESAR

Folders and files

Latest commit

History

Repository files navigation

CAESAR

📖 Overview

📦 Installation

1️⃣ Clone the repository

2️⃣ Install dependencies

✅ Tested Hardware

📝 Pretrained Models

📊 Datasets

🗂️ Data Organization

🚀 Usage

Run compression on dataset

📄 Citation

📬 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages