Skip to content

Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning.

License

Notifications You must be signed in to change notification settings

HorizonWind2004/reconstruction-alignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RecA Logo RecA: Reconstruction Alignment Improves Unified Multimodal Models

🚀 Just 6 × 80GB A100s × 4.5 hours to boost BAGEL performance across all tasks! Our BAGEL outperforms FLUX-Kontext in image editing capabilities!

Paper alphaXiv Hugging Face Collection HF Demo Project Page

Ji Xie1, Trevor Darrell1, Luke Zettlemoyer2, XuDong Wang1*
UC Berkeley1; University of Washington2

🔥 News

  • 2025.9.15: 🔥 Add NF4, INT8, DF11 version of BAGEL-RecA! Thank to @theunlikely!
  • 2025.9.14: 🔥 Add ComfyUI guide! Try BAGEL-RecA in ComfyUI!
  • 2025.9.11: Harmon training code is released!
  • 2025.9.10: BAGEL training code is released! Harmon training code will be released soon.
  • 2025.9.9: Our finetuned weights and arXiv paper are available! We expect to release the training code tomorrow.

📑 Table of Contents

🔧 Quick Start!

  1. Online Demo: Try out our enhanced BAGEL-RecA demo on Hugging Face Spaces!

alt text

  1. ComfyUI: see ComfyUI-BAGEL. The usage is totally the same as the original ComfyUI-BAGEL but you should replace the BAGEL weight models/bagel/BAGEL-7B-MoT/ema.safetensors with RecA-tuned one. The ComfyUI-BAGEL repo already supports the NF4 and INT8 conversion of BAGEL.
wget https://huggingface.co/sanaka87/BAGEL-RecA/blob/main/model_bf16.safetensors
mv model_bf16.safetensors models/bagel/BAGEL-7B-MoT/ema.safetensors

You can also download weight of NF4 and INT8 version of BAGEL in BAGEL-RecA.

DF11 version BAGEL-RecA (heartfelt thank to @theunlikely !!!).

  1. Local Setup: Follow the instructions in the BAGEL Installation Guide to set up the environment, and run BAGEL/inference.ipynb to test the model locally!

  2. Full Training & Evaluation: For detailed instructions on installation, training, and evaluation, please refer to the respective repository READMEs:

🏆 Model Zoo

A collection of RecA models on Hugging Face with benchmark performance:

Model Name Parameters GenEval DPGBench ImgEdit GEdit
BAGEL-RecA (support INT8, NF4) 14B 82.4 (+3.6) 85.29 (+1.26) 3.75 (+0.37) 7.27 (+0.33)
Harmon-0.5B-RecA 0.5B 78.7 (+11.1) 84.67 (+4.55) - -
Harmon-1.5B-RecA 1.5B 85.7 (+12.8) 87.21 (+6.28) - -
Show-o-RecA 1.3B 61.9 (+5.3) 75.70 (+5.05) - -
Show-o-512x512-RecA 1.3B 72.3 (+6.1) 84.94 (+2.73) - -
Harmon-1.5B-RecA-plus 1.5B 90.0 88.15 - -
OpenUni-RecA 3.6B 74.1 (+12.2) 82.75 (+3.73) - -

🍭 Results

Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning.

RecA achieves state-of-the-art performance on generation benchmarks with remarkable efficiency. Despite using only 1.5B parameters, RecA surpasses models with 7B-24B parameters, achieving GenEval 0.86 and DPGBench 87.21 without GPT-4o distillation data or reinforcement learning. RecA also improves BAGEL's editing performance significantly across all categories. Further two-stage fine-tuning with GPT-4o-Image distillation data enhances the score to 0.90 and 88.15 respectively.

We've tested RecA on various base architectures, including Show-o, OpenUni, Harmon, and BAGEL, consistently observing significant performance improvements across all models and benchmarks.

🎨 Edit Comparison

Our method demonstrates superior image editing capabilities compared to state-of-the-art models including ICEdit, FLUX-Kontext, and GPT-4o:

Edit Comparison

🚧 TODO

  • Release our model weights on Hugging Face.
  • Release BAGEL training code.
  • Release Harmon training code.
  • Add ComfyUI guide.
  • Release Show-o and OpenUni training code.
  • Further scale-up BAGEL training.
  • Add support for new UMM architectures like Show-o2.

License

The majority of RecA is licensed under the Apache License, however portions of the project are available under their own license terms: BAGEL and Show-o are licensed under Apache, Harmon and OpenUni are licensed under S-Lab license; If you later add other third party code, please keep this license info updated, and please let us know if that component is licensed under something other than Apache, CC-BY-NC, MIT, or CC0.

📮 Contact

For feedback, or collaboration opportunities, feel free to reach out!

If you have any general questions, feel free to email us at sanaka@berkeley.edu and xdwang@eecs.berkeley.edu. If you have code or implementation-related questions, please feel free to send emails to us or open an issue in this codebase (We recommend that you open an issue in this codebase, because your questions may help others).

📄 Citation

If you find our work inspiring or use our codebase in your research, please consider giving a star ⭐ and a citation.

@article{xie2025reconstruction,
  title={Reconstruction Alignment Improves Unified Multimodal Models},
  author={Xie, Ji and Darrell, Trevor and Zettlemoyer, Luke and Wang, XuDong},
  journal={arXiv preprint arXiv:2509.07295},
  year={2025}
}

If you find this project helpful, please consider giving it a star!

Star History Chart

About

Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published