Awesome Video Generation by QuenithAI

A curated collection of papers, models, and resources for the field of Video Generation.

Note

This repository is proudly maintained by the frontline research mentors at QuenithAI (应达学术). It aims to provide the most comprehensive and cutting-edge map of papers and technologies in the field of video generation.

Your contributions are also vital—feel free to open an issue or submit a pull request to become a collaborator of this repository. We expect your participation!

If you require expert 1-on-1 guidance on your submissions to top-tier conferences and journals, we invite you to contact us via WeChat or E-mail.

本仓库由 「应达学术」(QuenithAI) 的一线科研导师团队倾力打造并持续维护，旨在为您呈现视频生成领域最全面、最前沿的视频生成领域的论文。

您的贡献对我们和社区来说至关重要——我们诚邀有志之士通过 open an issue 或 submit a pull request 来成为这个项目的合作者之一，期待您的加入！

如果您在冲刺科研顶会的道路上需要专业的1V1指导，欢迎通过微信或邮件联系我们。

⚡ Latest Updates

(Sep 13th, 2025): Add a new direction: 🎯 Reinforcement Learning for Video Generation.
(Aug 21th, 2025): Add a new direction: 🗣️ Audio-Driven Video Generation.
(Aug 20th, 2025): Initial commit and repository structure established.

📜 Papers & Models

✍️ Survey Papers

🎥 Text-to-Video (T2V) Generation

✨ 2025

✅ Published Papers

[CVPR 2025] AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM
[CVPR 2025] Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way
[CVPR 2025] Retrieval-Augmented Prompt Optimization for Text-to-Video Generation
[CVPR 2025] Identity-Preserving Text-to-Video Generation by Frequency Decomposition
[CVPR 2025] Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation
[CVPR 2025] TransPixeler: Advancing Text-to-Video Generation with Transparency
[CVPR 2025] LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation
[CVPR 2025] Improving Text-to-Video Generation via Instance-aware Structured Caption
[CVPR 2025] Compositional Text-to-Video Generation with Blob Video Representations
[CVPR 2025] Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
[ICCV 2025] T2Bs: Text‑to‑Character Blendshapes via Video Generation
[ICCV 2025] Animate Your Word: Bringing Text to Life via Video Diffusion Prior
[NeurIPS 2025] Safe‑Sora: Safe Text‑to‑Video Generation via Graphical Watermarking
[ICCV 2025] Prompt‑A‑Video: Prompt Your Video Diffusion Model via Preference‑Aligned LLM
[ICCV 2025] MotionShot: Adaptive Motion Transfer across Arbitrary Objects for Text‑to‑Video Generation
[ICCV 2025] TITAN‑Guide: Taming Inference‑Time Alignment for Guided Text‑to‑Video Diffusion Models
[ICCV 2025] Video‑T1: Test‑Time Scaling for Video Generation
[ICCV 2025] AnimateYourMesh: Feed‑Forward 4D Foundation Model for Text‑Driven Mesh Animation
[ICLR 2025] OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-Video Generation
[ICLR 2025] CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
[ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling

💡 Pre-Print Papers

✨ 2024

✅ Published Papers

[CVPR 2024] Vlogger: Make Your Dream A Vlog
[CVPR 2024] Make Pixels Dance: High-Dynamic Video Generation
[CVPR 2024] VGen: Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
[CVPR 2024] GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
[CVPR 2024] SimDA: Simple Diffusion Adapter for Efficient Video Generation
[CVPR 2024] MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
[CVPR 2024] Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models
[CVPR 2024] PEEKABOO: Interactive Video Generation via Masked-Diffusion
[CVPR 2024] EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
[CVPR 2024] A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
[CVPR 2024] BIVDiff: A Training-free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
[CVPR 2024] Mind the Time: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
[CVPR 2024] MotionDirector: Motion Customization of Text-to-Video Diffusion Models
[CVPR 2024] Hierarchical Patch-wise Diffusion Models for High-Resolution Video Generation
[CVPR 2024] DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation
[CVPR 2024] Grid Diffusion Models for Text-to-Video Generation
[ECCV 2024] Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
[ECCV 2024] W.A.L.T.: Photorealistic Video Generation with Diffusion Models
[ECCV 2024] MoVideo: Motion-Aware Video Generation with Diffusion Models
[ECCV 2024] DrivingDiffusion: Layout-Guided Multi-View Driving Scenarios Video Generation with Latent Diffusion Model
[ECCV 2024] MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing
[ECCV 2024] HARIVO: Harnessing Text-to-Image Models for Video Generation
[ECCV 2024] MEVG: Multi-event Video Generation with Text-to-Video Models
[NeurIPS 2024] DEMO: Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning
[ICML 2024] Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
[ICLR 2024] VDT: General-purpose Video Diffusion Transformers via Mask Modeling
[ICLR 2024] VersVideo: Leveraging Enhanced Temporal Diffusion Models for Versatile Video Generation
[AAAI 2024] Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos
[AAAI 2024] E2HQV: High-Quality Video Generation from Event Camera via Theory-Inspired Model-Aided Deep Learning
[AAAI 2024] ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation
[AAAI 2024] F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text to-Video Synthesis

💡 Pre-Print Papers

✨ 2023

✅ Published Papers

[CVPR 2023] Align your Latents: High-resolution Video Synthesis with Latent Diffusion Models
[CVPR 2023] Text2Video-Zero: Text-to-image Diffusion Models are Zero-shot Video Generators
[CVPR 2023] Video Probabilistic Diffusion Models in Projected Latent Space
[ICCV 2023] PYOCO: Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models
[ICCV 2023] Gen-1: Structure and Content-guided Video Synthesis with Diffusion Models
[NeurIPS 2023] Video Diffusion Models
[NeurIPS 2023] UniPi: Learning Universal Policies via Text-Guided Video Generation
[NeurIPS 2023] VideoComposer: Compositional Video Synthesis with Motion Controllability
[ICLR 2023] CogVideo: Large-scale Pretraining for Text-to-video Generation via Transformers
[ICLR 2023] Make-A-Video: Text-to-video Generation without Text-video Data
[ICLR 2023] Phenaki: Variable Length Video Generation From Open Domain Textual Description

💡 Pre-Print Papers

⇧ Back to ToC

🖼️ Image-to-Video (I2V) Generation

✨ 2025

✅ Published Papers

[CVPR 2025] MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation
[CVPR 2025] MotionPro: A Precise Motion Controller for Image-to-Video Generation
[CVPR 2025] Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation
[CVPR 2025] Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think
[CVPR 2025] I2VGuard: Safeguarding Images against Misuse in Diffusion-based Image-to-Video Models
[CVPR 2025] LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
[ICCV 2025] AnyI2V: Animating Any Conditional Image with Motion Control
[ICCV 2025] Versatile Transition Generation with Image-to-Video Diffusion
[ICCV 2025] TIP‑I2V: A Million‑Scale Real Text and Image Prompt Dataset for Image‑to‑Video Generation
[ICCV 2025] Unified Video Generation via Next‑Set Prediction in Continuous Domain
[NeurIPS 2025] GenRec: Unifying Video Generation and Recognition with Diffusion Models
[ICCV 2025] Precise Action‑to‑Video Generation Through Visual Action Prompts
[ICCV 2025] STIV: Scalable Text and Image Conditioned Video Generation
[ICLR 2025] FrameBridge: Improving Image‑to‑Video Generation with Bridge Models
[ICLR 2025] SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation
[ICLR 2025] Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation
[ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling

💡 Pre-Print Papers

✨ 2024

✅ Published Papers

[CVPR 2024] Animate Anyone: Consistent and Controllable Image-to-video Synthesis for Character Animation
[CVPR 2024] Your Image Is My Video: Reshaping the Receptive Field via Image-to-Video Differentiable AutoAugmentation and Fusion
[CVPR 2024] TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models
[CVPR 2024] Enhanced Motion-Text Alignment for Image-to-Video Transfer Learning
[CVPR 2024] Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
[ECCV 2024] MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
[ECCV 2024] $\mathrm R2$-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
[ECCV 2024] PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation
[ECCV 2024] Rethinking Image-to-Video Adaptation: An Object-Centric Perspective
[NeurIPS 2024] TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation
[NeurIPS 2024] Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model
[ICML 2024] Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
[SIGGRAPH 2024] I2V-Adapter: A General Image-to-Video Adapter for Diffusion Models
[SIGGRAPH 2024] Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
[AAAI 2024] Continuous Piecewise-Affine Based Motion Model for Image Animation

💡 Pre-Print Papers

✨ 2023

✅ Published Papers

[ICCV 2023] DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion
[ICCV 2023] Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning

💡 Pre-Print Papers

⇧ Back to ToC

✂️ Video-to-Video (V2V) Editing

✨ 2025

✅ Published Papers

[CVPR 2025] VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors
[CVPR 2025] VideoDirector: Precise Video Editing via Text-to-Video Models
[CVPR 2025] VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing
[CVPR 2025] Align-A-Video: Deterministic Reward Tuning of Image Diffusion Models for Consistent Video Editing
[CVPR 2025] Unity in Diversity: Video Editing via Gradient-Latent Purification
[CVPR 2025] VEU-Bench: Towards Comprehensive Understanding of Video Editing
[CVPR 2025] SketchVideo: Sketch-based Video Generation and Editing
[CVPR 2025] FATE: Full-head Gaussian Avatar with Textural Editing from Monocular Video
[CVPR 2025] Visual Prompting for One-shot Controllable Video Editing without Inversion
[CVPR 2025] FADE: Frequency-Aware Diffusion Model Factorization for Video Editing
[ICCV 2025] VACE: All-in-One Video Creation and Editing
[ICCV 2025] Reangle-A-Video: 4D Video Generation as Video-to-Video Translation
[ICCV 2025] DIVE: Taming DINO for Subject-Driven Video Editing
[ICCV 2025] DynamicFace: High-Quality and Consistent Face Swapping for Image and Video using Composable 3D Facial Priors
[ICCV 2025] QK-Edit: Revisiting Attention-based Injection in MM-DiT for Image and Video Editing
[ICCV 2025] Teleportraits: Training-Free People Insertion into Any Scene
[ICLR 2025] VideoGrain: Modulating Space-Time Attention for Multi-Grained Video Editing
[AAAI 2025] FreeMask: Rethinking the Importance of Attention Masks for Zero-Shot Video Editing
[AAAI 2025] EditBoard: Towards a Comprehensive Evaluation Benchmark for Text-Based Video Editing Models
[AAAI 2025] VE-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment
[AAAI 2025] Re-Attentional Controllable Video Diffusion Editing
[WACV 2025] IP-FaceDiff: Identity-Preserving Facial Video Editing with Diffusion
[WACV 2025] SST-EM: Advanced Metrics for Evaluating Semantic, Spatial and Temporal Aspects in Video Editing
[WACV 2025] MagicStick: Controllable Video Editing via Control Handle Transformations
[WACV 2025] Ada-VE: Training-Free Consistent Video Editing Using Adaptive Motion Prior
[WACV 2025] FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing

💡 Pre-Print Papers

✨ 2024

✅ Published Papers

[CVPR 2024] A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing
[CVPR 2024] VidToMe: Video Token Merging for Zero-Shot Video Editing
[CVPR 2024] Video-P2P: Video Editing with Cross-Attention Control
[CVPR 2024] CCEdit: Creative and Controllable Video Editing via Diffusion Models
[CVPR 2024] RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models
[CVPR 2024] DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
[CVPR 2024] MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers
[CVPR 2024] MotionEditor: Editing Video Motion via Content-Aware Diffusion
[CVPR 2024] CAMEL: CAusal Motion Enhancement Tailored for Lifting Text-Driven Video Editing
[ICLR 2024] Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models
[ICLR 2024] Video Decomposition Prior: Editing Videos Layer by Layer
[ICLR 2024] FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
[ICLR 2024] TokenFlow: Consistent Diffusion Features for Consistent Video Editing
[ECCV 2024] VIDEOSHOP: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion
[ECCV 2024] DragVideo: Interactive Drag-Style Video Editing
[ECCV 2024] WAVE: Warping DDIM Inversion Features for Zero-Shot Text-to-Video Editing
[ECCV 2024] DreamMotion: Space-Time Self-similar Score Distillation for Zero-Shot Video Editing
[ECCV 2024] Object-Centric Diffusion for Efficient Video Editing
[ECCV 2024] Video Editing via Factorized Diffusion Distillation
[ECCV 2024] SAVE: Protagonist Diversification with Structure Agnostic Video Editing
[ECCV 2024] DNI: Dilutional Noise Initialization for Diffusion Video Editing
[ECCV 2024] MagDiff: Multi-alignment Diffusion for High-Fidelity Video Generation and Editing
[ECCV 2024] DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency

💡 Pre-Print Papers

⇧ Back to ToC

🕹️ Controllable Video Generation

✨ 2025

✅ Published Papers

[CVPR 2025] IM-Zero: Instance-level Motion Controllable Video Generation in a Zero-shot Manner
[CVPR 2025] AnimateAnything: Consistent and Controllable Animation for Video Generation
[CVPR 2025] Customized Condition Controllable Generation for Video Soundtrack
[CVPR 2025] StarGen: A Spatiotemporal Autoregression Framework with Video Diffusion Model for Scalable and Controllable Scene Generation
[ICCV 2025] Perception-as-Control: Fine-grained Controllable Image Animation with 3D-aware Motion Representation
[ICCV 2025] MagicMirror: ID-Preserved Video Generation in Video Diffusion Transformers
[ICCV 2025] MagicDrive-V2: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control
[ICCV 2025] InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models
[ICCV 2025] Free-Form Motion Control (SynFMC): Controlling the 6D Poses of Camera and Objects in Video Generation
[ICCV 2025] RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control
[ICCV 2025] MagicMotion: Video Generation with a Smart Director
[ICCV 2025] UniMLVG: Unified Framework for Multi-view Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving
[ICLR 2025] MotionClone: Training-Free Motion Cloning for Controllable Video Generation
[AAAI 2025] CAGE: Unsupervised Visual Composition and Animation for Controllable Video Generation
[AAAI 2025] TrackGo: A Flexible and Efficient Method for Controllable Video Generation
[WACV 2025] Fine-grained Controllable Video Generation via Object Appearance and Context

💡 Pre-Print Papers

✨ 2024

✅ Published Papers

[CVPR 2024] 360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model
[CVPR 2024] Panacea: Panoramic and Controllable Video Generation for Autonomous Driving
[AAAI 2024] Decouple Content and Motion for Conditional Image-to-Video Generation

💡 Pre-Print Papers

✨ 2023

✅ Published Papers

[CVPR 2023] Conditional Image-to-Video Generation with Latent Flow Diffusion Models

💡 Pre-Print Papers

Controllable Video Generation by Learning the Underlying Dynamical System with Neural ODE

⇧ Back to ToC

🗣️ Audio-Driven Video Generation

✨ 2025

✅ Published Papers

[CVPR 2025] KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation
[CVPR 2025] AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers
[CVPR 2025] MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation
[CVPR 2025] Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation
[CVPR 2025] INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations
[ICCV 2025] FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait
[ICCV 2025] GaussianSpeech: Audio-Driven Personalized 3D Gaussian Avatars
[ICCV 2025] ACTalker: Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation
[ICLR 2025] Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation
[ICLR 2025] Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency
[ICLR 2025] CyberHost: A One-stage Diffusion Framework for Audio-driven Talking Body Generation
[AAAI 2025] EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions
[AAAI 2025] PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis

💡 Pre-Print Papers

✨ 2024

✅ Published Papers

[CVPR 2024] FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models
[ECCV 2024] UniTalker: Scaling up Audio-Driven 3D Facial Animation Through A Unified Model
[ECCV 2024] Audio-Driven Talking Face Generation with Stabilized Synchronization Loss
[NeurIPS 2024] VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

💡 Pre-Print Papers

⇧ Back to ToC

💃 Human Image Animation

✨ 2025

✅ Published Papers

[CVPR 2025] X-Dyna: Expressive Dynamic Human Image Animation
[CVPR 2025] StableAnimator: High-Quality Identity-Preserving Human Image Animation
[CVPR 2025] Disco4D: Disentangled 4D Human Generation and Animation from a Single Image
[ICCV 2025] DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance
[ICCV 2025] Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance
[ICCV 2025] Multi-identity Human Image Animation with Structural Video Diffusion
[ICCV 2025] OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
[ICCV 2025] AdaHuman: Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion
[ICCV 2025] Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation
[ICLR 2025] Animate-X: Universal Character Image Animation with Enhanced Motion Representation

💡 Pre-Print Papers

✨ 2024

✅ Published Papers

[CVPR 2024] MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion
[CVPR 2024] MotionEditor: Editing Video Motion via Content-Aware Diffusion
[CVPR 2024] MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
[ECCV 2024] Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
[NeurIPS 2024] HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation
[NeurIPS 2024] TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation
[ICLR 2024] DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

💡 Pre-Print Papers

⇧ Back to ToC

⚡ Fast Video Generation (Acceleration)

✨ 2025

✅ Published Papers

[CVPR 2025] Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
[CVPR 2025] CausVid: From Slow Bidirectional to Fast Autoregressive VDMs
[CVPR 2025] BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers
[ICCV 2025] AdaCache: Adaptive Caching for Faster Video Generation with Diffusion Transformers
[ICCV 2025] TaylorSeer: From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers
[ICCV 2025] Accelerating Diffusion Transformer via Gradient-Optimized Cache
[ICCV 2025] V.I.P.: Iterative Online Preference Distillation for Efficient Video Diffusion Models
[ICCV 2025] DMDX: Adversarial Distribution Matching for Diffusion Distillation Towards Efficient Image and Video Synthesis
[ICCV 2025] OmniCache: A Trajectory-Oriented Global Perspective on Training-Free Cache Reuse for DiT
[ICLR 2025] FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
[ICML 2025] Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity
[ICML 2025] Fast Video Generation with Sliding Tile Attention
[ICML 2025] Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing
[ICML 2025] AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration

💡 Pre-Print Papers

✨ 2024

✅ Published Papers

[CVPR 2024] Cache Me if You Can: Accelerating Diffusion Models through Block Caching
[NeurIPS 2024] Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation
[NeurIPS 2024] Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy
[NeurIPS 2024] Fast and Memory-Efficient Video Diffusion Using Streamlined Inference
[IJCAI 2024] FasterVD: On Acceleration of Video Diffusion Models

💡 Pre-Print Papers

⇧ Back to ToC

🎯 Reinforcement Learning for Video Generation

✨ 2025

✅ Published Papers

[ICCV 2025] LongAnimation: Long Animation Generation with Dynamic Global‑Local Memory
[ICCV 2025] TesserAct: Learning 4D Embodied World Models
[ICLR 2025] DartControl: A Diffusion‑Based Autoregressive Motion Model for Real‑Time Text‑Driven Motion Control
[ICLR 2025] FLIP: Flow‑Centric Generative Planning as General‑Purpose Manipulation World Model
[CVPR 2025] VideoDPO: Omni‑Preference Alignment for Video Diffusion Generation

💡 Pre-Print Papers

⇧ Back to ToC

🗂️ Datasets

Dataset Name	Year	Modalities	Task
UCF101	2012	Video	Unconditional Video Generation
TaiChi-HD	2019	Video	Unconditional Video Generation
SkyTimeLapse	2020	Video	Unconditional Video Generation
WebVid-10M	2021	Text, Video	Text-to-Video Generation
HD-VG-130M	2023	Text, Video	Text-to-Video Generation
FETV	2023	Text, Video	Text-to-Video Generation
InternVid	2024	Text, Video	Text-to-Video Generation
VidProM	2024	Text, Video	Text-to-Video Generation
Panda-70M	2024	Text, Video	Text-to-Video Generation
SafeSora	2024	Text, Video	Text-to-Video Generation
ChronoMagic-Pro	2024	Text, Video	Text-to-Video Generation
T2V-CompBench	2024	Text, Video	Text-to-Video Generation
VidGen-1M	2024	Text, Video	Text-to-Video Generation
PhyGenBench	2024	Text, Video	Text-to-Video Generation
DH-FaceVid-1K	2024	Text, Video	Text-to-Video Generation
StoryEval	2024	Text, Video	Text-to-Video Generation
HOIGen-1M	2025	Text, Video	Text-to-Video Generation
OpenVid-1M	2025	Text, Video	Text-to-Video Generation
HumanVid	2024	Image, Video	Image-to-Video Generation
TIP-I2V	2024	Text, Image, Video	Image-to-Video Generation
TC-Bench	2024	Text, Image, Video	Text-to-Video, Image-to-Video
AnimeShooter	2025	Text, Image, Video	Text-to-Video, Image-to-Video
VE-Bench	2024	Text, Video	Video Editing
DAVIS-Edit	2024	Text, Video, Image	Video Editing
DAVIS	2017	Video, Image	Video Editing
VIVID-10M	2024	Text, Video	Video Editing
Señorita-2M	2025	Text, Video	Video Editing
FiVE-Bench	2025	Text, Video	Video Editing
InsViE-1M	2025	Text, Video	Video Editing
VEU-Bench	2025	Text, Video	Video Editing
OpenS2V-5M	2025	Text, Video, Audio	Text-to-Video, Image-to-Video, Subject-to-Video
SpeakerVid-5M	2025	Text, Video, Audio	Audio-Driven Video Generation
AIGVQA‑DB	2025	Text, Video, Ratings	Text‑to‑Video Generation
EvalCrafter	2024	Text, Video	Text‑to‑Video Generation
EditBoard	2025	Video, Instruction, Edited Video	Video‑to‑Video Editing
VE‑Bench DB	2025	Video, Text, Edited Video	Video‑to‑Video Editing
SAVGBench	2024	Video, Audio, Spatial-Temporal Event	Audio‑Driven Video Generation
Morpheus	2025	Video	Reinforcement Learning for Video Generation
MJ‑Video	2025	Video, Text, Rating	Text‑to‑Video Generation

⇧ Back to ToC

🎓 About Us

QuenithAI is a professional organization composed of top researchers, dedicated to providing high-quality 1-on-1 research mentoring for university students worldwide. Our mission is to help students bridge the gap from theoretical knowledge to cutting-edge research and publish their work in top-tier conferences and journals.

Maintaining this Awesome Video Generation list requires significant effort, just as completing a high-quality paper requires focused dedication and expert guidance. If you're looking for one-on-one support from top scholars on your own research project, to quickly identify innovative ideas and make publications, we invite you to contact us ASAP.

➡️ Contact us via WeChat or E-mail to start your research journey.

「应达学术」(QuenithAI) 是一家由顶尖研究者组成，致力于为全球高校学生提供高质量1V1科研辅导的专业机构。我们的使命是帮助学生培养出色卓越的科研技能，在顶级会议和期刊上发表自己的成果。

维护一个GitHub调研仓库需要巨大的精力，正如完成一篇高质量的论文一样，离不开专注的投入和专业的指导。如果您希望在自己的研究项目中，获得来自顶尖学者的一对一支持，我们诚邀您与我们取得联系。

➡️ 欢迎通过微信或邮件联系我们，开启您的科研之旅。

⇧ Back to ToC

🤝 Contributing

Contributions are welcome! Please see our Contribution Guidelines for details on how to add new papers, correct information, or improve the repository.

💬 Join the Community

Join our community to stay up-to-date with the latest advancements, share your work, and collaborate with other researchers and developers in the field of video generation.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
assets		assets
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTING_CN.md		CONTRIBUTING_CN.md
README.md		README.md

QuenithAI/Video-Generation-Paper-List

Folders and files

Latest commit

History

Repository files navigation

Awesome Video Generation by QuenithAI

📚 Table of Contents

📜 Papers & Models

✍️ Survey Papers

🎥 Text-to-Video (T2V) Generation

✨ 2025

✅ Published Papers

💡 Pre-Print Papers

✨ 2024

✅ Published Papers

💡 Pre-Print Papers

✨ 2023

✅ Published Papers

💡 Pre-Print Papers

🖼️ Image-to-Video (I2V) Generation

✨ 2025

✅ Published Papers

💡 Pre-Print Papers

✨ 2024

✅ Published Papers

💡 Pre-Print Papers

✨ 2023

✅ Published Papers

💡 Pre-Print Papers

✂️ Video-to-Video (V2V) Editing

✨ 2025

✅ Published Papers

💡 Pre-Print Papers

✨ 2024

✅ Published Papers

💡 Pre-Print Papers

🕹️ Controllable Video Generation

✨ 2025

✅ Published Papers

💡 Pre-Print Papers

✨ 2024

✅ Published Papers

💡 Pre-Print Papers

✨ 2023

✅ Published Papers

💡 Pre-Print Papers

🗣️ Audio-Driven Video Generation

✨ 2025

✅ Published Papers

💡 Pre-Print Papers

✨ 2024

✅ Published Papers

💡 Pre-Print Papers

💃 Human Image Animation

✨ 2025

✅ Published Papers

💡 Pre-Print Papers

✨ 2024

✅ Published Papers

💡 Pre-Print Papers

⚡ Fast Video Generation (Acceleration)

✨ 2025

✅ Published Papers

💡 Pre-Print Papers

✨ 2024

✅ Published Papers

💡 Pre-Print Papers

🎯 Reinforcement Learning for Video Generation

✨ 2025

✅ Published Papers

💡 Pre-Print Papers

🗂️ Datasets

🎓 About Us

🤝 Contributing

💬 Join the Community

About

Topics

Resources

Contributing

Uh oh!

Packages