Skip to content

QuenithAI/Video-Generation-Paper-List

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Video Generation by QuenithAI

A curated collection of papers, models, and resources for the field of Video Generation.

Awesome   PRs Welcome   Issues Welcome

Note

This repository is proudly maintained by the frontline research mentors at QuenithAI (应达学术). It aims to provide the most comprehensive and cutting-edge map of papers and technologies in the field of video generation.

Your contributions are also vital—feel free to open an issue or submit a pull request to become a collaborator of this repository. We expect your participation!

If you require expert 1-on-1 guidance on your submissions to top-tier conferences and journals, we invite you to contact us via WeChat or E-mail.


本仓库由 「应达学术」(QuenithAI) 的一线科研导师团队倾力打造并持续维护,旨在为您呈现视频生成领域最全面、最前沿的视频生成领域的论文。

您的贡献对我们和社区来说至关重要——我们诚邀有志之士通过 open an issuesubmit a pull request 来成为这个项目的合作者之一,期待您的加入!

如果您在冲刺科研顶会的道路上需要专业的1V1指导,欢迎通过微信邮件联系我们

⚡ Latest Updates

📚 Table of Contents


📜 Papers & Models

✍️ Survey Papers

🎥 Text-to-Video (T2V) Generation

✨ 2025

✅ Published Papers

  • [CVPR 2025] AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM
    ArXiv GitHub

  • [CVPR 2025] Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way
    ArXiv GitHub

  • [CVPR 2025] Retrieval-Augmented Prompt Optimization for Text-to-Video Generation
    ArXiv Project Page GitHub

  • [CVPR 2025] Identity-Preserving Text-to-Video Generation by Frequency Decomposition
    ArXiv Project Page GitHub

  • [CVPR 2025] Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation
    ArXiv Project Page GitHub

  • [CVPR 2025] TransPixeler: Advancing Text-to-Video Generation with Transparency
    ArXiv Project Page GitHub

  • [CVPR 2025] LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation
    ArXiv GitHub

  • [CVPR 2025] Improving Text-to-Video Generation via Instance-aware Structured Caption
    ArXiv GitHub

  • [CVPR 2025] Compositional Text-to-Video Generation with Blob Video Representations
    ArXiv Project Page

  • [CVPR 2025] Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
    ArXiv Project Page

  • [ICCV 2025] T2Bs: Text‑to‑Character Blendshapes via Video Generation
    Paper Project Page GitHub Hugging Face

  • [ICCV 2025] Animate Your Word: Bringing Text to Life via Video Diffusion Prior
    Paper Project Page GitHub Hugging Face

  • [NeurIPS 2025] Safe‑Sora: Safe Text‑to‑Video Generation via Graphical Watermarking
    Paper Project Page GitHub Hugging Face

  • [ICCV 2025] Prompt‑A‑Video: Prompt Your Video Diffusion Model via Preference‑Aligned LLM
    Paper Project Page GitHub Hugging Face

  • [ICCV 2025] MotionShot: Adaptive Motion Transfer across Arbitrary Objects for Text‑to‑Video Generation
    Paper Project Page GitHub Hugging Face

  • [ICCV 2025] TITAN‑Guide: Taming Inference‑Time Alignment for Guided Text‑to‑Video Diffusion Models
    Paper Project Page GitHub Hugging Face

  • [ICCV 2025] Video‑T1: Test‑Time Scaling for Video Generation
    Paper Project Page GitHub Hugging Face

  • [ICCV 2025] AnimateYourMesh: Feed‑Forward 4D Foundation Model for Text‑Driven Mesh Animation
    Paper Project Page GitHub Hugging Face

  • [ICLR 2025] OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-Video Generation
    Paper Project Page GitHub Hugging Face

  • [ICLR 2025] CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
    Paper

  • [ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling
    Paper Project Page GitHub Hugging Face

💡 Pre-Print Papers

✨ 2024

✅ Published Papers

  • [CVPR 2024] Vlogger: Make Your Dream A Vlog
    ArXiv GitHub

  • [CVPR 2024] Make Pixels Dance: High-Dynamic Video Generation
    ArXiv Project Page Demo

  • [CVPR 2024] VGen: Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
    ArXiv Project Page GitHub

  • [CVPR 2024] GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
    ArXiv Project Page

  • [CVPR 2024] SimDA: Simple Diffusion Adapter for Efficient Video Generation
    ArXiv Project Page GitHub

  • [CVPR 2024] MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
    ArXiv Project Page Demo

  • [CVPR 2024] Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models
    ArXiv Project Page

  • [CVPR 2024] PEEKABOO: Interactive Video Generation via Masked-Diffusion
    ArXiv Project Page GitHub Demo

  • [CVPR 2024] EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
    ArXiv Project Page GitHub

  • [CVPR 2024] A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
    ArXiv Project Page GitHub

  • [CVPR 2024] BIVDiff: A Training-free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
    ArXiv Project Page

  • [CVPR 2024] Mind the Time: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
    ArXiv Project Page

  • [CVPR 2024] MotionDirector: Motion Customization of Text-to-Video Diffusion Models
    ArXiv GitHub

  • [CVPR 2024] Hierarchical Patch-wise Diffusion Models for High-Resolution Video Generation
    Paper Project Page

  • [CVPR 2024] DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation
    Paper GitHub

  • [CVPR 2024] Grid Diffusion Models for Text-to-Video Generation
    ArXiv GitHub Demo

  • [ECCV 2024] Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
    ArXiv Project Page

  • [ECCV 2024] W.A.L.T.: Photorealistic Video Generation with Diffusion Models
    Paper Project Page

  • [ECCV 2024] MoVideo: Motion-Aware Video Generation with Diffusion Models
    Paper

  • [ECCV 2024] DrivingDiffusion: Layout-Guided Multi-View Driving Scenarios Video Generation with Latent Diffusion Model
    Paper Project Page GitHub

  • [ECCV 2024] MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing
    Paper

  • [ECCV 2024] HARIVO: Harnessing Text-to-Image Models for Video Generation
    Paper Project Page

  • [ECCV 2024] MEVG: Multi-event Video Generation with Text-to-Video Models
    Paper Project Page

  • [NeurIPS 2024] DEMO: Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning
    Paper GitHub

  • [ICML 2024] Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
    Paper Project Page GitHub Hugging Face

  • [ICLR 2024] VDT: General-purpose Video Diffusion Transformers via Mask Modeling
    ArXiv Project Page GitHub

  • [ICLR 2024] VersVideo: Leveraging Enhanced Temporal Diffusion Models for Versatile Video Generation
    Paper

  • [AAAI 2024] Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos
    ArXiv Project Page GitHub

  • [AAAI 2024] E2HQV: High-Quality Video Generation from Event Camera via Theory-Inspired Model-Aided Deep Learning
    ArXiv

  • [AAAI 2024] ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation
    ArXiv Project Page GitHub

  • [AAAI 2024] F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text to-Video Synthesis
    ArXiv

💡 Pre-Print Papers

✨ 2023

✅ Published Papers

  • [CVPR 2023] Align your Latents: High-resolution Video Synthesis with Latent Diffusion Models
    ArXiv Project Page GitHub

  • [CVPR 2023] Text2Video-Zero: Text-to-image Diffusion Models are Zero-shot Video Generators
    Paper Project Page GitHub Demo

  • [CVPR 2023] Video Probabilistic Diffusion Models in Projected Latent Space
    Paper GitHub

  • [ICCV 2023] PYOCO: Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models
    Paper Project Page

  • [ICCV 2023] Gen-1: Structure and Content-guided Video Synthesis with Diffusion Models
    Paper Project Page

  • [NeurIPS 2023] Video Diffusion Models
    ArXiv Project Page

  • [NeurIPS 2023] UniPi: Learning Universal Policies via Text-Guided Video Generation
    Paper Project Page GitHub

  • [NeurIPS 2023] VideoComposer: Compositional Video Synthesis with Motion Controllability
    ArXiv Project Page GitHub

  • [ICLR 2023] CogVideo: Large-scale Pretraining for Text-to-video Generation via Transformers
    Paper GitHub Demo

  • [ICLR 2023] Make-A-Video: Text-to-video Generation without Text-video Data
    ArXiv Project Page GitHub

  • [ICLR 2023] Phenaki: Variable Length Video Generation From Open Domain Textual Description
    Paper GitHub

💡 Pre-Print Papers

⇧ Back to ToC

🖼️ Image-to-Video (I2V) Generation

✨ 2025

✅ Published Papers

  • [CVPR 2025] MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation
    ArXiv

  • [CVPR 2025] MotionPro: A Precise Motion Controller for Image-to-Video Generation
    ArXiv GitHub

  • [CVPR 2025] Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation
    ArXiv

  • [CVPR 2025] Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think
    ArXiv

  • [CVPR 2025] I2VGuard: Safeguarding Images against Misuse in Diffusion-based Image-to-Video Models
    Paper

  • [CVPR 2025] LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
    Paper GitHub

  • [ICCV 2025] AnyI2V: Animating Any Conditional Image with Motion Control
    Paper GitHub

  • [ICCV 2025] Versatile Transition Generation with Image-to-Video Diffusion
    Paper

  • [ICCV 2025] TIP‑I2V: A Million‑Scale Real Text and Image Prompt Dataset for Image‑to‑Video Generation
    Paper Project Page GitHub Hugging Face

  • [ICCV 2025] Unified Video Generation via Next‑Set Prediction in Continuous Domain
    Paper

  • [NeurIPS 2025] GenRec: Unifying Video Generation and Recognition with Diffusion Models
    Paper GitHub Hugging Face

  • [ICCV 2025] Precise Action‑to‑Video Generation Through Visual Action Prompts
    Paper Project Page

  • [ICCV 2025] STIV: Scalable Text and Image Conditioned Video Generation
    Paper Project Page Hugging Face

  • [ICLR 2025] FrameBridge: Improving Image‑to‑Video Generation with Bridge Models
    Paper GitHub Project Page

  • [ICLR 2025] SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation
    Paper Project Page GitHub

  • [ICLR 2025] Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation
    Paper

  • [ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling
    Paper Project Page GitHub Hugging Face

💡 Pre-Print Papers

✨ 2024

✅ Published Papers

  • [CVPR 2024] Animate Anyone: Consistent and Controllable Image-to-video Synthesis for Character Animation
    ArXiv Project Page GitHub

  • [CVPR 2024] Your Image Is My Video: Reshaping the Receptive Field via Image-to-Video Differentiable AutoAugmentation and Fusion
    Paper

  • [CVPR 2024] TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models
    Paper

  • [CVPR 2024] Enhanced Motion-Text Alignment for Image-to-Video Transfer Learning
    Paper

  • [CVPR 2024] Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
    Paper GitHub

  • [ECCV 2024] MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
    Paper GitHub

  • [ECCV 2024] $\mathrm R2$-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
    Paper GitHub

  • [ECCV 2024] PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation
    Paper GitHub

  • [ECCV 2024] Rethinking Image-to-Video Adaptation: An Object-Centric Perspective
    Paper

  • [NeurIPS 2024] TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation
    Paper

  • [NeurIPS 2024] Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model
    Paper GitHub

  • [ICML 2024] Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
    Paper Project Page GitHub Hugging Face

  • [SIGGRAPH 2024] I2V-Adapter: A General Image-to-Video Adapter for Diffusion Models
    Paper GitHub

  • [SIGGRAPH 2024] Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
    Paper GitHub

  • [AAAI 2024] Continuous Piecewise-Affine Based Motion Model for Image Animation
    Paper GitHub

💡 Pre-Print Papers

✨ 2023

✅ Published Papers

  • [ICCV 2023] DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion
    Paper GitHub

  • [ICCV 2023] Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
    Paper GitHub

💡 Pre-Print Papers

⇧ Back to ToC

✂️ Video-to-Video (V2V) Editing

✨ 2025

✅ Published Papers

  • [CVPR 2025] VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors
    Paper

  • [CVPR 2025] VideoDirector: Precise Video Editing via Text-to-Video Models
    Paper GitHub

  • [CVPR 2025] VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing
    Paper

  • [CVPR 2025] Align-A-Video: Deterministic Reward Tuning of Image Diffusion Models for Consistent Video Editing
    Paper

  • [CVPR 2025] Unity in Diversity: Video Editing via Gradient-Latent Purification
    Paper

  • [CVPR 2025] VEU-Bench: Towards Comprehensive Understanding of Video Editing
    Paper GitHub

  • [CVPR 2025] SketchVideo: Sketch-based Video Generation and Editing
    Paper GitHub

  • [CVPR 2025] FATE: Full-head Gaussian Avatar with Textural Editing from Monocular Video
    Paper GitHub

  • [CVPR 2025] Visual Prompting for One-shot Controllable Video Editing without Inversion
    Paper

  • [CVPR 2025] FADE: Frequency-Aware Diffusion Model Factorization for Video Editing
    Paper GitHub

  • [ICCV 2025] VACE: All-in-One Video Creation and Editing
    Paper Project Page GitHub

  • [ICCV 2025] Reangle-A-Video: 4D Video Generation as Video-to-Video Translation
    Paper Project Page

  • [ICCV 2025] DIVE: Taming DINO for Subject-Driven Video Editing
    Paper Project Page

  • [ICCV 2025] DynamicFace: High-Quality and Consistent Face Swapping for Image and Video using Composable 3D Facial Priors
    Paper Project Page

  • [ICCV 2025] QK-Edit: Revisiting Attention-based Injection in MM-DiT for Image and Video Editing
    Paper

  • [ICCV 2025] Teleportraits: Training-Free People Insertion into Any Scene
    Paper

  • [ICLR 2025] VideoGrain: Modulating Space-Time Attention for Multi-Grained Video Editing
    Paper GitHub

  • [AAAI 2025] FreeMask: Rethinking the Importance of Attention Masks for Zero-Shot Video Editing
    Paper GitHub

  • [AAAI 2025] EditBoard: Towards a Comprehensive Evaluation Benchmark for Text-Based Video Editing Models
    Paper GitHub

  • [AAAI 2025] VE-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment
    Paper GitHub

  • [AAAI 2025] Re-Attentional Controllable Video Diffusion Editing
    Paper GitHub

  • [WACV 2025] IP-FaceDiff: Identity-Preserving Facial Video Editing with Diffusion
    Paper

  • [WACV 2025] SST-EM: Advanced Metrics for Evaluating Semantic, Spatial and Temporal Aspects in Video Editing
    Paper GitHub

  • [WACV 2025] MagicStick: Controllable Video Editing via Control Handle Transformations
    Paper

  • [WACV 2025] Ada-VE: Training-Free Consistent Video Editing Using Adaptive Motion Prior
    Paper

  • [WACV 2025] FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing
    Paper GitHub

💡 Pre-Print Papers

✨ 2024

✅ Published Papers

  • [CVPR 2024] A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing
    Paper GitHub

  • [CVPR 2024] VidToMe: Video Token Merging for Zero-Shot Video Editing
    Paper GitHub

  • [CVPR 2024] Video-P2P: Video Editing with Cross-Attention Control
    Paper GitHub

  • [CVPR 2024] CCEdit: Creative and Controllable Video Editing via Diffusion Models
    Paper

  • [CVPR 2024] RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models
    Paper GitHub

  • [CVPR 2024] DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
    Paper

  • [CVPR 2024] MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers
    Paper

  • [CVPR 2024] MotionEditor: Editing Video Motion via Content-Aware Diffusion
    Paper GitHub

  • [CVPR 2024] CAMEL: CAusal Motion Enhancement Tailored for Lifting Text-Driven Video Editing
    Paper GitHub

  • [ICLR 2024] Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models
    Paper GitHub

  • [ICLR 2024] Video Decomposition Prior: Editing Videos Layer by Layer
    Paper

  • [ICLR 2024] FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
    Paper

  • [ICLR 2024] TokenFlow: Consistent Diffusion Features for Consistent Video Editing
    Paper GitHub

  • [ECCV 2024] VIDEOSHOP: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion
    Paper GitHub

  • [ECCV 2024] DragVideo: Interactive Drag-Style Video Editing
    Paper GitHub

  • [ECCV 2024] WAVE: Warping DDIM Inversion Features for Zero-Shot Text-to-Video Editing
    Paper

  • [ECCV 2024] DreamMotion: Space-Time Self-similar Score Distillation for Zero-Shot Video Editing
    Paper

  • [ECCV 2024] Object-Centric Diffusion for Efficient Video Editing
    Paper GitHub

  • [ECCV 2024] Video Editing via Factorized Diffusion Distillation
    Paper

  • [ECCV 2024] SAVE: Protagonist Diversification with Structure Agnostic Video Editing
    Paper

  • [ECCV 2024] DNI: Dilutional Noise Initialization for Diffusion Video Editing
    Paper

  • [ECCV 2024] MagDiff: Multi-alignment Diffusion for High-Fidelity Video Generation and Editing
    Paper

  • [ECCV 2024] DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency
    Paper

💡 Pre-Print Papers

⇧ Back to ToC

🕹️ Controllable Video Generation

✨ 2025

✅ Published Papers

  • [CVPR 2025] IM-Zero: Instance-level Motion Controllable Video Generation in a Zero-shot Manner
    Paper

  • [CVPR 2025] AnimateAnything: Consistent and Controllable Animation for Video Generation
    Paper

  • [CVPR 2025] Customized Condition Controllable Generation for Video Soundtrack
    Paper GitHub

  • [CVPR 2025] StarGen: A Spatiotemporal Autoregression Framework with Video Diffusion Model for Scalable and Controllable Scene Generation
    Paper

  • [ICCV 2025] Perception-as-Control: Fine-grained Controllable Image Animation with 3D-aware Motion Representation
    Paper Project Page GitHub

  • [ICCV 2025] MagicMirror: ID-Preserved Video Generation in Video Diffusion Transformers
    Paper GitHub

  • [ICCV 2025] MagicDrive-V2: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control
    Paper Project Page GitHub

  • [ICCV 2025] InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models
    Paper Project Page GitHub

  • [ICCV 2025] Free-Form Motion Control (SynFMC): Controlling the 6D Poses of Camera and Objects in Video Generation
    Paper Project Page GitHub

  • [ICCV 2025] RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control
    Paper Project Page GitHub

  • [ICCV 2025] MagicMotion: Video Generation with a Smart Director
    Paper Project Page GitHub

  • [ICCV 2025] UniMLVG: Unified Framework for Multi-view Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving
    Paper Project Page GitHub

  • [ICLR 2025] MotionClone: Training-Free Motion Cloning for Controllable Video Generation
    Paper GitHub

  • [AAAI 2025] CAGE: Unsupervised Visual Composition and Animation for Controllable Video Generation
    Paper GitHub

  • [AAAI 2025] TrackGo: A Flexible and Efficient Method for Controllable Video Generation
    Paper

  • [WACV 2025] Fine-grained Controllable Video Generation via Object Appearance and Context
    Paper

💡 Pre-Print Papers

✨ 2024

✅ Published Papers

  • [CVPR 2024] 360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model
    Paper GitHub

  • [CVPR 2024] Panacea: Panoramic and Controllable Video Generation for Autonomous Driving
    Paper GitHub

  • [AAAI 2024] Decouple Content and Motion for Conditional Image-to-Video Generation
    Paper

💡 Pre-Print Papers

✨ 2023

✅ Published Papers

  • [CVPR 2023] Conditional Image-to-Video Generation with Latent Flow Diffusion Models
    Paper GitHub

💡 Pre-Print Papers

⇧ Back to ToC

🗣️ Audio-Driven Video Generation

✨ 2025

✅ Published Papers

  • [CVPR 2025] KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation
    ArXiv Project Page GitHub

  • [CVPR 2025] AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers
    ArXiv Project Page

  • [CVPR 2025] MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation
    ArXiv

  • [CVPR 2025] Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation
    ArXiv Project Page

  • [CVPR 2025] INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations
    ArXiv Project Page

  • [ICCV 2025] FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait
    Paper Project Page GitHub

  • [ICCV 2025] GaussianSpeech: Audio-Driven Personalized 3D Gaussian Avatars
    Paper Project Page GitHub

  • [ICCV 2025] ACTalker: Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation
    Paper Project Page GitHub

  • [ICLR 2025] Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation
    ArXiv Project Page GitHub

  • [ICLR 2025] Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency
    ArXiv Project Page

  • [ICLR 2025] CyberHost: A One-stage Diffusion Framework for Audio-driven Talking Body Generation
    ArXiv Project Page

  • [AAAI 2025] EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions
    ArXiv Project Page GitHub Hugging Face

  • [AAAI 2025] PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis
    ArXiv

💡 Pre-Print Papers

✨ 2024

✅ Published Papers

  • [CVPR 2024] FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models
    ArXiv Project Page GitHub

  • [ECCV 2024] UniTalker: Scaling up Audio-Driven 3D Facial Animation Through A Unified Model
    ArXiv Project Page GitHub

  • [ECCV 2024] Audio-Driven Talking Face Generation with Stabilized Synchronization Loss
    ArXiv

  • [NeurIPS 2024] VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
    ArXiv Project Page

💡 Pre-Print Papers

⇧ Back to ToC

💃 Human Image Animation

✨ 2025

✅ Published Papers

  • [CVPR 2025] X-Dyna: Expressive Dynamic Human Image Animation
    ArXiv GitHub

  • [CVPR 2025] StableAnimator: High-Quality Identity-Preserving Human Image Animation
    ArXiv Project Page GitHub

  • [CVPR 2025] Disco4D: Disentangled 4D Human Generation and Animation from a Single Image
    Paper Project Page

  • [ICCV 2025] DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance
    Paper Project Page

  • [ICCV 2025] Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance
    Paper Project Page GitHub

  • [ICCV 2025] Multi-identity Human Image Animation with Structural Video Diffusion
    Paper

  • [ICCV 2025] OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
    Paper Project Page

  • [ICCV 2025] AdaHuman: Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion
    Paper Project Page

  • [ICCV 2025] Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation
    Paper

  • [ICLR 2025] Animate-X: Universal Character Image Animation with Enhanced Motion Representation
    ArXiv Project Page GitHub

💡 Pre-Print Papers

✨ 2024

✅ Published Papers

  • [CVPR 2024] MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion
    ArXiv Project Page GitHub

  • [CVPR 2024] MotionEditor: Editing Video Motion via Content-Aware Diffusion
    ArXiv Project Page GitHub

  • [CVPR 2024] MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
    Paper Project Page GitHub

  • [ECCV 2024] Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
    Paper Project Page GitHub

  • [NeurIPS 2024] HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation
    Paper Project Page GitHub

  • [NeurIPS 2024] TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation
    Paper

  • [ICLR 2024] DisPose: Disentangling Pose Guidance for Controllable Human Image Animation
    ArXiv Project Page GitHub

💡 Pre-Print Papers

⇧ Back to ToC

⚡ Fast Video Generation (Acceleration)

✨ 2025

✅ Published Papers

  • [CVPR 2025] Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
    Paper Project Page GitHub

  • [CVPR 2025] CausVid: From Slow Bidirectional to Fast Autoregressive VDMs
    Paper Project Page GitHub

  • [CVPR 2025] BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers
    Paper

  • [ICCV 2025] AdaCache: Adaptive Caching for Faster Video Generation with Diffusion Transformers
    Paper Project Page GitHub

  • [ICCV 2025] TaylorSeer: From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers
    Paper Project Page GitHub

  • [ICCV 2025] Accelerating Diffusion Transformer via Gradient-Optimized Cache
    Paper

  • [ICCV 2025] V.I.P.: Iterative Online Preference Distillation for Efficient Video Diffusion Models
    Paper Project Page

  • [ICCV 2025] DMDX: Adversarial Distribution Matching for Diffusion Distillation Towards Efficient Image and Video Synthesis
    Paper

  • [ICCV 2025] OmniCache: A Trajectory-Oriented Global Perspective on Training-Free Cache Reuse for DiT
    Paper

  • [ICLR 2025] FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
    Paper GitHub

  • [ICML 2025] Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity
    Paper Project Page GitHub

  • [ICML 2025] Fast Video Generation with Sliding Tile Attention
    Paper Project Page GitHub

  • [ICML 2025] Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing
    Paper GitHub

  • [ICML 2025] AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration
    Paper

💡 Pre-Print Papers

✨ 2024

✅ Published Papers

  • [CVPR 2024] Cache Me if You Can: Accelerating Diffusion Models through Block Caching
    Paper

  • [NeurIPS 2024] Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation
    Paper Project Page GitHub

  • [NeurIPS 2024] Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy
    Paper Project Page GitHub

  • [NeurIPS 2024] Fast and Memory-Efficient Video Diffusion Using Streamlined Inference
    Paper GitHub

  • [IJCAI 2024] FasterVD: On Acceleration of Video Diffusion Models
    Paper

💡 Pre-Print Papers

⇧ Back to ToC

🎯 Reinforcement Learning for Video Generation

✨ 2025

✅ Published Papers

  • [ICCV 2025] LongAnimation: Long Animation Generation with Dynamic Global‑Local Memory
    Paper Project Page GitHub Hugging Face

  • [ICCV 2025] TesserAct: Learning 4D Embodied World Models
    Paper Project Page GitHub Hugging Face

  • [ICLR 2025] DartControl: A Diffusion‑Based Autoregressive Motion Model for Real‑Time Text‑Driven Motion Control
    Paper Project Page GitHub

  • [ICLR 2025] FLIP: Flow‑Centric Generative Planning as General‑Purpose Manipulation World Model
    Paper Project Page GitHub

  • [CVPR 2025] VideoDPO: Omni‑Preference Alignment for Video Diffusion Generation
    Paper Project Page GitHub

💡 Pre-Print Papers

⇧ Back to ToC


🗂️ Datasets

Dataset Name Year Modalities Task Paper Link
UCF101 2012 Video Unconditional Video Generation Paper Website
TaiChi-HD 2019 Video Unconditional Video Generation Paper Website
SkyTimeLapse 2020 Video Unconditional Video Generation Paper Website
WebVid-10M 2021 Text, Video Text-to-Video Generation Paper Website
HD-VG-130M 2023 Text, Video Text-to-Video Generation Paper Website
FETV 2023 Text, Video Text-to-Video Generation Paper Website
InternVid 2024 Text, Video Text-to-Video Generation Paper Website
VidProM 2024 Text, Video Text-to-Video Generation Paper Website
Panda-70M 2024 Text, Video Text-to-Video Generation Paper Website
SafeSora 2024 Text, Video Text-to-Video Generation Paper Website
ChronoMagic-Pro 2024 Text, Video Text-to-Video Generation Paper Website
T2V-CompBench 2024 Text, Video Text-to-Video Generation Paper Website
VidGen-1M 2024 Text, Video Text-to-Video Generation Paper Website
PhyGenBench 2024 Text, Video Text-to-Video Generation Paper Website
DH-FaceVid-1K 2024 Text, Video Text-to-Video Generation Paper Website
StoryEval 2024 Text, Video Text-to-Video Generation Paper Website
HOIGen-1M 2025 Text, Video Text-to-Video Generation Paper Website
OpenVid-1M 2025 Text, Video Text-to-Video Generation Paper Website
HumanVid 2024 Image, Video Image-to-Video Generation Paper Website
TIP-I2V 2024 Text, Image, Video Image-to-Video Generation Paper Website
TC-Bench 2024 Text, Image, Video Text-to-Video, Image-to-Video Paper Website
AnimeShooter 2025 Text, Image, Video Text-to-Video, Image-to-Video Paper Website
VE-Bench 2024 Text, Video Video Editing Paper Website
DAVIS-Edit 2024 Text, Video, Image Video Editing Paper Website
DAVIS 2017 Video, Image Video Editing Paper Website
VIVID-10M 2024 Text, Video Video Editing Paper Website
Señorita-2M 2025 Text, Video Video Editing Paper Website
FiVE-Bench 2025 Text, Video Video Editing Paper Website
InsViE-1M 2025 Text, Video Video Editing Paper Website
VEU-Bench 2025 Text, Video Video Editing Paper Website
OpenS2V-5M 2025 Text, Video, Audio Text-to-Video, Image-to-Video, Subject-to-Video Paper Website
SpeakerVid-5M 2025 Text, Video, Audio Audio-Driven Video Generation Paper Website
AIGVQA‑DB 2025 Text, Video, Ratings Text‑to‑Video Generation Paper Website
EvalCrafter 2024 Text, Video Text‑to‑Video Generation Paper Website
EditBoard 2025 Video, Instruction, Edited Video Video‑to‑Video Editing Paper Website
VE‑Bench DB 2025 Video, Text, Edited Video Video‑to‑Video Editing Paper Website
SAVGBench 2024 Video, Audio, Spatial-Temporal Event Audio‑Driven Video Generation Paper Website
Morpheus 2025 Video Reinforcement Learning for Video Generation Paper Website
MJ‑Video 2025 Video, Text, Rating Text‑to‑Video Generation Paper Website

⇧ Back to ToC


🎓 About Us

QuenithAI is a professional organization composed of top researchers, dedicated to providing high-quality 1-on-1 research mentoring for university students worldwide. Our mission is to help students bridge the gap from theoretical knowledge to cutting-edge research and publish their work in top-tier conferences and journals.

Maintaining this Awesome Video Generation list requires significant effort, just as completing a high-quality paper requires focused dedication and expert guidance. If you're looking for one-on-one support from top scholars on your own research project, to quickly identify innovative ideas and make publications, we invite you to contact us ASAP.

➡️ Contact us via WeChat or E-mail to start your research journey.


「应达学术」(QuenithAI) 是一家由顶尖研究者组成,致力于为全球高校学生提供高质量1V1科研辅导的专业机构。我们的使命是帮助学生培养出色卓越的科研技能,在顶级会议和期刊上发表自己的成果。

维护一个GitHub调研仓库需要巨大的精力,正如完成一篇高质量的论文一样,离不开专注的投入和专业的指导。如果您希望在自己的研究项目中,获得来自顶尖学者的一对一支持,我们诚邀您与我们取得联系。

➡️ 欢迎通过 微信邮件 联系我们,开启您的科研之旅。

⇧ Back to ToC


🤝 Contributing

Contributions are welcome! Please see our Contribution Guidelines for details on how to add new papers, correct information, or improve the repository.


💬 Join the Community

Join our community to stay up-to-date with the latest advancements, share your work, and collaborate with other researchers and developers in the field of video generation.