Skip to content

Tele-AI/CtrlVDiff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

CtrlVDiff: Controllable Video Generation via Unified Multimodal Video Diffusion

Dianbing Xi1,2,*, Jiepeng Wang2,*,‡, Yuanzhi Liang2, Xi Qiu2, Jialun Liu2, Hao Pan3, Yuchi Huo1, Rui Wang1,†, Haibin Huang2, Chi Zhang2, Xuelong Li2,†

*Equal contribution.   †Corresponding author.   ‡Project leader.

1State Key Laboratory of CAD&CG, Zhejiang University
2Institute of Artificial Intelligence, China Telecom (TeleAI)
3Tsinghua University

📄 Paper   ·   🌐 Project Page

📌 Intro

CtrlVDiff unifies forward and inverse video generation within a single model, enabling the extraction of all modalities in a single pass. It provides layer-wise control over appearance and structure, facilitating applications such as material editing and object insertion.

✨ Highlights

  • Unified Video Framework: A single model supports both forward and inverse video generation. It can function as a renderer to synthesize videos, and as a decomposer to extract all multimodal representations in just one forward pass.

  • Layer-wise Control Strategy: To enable a unified model to flexibly handle arbitrary combinations and numbers of input modalities, we introduce a Hybrid Modality Control Strategy (HMCS), which provides hierarchical control over video generation across geometry, appearance, structure, and semantics.

  • MMVideo Dataset: To support the scale and diversity required for this task, we construct the MMVideo dataset, which includes both real-world and synthetic scenes. It contains 350k video clips paired with rich multimodal annotations, enabling high-quality video generation and decomposition across diverse domains.

📜 Citation

@misc{xdb2025ctrlvdiff,
      title={CtrlVDiff: Controllable Video Generation via Unified Multimodal Video Diffusion}, 
      author={Dianbing Xi and Jiepeng Wang and Yuanzhi Liang and Xi Qiu and Jialun Liu and Hao Pan and Yuchi Huo and Rui Wang and Haibin Huang and Chi Zhang and Xuelong Li},
      year={2025},
      eprint={2511.21129},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.21129}, 
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •