✨Bonjour, I am a Ph.D. candidate in the Department of Computer Science, University of Rochester (UR), advised by Prof. Jiebo Luo.
My research interests focus on building autonomous intelligence, specifically:
Prior to that, I received my master's degree from Peking University (PKU), advised by Prof. Li Yuan and Prof. Jie Chen. I obtained my bachelor's degree with honors from University of Electronic Science and Technology of China (UESTC).
🗺️ Research Map: A schematic overview of my research vision.
🌟 Project Prometheus: Fetching the "fire🔥" of real-world physics to spark autonomous AI.


My current research mainly focuses on multimodal understanding and generation. (*Equal Contribution)
Area: Text-to-Video Generation, Diffusion Model, Time-lapse Videos
Existing text-to-video generation models have not adequately encoded physical knowledge of the real world,
thus generated videos tend to have limited motion and poor variations. In this paper, we propose MagicTime, a
metamorphic time-lapse video generation model, which learns real-world physics knowledge from time-lapse
videos and implements metamorphic video generation.
Area: Video-and-Language Representation, Machine Learning, Video-Text Retrieval
To move beyond coarse-grained global interactions, we explicitly model video-text as game players using
cooperative game theory. We propose Hierarchical Banzhaf Interaction (HBI) to value fine-grained
correspondence between video frames and text words for sensitive, explainable cross-modal contrast across
different semantic levels.
Area: Video-and-Language Representation, Machine Learning, Video-Text Retrieval, Video
Captioning
To solve the problem of the modality gap in video-text feature space, we propose Expectation-Maximization
Contrastive Learning (EMCL) to learn compact video-and-language representations. We use the
Expectation-Maximization algorithm to find a compact set of bases for the latent space, where the features
could be concisely represented as the linear combinations of these bases.
I maintain several repositories to track the latest research.