✨Bonjour, I am a Ph.D. candidate in the Department of Computer Science, University of Rochester (UR), advised by Prof. Jiebo Luo.
Prior to that, I got my master's degree from Peking University (PKU), advised by Prof. Li Yuan and Prof. Jie Chen. I obtained the honored bachelor's degree from University of Electronic Science and Technology of China (UESTC).
My long-term goal is to build multimodal, interactive AI systems that ground, reason, and generate within a closed loop. I conceptualize this pursuit as the Prometheus framework:
Research Map: A schematic overview of my research vision.


My current research mainly focuses on vision+language and generative models. (*Equal Contribution)
Area: Text-to-Video Generation, Diffusion Model, Time-lapse Videos
Existing text-to-video generation models have not adequately encoded physical knowledge of the real world, thus generated videos tend to have limited motion and poor variations. In this paper, we propose MagicTime, a metamorphic time-lapse video generation model, which learns real-world physics knowledge from time-lapse videos and implements metamorphic video generation.
Area: Video-and-Language Representation, Machine Learning, Video-Text Retrieval, Video Captioning
To solve the problem of the modality gap in video-text feature space, we propose Expectation-Maximization Contrastive Learning (EMCL) to learn compact video-and-language representations. We use the Expectation-Maximization algorithm to find a compact set of bases for the latent space, where the features could be concisely represented as the linear combinations of these bases.
Area: Video-and-Language Representation, Machine Learning, Video-Text Retrieval, Video Captioning
To solve the problem of the modality gap in video-text feature space, we propose Expectation-Maximization Contrastive Learning (EMCL) to learn compact video-and-language representations. We use the Expectation-Maximization algorithm to find a compact set of bases for the latent space, where the features could be concisely represented as the linear combinations of these bases.
I maintain several repositories to track the latest research.