I aim at building multimodal interactive AI systems that can not only ground, reason and generate over the external world signals, to understand human language, but also assist humans in decision-making and efficiently solving social concerns, e.g., robot, medical.
As steps towards this goal, my research interests include but are not limited to multimodal understanding, multimodal generation and multimodal foundation model post-training.
[2025/04] 1 paper (MagicTime) is accepted by TPAMI 2025.
[2025/03] 1 paper (AR-Visual Survey) is accepted by TMLR 2025.
[2025/02] 1 paper (ConsisID) is accepted by CVPR 2025.
[2025/01] 2 papers (1 Poster and 1 Spotlight) are accepted by ICLR 2025.
[2025/01] 1 paper (Medical LLM Survey) is accepted by Nature Reviews Bioengineering 2025.
[2025/01] Started the research internship at Google, USA, supervised by Jiageng Zhang and Dr. Eric Li.
[2024/12] Happy New Yearπ₯³! 1 paper (PromptLLM) is accepted by TPAMI 2025.
[2024/12] 1 paper is accepted by AAAI 2025.
[2024/12] 1 short paper is accepted by COLING 2025.
[2024/11] 1 paper is accepted by ACM Transactions on Intelligence Systems and Technology (TIST) 2024.
[2024/11] Winter is comingβοΈ! 1 paper is accepted by npj Digital Medicine (Impact Factor: 15.357).
[2024/11] 1 survey is accepted by CAAI Transactions on Intelligence Technology (Impact Factor: 8.4), which aims at promoting camouflaged object detection and beyond tasks: Awesome Concealed Object Segmentation.
[2024/10] π₯π₯π₯ We release a GitHub repository and survey aim at promoting the application of autoregressive models in vision domain: Awesome Autoregressive Models in Vision.
[2024/09] 1 paper (Spotlight) is accepted by NeurIPS 2024 Datasets & Benchmarks Track.
[2024/09] 1 paper is accepted by EMNLP 2024 Findings.
[2024/05] 1 paper is accepted by ACL 2024 Findings.
[2024/04]π₯π₯π₯ We are thrilled to present πππ π’πππ’π¦π, a metamorphic time-lapse video generation model and a new dataset ChronoMagic, support U-Net or DiT-based T2V frameworks.
[2024/01] 1 paper is accepted by ICLR 2024.
[2023/11]π₯π₯π₯ We release a GitHub repository to promote medical Large Language Models research with the vision of applying LLM to real-life medical scenarios: A Practical Guide for Medical Large Language Models.
[2023/11]π₯π₯π₯ How could LMMs contribute to social good? We are excited to release a new preliminary explorations of GPT-4V(ison) for social multimedia: GPT-4V(ision) as A Social Media Analysis Engine.
[2023/09]Join the VIStA Lab as a Ph.D. student working on vision and language.
[2023/07]1 paper is accepted by ACMMM 2023.
[2023/05]I was awarded the 2023 Peking University Excellent Graduation Thesis.
[2023/04]1 paper is accepted by TIP 2023.
[2023/04]1 paper is accepted by IJCAI 2023.
[2023/02]1 paper (Top 10% Highlight) is accepted by CVPR 2023.
[2022/09]1 paper is accepted by ICRA 2023.
[2022/09]1 paper (Spotlight) is accepted by NeurIPS 2022.
Education
University of Rochester (UR), USA
PH.D. Student in Computer Science • Sep. 2023 - Present
Advisor: Prof. Jiebo Luo
Peking University (PKU), China
Master Degree in Computer Science • Sep. 2020 - Jun. 2023
Advisors: Prof. Li Yuan and Prof. Jie Chen
University of Electronic Science and Technology of China (UESTC), China
Bachelor Degree in Software Engineering • Sep. 2016 - Jun. 2020
Advisors: Prof. Xucheng Luo
Research Experience
Core ML Applied ML, Google, USA Student Research • Jan. 2025 - Now
Advisors: Jiageng Zhang and Dr. Eric Li.
Existing text-to-video generation models have not adequately encoded physical knowledge of the real world, thus generated videos tend to have limited motion and poor variations.
In this paper, we propose MagicTime, a metamorphic time-lapse video generation model, which learns real-world physics knowledge from time-lapse videos and implements metamorphic video generation.
Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning Peng Jin,
Jinfa Huang,
Pengfei Xiong,
Shangxuan Tian,
Chang Liu,
Xiangyang Ji,
Li Yuan,
Jie Chen IEEE International Conference on Computer Vision and Pattern Recognition, CVPR 2023 (Highlight, Top 2.5%) [Paperlink], [Code], [Page], Area: Video-and-Language Representation, Machine Learning, Video-Text Retrieval, Video Captioning
To solve the problem of the modality gap in video-text feature space, we propose Expectation-Maximization Contrastive Learning (EMCL) to learn compact video-and-language representations.
We use the Expectation-Maximization algorithm to find a compact set of bases for the latent space, where the features could be concisely represented as the linear combinations of these bases.
Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations Peng Jin*,
Jinfa Huang*,
Fenglin Liu,
Xian Wu,
Shen Ge,
Guoli Song,
David A. Clifton,
Jie Chen Conference on Neural Information Processing Systems, NeurIPS 2022 (Spotlight Presentation, Top 5%) [Paperlink], [Code], Area: Video-and-Language Representation, Machine Learning, Video-Text Retrieval, Video Captioning
To solve the problem of the modality gap in video-text feature space, we propose Expectation-Maximization Contrastive Learning (EMCL) to learn compact video-and-language representations. We use the Expectation-Maximization algorithm to find a compact set of bases for the latent space, where the features could be concisely represented as the linear combinations of these bases.
Selected Honors & Scholarships
OpenAI Researcher Access Program, OpenAI 2025
Peking University Excellent Graduation Thesis (Top 10%), PKU 2023
Outstanding Graduate of University of Electronic Science and Technology of China (UESTC), 2020
Selected entrant for Google Machine Learning Winter Camp 2019 (100 people worldwide), 2019
National Inspirational Scholarship, 2018
China Collegiate Programming Contest (ACM-CCPC), Jilin, Bronze, 2018
Talk
"Can Video Generation Models as World Simulators?β, 3Dθ§θ§ε·₯ε, 2025.01, [Live]
Teaching
Teaching Assistant, CSC 240/440 Data Mining, Prof. Thaddeus E. Pawlicki, University of Rochester, 2025 Spring
Teaching Assistant, CSC 240/440 Data Mining, Prof. Monika Polak, University of Rochester, 2024 Fall
Personal Interests
Anime: As a pastime in my spare time, I watched a lot of Japanese anime about love, sports, and sci-fi.
Literature: My favorite writer is Xiaobo Wang, the wisdom of his life inspires me. My favorite philosopher is Friedrich Wilhelm Nietzsche, and I am grateful that his philosophy has accompanied me through many difficult times in my life.