𝑱𝙞𝒏𝙛𝒂 𝑯𝙪𝒂𝙣𝒈

𝒥𝒾𝓃𝒻𝒶 ℋ𝓊𝒶𝓃ℊ

✨Bonjour, I am a Ph.D. candidate in the Department of Computer Science, University of Rochester (UR), advised by Prof. Jiebo Luo.

My long-term goal is to build multimodal, interactive AI systems that can ground, reason, and generate across real-world signals for understanding human language, assisting decision-making, and addressing pressing social challenges in domains such as robotics and medicine.

Toward this vision, my research focuses on vision and language, with a particular interest in efficient, faithful, and scalable reasoning for vision–language understanding and generation.

Prior to that, I got my master's degree from Peking University (PKU) in 2023, advised by Prof. Li Yuan and Prof. Jie Chen. And I obtained the honored bachelor's degree from University of Electronic Science and Technology of China (UESTC) in 2020.

Email / Google Scholar / Github / Twitter / Zhihu / LinkedIn

Winter 2024, Puerto Rico✨

News

[2025/07] 🔥🔥🔥 1 paper (MoE-LLaVA) is accepted by TMM 2025.

[2025/07] 🔥🔥🔥 We release the companion GitHub repo: LatentCoT-Horizon for our paper A Survey on Latent Reasoning, aiming to consolidate and promote the latest research, code, and practical resources on latent reasoning: GitHub Repo stars

[2025/06] Finally, finished my cold start stage, my research career has achieved

[2025/05] Our workshop on MLLM for Unified Comprehension and Generation (MUCG) will be presented on @ACMMM2025!

[2025/05] MagicTime (TPAMI 2025) is posted by University of Rochester News.

[2025/04] 🎉🎉🎉 I have officially passed my area/qualification exam and am now a Ph.D. candidate!

[2025/04] 🔥🔥🔥 1 paper (MagicTime) is accepted by TPAMI 2025.

[2025/03] 1 paper (AR-Visual Survey) is accepted by TMLR 2025.

[2025/02] 🔥🔥🔥 1 paper (ConsisID) is accepted by CVPR 2025 Highlight (Top 3%) 🌟.

[2025/01] 2 papers (1 Poster and 1 Spotlight) are accepted by ICLR 2025.

[2025/01] 1 paper (Medical LLM Survey) is accepted by Nature Reviews Bioengineering 2025.

[2025/01] Started the research internship at Google, USA, supervised by Jiageng Zhang and Dr. Eric Li.

[2024/12] Happy New Year🥳! 1 paper (PromptLLM) is accepted by TPAMI 2025.

[2024/12] 1 paper is accepted by AAAI 2025.

[2024/12] 1 short paper is accepted by COLING 2025.

[2024/11] 1 paper is accepted by ACM Transactions on Intelligence Systems and Technology (TIST) 2024.

[2024/11] Winter is coming❄️! 1 paper is accepted by npj Digital Medicine (Impact Factor: 15.357).

[2024/11] 1 survey is accepted by CAAI Transactions on Intelligence Technology (Impact Factor: 8.4), which aims at promoting camouflaged object detection and beyond tasks: GitHub Repo stars

Awesome Concealed Object Segmentation.

[2024/10] 🔥🔥🔥 We release a GitHub repository and survey aim at promoting the application of autoregressive models in vision domain: GitHub Repo stars

Awesome Autoregressive Models in Vision.

[2024/09] 1 paper (Spotlight) is accepted by NeurIPS 2024 Datasets & Benchmarks Track.

[2024/09] 1 paper is accepted by EMNLP 2024 Findings.

[2024/06] 🔥🔥🔥 We are excited to present 𝐂𝐡𝐫𝐨𝐧𝐨𝐌𝐚𝐠𝐢𝐜-𝐁𝐞𝐧𝐜𝐡, a benchmark for metamorphic evaluation of text-to-video generation, which provides valuable insights for T2V models selection. GitHub Repo stars

[2024/05] Started the research internship at ByteDance Seed, Bellevue, USA, supervised by Quanzeng You & Yongfei Liu & Jianbo Yuan.

[2024/05] 1 paper is accepted by ACL 2024 Findings.

[2024/04] 🔥🔥🔥 We are thrilled to present 𝐌𝐚𝐠𝐢𝐜𝐓𝐢𝐦𝐞, a metamorphic time-lapse video generation model and a new dataset ChronoMagic, support U-Net or DiT-based T2V frameworks. GitHub Repo stars

[2024/01] 1 paper is accepted by ICLR 2024.

[2023/11] 🔥🔥🔥 We release a GitHub repository to promote medical Large Language Models research with the vision of applying LLM to real-life medical scenarios: GitHub Repo stars

A Practical Guide for Medical Large Language Models.

[2023/11] 🔥🔥🔥 How could LMMs contribute to social good? We are excited to release a new preliminary explorations of GPT-4V(ison) for social multimedia: GPT-4V(ision) as A Social Media Analysis Engine.

[2023/09] Join the VIStA Lab as a Ph.D. student working on vision and language.

[2023/07] 1 paper is accepted by ACMMM 2023.

[2023/05] I was awarded the 2023 Peking University Excellent Graduation Thesis.

[2023/04] 1 paper is accepted by TIP 2023.

[2023/04] 1 paper is accepted by IJCAI 2023.

[2023/02] 1 paper (Top 10% Highlight) is accepted by CVPR 2023.

[2022/09] 1 paper is accepted by ICRA 2023.

[2022/09] 1 paper (Spotlight) is accepted by NeurIPS 2022.

Education

	University of Rochester (UR), USA PH.D. Student in Computer Science • Sep. 2023 - Present Advisor: Prof. Jiebo Luo
	Peking University (PKU), China Master Degree in Computer Science • Sep. 2020 - Jun. 2023 Advisors: Prof. Li Yuan and Prof. Jie Chen
	University of Electronic Science and Technology of China (UESTC), China Bachelor Degree in Software Engineering • Sep. 2016 - Jun. 2020 Advisors: Prof. Xucheng Luo

Research Experience

International Machine Learning, Amazon, USA
Applied Scientist Intern • Jul. 2025 - Now
Advisors: Dr. Yang Liu, Dr. Chien-Chih Wang and Dr. Huidong Liu

Core ML Applied ML, Google, USA
Student Researcher • Jan. 2025 - May. 2025
Advisors: Jiageng Zhang and Dr. Eric Li

Seed-Foundation-Model, ByteDance
Research Intern • May. 2024 - Aug. 2024
Advisors: Dr. Quanzeng You & Dr. Yongfei Liu & Dr. Jianbo Yuan

Artificial Intelligence Center, Pengcheng Lab
Research Intern • Sep. 2020 - Aug. 2022
Advisors: Dr. Guoli Song & Prof. Jie Chen

Multimedia Computing Team, KDDI Research
Research Intern • Nov. 2019 - Feb. 2020
Advisors: Dr. Yanan Wang & Dr. Jianming Wu

X-Data Research Group, Tencent IEG
Engineering Intern • Jan. 2019 - Jul. 2019
Advisors: Boya Yin & Dr. Yang Chao

Selected Publication

My current research mainly focuses on the vision+language, generative model. *Equal Contribution.

MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
Shenghai Yuan*, Jinfa Huang*, Yujun Shi, Yongqi Xu, Ruijie Zhu, Bin Lin, Xinhua Cheng, Li Yuan, Jiebo Luo
IEEE Transactions on Pattern Analysis and Machine Intelligence, TPAMI 2025
(Github Repo 1300+ Stars🌟)
[Paperlink], [Code], [News], [Page], GitHub Repo stars

Area: Text-to-Video Generation, Diffusion Model, Time-lapse Videos

Existing text-to-video generation models have not adequately encoded physical knowledge of the real world, thus generated videos tend to have limited motion and poor variations. In this paper, we propose MagicTime, a metamorphic time-lapse video generation model, which learns real-world physics knowledge from time-lapse videos and implements metamorphic video generation.

Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Peng Jin, Jinfa Huang, Pengfei Xiong, Shangxuan Tian, Chang Liu, Xiangyang Ji, Li Yuan, Jie Chen
IEEE International Conference on Computer Vision and Pattern Recognition, CVPR 2023
(Highlight, Top 2.5%)
[Paperlink], [Code], [Page], GitHub Repo stars

Area: Video-and-Language Representation, Machine Learning, Video-Text Retrieval, Video Captioning

To solve the problem of the modality gap in video-text feature space, we propose Expectation-Maximization Contrastive Learning (EMCL) to learn compact video-and-language representations. We use the Expectation-Maximization algorithm to find a compact set of bases for the latent space, where the features could be concisely represented as the linear combinations of these bases.

Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
Peng Jin*, Jinfa Huang*, Fenglin Liu, Xian Wu, Shen Ge, Guoli Song, David A. Clifton, Jie Chen
Conference on Neural Information Processing Systems, NeurIPS 2022
(Spotlight Presentation, Top 5%)
[Paperlink], [Code], GitHub Repo stars

Area: Video-and-Language Representation, Machine Learning, Video-Text Retrieval, Video Captioning

Selected Survey

I am a primary maintainer and organizer for the following survey GitHub repositories, a role that stems from my personal interest in tracking and sharing the latest research.

[Under Construction] [AIGC] Awesome-Personalized-Video-Creation
[Arxiv 2025] [Reasoning] A Survey on Latent Reasoning
[TMLR 2025] [AIGC] Autoregressive Models in Vision Survey
[Nature Reviews Bioengineering 2025] [Medical] A Practical Guide for Medical Large Language Models

Selected Benchmark

I am dedicated to building benchmarks, as I believe that establishing a standardized evaluation for a complex problem is the essential prerequisite to clearly defining it and driving breakthrough progress for AGI.

[Arxiv 2025] [AIGC] OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation, [PDF], [OpenS2V-5M], [OpenS2V-Eval],
[NeurIPS D&B 2024 Spotlight] [AIGC] ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation, [PDF], [ChronoMagic-Pro], [ChronoMagic-Bench],
[TIST 2025] [MLLM] GPT-4V(ision) as A Social Media Analysis Engine, [PDF], [Data],

Selected Honors & Scholarships

OpenAI Researcher Access Program, OpenAI 2025

Peking University Excellent Graduation Thesis (Top 10%), PKU 2023

Outstanding Graduate of University of Electronic Science and Technology of China (UESTC), 2020

Selected entrant for Google Machine Learning Winter Camp 2019 (100 people worldwide), 2019

National Inspirational Scholarship, 2018

China Collegiate Programming Contest (ACM-CCPC), Jilin, Bronze, 2018

Talk

"Text-to-video AI blossoms with New Metamorphic Video Capabilities“, University of Rochester News, 2025.05, [News]

"Can Video Generation Models as World Simulators?“, 3D视觉工坊, 2025.01, [Live]

Teaching

Teaching Assistant, CSC 240/440 Data Mining, Prof. Monika Polak, University of Rochester, 2025 Fall

Teaching Assistant, CSC 240/440 Data Mining, Prof. Thaddeus E. Pawlicki, University of Rochester, 2025 Spring

Teaching Assistant, CSC 240/440 Data Mining, Prof. Monika Polak, University of Rochester, 2024 Fall

Personal Interests

Anime: As a pastime in my spare time, I watched a lot of Japanese anime about love, sports, and sci-fi.

Literature: My favorite writer is Xiaobo Wang, the wisdom of his life inspires me. My favorite philosopher is Friedrich Wilhelm Nietzsche, and I am grateful that his philosophy has accompanied me through many difficult times in my life.

Latex Beamer: I maintain the open-source PKU-Beamer-Theme, a clean, bilingual presentation style tailored for academic reports, theses, and talks at Peking University. GitHub Repo stars

Academic Service

Program Member: MUCG@ACMMM2025, ER@NeurIPS2025

PC Member: CVPR'23/24/25, NeurIPS'22/23/25, ICLR'23/24/25, ICCV'23/25, ACM MM'24/25, ECCV'24, AAAI'25/26, COLM'25, ACL'25

Journal Reviewer: IEEE TPAMI(x2), IEEE TCSVT, NEJM AI

My hometown is Guangdong, you can call me Cantonese name: Gamfaat Wong.
Last updated on Jul 2025.

This awesome template is inspired from this good man.