Jinfa Huang

Currently, I am a first-year Ph.D. student in the Department of Computer Science, University of Rochester (UR), advised by Prof. Jiebo Luo.

I aim at building multimodal interactive AI systems that can not only ground and reason over the external world signals, to understand human language, but also assist humans in decision-making and efficiently solving social concerns, e.g., robot. As steps towards this goal, my research interests include but are not limited to Multimodal Large Language Models, Video Generation and Multimodal Agents.

Prior to that, I got my master's degree from Peking University (PKU) in 2023, advised by Prof. Li Yuan and Prof. Jie Chen. And I obtained the honored bachelor's degree from University of Electronic Science and Technology of China (UESTC) in 2020.

Email / Google Scholar / Github

News

[2024/04] 🔥🔥🔥 We are thrilled to present MagicTime, a metamorphic time-lapse video generation model and a new dataset ChronoMagic, support U-Net or DiT-based T2V frameworks. GitHub Repo stars

MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators.

[2024/03] I will be joining ByteDance AML, Seattle, USA, as a research intern this summer.

[2024/01] 1 paper is accepted by ICLR 2024.

[2023/11] 🔥🔥🔥 We release a GitHub repository to promote medical Large Language Models research with the vision of applying LLM to real-life medical scenarios: GitHub Repo stars

A Practical Guide for Medical Large Language Models.

[2023/11] 🔥🔥🔥 How could LMMs contribute to social good? We are excited to release a new preliminary explorations of GPT-4V(ison) for social multimedia: GPT-4V(ision) as A Social Media Analysis Engine.

[2023/09] Join the VIStA Lab as a Ph.D. student working on vision and language.

[2023/07] 1 paper is accepted by ACMMM 2023.

[2023/05] I was awarded the 2023 Peking University Excellent Graduation Thesis.

[2023/04] 1 paper is accepted by TIP 2023.

[2023/04] 1 paper is accepted by IJCAI 2023.

[2023/02] 1 paper (Top 10% Highlight) is accepted by CVPR 2023.

[2022/09] 1 paper is accepted by ICRA 2023.

[2022/09] 1 paper (Spotlight) is accepted by NeurIPS 2022.

Education

	University of Rochester (UR), USA PH.D. Student in Computer Science • Sep. 2023 - Present Advisor: Prof. Jiebo Luo
	Peking University (PKU), China Master Degree in Computer Science • Sep. 2020 - Jun. 2023 Advisors: Prof. Li Yuan and Prof. Jie Chen
	University of Electronic Science and Technology of China (UESTC), China Bachelor Degree in Software Engineering • Sep. 2016 - Jun. 2020

Selected Publication [Google Scholar]

My current research mainly focuses on vision+language, generative model. *Equal Contribution.

MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
Shenghai Yuan*, Jinfa Huang*, Yujun Shi, Yongqi Xu, Ruijie Zhu, Bin Lin, Xinhua Cheng, Li Yuan, Jiebo Luo
Arxiv
[Paperlink], [Code], [Page]
Area: Text-to-Video Generation, Diffusion Model, Time-lapse Videos

Existing text-to-video models have not adequately encoded physical knowledge of the real world, thus generated videos tend to have limited motion and poor variations. In this paper, we propose MagicTime, a metamorphic time-lapse video generation model, which learns real-world physics knowledge from time-lapse videos and implements metamorphic video generation.

Continuous-Multiple Image Outpainting in One-Step via Positional Query and A Diffusion-based Approach
Shaofeng Zhang, Jinfa Huang, Qiang Zhou, Zhibin Wang, Fan Wang, Jiebo Luo, Junchi Yan
Conference on International Conference on Learning Representations, ICLR 2024
(Review Rating: 866)
[Paperlink], [Code]
Area: Image Outpainting, Diffusion Model, GenAI

We have proposed PQDiff, which learns the positional relationships and pixel information at the same time. Methodically, PQDiff can outpaint at any multiple in only one step, greatly increasing the applicability of image outpainting.

Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Peng Jin, Jinfa Huang, Pengfei Xiong, Shangxuan Tian, Chang Liu, Xiangyang Ji, Li Yuan, Jie Chen
IEEE International Conference on Computer Vision and Pattern Recognition, CVPR 2023
(Highlight, Top 2.5%)
[Paperlink], [Code]
Area: Video-and-Language Representation, Machine Learning, Video-Text Retrieval, Video Captioning

To solve the problem of modality gap in video-text feature space, we propose Expectation-Maximization Contrastive Learning (EMCL) to learn compact video-and-language representations. We use the Expectation-Maximization algorithm to find a compact set of bases for the latent space, where the features could be concisely represented as the linear combinations of these bases.

Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
Peng Jin*, Jinfa Huang*, Fenglin Liu, Xian Wu, Shen Ge, Guoli Song, David A. Clifton, Jie Chen
Conference on Neural Information Processing Systems, NeurIPS 2022
(Spotlight Presentation, Top 5%)
[Paperlink], [Code]
Area: Video-and-Language Representation, Machine Learning, Video-Text Retrieval, Video Captioning

To solve the problem of the modality gap in video-text feature space, we propose Expectation-Maximization Contrastive Learning (EMCL) to learn compact video-and-language representations. We use the Expectation-Maximization algorithm to find a compact set of bases for the latent space, where the features could be concisely represented as the linear combinations of these bases.

Selected Honors & Scholarships

Outstanding Graduate of University of Electronic Science and Technology of China (UESTC) , 2020

National Inspirational Scholarship, 2018

Selected entrant for Deepcamp 2020 (200 people worldwide), 2020

Outstanding Camper of Tencent Rhino Bird Elite Research Camp (24 people worldwide), 2020

Selected entrant for Google Machine Learning Winter Camp 2019 (100 people worldwide), 2019

China Collegiate Programming Contest (ACM-CCPC), JiLin, Bronze, 2018

Outstanding Student Scholarship (Top 10% student), UESTC 2017~2019

Peking University Excellent Graduation Thesis (Top 10%), PKU 2023

Academic Service

PC Member: CVPR'23/24, NeurIPS'22/23, ICLR'23/24, ICCV'23, ACM MM'24, ECCV'24

Journal Reviewer: IEEE TCSVT, IEEE TPAMI

Personal Interests

Anime: As a pastime in my spare time, I watched a lot of Japanese anime about love, sports, and sci-fi.

Literature: My favorite writer is Xiaobo Wang, the wisdom of his life inspires me. My favorite philosopher is Friedrich Wilhelm Nietzsche, and I am grateful that his philosophy has accompanied me through many difficult times in my life.

Last updated on April, 2024.

This awesome template borrowed from this good man💗💗💗.