All Publication [Google Scholar]
My current research mainly focuses on multimodal generation and understanding. (*Equal Contribution)
arXiv preprints
[1] Shenghai Yuan*, Jinfa Huang*, Yujun Shi, Yongqi Xu, Ruijie Zhu, Bin Lin, Xinhua Cheng, Li Yuan, Jiebo Luo. "MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators"
[PDF][Code][Project page]
[2] Bin Zhu, Peng Jin, Munan Ning, Bin Lin, Jinfa Huang, Qi Song, Mingjun Pan, Li Yuan. "LLMBind: A unified modality-task integration framework"
[PDF][Code]
[3] Bin Lin, Zhenyu Tang, Yang Ye, Jinfa Huang, Junwu Zhang, Patian Pang, Peng Jin, Munan Ning, Jiebo Luo, Li Yuan. "MoE-LLaVA: Mixture of Experts for Large Vision-Language Models"
[PDF][Code]
[4] Cong Jin, Jingru Fan, Jinfa Huang, Jinyuan Fu, Yi Zhang, Tao Mei, Li Yuan, Jiebo Luo. "Next-Gen AIGC: Harnessing Advanced Multimodal Foundation Models for Text-to-Media Innovations"
[5] Yongdong Luo, Xiawu Zheng, Xiao Yang, Guilin Li, Haojia Lin, Jinfa Huang, Jiayi Ji, Fei Chao, Jiebo Luo, Rongrong Ji. "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension"
[PDF][Code]
[6] Shenghai Yuan, Jinfa Huang, Xianyi He, Yunyuan Ge, Yujun Shi, Liuhan Chen, Jiebo Luo, Li Yuan. "Identity-Preserving Text-to-Video Generation by Frequency Decomposition"
[PDF][Code][Page]
[7] Jing Xiong, Gongye Liu, Lun Huang, Chengyue Wu, Taiqiang Wu, Yao Mu, Yuan Yao, Hui Shen, Zhongwei Wan, Jinfa Huang, Chaofan Tao, Shen Yan, Huaxiu Yao, Lingpeng Kong, Hongxia Yang, Mi Zhang, Guillermo Sapiro, Jiebo Luo, Ping Luo, Ngai Wong. "Autoregressive Models in Vision: A Survey"
[PDF][Code]
2025
[1] Jinfa Huang*, Jinsheng Pan*, Zhongwei Wan, Hanjia Lyu, Jiebo Luo. "Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection", COLING 2025, short paper,
[PDF] [Code] [Poster]
[2] Haoran Tang, Meng Cao, Jinfa Huang, Ruyang Liu, Peng Jin, Ge Li, Xiaodan Liang. "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval", AAAI 2025,
[PDF][Code]
[3] Fenglin Liu, Xian Wu, Jinfa Huang, Kim Branson, Patrick Schwab, Lei Clifton, Ping Zhang, Jiebo Luo, Yefeng Zheng, and David A. Clifton. "Aligning, Autoencoding and Prompting Large Language Models for Novel Thorax Disease Reporting", TPAMI 2025,
[PDF][Code]
[4] Hongjian Zhou*, Fenglin Liu*, Boyang Gu*, Xinyu Zou*, Jinfa Huang*, Jinge Wu, Yiru Li, Sam S. Chen, Peilin Zhou, Junling Liu, Yining Hua, Chengfeng Mao, Xian Wu, Yefeng Zheng, Lei Clifton, Zheng Li, Jiebo Luo, David A. Clifton. "A Survey of Large Language Models in Medicine: Principles, Applications, and Challenges", Nature Reviews Bioengineering 2025,
[PDF][Code]
[5] Shaofeng Zhang, Qiang Zhou, Sitong Wu, Haoru Tan, Zhibin Wang, Jinfa Huang, Junchi Yan. "CR2PQ: Continuous Relative Rotary Positional Query for Dense Visual Representation Learning", ICLR 2025,
[PDF][Code]
[6] Chunming He, Chengyu Fang, Yulun Zhang, Longxiang Tang, Jinfa Huang, Kai Li, Zhenhua Guo, Xiu Li, Sina Farsiu. "Reti-Diff: Illumination Degradation Image Restoration with Retinex-based Latent Diffusion Model", ICLR 2025 Spotlight,
[PDF][Code]
[7] Fenglin Liu, Zheng Li, Qingyu Yin, Jinfa Huang, Xian Wu, Anshul Thakur, Kim Branson, Patrick Schwab, Bing Yin, Yefeng Zheng, Jiebo Luo, and David A. Clifton. "A Multimodal Multidomain Multilingual Medical Foundation Model for Zero-Shot Clinical Diagnosis", npj Digital Medicine,
[PDF][Github]
2024
[1] Meng Cao*, Haoran Tang*, Jinfa Huang, Peng Jin, Can Zhang, Ruyang Liu, Long Chen, Xiaodan Liang, Li Yuan, Ge Li. "RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter", ACL 2024 Finding,
[PDF]
[2] Shaofeng Zhang, Jinfa Huang, Qiang Zhou, Zhibin Wang, Fan Wang, Jiebo Luo, Junchi Yan. "Continuous-Multiple Image Outpainting in One-Step via Positional Query and A Diffusion-based Approach", ICLR 2024,
[PDF][Code]
[3] Zhongwei Wan*, Ziang Wu*, Che Liu, Jinfa Huang, Zhihong Zhu, Peng Jin, Longyue Wang, Li Yuan. "LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference", EMNLP 2024 Finding,
[PDF][Code]
[4] Shenghai Yuan, Jinfa Huang, Yongqi Xu, Yaoyang Liu, Shaofeng Zhang, Yujun Shi, Ruijie Zhu, Xinhua Cheng, Jiebo Luo, Li Yuan. "ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation", NeurIPS 2024 D&B Spotlight,
[PDF][Code][Project page]
[5] Fengyang Xiao, Sujie Hu, Yuqi Shen, Chengyu Fang, Jinfa Huang, Chunming He, Longxiang Tang, Ziyun Yang, Xiu Li. "A Survey of Camouflaged Object Detection and Beyond", CAAI 2024,
[PDF][Github]
[6] Hanjia Lyu*, Jinfa Huang*, Daoan Zhang*, Yongsheng Yu*, Xinyi Mou*, Jinsheng Pan, Zhengyuan Yang, Zhongyu Wei, Jiebo Luo. "GPT-4V (ision) as a Social Media Analysis Engine", ACM Transactions on Intelligence Systems and Technology (TIST) 2024,
[PDF][Code]
2023
[1] Peng Jin, Jinfa Huang, Pengfei Xiong, Shangxuan Tian, Chang Liu, Xiangyang Ji, Li Yuan, Jie Chen. "Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning", CVPR 2023 Highlight,
[PDF][Code][Project page]
[2] Jingyi Wang, Jinfa Huang, Can Zhang, Zhidong Deng. "Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs", ICRA 2023,
[PDF][Code]
[3] Peng Jin, Hao Li, Zesen Cheng, Jinfa Huang, Zhennan Wang, Li Yuan, Chang Liu, Jie Chen. "Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment", IJCAI 2023,
[PDF][Code]
[4] Hao Li, Jinfa Huang, Peng Jin, Guoli Song, Qi Wu, Jie Chen. "Weakly-Supervised 3D Spatial Reasoning for Text-Based Visual Question Answering", TIP 2023,
[PDF]
[5] Jingyi Wang, Can Zhang, Jinfa Huang, Botao Ren, Zhidong Deng. "Improving Scene Graph Generation with Superpixel-Based Interaction Learning", ACMMM 2023,
[PDF]
2022 and Earlier
[1] Peng Jin*, Jinfa Huang*, Fenglin Liu, Xian Wu, Shen Ge, Guoli Song, David Clifton, Jie Chen. "Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations", NeurIPS 2022 Spotlight,
[PDF][Code]
[2] Yingmei Guo, Jinfa Huang, Yanlong Dong, Mingxing Xu. "Guoym at SemEval-2020 task 8: Ensemble-based Classification of Visuo-lingual Metaphor in Memes", SemEval-2020,
[PDF]
[3] Yanan Wang, Jianming Wu, Jinfa Huang, Gen Hattori, Yasuhiro Takishima, Shinya Wada, Rui Kimura, Jie Chen, Satoshi Kurihara. "LDNN: Linguistic Knowledge Injectable Deep Neural Network for Group Cohesiveness Understanding", ICMI 2020,
[PDF][Code]
|