All Publication [Google Scholar]
My current research mainly focuses on multimodal generation and understanding. (*Equal Contribution)
arXiv preprints
[1] Shenghai Yuan*, Jinfa Huang*, Yujun Shi, Yongqi Xu, Ruijie Zhu, Bin Lin, Xinhua Cheng, Li Yuan, Jiebo Luo. "MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators"
[PDF][Code][Project page]
[2] Bin Zhu, Peng Jin, Munan Ning, Bin Lin, Jinfa Huang, Qi Song, Mingjun Pan, Li Yuan. "LLMBind: A unified modality-task integration framework"
[PDF][Code]
[3] Bin Lin, Zhenyu Tang, Yang Ye, Jiaxi Cui, Bin Zhu, Peng Jin, Jinfa Huang, Junwu Zhang, Munan Ning, Li Yuan. "MoE-LLaVA: Mixture of Experts for Large Vision-Language Models"
[PDF][Code]
[4] Hanjia Lyu*, Jinfa Huang*, Daoan Zhang*, Yongsheng Yu*, Xinyi Mou*, Jinsheng Pan, Zhengyuan Yang, Zhongyu Wei, Jiebo Luo. "GPT-4V (ision) as a Social Media Analysis Engine"
[PDF][Code]
[5] Hongjian Zhou*, Fenglin Liu*, Boyang Gu*, Xinyu Zou*, Jinfa Huang*, Jinge Wu, Yiru Li, Sam S. Chen, Peilin Zhou, Junling Liu, Yining Hua, Chengfeng Mao, Xian Wu, Yefeng Zheng, Lei Clifton, Zheng Li, Jiebo Luo, David A. Clifton. "A Survey of Large Language Models in Medicine: Principles, Applications, and Challenges"
[PDF][Code]
[6] Jinfa Huang*, Jinsheng Pan*, Zhongwei Wan, Hanjia Lyu, Jiebo Luo. "Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection"
[PDF]
[7] Cong Jin, Jingru Fan, Jinfa Huang, Jinyuan Fu, Yi Zhang, Tao Mei, Li Yuan, Jiebo Luo. "Next-Gen AIGC: Harnessing Advanced Multimodal Foundation Models for Text-to-Media Innovations"
[8] Haoran Tang, Meng Cao, Jinfa Huang, Ruyang Liu, Peng Jin, Ge Li, Xiaodan Liang. "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval"
[PDF][Code]
2024
[1] Meng Cao*, Haoran Tang*, Jinfa Huang, Peng Jin, Can Zhang, Ruyang Liu, Long Chen, Xiaodan Liang, Li Yuan, Ge Li. "RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter", ACL 2024 Finding,
[PDF][Code]
[2] Shaofeng Zhang, Jinfa Huang, Qiang Zhou, Zhibin Wang, Fan Wang, Jiebo Luo, Junchi Yan. "Continuous-Multiple Image Outpainting in One-Step via Positional Query and A Diffusion-based Approach", ICLR 2024,
[PDF][Code]
[3] Zhongwei Wan*, Ziang Wu*, Che Liu, Jinfa Huang, Zhihong Zhu, Peng Jin, Longyue Wang, Li Yuan. "LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference", EMNLP 2024 Finding,
[PDF][Code]
[4] Shenghai Yuan, Jinfa Huang, Yongqi Xu, Yaoyang Liu, Shaofeng Zhang, Yujun Shi, Ruijie Zhu, Xinhua Cheng, Jiebo Luo, Li Yuan. "ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation", NeurIPS 2024 D&B Spotlight,
[PDF][Code][Project page]
[5] Fengyang Xiao, Sujie Hu, Yuqi Shen, Chengyu Fang, Jinfa Huang, Chunming He, Longxiang Tang, Ziyun Yang, Xiu Li. "A Survey of Camouflaged Object Detection and Beyond", CAAI 2025,
[PDF][Github]
[6] Fenglin Liu, Zheng Li, Qingyu Yin, Jinfa Huang, Xian Wu, Anshul Thakur, Kim Branson, Patrick Schwab, Bing Yin, Yefeng Zheng, Jiebo Luo, and David A. Clifton. "A Multimodal Multidomain Multilingual Medical Foundation Model for Zero-Shot Clinical Diagnosis", npj Digital Medicine,
[PDF][Github]
2023
[1] Peng Jin, Jinfa Huang, Pengfei Xiong, Shangxuan Tian, Chang Liu, Xiangyang Ji, Li Yuan, Jie Chen. "Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning", CVPR 2023 Highlight,
[PDF][Code][Project page]
[2] Jingyi Wang, Jinfa Huang, Can Zhang, Zhidong Deng. "Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs", ICRA 2023,
[PDF][Code]
[3] Peng Jin, Hao Li, Zesen Cheng, Jinfa Huang, Zhennan Wang, Li Yuan, Chang Liu, Jie Chen. "Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment", IJCAI 2023,
[PDF][Code]
[4] Hao Li, Jinfa Huang, Peng Jin, Guoli Song, Qi Wu, Jie Chen. "Weakly-Supervised 3D Spatial Reasoning for Text-Based Visual Question Answering", TIP 2023,
[PDF]
[5] Jingyi Wang, Can Zhang, Jinfa Huang, Botao Ren, Zhidong Deng. "Improving Scene Graph Generation with Superpixel-Based Interaction Learning", ACMMM 2023,
[PDF]
2022 and Earlier
[1] Peng Jin*, Jinfa Huang*, Fenglin Liu, Xian Wu, Shen Ge, Guoli Song, David Clifton, Jie Chen. "Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations", NeurIPS 2022 Spotlight,
[PDF][Code]
[2] Yingmei Guo, Jinfa Huang, Yanlong Dong, Mingxing Xu. "Guoym at SemEval-2020 task 8: Ensemble-based Classification of Visuo-lingual Metaphor in Memes", SemEval-2020,
[PDF]
[3] Yanan Wang, Jianming Wu, Jinfa Huang, Gen Hattori, Yasuhiro Takishima, Shinya Wada, Rui Kimura, Jie Chen, Satoshi Kurihara. "LDNN: Linguistic Knowledge Injectable Deep Neural Network for Group Cohesiveness Understanding", ICMI 2020,
[PDF][Code]
|