Biography

I am a 1st year Ph.D. student at Shanghai Jiao Tong University and Shanghai Innovation Institute. Prior to this, I received my B.E. degree in Artificial Intelligence at Harbin Institute of Technology in 2025.

My current research focuses on advancing Fine-grained Perception capabilities of Large Vision-Language Models, with a particular emphasis on Spatial Intelligence, Visual Grounding, and Segmentation. I am also interested in embodied and interactive AI agents, especially their perception, reasoning, and action in open-world environments.

Please feel free to contact me if you're interested in relevant research or would like to discuss potential collaborations!

News

  • Jan 2026 SeC and GPT4Scene were accepted by ICLR 2026.
  • Oct 2025 SeC served as the foundational solution for the majority of teams in the 7th LSVOS Challenge.
  • May 2025 SongGen was accepted by ICML 2025.

Publications Full list on Google Scholar →

* Equal contribution · Project lead · Corresponding author

(Co-)First Author Publications

SetCon New!
Arxiv 2026 SetCon: Towards Open-Ended Referring Segmentation via Set-Level Concept Prediction
Zhixiong Zhang*, Yizhuo Li*, Shuangrui Ding, Yuhang Zang, Shengyuan Ding, Long Xing, Yibin Wang, Qiaosheng Zhang, Jiaqi Wang
Paper Code 🤗 Model 🤗 Dataset - -
SeC
ICLR 2026 SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction
Zhixiong Zhang*, Shuangrui Ding*, Xiaoyi Dong, Songxin He, Jianfan Lin, Junsong Tang, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang
Paper Project Code 🤗 Model 🤗 Benchmark - -
GPT4Scene
ICLR 2026 GPT4Scene: Understand 3D Scenes from Videos with Vision Language Models
Zhangyang Qi*, Zhixiong Zhang*, Ye Fang, Jiaqi Wang, Hengshuang Zhao
Paper Project Code 🤗 Dataset - -

Co-Author Publications

New!
Arxiv 2026 WildClawBench: An In-the-Wild Benchmark for AI Agents in the OpenClaw Environment
Shuangrui Ding*, Xuanlang Dai*, Long Xing*, Shengyuan Ding, Ziyu Liu, Jingyi Yang, Penghui Yang, Zhixiong Zhang, Xilin Wei, Xinyu Fang, Yubo Ma, Haodong Duan, Jing Shao, Jiaqi Wang, Dahua Lin, Kai Chen, Yuhang Zang
Paper Project Code 🤗 Benchmark - -
New!
Arxiv 2026 LoMo: Local Modality Substitution for Deeper Vision-Language Fusion
Feng Han, Zhixiong Zhang, Zheming Liang, Yibin Wang, Jiaqi Wang
Paper Project Code 🤗 Checkpoints - -
ICML 2025 SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
Zihan Liu, Shuangrui Ding, Zhixiong Zhang, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang
Paper Project Code - -
Arxiv 2025 VLN-R1: Vision-Language Navigation via Reinforcement Fine-Tuning
Zhangyang Qi, Zhixiong Zhang, Yizhou Yu, Jiaqi Wang, Hengshuang Zhao
Paper Project Code - -
Arxiv 2025 CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning
Zeyi Sun, Yuhang Cao, Jianze Liang, Qiushi Sun, Ziyu Liu, Zhixiong Zhang, Yuhang Zang, Xiaoyi Dong, Kai Chen, Dahua Lin, Jiaqi Wang
Paper Code - -

Honors & Awards

  • National Scholarship × 2 2022, 2024
  • Xiaomi Special Scholarship (¥ 20,000) 2023
  • HIT Excellent Bachelor's Thesis 2025
  • Heilongjiang Provincial Excellent Graduate 2025
  • Heilongjiang Provincial Merit Student 2024
  • National 1st Prize, China Collegiate IoT Design Competition 2023
  • Meritorious Winner, Mathematical Contest in Modeling and Interdisciplinary Contest in Modeling (MCM/ICM) 2023

Academic Services

Conference Reviewer: CVPR 2026, ICML 2026, NeurIPS 2026