Zhixiong Zhang

Biography

I am a 1^st year Ph.D. student at Shanghai Jiao Tong University and Shanghai Innovation Institute. Prior to this, I received my B.E. degree in Artificial Intelligence at Harbin Institute of Technology in 2025.

My current research focuses on advancing Fine-grained Perception capabilities of Large Vision-Language Models, with a particular emphasis on Spatial Intelligence, Visual Grounding, and Segmentation. I am also interested in embodied and interactive AI agents, especially their perception, reasoning, and action in open-world environments.

Please feel free to contact me if you're interested in relevant research or would like to discuss potential collaborations!

News

Jan 2026 SeC and GPT4Scene were accepted by ICLR 2026.
Oct 2025 SeC served as the foundational solution for the majority of teams in the 7th LSVOS Challenge.
May 2025 SongGen was accepted by ICML 2025.

Publications Full list on Google Scholar →

^* Equal contribution · ^‡ Project lead · ^† Corresponding author

(Co-)First Author Publications

	New! Arxiv 2026 SetCon: Towards Open-Ended Referring Segmentation via Set-Level Concept Prediction Zhixiong Zhang^, Yizhuo Li^, Shuangrui Ding^‡, Yuhang Zang^†, Shengyuan Ding, Long Xing, Yibin Wang, Qiaosheng Zhang, Jiaqi Wang^† Paper Code 🤗 Model 🤗 Dataset - -
	ICLR 2026 SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction Zhixiong Zhang^, Shuangrui Ding^, Xiaoyi Dong^†, Songxin He, Jianfan Lin, Junsong Tang, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang^† Paper Project Code 🤗 Model 🤗 Benchmark - -
	ICLR 2026 GPT4Scene: Understand 3D Scenes from Videos with Vision Language Models Zhangyang Qi^, Zhixiong Zhang^, Ye Fang, Jiaqi Wang^†, Hengshuang Zhao^† Paper Project Code 🤗 Dataset - -

Co-Author Publications

New!

Arxiv 2026 WildClawBench: An In-the-Wild Benchmark for AI Agents in the OpenClaw Environment

Shuangrui Ding^*, Xuanlang Dai^*, Long Xing^*, Shengyuan Ding, Ziyu Liu, Jingyi Yang, Penghui Yang, Zhixiong Zhang, Xilin Wei, Xinyu Fang, Yubo Ma, Haodong Duan, Jing Shao, Jiaqi Wang, Dahua Lin, Kai Chen, Yuhang Zang^†
Paper Project Code 🤗 Benchmark - -

New!

Arxiv 2026 LoMo: Local Modality Substitution for Deeper Vision-Language Fusion

Feng Han, Zhixiong Zhang, Zheming Liang, Yibin Wang, Jiaqi Wang^†
Paper Project Code 🤗 Checkpoints - -

ICML 2025 SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

Zihan Liu, Shuangrui Ding, Zhixiong Zhang, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang^†
Paper Project Code - -

Arxiv 2025 VLN-R1: Vision-Language Navigation via Reinforcement Fine-Tuning

Zhangyang Qi, Zhixiong Zhang, Yizhou Yu^†, Jiaqi Wang^†, Hengshuang Zhao^†
Paper Project Code - -

Arxiv 2025 CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning

Zeyi Sun, Yuhang Cao, Jianze Liang, Qiushi Sun, Ziyu Liu, Zhixiong Zhang, Yuhang Zang^†, Xiaoyi Dong, Kai Chen, Dahua Lin, Jiaqi Wang^†
Paper Code - -

Honors & Awards

National Scholarship × 2 2022, 2024
Xiaomi Special Scholarship (¥ 20,000) 2023
HIT Excellent Bachelor's Thesis 2025
Heilongjiang Provincial Excellent Graduate 2025
Heilongjiang Provincial Merit Student 2024
National 1^st Prize, China Collegiate IoT Design Competition 2023
Meritorious Winner, Mathematical Contest in Modeling and Interdisciplinary Contest in Modeling (MCM/ICM) 2023

Academic Services

Conference Reviewer: CVPR 2026, ICML 2026, NeurIPS 2026