Xiaoshuai Hao

Researcher at Xiaomi EV

I am currently a researcher at the Xiaomi EV, focusing on embodied multimodal large models. Previously, I received my Ph.D. from the Institute of Information Engineering at the Chinese Academy of Sciences, advised by Prof. Bo Li.

We have several academic visitor and intern positions at Xiaomi EV. We actively work on Embodied Foundation Model, 3D Foundation Model, Vision Language Action Model and Automatic Driving Perception. If you like what we do, don't hesitate to contact me.

Research Interests

Embodied Foundation Model
3D Foundation Model
Vision Language Action Model
Automatic Driving Perception
Multimodal Learning

News

[01/2026] - One paper was accepted by ICASSP 2026!
[12/2025] - TLA was accepted by Robot Learning！Congrats to Peng Hao!
[12/2025] - One paper was accepted by EAAI! Congrats to Zhihui Zhang!
[11/2025] - Groundbreaking Announcement! The MiMo-Embodied: X-Embodied Foundation Model Technical Report is now available!
[11/2025] - One paper was accepted by AAAI 2026! Congrats to Lingfeng Zhang!
[10/2025] - Exciting news! Our paper“RoboAfford++: A Generative AI-Enhanced Dataset for Multimodal Affordance Learning in Robotic Manipulation and Navigation,” has been awarded the Best Paper Award and Best Poster Award at the IROS 2025 RODGE Workshop. Congrats to Yingbo Tang!
[10/2025] - Our team achieved excellent results at the IROS 2025 RoboSense Challenge, placing second in Track #2: Social Navigation and third in Track #4: Cross-Modal Drone Navigation !
[10/2025] - Exciting news! Our paper“RPGCN: Relational Probabilistic Graphs forEEG-based Emotion Mining,” has been awarded the Best Special Session Paper Award at the ADMA 2025.
[09/2025] - Exciting news!One paper was accepted by International Journal of Robotics Research (IJRR)! Congrats to Dingzhe Li!
[09/2025] - Exciting news! We won first place in Live Broadcasting Video Quality Assessment Challenge, IEEE VCIP 2025. Congrats to Erjia Xiao and Lingfeng Zhang!
[09/2025] - Two papers have been accepted by NeurIPS 2025!
[09/2025] - One paper was accepted by Information Fusion! Congrats to Ye Ni!
[08/2025] - Exciting news! We won third place in ICCV EVQA-SnapUGC Challenge, with our model achieving the best single-modality performance. Congrats to Erjia Xiao and Lingfeng Zhang!
[08/2025] - Exciting news! Our paper, “Exploring Typographic Visual Prompts Injection Threats in Cross-Modality Generation Models,” has been awarded the Best Student Paper Award at the IJCAI 2025 Workshop and Challenge on Deepfake Detection, Localization, and Interpretability.
[08/2025] - One paper was accepted by ADMA 2025!
[08/2025] - One paper was accepted by CIKM 2025!
[08/2025] - One paper was accepted by Neurocomputing! Congrats to Cheng Shang!
[08/2025] - Four papers have been accepted by ACM Multimedia 2025 Dataset Track!
[07/2025] - DADA++ was accepted by ACM ToMM！
[07/2025] - One paper was accepted by Data Intelligence！ Congrats to Yang Lei!
[07/2025] - Exciting news! The RoboBrain 2.0 Technical Report is now available!
[07/2025] - One paper was accepted by ACM MM 2025！ Congrats to Yuting Zhao and Yuheng Ji!
[06/2025] - One paper was accepted by IJCAI 2025 Workshop on Deepfake Detection！
[06/2025] - Two papers have been accepted by ICCV 2025! Congrats to Zhihui and Yinuo!
[06/2025] - Two papers have been accepted by IROS 2025! Congrats to Yingbo and Shuaike!
[05/2025] - MapNav was accepted by ACL 2025！ Congrats to Lingfeng Zhang!
[05/2025] - One paper was accepted by TIP！
[05/2025] - SafeMap was accepted by ICML 2025.
[04/2025] - One paper was accepted by IJCAI 2025.
[04/2025] - One paper was accepted by ICMR 2025.
[04/2025] - BCTR was accepted by Information Fusion. Congrats to Peng Hao!
[03/2025] - Three papers have been accepted by ICME 2025!
[03/2025] - One paper was accepted by KBS! Congrats to Jinglin He!
[02/2025] - RoboBrain was accepted by CVPR 2025.
[02/2025] - MapFusion was accepted by Information Fusion.
[01/2025] - ESC-MISR has been selected as a candidate for the Best Paper Award at Multimedia Modelling 2025. Congrats to Zhihui Zhang!
[01/2025] - TASAR was accepted by ICLR 2025.
[12/2024] - KALAHash was accepted by AAAI 2025.
[09/2024] - MapBench was accepted by NeurIPs 2024.
[07/2024] - MapDistill was accepted by ECCV 2024.

Industrial Experience

Amazon Web Services

2021.09-2023.01

Mentor：Yi Zhu, Mu Li

Samsung Research China - Beijing (SRC-B）

2023.01-2024.09

Mentor：Hui Zhang, Weiming Li

Beijing Academy of Artificial Intelligence

2024.09-2025.08

Mentor：Zhongyuan Wang

Xiaomi EV

2025.08-至今

Mentor：Long Chen

Recent Publications

* equal contributions ‡ project lead § corresponding author

A Hierarchical Reinforcement Learning Framework for Multi-UAV Combat Using Leader-Follower Strategy

Jinhui Pang, Jinglin He, Noureldin Mohamed Abdelaal Ahmed Mohamed, Changqing Lin, Zhihui Zhang, Xiaoshuai Hao^§

Knowledge-based Systems (KBS), 2025

PDF

RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete

Yuheng Ji^*, Huajie Tan^*, Jiayu Shi*, Xiaoshuai Hao^{* ‡}, Yuan Zhang, et al.

Computer Vision and Pattern Recognition (CVPR), 2025

PDF | Home

TASAR: TRANSFER-BASED ATTACK ON SKELETAL ACTION RECOGNITION

Yunfeng Diao, Baiqi Wu^§, Ruixuan Zhang, Ajian Liu, Xiaoshuai Hao, Xingxing Wei, Meng Wang, He Wang^§

International Conference on Learning Representations (ICLR), 2025

PDF | Code

AS-GCL: Asymmetric Spectral Augmentation on Graph Contrastive Learning

Ruyue Liu, Rong Yin^§, Yong Liu, Xiaoshuai Hao, Haichao Shi, Can Ma, Weiping Wang

IEEE Transactions on Multimedia (TMM), 2025

PDF

MapFusion: A novel BEV feature fusion network for multi-modal map construction

Xiaoshuai Hao, Yunfeng Diao^§, Mengchuan Wei, Yifan Yang, Peng Hao, Rong Yin^§, Hui Zhang, et al.

Information Fusion, 2025

PDF

STViT+: improving self-supervised multi-camera depth estimation with spatial-temporal context and adversarial geometry regularization

Zhuo Chen^*, Haimei Zhao^*, Xiaoshuai Hao, Bo Yuan, Xiu Li

Applied Intelligence, 2025

PDF

Is Your HD MapConstructor Reliable under Sensor Corruptions?

Xiaoshuai Hao, Mengchuan Wei, Yifan Yang, Haimei Zhao, et al.

Conference on Neural Information Processing Systems (NeurIPS), 2024

PDF | Home

MapDistill: Boosting Efficient Camera-based HD Map Construction via Camera-LiDAR Fusion Model Distillation

Xiaoshuai Hao^*, Ruikai Li^*, Hui Zhang, Dingzhe Li, Rong Yin, et al.

European Conferenceon Computer Vision (ECCV), 2024

PDF

KALAHash: Knowledge-Anchored Low-Resource Adaptation for Deep Hashing

Shu Zhao, Tan Yu, Xiaoshuai Hao, Wenchao Ma, Vijaykrishnan Narayanan

Association for the Advancement of Artificial Intelligence (AAAI), 2025

PDF | Code

FTF-ER: Feature-Topology Fusion-Based Experience Replay Method for Continual Graph Learning

Jinhui Pang, Changqing Lin, Xiaoshuai Hao^§, Rong Yin, Zixuan Wang, et al.

ACM Multimedia (MM), 2024

PDF | Code

MBFusion: A New Multi-modal BEV Feature Fusion Method for HD Map Construction

Xiaoshuai Hao, Hui Zhang, Yifan Yang, Yi Zhou, Sangil Jung, et al.

IEEE International Conference on Robotics and Automation (ICRA), 2024

PDF

CUSTOMIZED TREATMENT PER PIXEL FOR BLIND IMAGE SUPER-RESOLUTION

Guanqun Liu, Xiaoshuai Hao^§

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024

PDF

Enhancing 3D Hand Pose Estimation via Dense Ordinal Regression Network

Yamin Mao, Zhihua Liu, Weiming Li, SoonYong Cho, Qiang Wang, Xiaoshuai Hao^§

British Machine Vision Conference (BMVC), 2024

PDF

ESC-MISR: Enhancing Spatial Correlations for Multi-Image Super-Resolution in Remote Sensing

Zhihui Zhang, Jinhui Pang, Jianan Li, Xiaoshuai Hao

International Conference on MultiMedia Modeling (MMM), 2024

Best Paper Candidate Award

PDF

Dual Alignment Unsupervised Domain Adaptation for Video-Text Retrieval

Xiaoshuai Hao, Wanqian Zhang^§, Dayan Wu, Fei Zhu, Bo Li

Computer Vision and Pattern Recognition (CVPR), 2023

PDF

Uncertainty-Aware Alignment Network for Cross-Domain Video-Text Retrieval

Xiaoshuai Hao, Wanqian Zhang^§

Conference on Neural Information Processing Systems (NeurIPS), 2023

PDF

MixGen: A NewMulti-Modal Data Augmentation

Xiaoshuai Hao^*, Yi Zhu^*, Srikar Appalaraju^*, Aston Zhang, Wanqian Zhang, Bo Li, Mu Li

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023

PDF

LISTEN AND LOOK: MULTI-MODAL AGGREGATION AND CO-ATTENTION NETWORK FOR VIDEO-AUDIO RETRIEVAL

Xiaoshuai Hao, Wanqian Zhang^§, Dayan Wu, Fei Zhu, Bo Li

IEEE International Conference on Multimedia & Expo (ICME), 2022

PDF

Multi-Feature Graph Attention Network for Cross-Modal Video-Text Retrieval

Xiaoshuai Hao, Yucan Zhou^§, Dayan Wu, Wanqian Zhang, Bo Li, Weiping Wang

International Conference on Multimedia Retrieval (ICMR), 2021

PDF

WHAT MATTERS: ATTENTIVE AND RELATIONAL FEATURE AGGREGATION NETWORK FOR VIDEO-TEXT RETRIEVAL

Xiaoshuai Hao, Yucan Zhou^§, Dayan Wu, Wanqian Zhang, Bo Li, Weiping Wang, Dan Meng

IEEE International Conference on Multimedia & Expo (ICME), 2021

PDF

Unpublished Manuscript

* equal contributions ‡ project lead § corresponding author

TLA: Tactile-Language-Action Model for Contact-Rich Manipulation

Peng Hao^*, Chaofan Zhang^*, Dingzhe Li, Xiaoge Cao, Xiaoshuai Hao, Shaowei Cui, Shuo Wang

arXiv

PDF

AffordGrasp: In-Context Affordance Reasoning for Open-Vocabulary Task-Oriented Grasping in Clutter

Yingbo Tang, Shuaike Zhang, Xiaoshuai Hao^{§ ‡}, Pengwei Wang, Jianlong Wu, Zhongyuan Wang, Shanghang Zhang

arXiv

PDF

Enhancing Adversarial Robustness of Vision-Language Models through Low-Rank Adaptation

Yuheng Ji^*, Yue Liu^*, Zhicheng Zhang, Zhao Zhang, Yuting Zhao, Xiaoshuai Hao, Gang Zhou, Xingwei Zhang, Xiaolong Zheng^§

arXiv

PDF

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey

Dingzhe Li, Yixiang Jin, YuHao Sun, Yong A, Hongze Yu, Jun Shi, Xiaoshuai Hao, et al.

arXiv

PDF

BCTR: Bidirectional Conditioning Transformer for Scene Graph Generation

Peng Hao, Xiaobing Wang, Yingying Jiang, Hanchao Jia, Xiaoshuai Hao^§

arXiv

PDF

Communication-Efficient Personalized Federal Graph Learning via Low-Rank Decomposition

Ruyue Liu, Rong Yin^§, Xiangzhen Bo, Xiaoshuai Hao, Xingrui Zhou, Yong Liu, Can Ma, Weiping Wang

arXiv

PDF

DWCL: Dual-Weighted Contrastive Learning for Multi-View Clustering

Hanning Yuan^*, Zhihui Zhang^*, Lianhua Chi, Qi Guo, Sijie Ruan, Jinhui Pang^§, Xiaoshuai Hao^§

arXiv

PDF

MapNav: ANovel Memory Representation via Annotated Semantic Maps for VLM-based Vision-and-Language Navigation

Lingfeng Zhang^*, Xiaoshuai Hao^{* ‡}, Qinwen Xu, Qiang Zhang, Xinyao Zhang, Pengwei Wang, Jing Zhang, Zhongyuan Wang, Shanghang Zhang^§, Renjing Xu^§

arXiv

PDF

MSC-Bench: Benchmarking and Analyzing Multi-Sensor Corruption for Driving Perception

Xiaoshuai Hao, Guanqun Liu, Yuting Zhao, Yuheng Ji, Mengchuan Wei, Haimei Zhao, Lingdong Kong, Rong Yin, Yu Liu

arXiv

PDF | Home

International Competition

EPIC-Kitchens Dataset Challenges
Multi-Instance Action Retrieval Track 2021

Xiaoshuai Hao, Wangqian Zhang, Dejie Yang, Shu Zhao, Dayan Wu, Bo Li, Weiping Wang

First Place

IEEE/CVF Computer Vision and Pattern Recognition (CVPR)

EPIC-Kitchens Dataset Challenges
Interaction Recognition Track 2023

Yuqi Li, Yizhi Luo, Xiaoshuai Hao, Chuanguang Yang, Zhulin An, Dantong Song, Wei Yi

Third Place Award

IEEE/CVF Computer Vision and Pattern Recognition (CVPR)

EPIC-Kitchens Dataset Challenges
Multi-Instance Action Retrieval Track 2022

Xiaoshuai Hao, Yufan Liu, Wangqian Zhang, Dayan Wu, Bo Li

Third Place Award (Joint)

IEEE/CVF Computer Vision and Pattern Recognition (CVPR)

The RoboDrive Challenge
Track 2: Robust Map Segmentation

Xiaoshuai Hao, Yifan Yang, Hui Zhang, Mengchuan Wei, Yi Zhou, Haimei Zhao, Jing Zhang

The Innovative Solution (Honorable Mention)

IEEE Conference on Robotics and Automation (ICRA)

The RoboDrive Challenge
Track 2: Robust Map Segmentation

Xiaoshuai Hao, Yifan Yang, Hui Zhang, Mengchuan Wei, Yi Zhou, Haimei Zhao, Jing Zhang

The 3rd place in the category

IEEE Conference on Robotics and Automation (ICRA)

A Challenge for Out-of-Distribution Generalization in Computer Vision (OOD-CV)
OOD-CV: Classification Track (Self-Supervised)

Yuqi Li, Yizhi Luo, Chuangang Yang, Zhulin An, Xiaoshuai Hao, Yihang Zhou

Third Place

IEEE/CVF International Conference on Computer Vision (ICCV)

Academic Services

Conference Reviewer

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
IEEE/CVF International Conference on Computer Vision (ICCV)
European Conference on Computer Vision (ECCV)
Conference on Neural Information Processing Systems (NeurIPS)
International Conference on Learning Representations (ICLR)
International Conference on Machine Learning (ICML)
Association for the Advancement of Artificial Intelligence (AAAI)
IEEE International Conference on Robotics and Automation (ICRA)
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Journal Reviewer

International Journal of Computer Vision (IJCV)
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
IEEE Transactions on Intelligent Vehicles (TIV)
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)
IEEE Transactions on Multimedia (TMM)
IEEE Robotics and Automation Letters (RA-L)