Shijie Lian

Lian Shijie (连仕杰)

Shijie Lian is a Ph.D. student at the School of Computer Science & Technology, Huazhong University of Science and Technology (HUST), under the supervision of Prof. Laurence Tianruo Yang. He is also one of the inaugural Ph.D. students at Beijing Zhongguancun Academy, where he works under the leadership of Prof. Kai Chen. He has focused on research in Vision-Language Models, Spatial Intelligence, Embodied General Intelligence, and Computer Vision.

Recent News

[06/2026] I appeared on CCTV 13 Xinwen Lianbo (新闻联播) news report, at video timestamps 17:24-20:40. view images
[05/2026] Our work LangForce has been accepted to ICML 2026 and featured by Xinzhiyuan. Check out the WeChat article and Tencent News report.
[04/2026] Our work PhysBrain was cited by \(\pi_{0.7}\) on 16 Apr 2026. See the Google Scholar record.
[04/2026] We released the PhysBrain 1.0 series (PhysBrain Data + TwinBrainVLA Architecture + LangForce Algorithm) at the Zhongguancun Forum. Check out our WeChat article and Project Page. See our agenda on Mar 27 and Mar 28.
[03/2026] I appeared on CCTV 13 news report "迈向'十五五'美丽图景·一线见闻". view images
[03/2026] Thanks to Synced (机器之心) for covering our work PhysBrain, TwinBrainVLA, LangForce: wechat article.
[02/2026] Thanks to 具身纪元 for covering our work LangForce: wechat article.
[01/2026] Thanks to 具身智能之心 for covering our work TwinBrainVLA: wechat article.
[10/2025] Thanks to Synced (机器之心) for covering our work Euclid's Gift: wechat article / zhihu.
[09/2025] We release our paper Euclid's Gift in arXiv and Euclid30K dataset in huggingface.
[06/2025] I was interviewed by Xinhua News Agency and mentioned in their report. view images

Publications ( * denotes contribution equally )

	PhysBrain 1.0 Technical Report Shijie Lian, Bin Yu, Xiaopeng Lin, Changti Wu, Hang Yuan, Xiaolin Hu, Zhaolong Shen, Yuzhuo Miao, Haishan Liu, Yuxuan Tian, Yukun Shi, Cong Huang, Kai Chen arXiv, 2026 Paper \| BibTex PhysBrain 1.0 converts large-scale human egocentric video into structured physical commonsense supervision, then transfers these physical priors to VLA policies through capability-preserving, language-sensitive adaptation.
	StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing starVLA Community (Shijie Lian is the 3rd Contributor among the Community Contributors) arXiv, 2026 Paper \| Code \| BibTex A Lego-like codebase for developing Vision-Language-Action (VLA) models, providing modular and extensible components for robotics research.
	3D-Mix for VLA: A Plug-and-Play Module for Integrating VGGT-based 3D Information into Vision-Language-Action Models Bin Yu, Shijie Lian*, Xiaopeng Lin, Zhaolong Shen, Yuliang Wei, Haishan Liu, Changti Wu, Hang Yuan, Bailing Wang, Cong Huang, Kai Chen arXiv, 2026* Paper \| BibTex We present 3D-Mix, a plug-and-play module that integrates VGGT-based 3D information into diverse VLA architectures. Experiments across six MLLM series on SIMPLER and LIBERO show that 3D-Mix delivers consistent performance gains, averaging +7.0% on the OOD SIMPLER benchmark.
	LangForce: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries Shijie Lian, Bin Yu, Xiaopeng Lin, Laurence Tianruo Yang, Zhaolong Shen, Changti Wu, Yuzhuo Miao, Cong Huang, Kai Chen ICML 2026* Paper \| Code \| starVLA Integration \| alphaXiv \| BibTex We propose LangForce, a novel framework that enforces instruction following via Bayesian decomposition. By introducing learnable Latent Action Queries, we address the Information Collapse problem in VLA models.
	TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers Bin Yu, Shijie Lian*, Xiaopeng Lin, Yuliang Wei, Zhaolong Shen, Changti Wu, Yuzhuo Miao, Xinming Wang, Bailing Wang, Cong Huang, Kai Chen arXiv, 2026 Paper \| Code \| BibTex We introduce TwinBrainVLA, a novel architecture that coordinates a generalist VLM and a specialist VLM for joint robotic control via Asymmetric Mixture-of-Transformers (AsyMoT), achieving superior manipulation performance while preserving comprehensive visual understanding capabilities.
	PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence Xiaopeng Lin, Shijie Lian*, Bin Yu, Ruoqi Yang, Changti Wu, Yuzhuo Miao, Yurun Jin, Yukun Shi, Cong Huang, Bojun Cheng, Kai Chen arXiv, 2025 Paper \| Code \| Project Page \| BibTex We propose an Egocentric2Embodiment translation pipeline that transforms first-person videos into multi-level VQA supervision, enabling the construction of the E2E-3M dataset at scale. PhysBrain exhibits substantially improved egocentric understanding and enables more sample-efficient VLA fine-tuning.
	Euclid's Gift: Enhancing Spatial Perception and Reasoning in Vision-Language Models via Geometric Surrogate Tasks Shijie Lian, Changti Wu, Laurence Tianruo Yang, Hang Yuan, Bin Yu, Lei Zhang, Kai Chen CVPR 2026 Findings Paper \| Code \| Dataset (Euclid30K) \| Project Page \| BibTex We propose solving Euclidean geometry problems as a surrogate task and construct Euclid30K, a dataset of roughly 30K 2D and 3D geometry questions.
	TUGS: Physics-based Compact Representation of Underwater Scenes by Tensorized Gaussians Shijie Lian, Ziyi Zhang, Laurence Tianruo Yang, Mengyu Ren, Debin Liu, Hua Li ICME 2026 (Spotlight) Paper \| Code \| Project Page \| BibTex We propose TUGS, which can effectively solve the modeling challenges of the complex interactions between object geometries and water media while achieving significant parameter reduction.
	WaterFlow: Explicit Physics-Prior Rectified Flow for Underwater Saliency Mask Generation Runting Li, Shijie Lian*, Hua Li, Yutong Li, Wenhui Wu, Sam Kwong ICASSP, 2026* Paper \| Code \| BibTex We propose WaterFlow, a rectified flow-based framework for underwater salient object detection that innovatively incorporates underwater physical imaging information as explicit priors directly into the network training process.
	Advancing Marine Research: UWSAM Framework and UIIS10K Dataset for Precise Underwater Instance Segmentation Hua Li, Shijie Lian*, Zhiyuan Li, Runmin Cong, Chongyi Li, Laurence Tianruo Yang, Weidong Zhang, Sam Kwong arXiv, 2025* Paper \| Code \| Dataset (UIIS10K) \| BibTex We propose UIIS10K, which includes 10,048 underwater images with pixel-level annotations for 10 categories. Then, we introduce UWSAM, an efficient model designed for automatic and accurate segmentation of underwater instances.
	TMANet: Triple Multi-Scale Attention based Network with Boundary Association Loss for Superpixel Segmentation Ziyi Zhang, Shijie Lian, Hua Li ICASSP, 2025 Paper \| BibTex We propose a Triple Multi-Scale Attention based Network for superpixel segmentation with Boundary Association loss to obtain fine boundaries and contours.
	Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset Shijie Lian, Ziyi Zhang, Hua Li, Wenjie Li, Laurence Tianruo Yang, Sam Kwong, Runmin Cong ICML, 2024 Paper \| Code \| Dataset (USIS10K Dataset) \| BibTex We apply SAM to underwater salient instance segmentation (USIS), aiming to improve the segmentation accuracy in complex underwater scenes. We also present the largest existing USIS dataset, which has per-pixel labeling of 10,632 images.
	WaterMask: Instance Segmentation for Underwater Imagery Shijie Lian, Hua Li, Runmin Cong, Suqi Li, Wei Zhang, Sam Kwong ICCV, 2023 Paper \| Code \| Dataset (UIIS Dataset) \| BibTex In this work, we present the first generalized underwater image instance segmentation dataset containing 4628 images with pixel-level annotations.
	DSMISR: Differential Siamese Multi-scale Attention Network for Iris Image Super Resolution Jin Hao, Shijie Lian, Suqi Li, Hua Li UIC, 2022 Paper \| BibTex In which we propose a super resolution neural network for Iris super resolution.

Honors & Awards

[09/2023] I was awarded the Chinese National Scholarship.