Lian Shijie (连仕杰)

Google Scholar  /  Github

Shijie Lian is a Ph.D. student at the School of Computer Science & Technology, Huazhong University of Science and Technology (HUST), under the supervision of Prof. Laurence T. Yang. He is also one of the inaugural Ph.D. students at Beijing Zhongguancun Institute. He has focused on research in Vision-Language Models (VLMs), spatial intelligence, embodied general intelligence, and computer vision.

profile photo
Recent News

  • [03/2026] I appeared on CCTV 13 news report "迈向'十五五'美丽图景·一线见闻".
  • [03/2026] Thanks to Synced (机器之心) for covering our work PhysBrain, TwinBrainVLA, LangForce: wechat article.
  • [02/2026] Thanks to 具身纪元 for covering our work LangForce: wechat article.
  • [01/2026] Thanks to 具身智能之心 for covering our work TwinBrainVLA: wechat article.
  • [10/2025] Thanks to Synced (机器之心) for covering our work Euclid's Gift: wechat article / zhihu.
  • [09/2025] We release our paper Euclid's Gift in arXiv and Euclid30K dataset in huggingface.
  • [06/2025] I was interviewed by Xinhua News Agency and mentioned in their report.
Publications ( * denotes contribution equally )

LangForce: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries

Shijie Lian*,  Bin Yu*,  Xiaopeng Lin*,  Laurence T Yang,  Zhaolong Shen,  Changti Wu,  Yuzhuo Miao,  Cong Huang,  Kai Chen

arXiv, 2026      

Paper  |  Code  |  starVLA Integration  |  alphaXiv  |  BibTex

We propose LangForce, a novel framework that enforces instruction following via Bayesian decomposition. By introducing learnable Latent Action Queries, we address the Information Collapse problem in VLA models.
TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers

Bin Yu*,  Shijie Lian*,  Xiaopeng Lin*,  Yuliang Wei,  Zhaolong Shen,  Changti Wu,  Yuzhuo Miao,  Xinming Wang,  Bailing Wang,  Cong Huang,  Kai Chen

arXiv, 2026      

Paper  |  Code  |  BibTex

We introduce TwinBrainVLA, a novel architecture that coordinates a generalist VLM and a specialist VLM for joint robotic control via Asymmetric Mixture-of-Transformers (AsyMoT), achieving superior manipulation performance while preserving comprehensive visual understanding capabilities.
PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence

Xiaopeng Lin*,  Shijie Lian*,  Bin Yu*,  Ruoqi Yang,  Changti Wu,  Yuzhuo Miao,  Yurun Jin,  Yukun Shi,  Cong Huang,  Bojun Cheng,  Kai Chen

arXiv, 2025      

Paper  |  Code  |  Project Page  |  BibTex

We propose an Egocentric2Embodiment translation pipeline that transforms first-person videos into multi-level, schema-driven VQA supervision, enabling the construction of the E2E-3M dataset at scale. PhysBrain exhibits substantially improved egocentric understanding and enables more sample-efficient VLA fine-tuning.
Euclid's Gift: Enhancing Spatial Perception and Reasoning in Vision-Language Models via Geometric Surrogate Tasks

Shijie Lian*,  Changti Wu*,   Laurence Tianruo Yang,   Hang Yuan,   Bin Yu,   Lei Zhang,   Kai Chen

CVPR 2026 Findings      

Paper  |  Code  |  Dataset (Euclid30K)  |  Project Page  |  BibTex

We propose solving Euclidean geometry problems as a surrogate task and construct Euclid30K, a dataset of roughly 30K 2D and 3D geometry questions.
TUGS: Physics-based Compact Representation of Underwater Scenes by Tensorized Gaussians

Shijie Lian*,  Ziyi Zhang*,   Laurence Tianruo Yang,   Mengyu Ren,   Debin Liu,   Hua Li

ICME 2026 (Spotlight)      

Paper  |  Code  |  Project Page  |  BibTex

We propose Tensorized Underwater Gaussian Splatting (TUGS), which can effectively solve the modeling challenges of the complex interactions between object geometries and water media while achieving significant parameter reduction.
WaterFlow: Explicit Physics-Prior Rectified Flow for Underwater Saliency Mask Generation

Runting Li*,  Shijie Lian*,   Hua Li,   Yutong Li,   Wenhui Wu,   Sam Kwong

ICASSP, 2026      

Paper  |  Code  |  BibTex

We propose WaterFlow, a rectified flow-based framework for underwater salient object detection that innovatively incorporates underwater physical imaging information as explicit priors directly into the network training process.
Advancing Marine Research: UWSAM Framework and UIIS10K Dataset for Precise Underwater Instance Segmentation

Hua Li*,  Shijie Lian*,   Zhiyuan Li,   Runmin Cong,   Chongyi Li,   Laurence T. Yang,   Weidong Zhang,   Sam Kwong

arXiv, 2025      

Paper  |  Code  |  Dataset (UIIS10K)  |  BibTex

We propose a large-scale underwater instance segmentation dataset, UIIS10K, which includes 10,048 images with pixel-level annotations for 10 categories. Then, we introduce UWSAM, an efficient model designed for automatic and accurate segmentation of underwater instances.
TMANet: Triple Multi-Scale Attention based Network with Boundary Association Loss for Superpixel Segmentation

Ziyi Zhang*,  Shijie Lian*,   Hua Li

ICASSP, 2025

Paper  |  BibTex

We propose a Triple Multi-Scale Attention based Network for superpixel segmentation with Boundary Association loss to obtain fine boundaries and contours.
Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset

Shijie Lian*,  Ziyi Zhang*,   Hua Li,   Wenjie Li,    Laurence Tianruo Yang,   Sam Kwong,   Runmin Cong

ICML, 2024      

Paper  |  Code  |  Dataset (USIS10K Dataset)  |  BibTex

We apply SAM to underwater salient instance segmentation (USIS), aiming to improve the segmentation accuracy in complex underwater scenes. We also present the largest existing USIS dataset, which has per-pixel labeling of 10,632 images.
WaterMask: Instance Segmentation for Underwater Imagery

Shijie Lian,   Hua Li,   Runmin Cong,   Suqi Li,   Wei Zhang,   Sam Kwong

ICCV, 2023      

Paper  |  Code  |  Dataset (UIIS Dataset)  |  BibTex

In this work, we present the first generalized underwater image instance segmentation dataset containing 4628 images with pixel-level annotations.
DSMISR: Differential Siamese Multi-scale Attention Network for Iris Image Super Resolution

Jin Hao,   Shijie Lian,   Suqi Li,   Hua Li

UIC, 2022

Paper  |  BibTex

In which we propose a super resolution neural network for Iris super resolution.
Honors & Awards
  • [09/2023] I was awarded the Chinese National Scholarship.