About me

Hi, I’m Ganlin Zhang (张甘霖), a PhD student at Technical University of Munich, supervised by Prof. Daniel Cremers. Currently I focus on Visual SLAM, Structure from Motion and 3D reconstruction.

Previously, I received my Master’s degree in Computer Science from ETH Zurich, where I worked on 3D Vision research projects with Prof. Luc Van Gool and Prof. Marc Pollefeys. Before that, I obtained my Bachelor’s degree in Computer Science from ShanghaiTech University, supervised by Prof. Laurent Kneip. During my undergraduate studies, I also spent a year at UC Berkeley, as a visiting student.

I’m at Google Zurich as a Student Researcher this summer.

Experience

Student Researcher Google · Summer 2026

PhD (ongoing), Computer Science TU Munich

MSc, Computer Science ETH Zurich

BEng, Computer Science ShanghaiTech University

Visiting Student, EECS UC Berkeley

News

06/2026I have joined Google as a Student Researcher Intern!
06/2026Flow4R has been accepted to ECCV 2026!
01/2026NOVA3R has been accepted to ICLR 2026!
11/2025ViSTA-SLAM has been accepted to 3DV 2026!
11/2025SNI-SLAM++ has been accepted to T-PAMI!
09/2025BA-Track has been selected as Best Paper Candidate of ICCV 2025! Congrats to Weirong!
06/2025BA-Track has been accepted to ICCV 2025 and selected for an oral presentation!
08/2024I have joined TUM Computer Vision Group as a PhD student, supervised by Prof. Daniel Cremers!
03/2023My first first-author paper "Revisiting Rotation Averaging: Uncertainties and Robust Losses" has been accepted by CVPR 2023! Thanks to my advisors and coauthors Dr. Viktor Larsson and Dr. Dániel Béla Baráth for the huge amount of help. The code is already available in Github.

Publications

BA-T: An Iterative Transformer for Two-View Bundle Adjustment

Ganlin Zhang, Weirong Chen, Daniel Cremers, Xi Wang

arXiv 2026 arXiv Code (coming soon)

BA-T is an iterative Transformer for two-view bundle adjustment that implements BA-style structured updates as a single lightweight, repeatable layer in implicit token space. Rather than relying on deep attention stacks, it refines poses and local geometry from latent residuals across iterations, achieving stronger cross-view consistency and matching or surpassing much larger models while using only 16% of their decoder parameters.

Flow4R: Unifying 4D Reconstruction and Tracking with Scene Flow

Shenhan Qian, Ganlin Zhang, Shangzhe Wu, Daniel Cremers

ECCV 2026 Project arXiv

Flow4R is a feed-forward framework for dynamic 4D reconstruction and tracking from unposed image pairs. By modeling camera-space scene flow as a unified representation of geometry, object motion, and camera motion, it predicts 3D position and bidirectional motion in a single forward pass without explicit pose regression or bundle adjustment, achieving state-of-the-art accuracy and temporal consistency.

NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction

Weirong Chen, Chuanxia Zheng, Ganlin Zhang, Andrea Vedaldi, Daniel Cremers

ICLR 2026 Project Paper Code

NOVA3R is a feed-forward method for non-pixel-aligned 3D reconstruction from unposed images that learns a global, view-agnostic scene representation via scene tokens and a diffusion-based 3D decoder, enabling complete and physically plausible geometry and outperforming state of the art in accuracy and completeness.

ViSTA-SLAM: Visual SLAM with Symmetric Two-view Association

Ganlin Zhang, Shenhan Qian, Xi Wang, Daniel Cremers

3DV 2026 Project arXiv Code

ViSTA-SLAM is a real-time monocular dense SLAM pipeline that combines a Symmetric Two-view Association (STA) frontend with Sim(3) pose graph optimization and loop closure, enabling accurate camera trajectories and high-quality 3D scene reconstruction from RGB inputs.

SNI-SLAM++: Tightly-coupled Semantic Neural Implicit SLAM

Siting Zhu, Guangming Wang, Hermann Blum, Zhong Wang, Ganlin Zhang, Daniel Cremers, Marc Pollefeys, Hesheng Wang

TPAMI 2025 Project Paper

SNI-SLAM++ is a tightly coupled semantic SLAM system that achieves robust tracking and dense semantic mapping through hierarchical semantic encoding, cross-attention feature fusion, and a semantics-coupled tracking framework.

Back on Track: Bundle Adjustment for Dynamic Scene Reconstruction

Weirong Chen, Ganlin Zhang, Felix Wimbauer, Rui Wang, Nikita Araslanov, Andrea Vedaldi, Daniel Cremers

ICCV 2025 Best Paper Candidate Project arXiv Code

A method for consistent dynamic scene reconstruction via motion decoupling, bundle adjustment, and global refinement.

Splat-SLAM: Globally Optimized RGB-only SLAM with 3D Gaussians

Erik Sandström*, Ganlin Zhang*, Keisuke Tateno, Michael Oechsle, Youmin Zhang, Manthan Patel, Luc Van Gool, Martin R. Oswald, Federico Tombari

CVPRW 2025 Paper Code

We use a keyframe based frame to frame tracker based on dense optical flow connected to a pose graph for global consistency. For dense mapping, we resort to a 3DGS representation, suitable for extracting both dense geometry and rendering from.

GlORIE-SLAM: Globally Optimized RGB-only Implicit Encoding Point Cloud SLAM

Ganlin Zhang*, Erik Sandström*, Youmin Zhang, Manthan Patel, Luc Van Gool, Martin R. Oswald

arXiv 2024 Project arXiv Code

1. A monocular SLAM pipeline with deformable neural point cloud scene representation.
2. Novel DSPO layer for BA, which can jointly optimize depth map, depth scale, and camera pose.

Revisiting Rotation Averaging: Uncertainties and Robust Losses

Ganlin Zhang, Viktor Larsson, Daniel Barath

CVPR 2023 arXiv Code

1. Better model the underlying noise distributions by directly propagating the uncertainty from the point correspondences into the rotation averaging.
2. Integrate a variant of the MAGSAC++ loss into the rotation averaging, instead of using the classical robust losses.

Selected Projects

EgoSpot: Egocentric Multimodal Control for Hands-Free Mobile Manipulation

Ganlin Zhang*, Deheng Zhang*, Longteng Duan*, Guo Han*

Course project of Mixed Reality 2022 in ETH Zurich

ICRA 2026 Workshop Project Code

In this project, we design, implement and deploy a mixed-reality-based method with HoloLens 2 that enables users to control the Boston Dynamics Spot robot.

NICE-SLAM with Adaptive Feature Grids

Ganlin Zhang, Deheng Zhang, Feichi Lu, Anqi Li

Course project of 3D Vision 2022 in ETH Zurich

Code

In this project, we present a sparse version of NICE-SLAM, which is a SLAM system incorporating the idea of Voxel Hashing into NICE-SLAM framework. Instead of initializing feature grids in the whole space, voxel features near the surface are adaptively added and optimized.

Optimization by Particle Swarm Using Surrogates via Bunch-Kaufman Pivoting and Standard Optimization

Ganlin Zhang*, Deheng Zhang*, Junpeng Gao*, Yu Hong*

Course project of Advanced System Lab 2022 in ETH Zurich

Code

Focus on speeding up black-box optimization algorithm OPUS from paper Particle Swarm with Radial Basis Function Surrogates for Expensive Black-box Optimization by Rommel G. Regis.
Besides, we implement the speed-up C++ version of Bunch-Kaufman Pivoting.

Improved PSMNet for Deep Stereo Disparity Estimation

Ganlin Zhang*, Haokai Pang*, Xinyu Shen*, Yunying Zhu*

Course project of Deep Learning 2021 in ETH Zurich

Code

Combining PSM Net, group-wise corr, dilatedResNet, semantic segmentation information to estimate accurate disparity of stereo image pairs efficiently.

Robot Art: Using Robot Arm to Draw Pictures

Ganlin Zhang*, Teng Xu*, Weijie Lyu*, Zhenzhong Tang*, Ziyuan Hu*

Course project of Introduction to Robotics 2019 in UC Berkeley

Project Code

We design a path-finding algorithm that could generate a path to draw a portrait/character in one stroke. Then we use our self-designed control system to draw this path. This project could be used with any arm-robot with at least 4 joints.

Teaching

Practical Course: Deep Learning for Spatial AI

2026 Summer 2025 Summer

Master Seminar — Modern Methods for 3D Representation and Reconstruction

2025/26 Winter

Master Seminar — 3D Vision Foundation Models

2025/26 Winter