About me
Hi, I’m Ganlin Zhang (张甘霖), a PhD student at Technical University of Munich, supervised by Prof. Daniel Cremers. Currently I focus on Visual SLAM, Structure from Motion and 3D reconstruction.
Previously, I received my Master’s degree in Computer Science from ETH Zurich, where I worked on 3D Vision research projects with Prof. Luc Van Gool and Prof. Marc Pollefeys. Before that, I obtained my Bachelor’s degree in Computer Science from ShanghaiTech University, supervised by Prof. Laurent Kneip. During my undergraduate studies, I also spent a year at UC Berkeley, as a visiting student.
Experience
News
- 01/2026NOVA3R has been accepted to ICLR 2026!
- 11/2025ViSTA-SLAM has been accepted to 3DV 2026!
- 11/2025SNI-SLAM++ has been accepted to T-PAMI!
- 09/2025BA-Track has been selected as Best Paper Candidate of ICCV 2025! Congrats to Weirong!
- 06/2025BA-Track has been accepted to ICCV 2025 and selected for an oral presentation!
- 08/2024I have joined TUM Computer Vision Group as a PhD student, supervised by Prof. Daniel Cremers!
- 03/2023My first first-author paper "Revisiting Rotation Averaging: Uncertainties and Robust Losses" has been accepted by CVPR 2023! Thanks to my advisors and coauthors Dr. Viktor Larsson and Dr. Dániel Béla Baráth for the huge amount of help. The code is already available in Github.
Publications

Flow4R is a feed-forward framework for dynamic 4D reconstruction and tracking from unposed image pairs. By modeling camera-space scene flow as a unified representation of geometry, object motion, and camera motion, it predicts 3D position and bidirectional motion in a single forward pass without explicit pose regression or bundle adjustment, achieving state-of-the-art accuracy and temporal consistency.

NOVA3R is a feed-forward method for non-pixel-aligned 3D reconstruction from unposed images that learns a global, view-agnostic scene representation via scene tokens and a diffusion-based 3D decoder, enabling complete and physically plausible geometry and outperforming state of the art in accuracy and completeness.

ViSTA-SLAM is a real-time monocular dense SLAM pipeline that combines a Symmetric Two-view Association (STA) frontend with Sim(3) pose graph optimization and loop closure, enabling accurate camera trajectories and high-quality 3D scene reconstruction from RGB inputs.

SNI-SLAM++ is a tightly coupled semantic SLAM system that achieves robust tracking and dense semantic mapping through hierarchical semantic encoding, cross-attention feature fusion, and a semantics-coupled tracking framework.

A method for consistent dynamic scene reconstruction via motion decoupling, bundle adjustment, and global refinement.

We use a keyframe based frame to frame tracker based on dense optical flow connected to a pose graph for global consistency. For dense mapping, we resort to a 3DGS representation, suitable for extracting both dense geometry and rendering from.

1. A monocular SLAM pipeline with deformable neural point cloud scene representation.
2. Novel DSPO layer for BA, which can jointly optimize depth map, depth scale, and camera pose.

1. Better model the underlying noise distributions by directly propagating the uncertainty from the point correspondences into the rotation averaging.
2. Integrate a variant of the MAGSAC++ loss into the rotation averaging, instead of using the classical robust losses.
Selected Projects

In this project, we design, implement and deploy a mixed-reality-based method with HoloLens 2 that enables users to control the Boston Dynamics Spot robot.


In this project, we present a sparse version of NICE-SLAM, which is a SLAM system incorporating the idea of Voxel Hashing into NICE-SLAM framework. Instead of initializing feature grids in the whole space, voxel features near the surface are adaptively added and optimized.

Focus on speeding up black-box optimization algorithm OPUS from paper Particle Swarm with Radial Basis Function Surrogates for Expensive Black-box Optimization by Rommel G. Regis.
Besides, we implement the speed-up C++ version of Bunch-Kaufman Pivoting.

Combining PSM Net, group-wise corr, dilatedResNet, semantic segmentation information to estimate accurate disparity of stereo image pairs efficiently.

We design a path-finding algorithm that could generate a path to draw a portrait/character in one stroke. Then we use our self-designed control system to draw this path. This project could be used with any arm-robot with at least 4 joints.
