ViSTA-SLAM: Visual SLAM with Symmetric Two-view Association

1 TUM, 2 MCML, 3 ETH Zurich
By combining the proposed lightweight frontend Symmetric Two-view Association (STA) model with Sim(3) pose graph optimization and loop closuring as the backend, ViSTA-SLAM achieves high-quality reconstruction and accurate trajectory estimation on challenging scenes while running in real-time.

Abstract

We present ViSTA-SLAM as a real-time monocular visual SLAM system that operates without requiring camera intrinsics, making it broadly applicable across diverse camera setups. At its core, the system employs a lightweight symmetric two-view association (STA) model as the frontend, which simultaneously estimates relative camera poses and regresses local pointmaps from only two RGB images. This design reduces model complexity significantly, the size of our frontend is only 35% that of comparable state-of-the-art methods, while enhancing the quality of two-view constraints used in the pipeline. In the backend, we construct a specially designed Sim(3) pose graph that incorporates loop closures to address accumulated drift. Extensive experiments demonstrate that our approach achieves superior performance in both camera tracking and dense 3D reconstruction quality compared to current methods.

ViSTA-SLAM Overview

Given sequential video frames without intrinsics as the input, our frontend model takes in view pairs and predicts local pointmaps and relative poses within each pair. We then use the pair-wise predictions to construct a Sim(3) pose graph with loop closure and optimize it via Levenberg-Marquardt algorithm. The frontend model employs a fully symmetric design, making the model lightweight and supporting more flexible pose graph optimization. The blue edges in the pose graph and final results correspond connections between neighboring nodes (views), the orange edges correspond to loop closures, and the light blue frustums represent the estimated camera poses.




ViSTA-SLAM Qualitative Results

7scenes_office
7-Scenes office
7scenes_redkitchen
7-Scenes redkitchen
bundlefusion apt1
BundleFusion apt1
bundlefusion office0
BundleFusion office0
TUM-RGBD floor
TUM-RGBD floor
TUM-RGBD room
TUM-RGBD room

BibTeX


@misc{zhang2025vista,
      title={{ViSTA-SLAM}: Visual {SLAM} with Symmetric Two-view Association}, 
      author={Ganlin Zhang and Shenhan Qian and Xi Wang and Daniel Cremers},
      year={2025},
      eprint={2509.01584},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.01584}, 
}