ParticleSfM

Exploiting Dense Point Trajectories for Localizing Moving Cameras in the Wild

ECCV 2022

1Tsinghua University   2ETH Zurich   3ByteDance Inc.   4Texas A&M University

ParticleSfM provides reliable dense geometry and camera localization on dynamic scenes (from DAVIS).

Video


Abstract

TL;DR: We present ParticleSfM, a structure-from-motion system for videos based on dense point trajectories, that generalizes well to in-the-wild sequences with complex foreground motion.

Estimating the pose of a moving camera from monocular video is a challenging problem, especially due to the presence of moving objects in dynamic environments, where the performance of existing camera pose estimation methods are susceptible to pixels that are not geometrically consistent. To tackle this challenge, we present a robust dense indirect structure-from-motion method for videos that is based on dense correspondence initialized from pairwise optical flow. Our key idea is to optimize long-range video correspondence as dense point trajectories and use it to learn robust estimation of motion segmentation. A novel neural network architecture is proposed for processing irregular point trajectory data. Camera poses are then estimated and optimized with global bundle adjustment over the portion of long-range point trajectories that are classified as static. Experiments on MPI Sintel dataset show that our system produces significantly more accurate camera trajectories compared to existing state-of-the-art methods. In addition, our method is able to retain reasonable accuracy of camera poses on fully static scenes, which consistently outperforms strong state-of-the-art dense correspondence based methods with end-to-end deep learning, demonstrating the potential of dense indirect methods based on optical flow and point trajectories. As the point trajectory representation is general, we further present results and comparisons on in-the-wild monocular videos with complex motion of dynamic objects.


Method



Given an input video, we first accumulate and optimize over pairwise optical flow to acquire high-quality dense point trajectories. Then, a specially designed network architecture is employed to process irregular point trajectory data to predict per-trajectory motion labels. Finally, the optimized dense point trajectories along with the motion labels are exploited for global bundle adjustment (BA) to optimize the final camera poses.


Additional Results

In-the-wild Sequences
motion segmentation (0 - 34s) ; structure-from-motion (34s - end).

 

 

Sintel Dataset

Sample frames

Tartan-VO

DROID-SLAM

COLMAP

ParticleSfM

 

 

ScanNet Dataset

Sample frames

Tartan-VO

DROID-SLAM

COLMAP

ParticleSfM



More Related Projects

DynaSLAM. Bescos et al. DynaSLAM: Tracking, Mapping and Inpainting in Dynamic Scenes. IROS 2018.

TrianFlow. Zhao et al. Towards Better Generalization: Joint Depth-Pose Learning without PoseNet. CVPR 2020.

VOLDOR. Min et al. VOLDOR-SLAM: For the times when feature-based or direct methods are not good enough. ICRA 2021.

DROID-SLAM. Teed et al. DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras. NeurIPS 2021.

BibTeX

@inproceedings{zhao2022particlesfm,
      author    = {Zhao, Wang and Liu, Shaohui and Guo, Hengkai and Wang, Wenping and Liu, Yong-Jin},
      title     = {ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving Cameras in the Wild},
      booktitle = {European conference on computer vision (ECCV)},
      year      = {2022}
  }

Acknowledgements

We thank anonymous reviewers for their valuable feedback. This work was supported by the Natural Science Foundation of China (61725204) and Tsinghua University Initiative Scientific Research Program.