November 30, 2024

SelfSplat: Pose-Free 3D Reconstruction from Monocular Video

Listen to this article as Podcast
0:00 / 0:00
SelfSplat: Pose-Free 3D Reconstruction from Monocular Video

SelfSplat: Pose-Free 3D Reconstruction with Gaussian Splatting

Three-dimensional scene reconstruction from images is a central topic in computer vision. Methods like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3D-GS) have made impressive progress in recent years. They enable the creation of photorealistic images from new viewpoints by learning from a set of images with known camera positions. Despite their power, these methods are often computationally intensive and require precise camera calibration data, limiting their practical use.

To overcome these hurdles, generalizable 3D reconstruction models have been developed that can predict 3D geometry and texture from a few images without iterative optimization steps. These models are often based on large datasets of synthetic and real 3D scenes and use pixel-aligned features to extract scene priors from the input images. However, these approaches also require calibrated images with accurate camera positions, both for training and application.

A promising approach to solving this problem is the integration of camera estimation into the 3D reconstruction process. Pose-free generalizable methods aim to learn reliable 3D geometries from uncalibrated images and generate accurate 3D representations in a single step. However, existing approaches in this area face challenges such as dependence on error-prone, pre-trained flow models for pose estimation or the need for scene-specific fine-tuning.

SelfSplat: A New Approach

SelfSplat is a novel training framework for pose-free, generalizable 3D reconstruction from monocular videos. It is based on the 3D-GS representation and utilizes a pipeline for estimating Gaussian functions that enables fast and high-quality reconstructions. By integrating 3D-GS with self-supervised depth and pose estimation methods, SelfSplat can simultaneously predict depth, camera poses, and 3D Gaussian attributes within a unified neural network architecture.

The challenge in simultaneously predicting Gaussian attributes and camera poses lies in the sensitivity of 3D-GS to errors in 3D positioning. SelfSplat addresses this problem by leveraging the strengths of self-supervised learning and 3D-GS. The geometric consistency of self-supervised learning leads to improved positioning of the 3D Gaussian functions, while the 3D-GS representation improves the accuracy of camera pose estimation.

Improved Geometry Consistency

To further improve the accuracy of pose estimation and the consistency of depth estimation, SelfSplat introduces two additional modules: a Matching-Aware Pose Network and a Depth Refinement Module. The Matching-Aware Pose Network utilizes information from multiple views to increase geometric accuracy. The Depth Refinement Module uses the estimated poses as embedding features to generate consistent depth maps, which are essential for accurate 3D scene geometry.

Applications and Results

After self-supervised training, SelfSplat can be used for various tasks, including pose and depth estimation, as well as 3D reconstruction with fast novel view synthesis. Evaluations on the RealEstate10k, ACID, and DL3DV datasets show that SelfSplat outperforms previous methods in both representation and geometric quality and exhibits strong, cross-dataset generalization ability.

Key Elements of SelfSplat:

- Pose-free and 3D-prior-free self-supervised learning from monocular videos. - Combination of self-supervised learning with 3D-GS representation. - Matching-Aware Pose Network for improved pose estimation. - Depth Refinement Module for consistent depth estimation. - Superior results in experiments and ablation studies. Bibliography: https://arxiv.org/abs/2411.17190 https://arxiv.org/html/2411.17190v2 https://github.com/Gynjn/selfsplat https://github.com/wangys16/FreeSplat https://x.com/zhenjun_zhao/status/1861655616751649105 https://paperreading.club/page?id=268633 https://www.researchgate.net/publication/385353991_PF3plat_Pose-Free_Feed-Forward_3D_Gaussian_Splatting https://openreview.net/pdf/e5aa79a0ac6a985dedbacda15a377e3baf8e79f5.pdf https://www.researchgate.net/publication/385385884_Epipolar-Free_3D_Gaussian_Splatting_for_Generalizable_Novel_View_Synthesis https://www.iflowai.com/static/chat/SelfSplat%3A%20Pose-Free%20and%203D%20Prior-Free%20Generalizable%203D%20Gaussian%20Splatting