SparSplat: Fast Multi-View Reconstruction with Generalizable 2D Gaussian Splatting

1INRIA
2TUM
CVPRW 2025

Fast inference of SparSplat. We present the first generalizable feed-forward 2DGS prediction model from multi-view images. It achieves state-of-the-art performance in the sparse DTU 3D reconstruction benchmark, with faster inference by several orders of magnitude compared to the competition based on volume rendering of implicit representations. Multi-view input deep features are homography-warped into the target view. A two-fold network performs Deep Multi-view Stereo and pixel-aligned 2D surface element attribute regression. Perspective accurate Gaussian Splatting of these surface elements enables real-time novel view synthesis and mesh extraction.

Abstract

Recovering 3D information from scenes via multi-view stereo reconstruction (MVS) and novel view synthesis (NVS) is inherently challenging, particularly in scenarios involving sparse-view setups. The advent of 3D Gaussian Splatting (3DGS) enabled real-time, photorealistic NVS. Following this, 2D Gaussian Splatting (2DGS) leveraged perspective-accurate 2D Gaussian primitive rasterization to achieve accurate geometry representation during rendering, improving 3D scene reconstruction while maintaining real-time performance. Recent approaches have tackled the problem of sparse real-time NVS using 3DGS within a generalizable, MVS-based learning framework to regress 3D Gaussian parameters. Our work extends this line of research by addressing the challenge of generalizable sparse 3D reconstruction and NVS jointly, and manages to perform successfully at both tasks. We propose an MVS-based learning pipeline that regresses 2DGS surface element parameters in a feed-forward fashion to perform 3D shape reconstruction and NVS from sparse-view images. We further show that our generalizable pipeline can benefit from preexisting foundational multi-view deep visual features. The resulting model attains state-of-the-art results on the DTU sparse 3D reconstruction benchmark in terms of Chamfer distance to ground-truth, as well as state-of-the-art NVS. It also demonstrates strong generalization on the BlendedMVS and Tanks and Temples datasets. We note that our model outperforms the prior state-of-the-art in feed-forward sparse view reconstruction based on volume rendering of implicit representations, while offering an almost 2 orders of magnitude higher inference speed.

Comparison Results

Visual comparisons on the sparse reconstruction setting of DTU dataset.

Visual comparison of surface reconstruction results on BlendedMVS dataset.

Visual comparison of novel-view sythesis results on DTU dataset.

BibTeX

@article{jena2025sparsplat,
  title={SparSplat: Fast Multi-View Reconstruction with Generalizable 2D Gaussian Splatting},
  author={Jena, Shubhendu and Reddy, Shishir and Boukhayma, Adnane},
  journal={arXiv preprint arXiv:},
  year={2025}
}