Ultra-Sparse Viewpoints Novel View Synthesis via Global Feature-Driven Spatial Structure Comprehension

Authors: Qijun He Jingfu Yan Jiahui Li Yifeng Li
Conference: ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Pages: 210-221
Keywords: Transformer and ViT and NeRF and Sparse Views

Abstract

Our research primarily tackles the issue of substantial artifacts and geometric distortions encountered while synthesizing novel views from extremely sparse input views. We discovered that enhancing global features in such sparse conditions aids the network in better comprehending the scene's spatial relationships, thereby improving rendering quality. Our methodology is bifurcated into two principal components: the geometric reasoner and the classical neural renderer.The geometric reasoner unfolds in three phases. Initially, our approach emphasizes global feature extraction, utilizing these features to afford the network a comprehensive grasp of the scene's overall layout and structure. It particularly focuses on deciphering spatial relationships between different views, facilitating geometric reasoning and the formulation of expressive 3D scene representations. The subsequent fusion stage employs a mechanism akin to the visual transformer to amalgamate features from various input views across multiple scales and levels, thereby enriching the model's understanding of abstract spatial relationships and augmenting the light density attributes of all 3D points. The second part involves rendering the color of any light passing through the scene using a classic renderer. Experiments show that when tested on the most popular real-scenario forward datasets and synthetic datasets, our approach exhibits state-of-the-art performance and demonstrates richer details and a more complete silhouette structure compared to previous excellent work on synthesizing novel views.
📄 View Full Paper (PDF) 📋 Show Citation