Self-Supervised Monocular Depth Estimation Based on Dual-Branch DepthNet and Multi-Attention Fusion

Authors: Yuanyuan Wang, Dianxi Shi, Junze Zhang, Luoxi Jing, Xueqi Li, and Xucan Chen
Conference: ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages: 482-498
Keywords: Monocular Depth Estimation·Self-Supervised·Dual-Branch Network

Abstract

Monocular depth estimation infers 3D geometric structures of scenes from a single RGB image, offering significant applications in autonomous driving, robot navigation, and other fields. While current self-supervised learning methods avoid dependency on ground truth depth data, they still exhibit no-table limitations in complex scenarios: traditional encoder-decoder architec-tures inevitably lose high-frequency detail features when acquiring global context through continuous downsampling, resulting in blurred edges and texture distortion in depth maps. To alleviate these issues, we propose a nov-el approach named HyperDetailNet, which significantly enhances depth es-timation detail preservation. Specifically, our method contains two key com-ponents: 1 A dual-branch detail-global feature extraction network, where the detail branch adopts an enlarge-then-reduce strategy to preserve high-frequency texture information, while the global branch extracts overall struc-tural information of the scene. 2 To effectively fuse features from both branches, we designed a multi-attention fusion module that combines spatial attention, channel attention, and sliding window self-attention mechanisms to enhance model perception of detailed regions. Experimental results demonstrate that HyperDetailNet achieves excellent performance on both KITTI and Make3D datasets, with significant improvements in depth estima-tion for edge and texture-rich areas. Additionally, ablation experiments veri-fy the effectiveness of the dual-branch detail-global feature extraction DepthNet and multi-attention fusion module.
📄 View Full Paper (PDF) 📋 Show Citation