PCDepth: Pattern-Based Complementary Learning for Monocular Depth Estimation by Best of Both Worlds

Haotian Liu, Sanqing Qu, FAN LU, Zongtao Bu, Florian Roehrbein, Alois Knoll, Guang Chen

PDF

Key figure (auto-extracted from paper)

Abstract

Event cameras can record scene dynamics with high temporal resolution, providing rich scene details for monocular depth estimation (MDE) even at low-level illumina- tion. Therefore, existing complementary learning approaches for MDE fuse intensity information from images and scene details from event data for better scene understanding. How- ever, most methods directly fuse two modalities at pixel level, ignoring that the attractive complementarity mainly impacts high-level patterns that only occupy a few pixels. For ex- ample, event data is likely to complement contours of scene objects. In this paper, we discretize the scene into a set of high-level patterns to explore the complementarity and pro- pose a Pattern-based Complementary learning architecture for monocular Depth estimation (PCDepth). Concretely, PCDepth comprises two primary components: a complementary visual representation learning module for discretizing the scene into high-level patterns and integrating complementary patterns across modalities and a refined depth estimator aimed at scene reconstruction and depth prediction while maintaining an efficiency-accuracy balance. Through pattern-based comple- mentary learning, PCDepth fully exploits two modalities and achieves more accurate predictions than existing methods, espe- cially in challenging nighttime scenarios. Extensive experiments on MVSEC and DSEC datasets verify the effectiveness and superiority of our PCDepth. Remarkably, compared with state- of-the-art, PCDepth achieves a 37.9% improvement in accuracy in MVSEC nighttime scenarios.

Index terms

Computer Vision for Automation Deep Learning Methods Visual Learning