Abstract
Point clouds play an important role in 3D analysis, which has broad applications in robotics and autonomous driving. The pre-training fine-tuning paradigm has shown great potential in the point cloud domain. Full fine-tuning is generally effective but leads to a heavy storage and computational burden, which becomes inefficient and unacceptable as the size of pre- trained models scales. Although efficient fine-tuning approaches have significant progress in other domains, they generally perform worse for point clouds. To overcome this dilemma, we revisit the official Point-MAE implementation and find the crit- ical role of aggregation in fine-tuning performances. Inspired by such discoveries, we propose a novel dynamic aggregation (DA) method to replace previous static aggregation like mean or max pooling for pre-trained point cloud Transformers. Besides standard metrics such as accuracy or mIoU, we evaluate the number of tunable parameters and additional FLOPs for a fair comparison of our method to different fine-tuning approaches. We construct several DA variants and validate them through ex- tensive experiments. Experimental results demonstrate that DA has competitive performances against full fine-tuning and other efficient fine-tuning approaches. The code is publicly available at https://github.com/JaronTHU/DynamicAggregation.