Skip to yearly menu bar Skip to main content


Poster

ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Predictions

Dubing Chen · Jin Fang · Wencheng Han · Xinjing Cheng · Junbo Yin · Cheng-zhong Xu · Fahad Khan · Jianbing Shen


Abstract:

Vision-based semantic occupancy and flow prediction provide critical spatiotemporal cues for real-world tasks like autonomous driving and robotics. In this work, we strive to improve performance by introducing a series of targeted improvements for 3D semantic occupancy prediction and flow estimation. First, we propose an occlusion-aware adaptive lifting mechanism with depth denoising to improve the robustness of 2D-to-3D feature transformation and reduce reliance on depth priors. Second, we enhance semantic consistency between 3D and 2D features using shared semantic prototypes to jointly constrain both modalities. This is supported by confidence- and category-based sampling to tackle long-tail challenges in 3D space. Third, to ease the feature encoding burden in joint semantics and flow prediction, we introduce a BEV cost volume-based method. It connects flow and semantic features via the cost volume and applies a classification-regression supervision scheme to manage varying flow scales in dynamic scenes. Our purely convolutional framework achieves SOTA results across multiple benchmarks for 3D semantic occupancy prediction and joint semantic occupancy-flow prediction. It is also the 2nd solution for the Occupancy and Flow in Autonomous Driving Challenge. We provide multiple model variants that optimally balance efficiency and performance. Our real-time version exceeds all existing real-time methods in speed and accuracy, showcasing unmatched deployability. Code and models will be publicly released.

Live content is unavailable. Log in and register to view live content