Skip to yearly menu bar Skip to main content


Poster

LONG3R: Long Sequence Streaming 3D Reconstruction

Zhuoguang Chen · Minghui Qin · Tianyuan Yuan · Zhe Liu · Hang Zhao


Abstract:

Recent advancements in sparse multi-view scene reconstruction have been significant, yet existing methods face limitations when processing streams of input images. These methods either rely on time-consuming offline optimization or are restricted to shorter sequences, hindering their applicability in real-time scenarios. In this work, we propose LONG3R (LOng sequence streamiNG 3D Reconstruction), a novel model designed for streaming multi-view 3D scene reconstruction over longer sequences. Our model achieves real-time processing by operating recurrently, maintaining and updating memory with each new observation. We introduce a refined decoder that facilitates coarse-to-fine interaction between memory and new observations using memory gating and a dual-source attention structure. To effectively capture long-sequence memory, we propose a 3D spatio-temporal memory that dynamically prunes redundant spatial information while adaptively adjusting resolution along the scene. To enhance our model’s performance on long sequences while maintaining training efficiency, we employ a two-stage curriculum training strategy, each stage targeting specific capabilities. Experiments on multiple multi-view reconstruction datasets demonstrate that LONG3R outperforms state-of-the-art streaming methods, particularly for longer sequences, while maintaining real-time inference speed.

Live content is unavailable. Log in and register to view live content