Skip to yearly menu bar Skip to main content


Workshop

Learning to See: Advancing Spatial Understanding for Embodied Intelligence

Hongyang Li, Philipp Krähenbühl, Kashyap Chitta, Eric Jang, Andrei Bursuc, Huijie Wang

Sun 19 Oct, 11 a.m. PDT

The world is three-dimensional. This fact was first seen by trilobites, the first organisms capable of sensing light. From that moment, nervous systems began to evolve, gradually transforming mere sight into insight, understanding, and action. All these combined gives rise to intelligence. Despite remarkable technological advancements in recent decades, modern embodied systems remain far from achieving full intelligence. They fall short in several key aspects: (i) contain information necessary for physical interaction, such as temporal dynamics of the scene; (ii) have a prior over semantic relevance, and should focus on task-relevant features like objects and their relationships; and (iii) be compact, avoiding the inclusion of irrelevant details, such as background elements. Attempts have been made, including integrating foundational models and utilizing large-scale data. Yet, the path to true intelligence remains long, with significant progress still required.

Live content is unavailable. Log in and register to view live content