Poster
CObL: Toward Zero-Shot Ordinal Layering without User Prompting
Aneel Damaraju · Dean Hazineh · Todd Zickler
Vision benefits from grouping pixels into objects and understanding their spatial relationships, both laterally and in depth. This is captured by a scene representation comprising of an occlusion-ordered stack of "object layers,’’ each containing an isolated and amodally-completed object. To infer this representation from an image we introduce a diffusion-based architecture named Concurrent Object Layers (CObL). CObL generates a stack of object layers concurrently, using Stable Diffusion as a prior for natural objects, and using inference-time guidance to ensure the inferred layers composite back to the input image. We train CObL using a few thousand synthetically-generated images of multi-object tabletop scenes, and we find that it zero-shot generalizes to scenes of real-world tabletops with varying numbers of novel objects. In contrast to recent models for amodal object completion, CObL reconstructs multiple partially-occluded objects without any user prompting and without knowing the number of objects beforehand; and unlike previous models for object-centric representation learning, CObL is not limited to the closed world it was trained in.
Live content is unavailable. Log in and register to view live content