Skip to yearly menu bar Skip to main content


Poster

SD$^2$Actor: Continuous State Decomposition via Diffusion Embeddings for Robotic Manipulation

lijiayi jiayi


Abstract: Language-conditioned robot manipulation in the continuous spectrum presents a persistent challenge due to the difficult of mapping states to target actions. Previous methods face limitations in effectively modeling object states, primarily due to their reliance on executing ambiguous instructions devoid of explicit state information. In response, we present SD$^2$Actor, a zero-shot robotic manipulation framework that possesses the capability to generate precise actions in continuous states. Specifically, given the novel instructions, we aim to generate instruction-following and accurate robot manipulation actions. Instead of time-consuming optimization and finetuning, our zero-shot method generalizes to any object state with a wide range of translations and versatile rotations. At its core, we quantify multiple base states in the training set and utilize their combination to refine the target action generated by the diffusion model. To obtain novel state representations, we initially employ LLMs to extract the novel state from the instruction and decompose it into multiple learned base states. We then employ the linear combination of base state embeddings to produce novel state features. Moreover, we introduce the orthogonalization loss to constrain the state embedding space, which ensures the validity of linear interpolation. Experiments demonstrate that SD$^2$Actor outperforms state-of-the-art methods across a diverse range of manipulation tasks in ARNOLD Benchmark. Moreover, SD$^2$Actor can effectively learn generalizable policies from a limited number of human demonstrations, achieving promising accuracy in a variety of real-world manipulation tasks.

Live content is unavailable. Log in and register to view live content