ICCV Poster Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics

Poster

Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics

Ruining Li · Chuanxia Zheng · Christian Rupprecht · Andrea Vedaldi

Exhibit Hall I #1236

[ Abstract ]

Wed 22 Oct 2:15 p.m. PDT — 4:15 p.m. PDT

Abstract:

We present Puppet-Master, a video generator designed to capture the internal, part-level motion dynamics of objects as a proxy to understand object dynamics universally.Given an image of an object and a set of “drags” specifying the trajectory of a few points of the object, Puppet-Master synthesizes a video where the object parts move accordingly.We extend a pre-trained image-to-video generator with a module that encodes the input drags, and introduce all-to-first attention, a novel alternative to conventional spatial attention that mitigates artifacts caused by fine-tuning a video generator on out-of-domain data.Instead of using real videos, which often intertwine part-level motion with overall object motion, camera movement, and occlusion, we fine-tune Puppet-Master on Objaverse-Animation-HQ, a new dataset of curated part-level motion clips obtained by rendering synthetic 3D animations.We extensively filter out sub-optimal animations and augment the synthetic renderings with meaningful drags to emphasize the internal dynamics of objects.We demonstrate that by using this synthetic dataset, Puppet-Master learns to generate part-level motions, unlike other motion-conditioned video generators that mostly move the object as a whole, and generalizes well to real images, outperforming existing methods on real-world benchmarks in a zero-shot manner.

Live content is unavailable. Log in and register to view live content