ICCV Poster Proxy-Bridged Game Transformer for Interactive Extreme Motion Prediction

Poster

Proxy-Bridged Game Transformer for Interactive Extreme Motion Prediction

Yanwen Fang · Wenqi Jia · Xu Cao · Peng-Tao Jiang · Guodong Li · Jintai CHEN

[ Abstract ]

Abstract:

Multi-person motion prediction becomes particularly challenging when handling highly interactive scenarios involving extreme motions. Previous works focused more on the case of `moderate' motions (e.g., walking together), where predicting each pose in isolation often yields reasonable results. However, these approaches fall short in modeling extreme motions like lindy-hop dances, as they require a more comprehensive understanding of cross-person dependencies. To bridge this gap, we introduce Proxy-bridged Game Transformer (PGformer), a Transformer-based foundation model that captures the interactions driving extreme multi-person motions. PGformer incorporates a novel cross-query attention module to learn bidirectional dependencies between pose sequences and a proxy unit that subtly controls bidirectional spatial information flow. We evaluate PGFormer on the challenging ExPI dataset, which involves large collaborative movements. Both quantitative and qualitative demonstrate the superiority of PGFormer in both short- and long-term predictions. We also test the proposed method on moderate movement datasets CMU-Mocap and MuPoTS-3D, generalizing PGFormer to scenarios with more than two individuals with promising results.

Live content is unavailable. Log in and register to view live content