Skip to yearly menu bar Skip to main content


Poster

DanceEditor: Towards Iterative Editable Music-driven Dance Generation with Open-Vocabulary Descriptions

Hengyuan Zhang · Zhe Li · Xingqun Qi · Mengze Li · Muyi Sun · Siye Wang · Man Zhang · Sirui Han


Abstract: Generating coherent and diverse human dances from music signals has gained tremendous progress in animating virtual avatars. While existing methods enable dance synthesis directly, they overlook affording editable dance movements for users is more practical in real choreography scenes.Moreover, the lack of high-quality dance datasets incorporating iterative editing also limits addressing this challenge.To achieve this goal, we first construct $\textbf{DanceRemix}$, a large-scale multi-turn editable dance dataset comprising the prompt featuring over 12.6M dance frames and 42K pairs.In addition, we propose a novel framework for iterative and editable dance generation coherently aligned with given music signals, namely $\textbf{DanceEditor}$. Considering the dance motion should be both musical rhythmic and enable iterative editing by user descriptions, our framework is built upon a prediction-then-editing paradigm unifying multi-modal conditions.At the initial prediction stage, our framework improves the authority of generated results by directly modeling dance movements from tailored aligned music.Moreover, at the subsequent iterative editing stages, we incorporate text descriptions as conditioning information to draw the editable results through a specific-designed $\textbf{Cross-modality Edition Module (CEM)}$.Specifically, CEM adaptively integrates the initial prediction with music and text prompts as temporal motion cues to guide the synthesized sequences.Thereby the results display music harmonic while preserving fine-grained semantic alignment with text descriptions.Extensive experiments demonstrate that our method outperforms the state-of-the-art models on our newly collected DanceRemix dataset.

Live content is unavailable. Log in and register to view live content