ICCV Poster Event-Driven Storytelling with Multiple Lifelike Humans in a 3D Scene

Poster

Event-Driven Storytelling with Multiple Lifelike Humans in a 3D Scene

Donggeun Lim · Jinseok Bae · Inwoo Hwang · Seungmin Lee · Hwanhee Lee · Young Min Kim

Exhibit Hall I #1071

[ Abstract ]

Wed 22 Oct 2:15 p.m. PDT — 4:15 p.m. PDT

Abstract:

In this work, we propose a framework that creates a lively virtual dynamic scene with contextual motions of multiple humans. Generating multi-human contextual motion requires holistic reasoning over dynamic relationships among human-human and human-scene interactions. We adapt the power of a large language model (LLM) to digest the contextual complexity within textual input and convert the task into tangible subproblems such that we can generate multi-agent behavior beyond the scale that was not considered before. Specifically, our event generator formulates the temporal progression of a dynamic scene into a sequence of small events. Each event calls for a well-defined motion involving relevant characters and objects. Next, we synthesize the motions of characters at positions sampled based on spatial guidance. We employ a high-level module to deliver scalable yet comprehensive context, translating events into relative descriptions that enable the retrieval of precise coordinates. As the first to address this problem at scale and with diversity, we offer a benchmark to assess diverse aspects of contextual reasoning. Benchmark results and user studies show that our framework effectively captures scene context with high scalability.

Live content is unavailable. Log in and register to view live content