Skip to yearly menu bar Skip to main content


Poster

Generic Event Boundary Detection via Denoising Diffusion

Jaejun Hwang · Dayoung Gong · Manjin Kim · Minsu Cho


Abstract:

Generic event boundary detection (GEBD) aims to identify natural boundaries in a video, segmenting it into distinct and meaningful chunks. Despite the inherent subjectivity of event boundaries, previous methods have focused on deterministic predictions, overlooking the diversity of plausible solutions.In this paper, we introduce a novel diffusion-based boundary detection model, dubbed DiffGEBD, that tackles the problem of GEBD from a generative perspective. The proposed model encodes relevant changes across adjacent frames via temporal self-similarity and then iteratively decodes random noise into plausible event boundaries being conditioned on the encoded features. Classifier-free guidance allows the degree of diversity to be controlled in denoising diffusion. In addition, we introduce a new evaluation metric to assess the quality of predictions considering both diversity and fidelity. Experiments show that our method achieves strong performance on two standard benchmarks, TAPOS and Kinetics-GEBD, generating diverse and plausible event boundaries.

Live content is unavailable. Log in and register to view live content