Poster
Open-World Skill Discovery from Unsegmented Demonstration Videos
Jingwen Deng · Zihao Wang · Shaofei Cai · Anji Liu · Yitao Liang
Learning skills in open-world environments is essential for developing agents capable of handling a variety of tasks by combining basic skills.Online demonstration videos are typically long and unsegmented, making them difficult to segment and label with skill identifiers.Unlike existing methods that rely on sequence sampling or human labeling, we have developed a self-supervised learning-based approach to segment these long videos into a series of semantic-aware and skill-consistent segments.Drawing inspiration from human cognitive event segmentation theory, we introduce Skill Boundary Detection (SBD), an annotation-free temporal video segmentation algorithm. SBD detects skill boundaries in a video by leveraging prediction errors from a pretrained unconditional action-prediction model. This approach is based on the assumption that a significant increase in prediction error indicates a shift in the skill being executed. We evaluated our method in the Minecraft environment, a rich open-world simulator with extensive gameplay videos available online. Our SBD-generated segments improved the average performance of two conditioned policies by 63.7\% and 52.1\% on short-term atomic skill tasks, and their corresponding hierarchical agents by 11.3\% and 20.8\% on long-horizon tasks.
Live content is unavailable. Log in and register to view live content