Tutorial
Responsible Vision-Language Generative Models
Changhoon Kim · Yezhou Yang · Sijia Liu
Vision-language generative models, such as text-to-image and image-to-text systems, have rapidly transitioned from research prototypes to widely deployed tools across domains like education, journalism, and design. However, their real-world adoption has introduced critical challenges surrounding robustness, controllability, and ethical risks—including issues like prompt misalignment, unauthorized content generation, adversarial attacks, and data memorization. This tutorial provides a comprehensive overview of these concerns and emerging solutions by covering recent advances and failure modes in state-of-the-art models, robust concept erasure techniques in diffusion models, and adversarial vulnerabilities and defenses in image-to-text systems. Through a blend of theoretical foundations, participants will examine failure scenarios, explore attack and defense strategies, and gain practical insights into enhancing the trustworthiness of multimodal generative models. Designed for researchers and practitioners in vision, language, and AI safety, this tutorial uniquely focuses on the responsible deployment of these models—bridging technical rigor with societal impact and offering guidance for future research directions in secure and reliable generative AI.
Live content is unavailable. Log in and register to view live content