Skip to yearly menu bar Skip to main content


Workshop

Closing the Loop Between Vision and Language (Decade Mark)

Mohamed Elhoseiny, Angel Chang, Anna Rohrbach, Marcus Rohrbach, Xin Eric Wang, Krishna Kumar, Kilichbek Haydarov, Eslam Abdelrahman, Austin Wang, Yiming Zhang, Tobias Wieczorek, Qianqi (Jackie) Yan

Mon 20 Oct, 11 a.m. PDT

This workshop explores the intersection of Computer Vision and NLP, focusing on joint vision-language understanding. Recent advances, particularly in large-scale multimodal pretraining with transformers, have driven progress in various tasks. Topics include visual-linguistic representation learning, VQA, captioning, visual dialog, referring expressions, vision-and-language navigation, embodied QA, and text-to-image generation. We emphasize joint video-language understanding due to its unique challenges. Additionally, we welcome critical work on dataset and algorithmic bias, generalization issues, and efforts toward transparency and explainability.

Live content is unavailable. Log in and register to view live content