Poster
DASH: Detection and Assessment of Systematic Hallucinations of VLMs
Maximilian Augustin · Yannic Neuhaus · Matthias Hein
Vision-language models (VLMs) are prone to object hal-lucinations, where they erroneously indicate the presenceof certain objects in an image. Existing benchmarks quan-tify hallucinations using relatively small, labeled datasets.However, this approach is i) insufficient to assess halluci-nations that arise in open-world settings, where VLMs arewidely used, and ii) inadequate for detecting systematic er-rors in VLMs. We propose DASH (Detection and Assess-ment of Systematic Hallucinations), an automatic, large-scale pipeline designed to identify systematic hallucinationsof VLMs on real-world images in an open-world setting.A key component is DASH-OPT for image-based retrieval,where we optimize over the “natural image manifold” togenerate images that mislead the VLM. The output of DASHconsists of clusters of real and semantically similar imagesfor which the VLM hallucinates an object. We apply DASHto PaliGemma and two LLaVA-NeXT models across 380 ob-ject classes and, in total, find more than 15k clusters with650kimages. We study the transfer of the identified system-atic hallucinations to other VLMs and show that fine-tuningPaliGemma with the model-specific images obtained withDASH mitigates object hallucinations.
Live content is unavailable. Log in and register to view live content