ICCV Poster ReMP-AD: Retrieval-enhanced Multi-modal Prompt Fusion for Few-Shot Industrial Visual Anomaly Detection

Poster

ReMP-AD: Retrieval-enhanced Multi-modal Prompt Fusion for Few-Shot Industrial Visual Anomaly Detection

Hongchi Ma · Guanglei Yang · Debin Zhao · Yanli Ji · Wangmeng Zuo

[ Abstract ]

Abstract:

Industrial visual inspection is crucial for detecting defects in manufactured products, but it traditionally relies on human operators, leading to inefficiencies. Industrial Visual Anomaly Detection (IVAD) has emerged as a promising solution, with methods such as zero-shot, few-shot, and reconstruction-based techniques. However, zero-shot methods struggle with subtle anomalies, and reconstruction-based methods fail to capture fine-grained details. Few-shot methods, which use limited samples and prompts, offer a more efficient approach. Despite their promise, challenges remain in managing intra-class variation among references and in effectively extracting more representative anomaly features.This paper presents \textbf{R}etrieval-\textbf{e}nhanced \textbf{M}ulti-modal \textbf{P}rompt Fusion \textbf{A}nomaly \textbf{D}etection (ReMP-AD), a framework that introduces Intra-Class Token Retrieval (ICTR) to reduce noise in the memory bank and Vision-Language Prior Fusion (VLPF) to guide the encoder in capturing more distinctive and relevant features of anomalies. Experiments on the VisA and MVTec-AD datasets demonstrate that ReMP-AD outperforms existing methods, achieving 97.8\%/94.1\% performance in 4-shot anomaly segmentation and classification. Our approach also shows strong results on the PCB-Bank dataset, highlighting its effectiveness in few-shot industrial anomaly detection.

Live content is unavailable. Log in and register to view live content