Poster
Bridging the Gap between Brain and Machine in Interpreting Visual Semantics: Towards Self-adaptive Brain-to-Text Decoding
Jiaxuan Chen · Yu Qi · Yueming Wang · Gang Pan
Neural decoding has recently made significant progress in reconstructing images and text from brain activity, yet seeking biologically valid semantic alignment between artificial models and the brain remains challenging. Large pre-trained foundation models such as CLIP excel at capturing rich semantic details in complex visual scenes. In contrast, due to selective attention, only part of the visual semantics in the stimulus may be preferentially represented in the neural patterns when subjects view images. Past studies have generally assumed that stimulus images and their evoked brain recordings are strictly semantically equivalent, potentially leading to semantic misalignment between supervision signals and neural recordings. In order to address this, we propose a novel self-adaptive semantic decoding method (Mind-SA), designed to dynamically detect the regions within stimulus images that the brain actually focuses on and use them as supervision to guide brain-to-text reconstruction. We find that the proposed Mind-SA can be used to reduce the semantic gap between supervision signals (i.e., stimulus images) and neural representations, thus enabling the reconstruction model to focus on the parts that the brain actually perceives. Experiments demonstrate that Mind-SA improves the quality of neural representations and achieves the state-of-the-art brain-to-text performance.
Live content is unavailable. Log in and register to view live content