Skip to yearly menu bar Skip to main content


Poster

LLM-assisted Entropy-based Adaptive Distillation for Unsupervised Fine-grained Visual Representation Learning

Jianfeng Dong · Danfeng Luo · Daizong Liu · Jie Sun · Xiaoye Qu · Xun Yang · Dongsheng Liu · Xun Wang


Abstract:

Unsupervised Fine-grained Visual Represent Learning (FVRL) aims to learn discriminative features to distinguish subtle differences among visually similar categories without using labeled fine-grained data. Existing works, which typically learn representation from target data, often struggle to capture subtle inter-class variations due to the limited prior fine-grained knowledge. To alleviate it, this paper proposes LLM-assisted Entropy-based Adaptive Distillation (LEAD), a novel unsupervised FVRL framework that selectively distills fine-grained knowledge from a powerful teacher model built upon pre-trained models. Specifically, we first harness the powerful reasoning capabilities of Large Language Models (LLMs) to generate contextual knowledge of fine-grained category-aware descriptions, enriching semantic priors in the teacher model. These descriptions are then used to form a prototype-driven fine-grained classifier, which acts as an assistant to generate rich knowledge with a frozen vision-language model. Besides, to achieve effective knowledge transfer, we further introduce an entropy-based adaptive mechanism, which dynamically adjusts the distillation strength based on the information entropy to identify and prioritize valuable knowledge. Extensive experimental results on three fine-grained datasets demonstrate the effectiveness and efficiency of our proposed LEAD for unsupervised FVRL. Our source code is available at https://anonymous.4open.science/r/EAD-FFAB.

Live content is unavailable. Log in and register to view live content