Poster
VRM: Knowledge Distillation via Virtual Relation Matching
Weijia Zhang · Fei Xie · Weidong Cai · Chao Ma
Knowledge distillation (KD) aims to transfer the knowledge of a more capable yet cumbersome teacher model to a lightweight student model. In recent years, relation-based KD methods have fallen behind, as instance-matching counterparts dominate in performance. In this paper, we revive relational KD by identifying and tackling several key issues in relational KD, including its susceptibility to overfitting and spurious responses. Specifically, we transfer novelly constructed affinity graphs that compactly encapsulate a wealth of beneficial inter-sample, inter-class, and inter-view correlations by exploiting virtual views and relations as a new kind of knowledge. As a result, the student has access to rich guidance signals and stronger regularisation throughout the distillation process. To further mitigate the adverse impact of spurious responses, we prune the affinity graphs by dynamically detaching redundant and unreliable edges. Extensive experiments on CIFAR-100, ImageNet, and MS-COCO datasets demonstrate the superior performance of the proposed virtual relation matching (VRM) method over a range of tasks, architectures, and set-ups. For instance, VRM for the first time hits 74.0% accuracy for ResNet50-to-MobileNetV2 distillation on ImageNet, and improves DeiT-Ti by 14.44% on CIFAR-100 with a ResNet56 teacher. Code and models will be released.
Live content is unavailable. Log in and register to view live content