Online Knowledge Distillation with Feature Disentanglement

Authors: Yifan Li, Zhengzhong Zhu, Pei Zhou, Kejiang Chen, and Jiangping Zhu
Conference: ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages: 1602-1616
Keywords: Knowledge Distillation, Variational Autoencoders, Knowledge Transfer

Abstract

Knowledge distillation is a method that trains a student model to approxi-mate the performance of a teacher model. However, in real-world applica-tions, the architectural discrepancy between teacher and student models of-ten impedes the comprehensive transfer of knowledge from teacher to stu-dent. Moreover, the reduction in learnable parameters in student models poses challenges in acquiring the high-dimensional knowledge from the teacher models. due to the complexity and redundancy of the teacher mod-el's high-dimensional features, the student model may encounter difficulties in learning these features. To address this challenge, this study proposes a knowledge distillation method based on variational autoencoders VAE . We use VAE to compress the teacher model's high-dimensional features into low-dimensional robust features, which are extracted and transferred to the student model through the variational autoencoder loss function. Experi-mental results show that student models using this method achieve signifi-cant performance improvements on multiple benchmark datasets. Our re-search indicates that the low-dimensional robust features extracted by VAE can effectively enhance the student model's learning process, providing a new approach for knowledge distillation tasks.
📄 View Full Paper (PDF) 📋 Show Citation