Adaptive Knowledge Distillation with Dynamic Weight Allocation

Authors: Shaokang Zhang, Yixin Zhang, Ning Ran, and Liang Wang
Conference: ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages: 994-1006
Keywords: Knowledge Distillation � Pre-trained Language Models� Adaptive Weight Distillation.

Abstract

Knowledge distillation has become a key technique for compressing pre-trained language models. However, existing methods suffer from some limitations. First, the student model can only imitate the teacher model, but the teacher cannot adapt to the ability of the student model. Second, the student model should focus on learning the knowledge that it is unfamiliar with. Existing methods that distill all the knowledge of the teacher model may bring redundant information. To address these issues, we propose Dynamic Weighted Adaptive Knowledge Distillation, which can adaptively update the teacher model and weight distillation. Specifically, the teacher model is updated according to feedback on the performance of the distilled student model in the independent quiz dataset.We introduce a dynamic weight assignment mechanism that controls the knowledge learned by the student model based on the difference between the teacher model and the student model. Experimental results show that our method outperforms several state-of-the-art methods on multiple datasets.
📄 View Full Paper (PDF) 📋 Show Citation