A Review of the Evolution and Applications of AI Knowledge Distillation Technology

  • MAO KeBiao ,
  • DAI Wang ,
  • GUO ZhongHua ,
  • SUN XueHong ,
  • XIAO LiuRui
Expand
  • 1. School of Physics and Electronic-Electrical Engineering, Ningxia University, Yinchuan 750021, China
    2. State Key Laboratory of Efficient Utilization of Arid and Semi-arid Arable Land in Northern China, Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing 100081, China

Received date: 2025-03-28

  Accepted date: 2025-05-06

  Online published: 2025-06-23

Abstract

Knowledge Distillation (KD) in Artificial Intelligence (AI) achieves model lightweighting through a teacher-student framework, emerging as a key technology to address the performance-efficiency bottleneck in deep learning. This paper systematically analyzes KD’s theoretical framework from the perspective of algorithm evolution, categorizing knowledge transfer paths into four paradigms: response-based, feature-based, relation-based, and structure-based. It establishes a comparative evaluation system for dynamic and static KD methods. We deeply explore innovative mechanisms such as cross-modal feature alignment, adaptive distillation architectures, and multi-teacher collaborative validation, while analyzing fusion strategies like progressive knowledge transfer and adversarial distillation. Through empirical analysis in computer vision and natural language processing, we assess KD’s practicality in scenarios like image classification, semantic segmentation, and text generation. Notably, we highlight KD’s potential in agriculture and geosciences, enabling efficient deployment in resource-constrained settings for precision agriculture and geospatial analysis. Current models often face issues like ambiguous knowledge selection mechanisms and insufficient theoretical interpretability. Accordingly, we discuss the feasibility of automated distillation systems and multimodal knowledge fusion, offering new technical pathways for edge intelligence deployment and privacy computing, particularly suited for agricultural intelligence and geoscience research.

Cite this article

MAO KeBiao , DAI Wang , GUO ZhongHua , SUN XueHong , XIAO LiuRui . A Review of the Evolution and Applications of AI Knowledge Distillation Technology[J]. Journal of Agricultural Big Data, 2025 , 7(2) : 144 -154 . DOI: 10.19788/j.issn.2096-6369.000106

References

[1] MAO K, WU C, YUAN Z., et al. Theory and conditions for AI-based inversion paradigm of geophysical parameters using energy balance, EarthArXiv, 2024, 12:1-16. DOI: https://doi.org/10.31223/X5H13J.
[2] 毛克彪, 王涵, 袁紫晋, 等, 热红外遥感多参数人工智能一体化反演范式理论与技术. 中国农业信息, 2024, 36(3): 63-80.
[3] 毛克彪, 袁紫晋, 施建成, 等. 基于大数据的遥感参数人工智能反演范式理论形成与工程技术实现. 农业大数据学报, 2023, 5(4):1-12.
[4] GOU J, YU B, MAYBANK J S, et al. Knowledge distillation: A survey. International Journal of Computer Vision, 2021, 129(6):1-31.
[5] HINTON G E, VINYALS O, DEAN J. Distilling the knowledge in a neural network. arXiv:1503.02531, 2015.
[6] DAI W, MAO K, GUO Z, et al. Joint optimization of AI large and small models for surface temperature and emissivity retrieval using knowledge distillation. Artificial Intelligence in Agriculture, 2025, 15(3): 407-425.
[7] ROMERO A, BALLAS N, KAHOU S E, et al. FitNets: hints for thin deep nets//Proceedings of the 3rd International Conference on Learning Representations, San Diego, May 7-9, 2015: 1-13.
[8] PARK W, KIM D, LU Y, et al. Relational knowledge distillation// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 3967-3976.
[9] KIM J, PARK S, KWAK N. Paraphrasing complex network: network compression via factor transfer. arXiv:1802.04977, 2018.
[10] 杨传广, 陈路明, 赵二虎, 等. 基于图表征知识蒸馏的图像分类方法. 电子学报, 2024, 52(10):3435-3447.
[11] 王改华, 李柯鸿, 龙潜, 等. 基于知识蒸馏的轻量化Transformer目标检测. 系统仿真学, 2024, 36(11):2517-2527. DOI:10.16182/j.issn1004731x.joss.24-0754.
[12] LIU Y, CHEN K, LIU C, et al. Structured knowledge distillation for semantic segmentation. CoRR, 2019, abs/1903.04197.
[13] DAI H, LIU Z, LIAO W, et al. AugGPT: Leveraging ChatGPT for text data augmentation. IEEE Transactions on Big Data, 2025.3536934.
[14] XU Y, XU R, ITER D, et al. InheritSumm: A general, versatile and compact summarizer by distilling from GPT. ArXiv, 2023, abs/2305. 13083.
[15] HOU W, ZHAO W, JIA N, et al. Low-resource knowledge graph completion based on knowledge distillation driven by large language models. Applied Soft Computing, 2025,169112622-112622.
[16] ACHARYA K, VELASQUEZ A, SONG H H. A survey on symbolic knowledge distillation of large language models. IEEE Transactions on Artificial Intelligence, 2024.DOI:10.1109/TAI.2024.3428519.
[17] ZAGORUYKO S, KOMODAKIS N. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. CoRR, 2016,abs/1612.03928.
[18] HEO B, KIM J, YUN S, et al. A comprehensive overhaul of feature distillation. CoRR, 2019,abs/1904.01866.
[19] ZHANG Y, XIANG T, HOSPEDALES T M, et al. Deep Mutual Learning. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018:4320-4328.
[20] ANIL R, PEREYRA G, PASSOS A T, et al. Large scale distributed neural network training through online distillation. International Conference on Learning Representations (ICLR), Vancouver, Canada, 2018.
[21] FURLANELLO T, LIPTON Z C, TSCHANEN M, et al. Born again neural networks. International Conference on Machine Learning (ICML), Stockholm, Sweden, 2018: 1602-1611.
[22] HSIEH C Y, HUANG J, HUANG S, et al. Distilling step-by-step: Training smaller models with less data via reasoning transfer. In Advances in Neural Information Processing Systems (NeurIPS). 2024, https://doi.org/10.48550/arXiv.2305.02301.
[23] LI Y, LI Z, ZHANG Y, et al. Self-consistency decoding for chain-of- thought distillation. In Proceedings of the International Conference on Machine Learning (ICML), 2024.
[24] DUAN Z, WANG Y, LI X, et al. DIKWP: A hierarchical knowledge distillation framework for interpretable model compression. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2025.
[25] JIANG Y, ZHAO X, WU Y, et al. A knowledge distillation-based approach to enhance transparency of classifier models. arXiv preprint arXiv, 2025, 2502.15959. https://doi.org/10.48550/arXiv.2502.15959.
[26] HUANG Z, WANG N. Like what you like: Knowledge distill via neuron selectivity transfer. 2017. DOI:10.48550/arXiv.1707.01219.
[27] MIRZADEH I S, FARAJTABAR M, LI A, et al. Improved knowledge distillation via teacher assistant. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(4):5191-5198.
[28] ZHU S, SHANG R, YUAN B, et al. DynamicKD: An effective knowledge distillation via dynamic entropy correction-based distillation for gap optimizing. Pattern Recognition, 2024,110545.
[29] GAIDO M, DI GANGI M A, NEGRI M, et al. End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020 // 17th International Conference on Spoken Language Translation, Online, 2020:80-88. Association for Computational Linguistics.
[30] LOPES R G, FENU S, STARNER T. Data-Free knowledge distillation for deep neural networks. 2017. DOI:10.48550/arXiv.1710.07535.
[31] WANG X, ZHANG R, SUN Y, et al. KDGAN: Knowledge distillation with generative adversarial networks. Neural Information Processing Systems (NeurIPS), Montreal, Canada. 2018. https://api.semanticscholar.org/CorpusID:53976534.
[32] AKMEL F, MENG F, LIU M, et al. Few-shot class incremental learning via prompt transfer and knowledge distillation. Image and Vision Computing, 2024,151105251-105251.
[33] WU Z, SUN S, WANG Y, et al. Knowledge distillation in federated edge learning: A survey. arXiv, 2023. https://arxiv.org/abs/2301.05849.
[34] CHEN W C, CHANG C C, LEE C R. Knowledge distillation with feature maps for image classification. Asian Conference on Computer Vision (ACCV), Sydney, Australia, 2018:200-215. Springer, Cham. https://doi.org/10.1007/978-3-030-20893-6_13
[35] ZHU M, HAN K, ZHANG C, et al. Low-resolution visual recognition via deep feature distillation. 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019:3762-3766. doi:10.1109/ICASSP.2019.8682926.
[36] SUN F, JIA J, HAN X, et al. Small-Sample target detection across domains based on supervision and distillation. Electronics, 2024, 13(24):4975-4975.
[37] WEI Y, PAN X, QIN H, et al. Quantization Mimic: Towards very tiny CNN for object detection. European Conference on Computer Vision (ECCV), Munich, Germany, 2018. Lecture Notes in Computer Science, 2018. Lecture Notes in Computer Science, vol 11212. Springer, Cham.
[38] 谢新林, 段泽云, 罗臣彦, 等. 边界感知引导多层级特征的知识蒸馏交通场景语义分割算法. 模式识别与人工智能, 2024, 37(9):770-785.
[39] 董增波, 徐诗雨, 陈曦, 等. 电力领域自然语言理解模型的轻量化研究. 哈尔滨理工大学学报,1-8[2025-03-05]. http://kns.cnki.net/kcms/detail/23.1404.N.20231204.1602.020.html.
[40] 申影利, 赵小兵. 语言模型蒸馏的低资源神经机器翻译方法. 计算机工程与科学, 2024, 46(4):743-751.
[41] YEHUDAI A, CARMELI B, MASS Y, et al. Genie: Achieving human parity in content-grounded datasets generation. arXiv,2024:2401. 14367.
[42] ZHANG J, XIE R, HOU Y, et al. Recommendation as instruction following: A large language model empowered recommendation approach. 2023. DOI:10.48550/arXiv.2305.07001.
[43] WANG Y, YU Z, YAO W, et al. PandaLM: An automatic evaluation benchmark for LLM instruction tuning optimization. International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 2024.
[44] YAO T J, SUN J Q, CAO D F, et al. MuGSI: Distilling GNNs with Multi-Granularity Structural Information for Graph Classification. ACM Web Conference 2024 (WWW 2024), Singapore, 2024:709-720. ACM.
[45] 赵全意, 郑福建, 夏波, 等. 基于深度流形蒸馏网络的高光谱遥感图像场景分类方法. 测绘学报, 2024, 53(12):2404-2415.
[46] 张重阳, 王斌. 基于知识蒸馏的轻量化遥感图像场景分类. 红外与毫米波学报, 2024, 43(5):684-695.
[47] 李大湘, 南艺璇, 刘颖. 面向遥感图像场景分类的双知识蒸馏模型. 电子与信息学报, 2023, 45(10):3558-3567.
Outlines

/