AI知识蒸馏技术演进与应用综述

毛克彪; 代旺; 郭中华; 孙学宏; 肖柳瑞

doi:10.19788/j.issn.2096-6369.000106

农业大数据学报 >

2025 , Vol. 7 >Issue 2: 144 - 154

DOI: https://doi.org/10.19788/j.issn.2096-6369.000106

数据智能

AI知识蒸馏技术演进与应用综述

毛克彪 ,
代旺 ,
郭中华 ,
孙学宏 ,
肖柳瑞

展开

1.中国农业科学院农业资源与农业区划研究所北方干旱半干旱耕地高效利用全国重点实验室，北京 100081
2.宁夏大学电子与电气工程学院，宁夏银川 750021

毛克彪，E-mail：maokebiao@caas.cn。

收稿日期: 2025-03-28

录用日期: 2025-05-06

网络出版日期: 2025-06-23

基金资助

中央级公益性科研院所基本科研业务费专项(Y2025YC86);宁夏科技厅自然科学基金重点项目(2024AC02032)

收起

A Review of the Evolution and Applications of AI Knowledge Distillation Technology

MAO KeBiao ,
DAI Wang ,
GUO ZhongHua ,
SUN XueHong ,
XIAO LiuRui

Expand

1. School of Physics and Electronic-Electrical Engineering, Ningxia University, Yinchuan 750021, China
2. State Key Laboratory of Efficient Utilization of Arid and Semi-arid Arable Land in Northern China, Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing 100081, China

Received date: 2025-03-28

Accepted date: 2025-05-06

Online published: 2025-06-23

Fold

摘要

人工智能（AI）中知识蒸馏（KD）技术通过构建师生框架实现模型轻量化，成为解决深度学习性能与效率瓶颈的关键技术。本文从算法原理演进的视角，系统解析知识蒸馏的理论框架，将知识迁移路径归纳为基于响应、特征、关系及结构四类范式，并构建动态与静态知识蒸馏方法的对比评估体系。我们深入探讨了跨模态特征对齐、自适应蒸馏架构及多教师协同验证等创新机制，同时剖析渐进式知识迁移与对抗蒸馏等融合策略。通过计算机视觉与自然语言处理领域的实证分析，评估了该技术在图像分类、语义分割及文本生成等场景中的实用性。特别地，我们强调了知识蒸馏在农业与地学领域的潜力，例如在资源受限环境下的精准农业和地理空间分析中实现高效部署。研究发现当前模型普遍存在知识选择机制模糊、理论解释性不足等瓶颈问题。据此，我们探讨了自动化蒸馏系统与多模态知识融合等前沿方向的可行性，为边缘智能部署及隐私计算提供了新的技术路径，尤其适用于农业智能化与地学研究。

关键词： 知识蒸馏; 模型压缩; 知识迁移; 动态优化; 多模态学习

本文引用格式

毛克彪 , 代旺 , 郭中华 , 孙学宏 , 肖柳瑞 . AI知识蒸馏技术演进与应用综述[J]. 农业大数据学报, 2025 , 7(2) : 144 -154 . DOI: 10.19788/j.issn.2096-6369.000106

Abstract

Knowledge Distillation (KD) in Artificial Intelligence (AI) achieves model lightweighting through a teacher-student framework, emerging as a key technology to address the performance-efficiency bottleneck in deep learning. This paper systematically analyzes KD’s theoretical framework from the perspective of algorithm evolution, categorizing knowledge transfer paths into four paradigms: response-based, feature-based, relation-based, and structure-based. It establishes a comparative evaluation system for dynamic and static KD methods. We deeply explore innovative mechanisms such as cross-modal feature alignment, adaptive distillation architectures, and multi-teacher collaborative validation, while analyzing fusion strategies like progressive knowledge transfer and adversarial distillation. Through empirical analysis in computer vision and natural language processing, we assess KD’s practicality in scenarios like image classification, semantic segmentation, and text generation. Notably, we highlight KD’s potential in agriculture and geosciences, enabling efficient deployment in resource-constrained settings for precision agriculture and geospatial analysis. Current models often face issues like ambiguous knowledge selection mechanisms and insufficient theoretical interpretability. Accordingly, we discuss the feasibility of automated distillation systems and multimodal knowledge fusion, offering new technical pathways for edge intelligence deployment and privacy computing, particularly suited for agricultural intelligence and geoscience research.

Key words： knowledge distillation; model compression; knowledge transfer; dynamic optimization; multimodal learning

参考文献

[1]	MAO K, WU C, YUAN Z., et al. Theory and conditions for AI-based inversion paradigm of geophysical parameters using energy balance, EarthArXiv, 2024, 12:1-16. DOI: https://doi.org/10.31223/X5H13J.
[2]	毛克彪, 王涵, 袁紫晋, 等, 热红外遥感多参数人工智能一体化反演范式理论与技术. 中国农业信息, 2024, 36(3): 63-80.
[3]	毛克彪, 袁紫晋, 施建成, 等. 基于大数据的遥感参数人工智能反演范式理论形成与工程技术实现. 农业大数据学报, 2023, 5(4):1-12.
[4]	GOU J, YU B, MAYBANK J S, et al. Knowledge distillation: A survey. International Journal of Computer Vision, 2021, 129(6):1-31.
[5]	HINTON G E, VINYALS O, DEAN J. Distilling the knowledge in a neural network. arXiv:1503.02531, 2015.
[6]	DAI W, MAO K, GUO Z, et al. Joint optimization of AI large and small models for surface temperature and emissivity retrieval using knowledge distillation. Artificial Intelligence in Agriculture, 2025, 15(3): 407-425.
[7]	ROMERO A, BALLAS N, KAHOU S E, et al. FitNets: hints for thin deep nets//Proceedings of the 3rd International Conference on Learning Representations, San Diego, May 7-9, 2015: 1-13.
[8]	PARK W, KIM D, LU Y, et al. Relational knowledge distillation// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 3967-3976.
[9]	KIM J, PARK S, KWAK N. Paraphrasing complex network: network compression via factor transfer. arXiv:1802.04977, 2018.
[10]	杨传广, 陈路明, 赵二虎, 等. 基于图表征知识蒸馏的图像分类方法. 电子学报, 2024, 52(10):3435-3447.
[11]	王改华, 李柯鸿, 龙潜, 等. 基于知识蒸馏的轻量化Transformer目标检测. 系统仿真学, 2024, 36(11):2517-2527. DOI:10.16182/j.issn1004731x.joss.24-0754.
[12]	LIU Y, CHEN K, LIU C, et al. Structured knowledge distillation for semantic segmentation. CoRR, 2019, abs/1903.04197.
[13]	DAI H, LIU Z, LIAO W, et al. AugGPT: Leveraging ChatGPT for text data augmentation. IEEE Transactions on Big Data, 2025.3536934.
[14]	XU Y, XU R, ITER D, et al. InheritSumm: A general, versatile and compact summarizer by distilling from GPT. ArXiv, 2023, abs/2305. 13083.
[15]	HOU W, ZHAO W, JIA N, et al. Low-resource knowledge graph completion based on knowledge distillation driven by large language models. Applied Soft Computing, 2025,169112622-112622.
[16]	ACHARYA K, VELASQUEZ A, SONG H H. A survey on symbolic knowledge distillation of large language models. IEEE Transactions on Artificial Intelligence, 2024.DOI:10.1109/TAI.2024.3428519.
[17]	ZAGORUYKO S, KOMODAKIS N. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. CoRR, 2016,abs/1612.03928.
[18]	HEO B, KIM J, YUN S, et al. A comprehensive overhaul of feature distillation. CoRR, 2019,abs/1904.01866.
[19]	ZHANG Y, XIANG T, HOSPEDALES T M, et al. Deep Mutual Learning. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018:4320-4328.
[20]	ANIL R, PEREYRA G, PASSOS A T, et al. Large scale distributed neural network training through online distillation. International Conference on Learning Representations (ICLR), Vancouver, Canada, 2018.
[21]	FURLANELLO T, LIPTON Z C, TSCHANEN M, et al. Born again neural networks. International Conference on Machine Learning (ICML), Stockholm, Sweden, 2018: 1602-1611.
[22]	HSIEH C Y, HUANG J, HUANG S, et al. Distilling step-by-step: Training smaller models with less data via reasoning transfer. In Advances in Neural Information Processing Systems (NeurIPS). 2024, https://doi.org/10.48550/arXiv.2305.02301.
[23]	LI Y, LI Z, ZHANG Y, et al. Self-consistency decoding for chain-of- thought distillation. In Proceedings of the International Conference on Machine Learning (ICML), 2024.
[24]	DUAN Z, WANG Y, LI X, et al. DIKWP: A hierarchical knowledge distillation framework for interpretable model compression. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2025.
[25]	JIANG Y, ZHAO X, WU Y, et al. A knowledge distillation-based approach to enhance transparency of classifier models. arXiv preprint arXiv, 2025, 2502.15959. https://doi.org/10.48550/arXiv.2502.15959.
[26]	HUANG Z, WANG N. Like what you like: Knowledge distill via neuron selectivity transfer. 2017. DOI:10.48550/arXiv.1707.01219.
[27]	MIRZADEH I S, FARAJTABAR M, LI A, et al. Improved knowledge distillation via teacher assistant. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(4):5191-5198.
[28]	ZHU S, SHANG R, YUAN B, et al. DynamicKD: An effective knowledge distillation via dynamic entropy correction-based distillation for gap optimizing. Pattern Recognition, 2024,110545.
[29]	GAIDO M, DI GANGI M A, NEGRI M, et al. End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020 // 17th International Conference on Spoken Language Translation, Online, 2020:80-88. Association for Computational Linguistics.
[30]	LOPES R G, FENU S, STARNER T. Data-Free knowledge distillation for deep neural networks. 2017. DOI:10.48550/arXiv.1710.07535.
[31]	WANG X, ZHANG R, SUN Y, et al. KDGAN: Knowledge distillation with generative adversarial networks. Neural Information Processing Systems (NeurIPS), Montreal, Canada. 2018. https://api.semanticscholar.org/CorpusID:53976534.
[32]	AKMEL F, MENG F, LIU M, et al. Few-shot class incremental learning via prompt transfer and knowledge distillation. Image and Vision Computing, 2024,151105251-105251.
[33]	WU Z, SUN S, WANG Y, et al. Knowledge distillation in federated edge learning: A survey. arXiv, 2023. https://arxiv.org/abs/2301.05849.
[34]	CHEN W C, CHANG C C, LEE C R. Knowledge distillation with feature maps for image classification. Asian Conference on Computer Vision (ACCV), Sydney, Australia, 2018:200-215. Springer, Cham. https://doi.org/10.1007/978-3-030-20893-6_13
[35]	ZHU M, HAN K, ZHANG C, et al. Low-resolution visual recognition via deep feature distillation. 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019:3762-3766. doi:10.1109/ICASSP.2019.8682926.
[36]	SUN F, JIA J, HAN X, et al. Small-Sample target detection across domains based on supervision and distillation. Electronics, 2024, 13(24):4975-4975.
[37]	WEI Y, PAN X, QIN H, et al. Quantization Mimic: Towards very tiny CNN for object detection. European Conference on Computer Vision (ECCV), Munich, Germany, 2018. Lecture Notes in Computer Science, 2018. Lecture Notes in Computer Science, vol 11212. Springer, Cham.
[38]	谢新林, 段泽云, 罗臣彦, 等. 边界感知引导多层级特征的知识蒸馏交通场景语义分割算法. 模式识别与人工智能, 2024, 37(9):770-785.
[39]	董增波, 徐诗雨, 陈曦, 等. 电力领域自然语言理解模型的轻量化研究. 哈尔滨理工大学学报,1-8[2025-03-05]. http://kns.cnki.net/kcms/detail/23.1404.N.20231204.1602.020.html.
[40]	申影利, 赵小兵. 语言模型蒸馏的低资源神经机器翻译方法. 计算机工程与科学, 2024, 46(4):743-751.
[41]	YEHUDAI A, CARMELI B, MASS Y, et al. Genie: Achieving human parity in content-grounded datasets generation. arXiv,2024:2401. 14367.
[42]	ZHANG J, XIE R, HOU Y, et al. Recommendation as instruction following: A large language model empowered recommendation approach. 2023. DOI:10.48550/arXiv.2305.07001.
[43]	WANG Y, YU Z, YAO W, et al. PandaLM: An automatic evaluation benchmark for LLM instruction tuning optimization. International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 2024.
[44]	YAO T J, SUN J Q, CAO D F, et al. MuGSI: Distilling GNNs with Multi-Granularity Structural Information for Graph Classification. ACM Web Conference 2024 (WWW 2024), Singapore, 2024:709-720. ACM.
[45]	赵全意, 郑福建, 夏波, 等. 基于深度流形蒸馏网络的高光谱遥感图像场景分类方法. 测绘学报, 2024, 53(12):2404-2415.
[46]	张重阳, 王斌. 基于知识蒸馏的轻量化遥感图像场景分类. 红外与毫米波学报, 2024, 43(5):684-695.
[47]	李大湘, 南艺璇, 刘颖. 面向遥感图像场景分类的双知识蒸馏模型. 电子与信息学报, 2023, 45(10):3558-3567.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献