基于改进YOLOv8的番茄目标检测算法研究

doi:10.19788/j.issn.2096-6369.000075

摘要/Abstract

摘要：

随着农业智能化进程的加快，基于深度学习、机器人等人工智能技术在农业生产中的应用也越来越受到关注。针对现有番茄果实识别方法在复杂环境下误识率高、定位精度低和采摘效率低等问题,本文提出了一种改进的YOLOv8网络模型,旨在提高番茄果实自动化采摘的检测精度和速度。该网络以YOLOv8为初始模型,在其骨干网络中添加了可变形卷积模块(DCN),有效提升模型对小目标的检测精度,降低漏检率；在Neck端引入SE注意力机制模块，提高对检测目标的关注度；采用Inner-IoU损失函数来替代原有的CIoU损失函数,提高目标检测中边界框的回归精度。本研究将改进后的YOLOv8模型与SSD、YOLOv4、YOLOv5、YOLOv7网络模型对比，平均精度分别提高了7.2、6.4、6.6、7.7个百分点，改进后的YOLOv8模型较原模型的准确率提升了3.8％，召回率上升了0.6％，同时mAP@0.5和mAP@[0.5:0.95]分别提高了约2.6％和1.9％。研究表明改进的YOLOv8模型能够有效提高番茄果实的自动化采摘检测精度和速度，对实现番茄的自动化采摘具有重要意义。

关键词: 番茄, YOLOv8, 目标识别, 可变形卷积, 注意力机制

Abstract:

With the acceleration of the process of agricultural intelligence, the application of artificial intelligence technologies based on deep learning and robotics in agricultural production has attracted more and more attention. In order to solve the problems of high false recognition rate, low positioning accuracy and low picking efficiency of existing tomato fruit recognition methods in complex environments, an improved YOLOv8 network model was proposed to improve the detection accuracy and speed of automatic tomato fruit picking. The network takes YOLOv8 as the initial model, and adds the Deformable Convolution Module (DCN) to its backbone network, which effectively improves the detection accuracy of the model for small targets and reduces the missed detection rate. The SE attention mechanism module was introduced on the Neck side to improve the attention to the detection target. The Inner-IoU loss function is used to replace the original CIoU loss function to improve the regression accuracy of the bounding box in object detection. In this study, the average accuracy of the improved YOLOv8 model was increased by 7.2, 6.4, 6.6, and 7.7 percentage points compared with the SSD, YOLOv4, YOLOv5, and YOLOv7 network models, respectively, and the accuracy of the improved YOLOv8 model increased by 3.8%, the recall rate increased by 0.6%, and the mAP@0.5 and mAP@[0.5:0.95] increased by about 2.6% and 1.9%, respectively. The results show that the improved YOLOv8 model can effectively improve the accuracy and speed of automatic picking and detection of tomato fruits, which is of great significance for the realization of automatic picking of tomatoes.

Key words: tomato, YOLOv8, target recognition, deformable convolution network, attention mechanisms

吴丹, 马晓君, 刘德胜, 宋伟, 苏文献. 基于改进YOLOv8的番茄目标检测算法研究[J]. 农业大数据学报, 2025, 7(3): 281-293.

WU Dan, MA XiaoJun, LIU DeSheng, SONG Wei, SU WenXian. Tomato Object Detection Algorithm Based on YOLOv8[J]. Journal of Agricultural Big Data, 2025, 7(3): 281-293.

图/表 16

图1

图2

图3

图4

图5

图6

图7

表1

表2

表3

表4

表5

表6

图8

图9

图10

参考文献 32

[1]	马丽丽, 白春美, 周新原, 等. 低温贮藏对高品质番茄果实采后生理的影响. 北方园艺, 2023(14):97-104.
	MA L, BAI C, ZHOU X, et al. Effects of low temperature storageon postharvest physiological of high-quality tomato fruit. Northern Horticulture, 2023(14):97-104.
[2]	李天华, 孙萌, 丁小明, 等. 基于YOLO v4 + HSV 的成熟期番茄识别方法. 农业工程学报, 2021, 37(21):183-190.
	LI T, SUN M, DING X, et al. Tomato recognition method at the ripening stage based on YOLO v4 and HSV. Transactions of the CSAE, 2021, 37(21):183-190.
[3]	宋怀波, 尚钰莹, 何东健. 果实目标深度学习识别技术研究进展. 农业机械学报, 2023, 54(1):1-19.
	SONG H, SHANG Y, HE D. Review on deep learning technology for fruit target recognition. Transactions of the Chinese Society for Agricultural Machinery, 2023, 54(1):1-19.
[4]	王瑾, 王瑞荣, 李晓红. 番茄采摘机器人目标识别方法研究. 江苏农业科学, 2021, 49(20):217-222.
	WANG J, WANG R, LI X. Study on target recognition method of tomato picking robot. Jiangsu Agricultural Sciences, 2021, 49(20):217-222.
[5]	冯青春, 程伟, 杨庆华, 等. 基于线结构光视觉的番茄重叠果实识别定位方法研究. 中国农业大学学报, 2015, 20(4):100-106.
	FENG Q, CHENG W, YANG Q, et al. Identification and localization of overlapping tomatoes based on linear structured light vision system. Journal of China Agricultural University, 2015, 20(4):100-106.
[6]	孙建桐, 孙意凡, 赵然, 等. 基于几何形态学与迭代随机圆的番茄识别方法. 农业机械学报, 2019, 50(S1):22-26+61.
	SUN J, SUN Y, ZHAO R, et al. Tomato recognition method based on iterative random circle and geometric morphology. Transactions of the Chinese Society of Agricultural Machinery, 2019, 50(S1):22-26+61.
[7]	李寒, 张漫, 高宇, 等. 温室绿熟番茄机器视觉检测方法. 农业工程学报, 2017, 33(S1):328-334.
	LI H, ZHANG M, GAO Y, et al. Green ripe tomato detection method based on machine vision in greenhouse. Transactions of the Chinese Society of Agricultural Engineering, 2017, 33(z1):328-334.
[8]	韩鑫, 余永维, 杜柳青. 基于改进单次多框检测算法的机器人抓取系统. 计算机应用, 2020, 40(8):2434-2440. doi: 10.11772/j.issn.1001-9081.2019122234
	HAN X, YU Y, DU L. Robotic grasping system based on improved single shot multibox detector algorithm. Journal of Computer Applications, 2020, 40(8):2434-2440. doi: 10.11772/j.issn.1001-9081.2019122234
[9]	SEO D M, WOO H J, KIM M S, et al. Identification of asbestos slates in buildings based on faster region- based convolutional neural network (Faster R-CNN) and drone-based aerial imagery. Drones, 2022, 6(8):194.
[10]	GIRSHICK R. Fast r-cnn[C]. Proceedings of the IEEE international conference on computer vision. 2015:1440-1448.
[11]	周云成, 许童羽, 郑伟, 等. 基于深度卷积神经网络的番茄主要器官分类识别方法. 农业工程学报, 2017, 33(15):219-226.
	ZHOU Y, XU T, ZHENG W, et al. Classification and recognition approaches of tomato main organs based on DCNN. Transactions of the Chinese Society of Agricultural Engineering, 2017, 33(15):219-226.
[12]	REDMON J, DIVVALA. You only look once: Unified, real-time object detection[C]. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016:779-788.
[13]	JIANG P Y, ERGU D J, LIU F Y, et al. A review of YOLO algorithm developments. Procedia Computer Science, 2022, 199:1066-1073.
[14]	PIRASTEH S, RASHIDI P, RASTIVEIS H, et al. Developing an algorithm for buildings extraction and determining changes from airborne LiDAR, and comparing with R-CNN method from drone images. Remote Sensing, 2019, 11 (11):1272.
[15]	LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018:8759-8768.
[16]	刘芳, 刘玉坤, 林森, 等. 基于改进型YOLO的复杂环境下番茄果实快速识别方法. 农业机械学报, 2020, 51(6):229-237.
	LIU F, LIU Y, LIN S, et al. Fast recognition method for tomatoes under complex environments based on improved YOLO. Transactions of the Chinese Society of Agricultural Machinery, 2020, 51(6):229-237.
[17]	JUN J, KIM J, SEOL J, et al. Towards an efficient tomato harvesting robot:3D percetion, manipulation, and end-effector. IEEE Access, 2021, 9:17631-17640.
[18]	REDMON J, FARHADI A. YOLOv3:An incremental improvement[EB/OL]. 2018: 1804.02767. http:/arxiv.org/abs/1804.02767v1.
[19]	孙丰刚, 王云露, 兰鹏, 等. 基于改进YOLOv5s和迁移学习的苹果果实病害识别方法. 农业工程学报, 2022, 38(11):171-179.
	SUN F G, WANG Y L, LAN P, et al. Identification of apple fruit diseases using improved YOLOv5s and transfer learning. Transactions of the Chinese Society of Agricultural Engineering, 2022, 38(11):171-179.
[20]	朱智惟, 单建华, 余贤海, 等. 基于YOLOv5s的番茄采摘机器人目标检测技术. 传感器与微系统, 2023, 42(6):129-132.
	ZHU Z, SHAN J, YU X, et al. Target detection technology of tomato picking robot based on YOLOv5s. Transducer and Microsystem Technologies, 2023, 42(6):129-132.
[21]	赵元龙, 单玉刚, 袁杰. 改进YOLOv7与DeepSORT的佩戴口罩行人跟踪. 计算机工程与应用, 2023, 59(6):221-230. doi: 10.3778/j.issn.1002-8331.2210-0479
	ZHAO Y, SHAN Y, YUAN J. Wearing mask pedestrian tracking based on improved YOLOv7 and DeepSORT. Computer Engineering and Applications, 2023, 59(6):221-230. doi: 10.3778/j.issn.1002-8331.2210-0479
[22]	苗荣慧, 李志伟, 武锦龙. 基于改进YOLO v7的轻量化樱桃番茄成熟度检测方法. 农业机械学报, 2023, 54(10):225-233.
	MIAO R H, LI Z W, WU J L. Lightweight maturity detection of cherry tomato based on improved YOLO v7. Transactions of the Chinese Society of Agricultural Machinery, 2023, 54(10):225-233.
[23]	TERVEN J, CORDOVA-ESPARZA D. A comprehensive review of YOLO:From YOLOv1 to YOLOv8 and beyond[EB/OL]. arXiv:2304.00501, 2023.
[24]	REIS D, KUPEC J, HONG J, et al. Real-time flying object detection with YOLOv8. arXiv:2305.09972, 2023.
[25]	DAI J, QI H, XIONG Y, et al. Deformable convolutional networks[C]. Proceedings of the IEEE international conference on computer vision. 2017:764773.
[26]	ZHU X, HU H, LIN S, et al. Deformable convnets v2: More deformable, better results[C]. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019:9308-9316.
[27]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]. Proceedings of the IEEE conference on computer vision and pattern recognition. 2018:71327141.
[28]	TONG Z, CHEN Y, XU Z, et al. Wise-IoU: Bounding box regression loss with dynamic focusing mechanism[EB/OL]. [2023-10-20] https://arxiv.org/pdf/2301.10051.
[29]	ZHANG Y F, REN W, ZHANG Z, et al. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing, 2022, 506:146-157.
[30]	ZHORA G. SIoU Loss: More powerful learning for bounding box regression[EB/OL]. (2022-05-25)[2023-06-10] http://arxiv.org/pdf/2205.12740.pdf.
[31]	ZHENG Z, WANG P, LIU W, et al. Distance-IoU loss: Fasterand better learning for bounding box regression[C]. Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, California USA: AAAI Press, 2020, 34(7):12993-13000.
[32]	ZHANG H, XU C, ZHANG S. Inner-IoU: More effective intersection over union loss with auxiliary bounding box. arXiv preprint arXiv: 2311.02877, 2023.

配置	名称	具体信息
硬件环境	CPU	Intel （R）Core（TM）i7-11700
	GPU	英伟达 GTX 3080TI
	显存	12 GB
	内存	32 GB
软件环境	操作系统	Windows 11
	Python	Python-3.8.10
	Pytorch	torch-1.9.0+cu111
	CUDA	11.7

参数	设置	参数	设置
epochs	300	close_mosaic	10
patience	50	warmupepochs	3.0
batch	16	lrf	0.01
imgsz	224×224、416×416、640×640、800×800、960×960	lr0	0.01
workers	4	momentum	0.937
optimizer	SGD	weight_decay	0.0005

ratio	P(%)	R(%)	mAP@0.5(%)	mAP@[0.5:0.95](%)
0.5	55.9	44.3	45.4	27.9
0.7	56.1	45.6	45.2	27.6
1	57.7	44.2	46.1	28.3
1.25	56.4	45.3	45.8	28.5
1.5	56.9	45.7	46.0	28.1

模型 Method				P (%)	R (%)	mAP@0.5 (%)	mAP@[0.5:0.95] (%)
YOLOV8	DCN	SE	Inner-IOU	P (%)	R (%)	mAP@0.5 (%)	mAP@[0.5:0.95] (%)
√				57.7	60.1	58.5	45.3
√	√			59.2	61.7	59.5	45.9
		√		60.5	59.9	59.4	46.4
√			√	57.7	44.2	46.1	28.3
√	√	√		61.5	60.2	60.0	46.8
√	√	√	√	61.5	60.7	61.1	47.2

输入图像尺寸/(像素×像素)	GPU参数量	P精确率(%)	R召回率(%)	mAP@0.5(%)
224×224	5.80×10⁸	56.2	56.6	57.2
416×416	1.83×10⁹	57.3	59.6	58.1
640×640	4.52×10⁹	57.7	60.1	58.5
800×800	6.27×10⁹	57.7	60.5	58.9
960×960	9.47×10⁹	57.0	61.4	58.2