农业大数据学报 ›› 2024, Vol. 6 ›› Issue (4): 522-531.doi: 10.19788/j.issn.2096-6369.000066

• • 上一篇    下一篇

基于空间特征融合ViT的枸杞虫害细粒度分类方法

孙露露1(), 刘建平1,2,*(), 周国民3,4, 王健5, 刘立波6   

  1. 1.北方民族大学计算机科学与工程学院,银川 750021
    2.图像图形智能处理国家民委重点实验室,银川 750021
    3.农业农村部南京农业机械化研究所,南京 210014
    4.中国农业科学院农业信息研究所国家农业科学数据中心,北京 100081
    5.中国农业科学院农业信息研究所,北京 100091
    6.宁夏大学信息工程学院,银川 750021
  • 收稿日期:2024-09-09 接受日期:2024-10-14 出版日期:2024-12-26 发布日期:2024-12-02
  • 通讯作者: 刘建平,E-mail:liujianping01@nmu.edu.cn
  • 作者简介:孙露露,E-mail:20227515@stu.nmu.edu.cn
  • 基金资助:
    国家自然科学基金项目(32460444);北方民族大学重点科研项目((2023ZRLG12);北方民族大学研究生创新项目(YCX23168)

Spatial Feature Fusion-Based ViT Method for Fine-Grained Classification of Wolfberry Pests

SUN LuLu1(), LIU JianPing1,2,*(), ZHOU GuoMin3,4, WANG Jian5, LIU LiBo6   

  1. 1. College of Computer Science and Engineering, North Minzu University, Yinchuan 750021, China
    2. The Key Laboratory of Images and Graphics Intelligent Processing of State Ethnic Affairs Commission, Yinchuan 750021, China
    3. Nanjing Institute of Agricultural Mechanization, Ministry of Agriculture and Rural Affairs, Nanjing 210014, China
    4. National Agriculture Science Data Center, Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
    5. Agricultural Information institute of CAAS, Beijing 100091, China
    6. School of Information Engineering Ningxia University, Yinchuan 750021, China
  • Received:2024-09-09 Accepted:2024-10-14 Published:2024-12-26 Online:2024-12-02

摘要:

针对枸杞种植中面临的虫害细粒度分类难题,提出了一种农业虫害细粒度分类模型——基于空间特征融合的数据增强视觉Transformer(ESF-ViT)。首先,该模型利用自注意力机制裁剪出前景目标的图片以增强图像输入,补充更多细节表示;其次,结合自注意力机制与图卷积网络提取害虫区域的空间信息,学习害虫的空间姿态特征。为了验证本文所提模型的有效性,在CUB-200-2011、IP102以及宁夏枸杞虫害数据集WPIT9K上开展实验研究,实验结果表明本文所提方法相比基础ViT模型分别提升了1.83%、2.09%和2.01%,并且超越了现有最新的虫害分类模型。所提模型能够有效地解决农业虫害识别领域中的细粒度虫害图像分类问题,为虫害的高效监测预警提供视觉模型。

关键词: 枸杞, 视觉 Transformer, 细粒度图像分类, 空间特征融合, 数据增强

Abstract:

To address the fine-grained pest classification challenge faced in wolfberry cultivation, we propose an agricultural pest fine-grained classification model—Spatial Feature Fusion-based Data Augmented Visual Transformer (ESF-ViT). The model first utilizes the self-attention mechanism to crop images of the foreground targets to enhance image input and supplement more detailed representations. Secondly, it combines the self-attention mechanism with a Graph Convolutional Network (GCN) to extract spatial information from the pest regions, learning the spatial posture features of the pests. To validate the effectiveness of the proposed model, we conducted experimental research on the CUB-200-2011, IP102, and Ningxia wolfberry pest dataset WPIT9K. The experimental results show that the proposed method outperforms the baseline ViT model by 1.83%, 2.09%, and 2.01% respectively, and surpasses the existing state-of-the-art pest classification models. The proposed model effectively solves the fine-grained pest image classification problem in the field of agricultural pest recognition, providing a visual model for efficient pest monitoring and early warning.

Key words: wolfberry berry, vision transformer, fine-grained image classification, spatial feature fusion, data augmentation