Journal of Agricultural Big Data ›› 2024, Vol. 6 ›› Issue (4): 522-531.doi: 10.19788/j.issn.2096-6369.000066

Previous Articles     Next Articles

Spatial Feature Fusion-Based ViT Method for Fine-Grained Classification of Wolfberry Pests

SUN LuLu1(), LIU JianPing1,2,*(), ZHOU GuoMin3,4, WANG Jian5, LIU LiBo6   

  1. 1. College of Computer Science and Engineering, North Minzu University, Yinchuan 750021, China
    2. The Key Laboratory of Images and Graphics Intelligent Processing of State Ethnic Affairs Commission, Yinchuan 750021, China
    3. Nanjing Institute of Agricultural Mechanization, Ministry of Agriculture and Rural Affairs, Nanjing 210014, China
    4. National Agriculture Science Data Center, Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
    5. Agricultural Information institute of CAAS, Beijing 100091, China
    6. School of Information Engineering Ningxia University, Yinchuan 750021, China
  • Received:2024-09-09 Accepted:2024-10-14 Online:2024-12-26 Published:2024-12-02
  • Contact: LIU JianPing

Abstract:

To address the fine-grained pest classification challenge faced in wolfberry cultivation, we propose an agricultural pest fine-grained classification model—Spatial Feature Fusion-based Data Augmented Visual Transformer (ESF-ViT). The model first utilizes the self-attention mechanism to crop images of the foreground targets to enhance image input and supplement more detailed representations. Secondly, it combines the self-attention mechanism with a Graph Convolutional Network (GCN) to extract spatial information from the pest regions, learning the spatial posture features of the pests. To validate the effectiveness of the proposed model, we conducted experimental research on the CUB-200-2011, IP102, and Ningxia wolfberry pest dataset WPIT9K. The experimental results show that the proposed method outperforms the baseline ViT model by 1.83%, 2.09%, and 2.01% respectively, and surpasses the existing state-of-the-art pest classification models. The proposed model effectively solves the fine-grained pest image classification problem in the field of agricultural pest recognition, providing a visual model for efficient pest monitoring and early warning.

Key words: wolfberry berry, vision transformer, fine-grained image classification, spatial feature fusion, data augmentation