Journal of Agricultural Big Data ›› 2024, Vol. 6 ›› Issue (4): 509-521.doi: 10.19788/j.issn.2096-6369.000067

Previous Articles     Next Articles

D-PAG: Cross-modal Wolfberry Pest Recognition Model Based on Parameter-Efficient Fine-Tuning

XING JiaLu1(), LIU JianPing1,2,*(), ZHOU GuoMin3,4, LIU LiBo5, WANG Jian6   

  1. 1. College of Computer Science and Engineering, North Minzu University, Yinchuan 750021, China
    2. The Key Laboratory of Images and Graphics Intelligent Processing of State Ethnic Affairs Commission, Yinchuan 750021, China
    3. Nanjing Institute of Agricultural Mechanization, Ministry of Agriculture and Rural Affairs, Nanjing 210014, China
    4. National Agriculture Science Data Center, Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
    5. School of Information Engineering Ningxia University, Yinchuan 750021, China
    6. Agricultural Information institute of CAAS, Beijing 100081, China
  • Received:2024-09-09 Accepted:2024-10-14 Online:2024-12-26 Published:2024-12-02
  • Contact: LIU JianPing

Abstract:

With the development of multimodal foundation models (large models), efficiently transferring them to specific domains or tasks has become a current hot topic. This study uses the multimodal large model CLIP as the base model and employs parameter-efficient fine-tuning methods, such as Prompt and Adapter, to adapt CLIP to the task of goji berry pest identification. It introduces a cross-modal parameter-efficient fine-tuning model for goji berry pest recognition, named D-PAG. Firstly, learnable Prompts and Adapters are embedded in the input or hidden layers of the CLIP encoder to capture pest features. Then, gated units are utilized to integrate the Prompt and Adapter, further balancing the learning capacity. A GCS-Adapter is designed within the Adapter to enhance the attention mechanism for cross-modal semantic information fusion. To validate the effectiveness of the method, experiments were conducted on the goji berry pest dataset and the fine-grained dataset IP102. The experimental results indicate that with only 20% of the sample size, an accuracy of 98.8% was achieved on the goji dataset, and an accuracy of 99.5% was reached with 40% of the samples. On IP102, an accuracy of 75.6% was attained, comparable to ViT. This approach allows for efficient transfer of the foundational knowledge of multimodal large models to the specific domain of pest recognition with minimal additional parameters, providing a new technical solution for efficiently addressing agricultural image processing problems.

Key words: wolfberry, pest identification, parameter-efficient fine-tuning, large model, CLIP