Journal of Agricultural Big Data ›› 2024, Vol. 6 ›› Issue (3): 412-423.doi: 10.19788/j.issn.2096-6369.000052

Previous Articles     Next Articles

Construction Process and Technological Prospects of Large Language Models in the Agricultural Vertical Domain

ZHANG YuQin1,2(), ZHU JingQuan3, DONG Wei2, LI FuZhong1,*(), GUO LeiFeng2,*()   

  1. 1. College of Software, Shanxi Agricultural University, Jinzhong 030801, Shanxi, China
    2. Agricultural Information Institute of Chinese Academy of Agricultural Sciences, Beijing 100081, China
    3. The National Agro-Tech Extension and Service Center, Beijing 100125, China
  • Received:2024-05-23 Accepted:2024-06-23 Online:2024-09-26 Published:2024-10-01
  • Contact: LI FuZhong, GUO LeiFeng

Abstract:

With the proliferation of the internet, accessing agricultural knowledge and information has become more convenient. However, this information is often static and generic, failing to provide tailored solutions for specific situations. To address this issue, vertical domain models in agriculture combine agricultural data with large language models (LLMs), utilizing natural language processing and semantic understanding technologies to provide real-time answers to agricultural questions and play a crucial role in agricultural decision-making and extension. This paper details the construction process of LLMs in the agricultural vertical domain, including data collection and preprocessing, selecting appropriate pre-trained LLM base models, fine-tuning training, Retrieval Augmented Generation (RAG), evaluation. The paper also discusses the application of the LangChain framework in agricultural Q&A systems. Finally, the paper summarizes some challenges in building LLMs for the agricultural vertical domain, including data security challenges, model forgetting challenges, and model hallucination challenges, and proposes future development directions for agricultural models, including the utilization of multimodal data, real-time data updates, the integration of multilingual knowledge, and optimization of fine-tuning costs to further promote the intelligence and modernization of agricultural production.

Key words: LLMs, RAG, LangChain, agricultural Q&A systems