农业大数据学报 ›› 2021, Vol. 3 ›› Issue (3): 55-61.doi: 10.19788/j.issn.2096-6369.210306

• 专题——农业模型 • 上一篇    下一篇

基于梯度提升迭代决策树模型的渔船转移数据挖掘

李怡德1(), 鲁峰1,2(), 朱勇1, 徐硕1,2, 孙璐1   

  1. 1.中国水产科学研究院渔业工程研究所,北京 100141
    2.青岛海洋科学与技术试点国家实验室,青岛 266237
  • 收稿日期:2021-05-11 出版日期:2021-09-26 发布日期:2021-12-22
  • 通讯作者: 鲁峰 E-mail:liyd@cafs.ac.cn;lufeng@cafs.ac.cn
  • 作者简介:李怡德,男,助理研究员,硕士,研究方向:渔船管理信息化、数据挖掘等。E-mail:liyd@cafs.ac.cn
  • 基金资助:
    渔业通信导航与大数据创新团队项目(2020TD84);山东省支持青岛海洋科学与技术试点国家实验室重大科技专项(2018SDKJ0103-2);中国水产科学研究院渔业工程研究所基本科研业务费专项(2019HY-ZC001-3)

Data Mining for Fishing Vessel Purchase Based on Gradient Boosting Decision Tree Algorithm

Yide Li1(), Feng Lu1,2(), Yong Zhu1, Shuo Xu1,2, Lu Sun1   

  1. 1.Institute of Fisheries Engineering, Chinese Academy of Fishery Sciences, Beijing 100141, China
    2.Pilot National Laboratory for Marine Science and Technology (Qingdao), Qingdao 266237, China
  • Received:2021-05-11 Online:2021-09-26 Published:2021-12-22
  • Contact: Feng Lu E-mail:liyd@cafs.ac.cn;lufeng@cafs.ac.cn

摘要:

渔船转移是海洋渔船日常管理过程中的一项关键业务,也是所有渔船管理业务中涉及流程最多、数据传递量最大的业务,通过对大量渔船历史转移数据进行处理分析,可挖掘出与渔船转移活动相关的潜在决定性因子,对保障渔民经济利益和制定渔船管理政策等活动具有重要意义。本文基于中国渔政管理指挥系统中的渔船基础数据和渔船转移数据,并以浙江省为典型案例,选取2018年1月至2020年7月共计5641条渔船的历史转移业务数据进行数值化处理。采用梯度提升迭代决策树(GBDT)算法进行分类器逐级迭代,给出了特征分类结果与模型训练集,并最终构建了渔船被交易潜在可能性的单决策树和多决策树模型。通过模型中船龄、船长、船体材质、作业类型等渔船基本参数的权重,分析了渔民购置渔船的倾向性。结果表明:不同类型的渔船,被购置的可能性存在较大的差异,大船长、大吨位、高船龄、拖网及张网作业类型是渔船发生转移的重要决定因子。对比各项特征损失函数计算得到的损失值大小,20年船龄、大中型船长等特征的损失值比其他特征损失值小15%以上,意味着使用所选特征进行计算的分类识别率更高。本研究通过定量化分析渔民购置渔船的倾向性,可在渔船转移过程中最大化保障渔民的经济利益,同时可对渔船管理政策的制定起到辅助决策作用。

关键词: 渔船转移, GBDT算法, 决策树, 数据挖掘, 渔业大数据, 迭代决策树, 渔船管理

Abstract:

The purchase of a fishing vessel is a significant and complex process in the daily management of marine fishing fleets, and it yields the largest amount of data in all fishing vessel management operations. Through processing and analysis of the historical purchase data of fishing vessels, the potential decisive factors related to the purchase of fishing vessels can be found. This is significant to the protection of fishermen's economic interests and the development of fishing vessel management policies. We extracted and numerically processed the historical purchase data of fishing vessels from January 2018 to July 2020 using the physical and purchase data of fishing vessels in the Chinese Fishery Law Enforcement Command System (CFLECS) and taking Zhejiang Province as a typical case. The gradient boosting iterative decision tree (GBDT) algorithm was used to iterate the classifier regularly. We produced the results of feature classification and training set, and these were used to generate single decision tree and multiple decision tree models. We calculated the weight of the basic parameters of fishing vessels, such as length, material, and fishing type, to predict the potential possibility of fishing vessel transactions and to analyze the tendencies of fishermen when purchasing fishing vessels. The results indicate that age, length, trawler, and stow net are the principal determinants of fishing vessel transactions. The trawler and stow net vessel can only be obtained through the fishing vessel transaction. Thus, when the fishing vessel types are different, there is a great difference in the possibility of their being purchased. By comparing the loss functions of various features, we can find that the loss values of features with 20 years of age and ship length are more than 15% smaller than the loss values of other features, which means that the classification recognition rate calculated with selected features is higher. Consequently, quantitative analysis for fishermen's propensity to purchase fishing vessels can maximize fishermen's economic interests, and it can also play an auxiliary role in the formulation of fishing vessel management policy.

Key words: fishing vessel transaction, GBDT algorithm, decision tree, data mining, big data in fishery, Gradient Boosting Decision Tree, fishing vessel management

中图分类号: 

  • S972.7