
农业大数据学报 ›› 2026, Vol. 8 ›› Issue (1): 24-35.doi: 10.19788/j.issn.2096-6369.000120
赵晓燕1,2,4(
), 周焕斌2,3, 周国民2,4,5,6,*(
), 张建华1,2,4,*(
)
收稿日期:2025-06-18
接受日期:2025-09-28
出版日期:2026-03-26
发布日期:2026-04-01
通讯作者:
张建华,E-mail:zhangjianhua@caas.cn;作者简介:赵晓燕,E-mail:2896216851@qq.com。
基金资助:
ZHAO XiaoYan1,2,4(
), ZHOU HuanBin2,3, ZHOU GuoMin2,4,5,6,*(
), ZHANG JianHua1,2,4,*(
)
Received:2025-06-18
Accepted:2025-09-28
Published:2026-03-26
Online:2026-04-01
摘要:
近年来,基因编辑技术发展迅猛,已成为基础基因功能研究与生物育种的核心工具。并且随着计算机和大数据的增长推动了深度学习在基因编辑中的应用,深度学习技术在优化基因编辑过程,特别是在提升作物改良效率方面,正发挥着日益显著的作用。本文综述了深度学习在基因编辑优化方面的研究进展,重点介绍了深度学习在基因编辑效率和特异性增强方面的应用。此外,文章深入探讨了深度学习与基因编辑深度融合所面临的技术挑战,并展望了其未来发展前景。通过将先进的基因编辑技术与深度学习相结合,未来将进一步加快作物育种的进展。
赵晓燕, 周焕斌, 周国民, 张建华. 深度学习在作物基因编辑技术的应用与研究进展[J]. 农业大数据学报, 2026, 8(1): 24-35.
ZHAO XiaoYan, ZHOU HuanBin, ZHOU GuoMin, ZHANG JianHua. Application of Deep Learning in Crop Gene Editing Technology and Research Progress[J]. Journal of Agricultural Big Data, 2026, 8(1): 24-35.
表1
代表性的基于深度学习的guide RNA设计工具"
| 名称 Tool name | 链接 Accession link | 主要功能 Main Functions | 年份 Year | 引用 References |
|---|---|---|---|---|
| GPP Web Portal | | 结合深度学习预测sgRNA切割效率及脱靶效应,支持CRISPRko/i/a设计 | 2023 | / |
| DeepHF | | 基于LSTM预测sgRNA在人类细胞中的编辑效率 | 2019 | [ |
| DeepCRISPR | | 结合CNN预测sgRNA活性与脱靶效应 | 2018 | [ |
| CRISPR-Net | | 基于CNN和注意力机制优化sgRNA效率预测 | 2020 | [ |
| SpliceRover | | 专为CRISPR剪接调控设计的sgRNA优化工具 | 2018 | [ |
| CRISPRO | | 结合gRNA结构预测sgRNA结合效率 | 2020 | [ |
| DeepPE | | 预测不同gRNA序列的编辑效率 | 2020 | [ |
| DeepSpCas9 | | 预测不同gRNA的活性 | 2019 | [ |
表2
代表性的基于深度学习的蛋白质优化相关工具"
| 名称 Tool name | 链接 Accession link | 主要功能 Main Functions | 年份 Year | 引用 References |
|---|---|---|---|---|
| Protein2PAM | | 基于45,000+ CRISPR-Cas进化数据集训练,预测并定制Cas蛋白的PAM识别能力,突破靶向范围限制 | 2025 | [ |
| PRO-PRIME | | 引入物种温度标签的语言模型,预测单点突变对蛋白稳定性与活性的影响,在LbCas12a等5种蛋白中实现>30%阳性突变率 | 2024 | [ |
| AlphaFold2 | | 蛋白质结构预测或RNA结构预测 | 2018 | / |
| DeepChrome | | 基于 CNN 的组蛋白修饰数据预测基因表达水平 | 2020 | [ |
| DeepHistone | | 基于CNN的序列和DNase测序准确预测组蛋白修饰位点 | 2021 | [ |
| DeepFIGV | | 基于CNN预测对染色质可及性和组蛋白修饰的影响 | 2020 | [ |
表3
代表性的基于深度学习的Cas等功能蛋白注释相关工具"
| 名称 Tool name | 可标注单位 Annotatable units | 链接 Accession link | 年份 Year | 引用 References |
|---|---|---|---|---|
| CRISPRcasIdentifier | Cas gene | | 2020 | [ |
| CASPredict | Cas gene | | 2021 | [ |
| CRISPRcasStack | Cas gene | | 2022 | [ |
| CRISPR-Cas-Docker | Cas gene | | 2020 | / |
| CRISPRloci | Cas gene (based on CRISPRcasIdentifier) | | 2021 | [ |
| [1] | 赖郑诗雨, 黄赞唐, 孙洁婷, 等. CRISPR/Cas基因组编辑技术及其在农作物品种改良中的应用. 科学通报, 2022, 67: 1923-1937. |
| LAI Z S Y, HUANG Z T, SUN J T, et al. The recent progress of CRISPR/Cas genome editing technology and its application in crop improvement. Chinese Science Bulletion, 2022, 67: 1923-1937. | |
| [2] |
LI J, WU P, CAO Z, et al. Machine learning-based prediction models to guide the selection of Cas9 variants for efficient gene editing. Cell Reports, 2024, 43(2): 113765.
doi: 10.1016/j.celrep.2024.113765 |
| [3] |
CHEN K, WANG Y, ZHANG R, et al. CRISPR/Cas genome editing and precision plant breeding in agriculture. Annual Review of Plant Biology, 2019, 70: 667-697.
doi: 10.1146/annurev-arplant-050718-100049 pmid: 30835493 |
| [4] |
ZHANG B. CRISPR/Cas9: A robust genome-editing tool withversatile functions and endless application. International Journal of Molecular Sciences, 2020, 21: 5111.
doi: 10.3390/ijms21145111 |
| [5] |
WATERMAN D P, HABER J E, SMOLKA M B. Checkpoint responses to DNA double-strand breaks. Annual Review of Biochemistry, 2020, 89:103-133.
doi: 10.1146/annurev-biochem-011520-104722 pmid: 32176524 |
| [6] | MORTON J, DAVIS M W, JORGENSEN E M, et al. Induction and repair of zinc-finger nuclease-targeted double-strand breaks in Caenorhabditis elegans somatic cells. Proceedings of the National Academy of Sciences of the United States of America, 2006, 103(44):16370-16375. |
| [7] |
CARROLL D. Genome engineering with zinc-finger nucleases. Genetics, 2011, 188:773-778.
doi: 10.1534/genetics.111.131433 pmid: 21828278 |
| [8] |
JOUNG J K, SANDER J D. TALENs: A widely applicable technology for targeted genome editing. Nature Reviews of Molecular Cell Biology, 2013, 14: 49-55.
doi: 10.1038/nrm3486 |
| [9] |
RICHTER C, CHANG J T, FINERAN P C. Function and regulation of clustered regularly interspaced short palindromic repeats(CRISPR) / CRISPR associated (Cas) systems. Viruses, 2012, 4: 2291-2311.
doi: 10.3390/v4102291 |
| [10] |
ESVELT K M, MALI P, BRAFF J L, et al. Orthogonal Cas9 proteins for RNA-guided gene regulation and editing. Nature Methods, 2013, 10: 1116-1121.
doi: 10.1038/nmeth.2681 pmid: 24076762 |
| [11] |
ZHANG X, CHENG J, LIN Y, et al. Editing homologous copies of an essential gene affords crop resistance against two cosmopolitan necrotrophic pathogens. Plant Biotechnology Journal, 2021, 19(11): 2349-2361.
doi: 10.1111/pbi.13667 pmid: 34265153 |
| [12] |
ZHANG J, ZHOU Z, BAI J, et al. Disruption of MIR396e and MIR396f improves rice yield under nitrogen-deficient conditions. National Science Review, 2019, 7: 102-112.
doi: 10.1093/nsr/nwz142 |
| [13] |
KOMOR A C, KIM Y B, PACKER M S, et al. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature, 2016, 533: 420-424.
doi: 10.1038/nature17946 |
| [14] |
DOM L, RAGURAM A, NEWBY G A, et al. Evaluation and minimization of Cas9-independent off-target DNA editing by cytosine base editors. Nature Biotechnology, 2020, 38:620-628.
doi: 10.1038/s41587-020-0414-6 pmid: 32042165 |
| [15] |
ZHAO D, LI J, LI S, et al. Glycosylase base editors enable C-to-a and C-to-G base changes. Nature Biotechnology, 2021, 39: 35-40.
doi: 10.1038/s41587-020-0592-2 |
| [16] |
CHEN P J, LIU D R. Prime editing for precise and highly versatile genome manipulation. Nature Reviews Genetics, 2022, 24:161-177.
doi: 10.1038/s41576-022-00541-1 pmid: 36344749 |
| [17] |
SHAH M A, JIARUI K, RUOFU T, et al. CRISPR/Cas 9 mediated knockout of the OsbHLH024 transcription factor improves salt stress resistance in rice (Oryza sativa L.). Plants, 2022, 11(9):1184-1184.
doi: 10.3390/plants11091184 |
| [18] |
LI S, ZHANG Y, LIU Y, et al. The E3 ligase TaGW2 mediates transcription factor TaARR12 degradation to promote drought resistance in wheat. The Plant cell, 2024, 36(3):605-625.
doi: 10.1093/plcell/koad307 |
| [19] |
LIU L, GALLAGHER J, AREVALO E D, et al. Enhancing grain-yield-related traits by CRISPR-Cas9 promoter editing of maize CLE genes. Nature plants, 2021, 7(3): 287-294.
doi: 10.1038/s41477-021-00858-5 pmid: 33619356 |
| [20] |
WANG Y X, LIU X Q, ZHENG X X, et al. Creation of aromatic maize by CRISPR/Cas. Journal of Integrative Plant Biology, 2021, 63(9):1664-1670.
doi: 10.1111/jipb.13105 |
| [21] |
LI H L, WANG L B, DAI Y Z, et al. Synergetic interaction between neighbouring platinum monomers in CO2 hydrogenation. Nature Nanotechnology, 2018, 13(5):411-417.
doi: 10.1038/s41565-018-0089-z |
| [22] |
LIU T F, JI J, CHENG Y Y, et al. CRISPR/Cas9-mediated editing of GmTAP1 confers enhanced resistance to Phytophthora sojae in soybean. Journal of Integrative Plant Biology, 2023, 65(7):1609-1612.
doi: 10.1111/jipb.v65.7 |
| [23] |
CHAO C W, CUI Y, LIN Y, et al. Identification of salt tolerance- associated presence-absence variations in the OsMADS56 gene through the integration of DEGs dataset and eQTL analysis. The New Phytologist, 2024, 243(3):833-838.
doi: 10.1111/nph.v243.3 |
| [24] |
WANG L, NIE R, YU Z, et al. An interpretable deep-learning architecture of capsule networks for identifying cell-type gene expression programs from single-cell RNA-sequencing data. Nature Machine Intelligence, 2020, 2(11):693-703.
doi: 10.1038/s42256-020-00244-4 |
| [25] |
GUO J, ZENG L, CHEN H, et al. CRISPR/Cas9-Mediated targeted mutagenesis of BnaCOL9 advances the flowering time of Brassica napus L. International Journal of Molecular Sciences, 2022, 23(23): 14944-14944.
doi: 10.3390/ijms232314944 |
| [26] | 王远立, 啜国晖, 闫继芳, 等. 计算机辅助 CRISPR 向导 RNA 设计. 生物工程学报, 2017, 33(10): 1744-1756. |
| WANG Y L, CHUAI G H, YAN J F, et al. In silico CRISPR-based sgRNA design. Chinese Journal of Biotechnology, 2017, 33(10): 1744-1756. | |
| [27] |
JINEK M, CHYLINSKI K, FONFARA I, et al. A programmable Dual-RNA-Guided DNA endonuclease in adaptive bacterial immunity. Science, 2012, 337(6096):816-821.
doi: 10.1126/science.1225829 pmid: 22745249 |
| [28] |
MEJÍA-GUERRA M K, BUCKLER E S. A k-mer grammar analysis to uncover maize regulatory architecture. BMC Plant Biology, 2019, 19(1):1-17.
doi: 10.1186/s12870-018-1600-2 |
| [29] |
FENG Z Y, ZHANG B T, DING W, et al. Efficient genome editing in plants using a CRISPR/Cas system. Cell Research, 2013, 23: 1229-1232.
doi: 10.1038/cr.2013.114 pmid: 23958582 |
| [30] |
ZHANG H, YAN J F, LU Z K, et al. Deep sampling of gRNA in the human genome and deep-learning-informed prediction of gRNA activities. Cell Discovery, 2023, 9(1):48-48.
doi: 10.1038/s41421-023-00549-9 pmid: 37193681 |
| [31] |
WANG D Q, ZHANG C D, WANG B, et al. Optimized CRISPR guide RNA design for two high-fidelity Cas 9 variants by deep learning. Nature Communications, 2019, 10(1):4284.
doi: 10.1038/s41467-019-12281-8 |
| [32] |
CHUAI G, MA H, YAN J, et al. DeepCRISPR: optimized CRISPR guide RNA design by deep learning. Genome Biology, 2018, 19(1): 1-18.
doi: 10.1186/s13059-017-1381-1 |
| [33] |
LIN J C, Zhang Z L, ZHANG S, et al. CRISPR-net: a recurrent convolutional network quantifies off-target with activities CRISPR mismatches and indels. Advanced Science, 2020, 7(13): 1903562.
doi: 10.1002/advs.v7.13 |
| [34] |
JASPER Z, FRÉDERIC G, MIJUNG K, et al. SpliceRover: interpretable convolutional neural networks for improved splice site prediction. Bioinformatics, 2018, 34(24):4180-4188.
doi: 10.1093/bioinformatics/bty497 pmid: 29931149 |
| [35] |
SCHOONENBERG V A C, COLE M A, YAO Q M, et al. CRISPRO: identification of functional protein coding sequences based on genome editing dense mutagenesis. Genome Biology, 2018, 19(1):169.
doi: 10.1186/s13059-018-1563-5 pmid: 30340514 |
| [36] |
KWON H K, GOOSANG Y, JINMAN P, et al. Predicting the efficiency of prime editing guide RNAs in human cells. Nature Biotechnology, 2020, 39(2):198-206.
doi: 10.1038/s41587-020-0677-y |
| [37] |
KIM H K, KIM Y, LEE S, et al. SpCas9 activity prediction by DeepSpCas9, a deep learning-based model with high generalization performance. Science Advances, 2019, 5(11): eaax9249.
doi: 10.1126/sciadv.aax9249 |
| [38] |
CHEN Q C, CHUAI G, ZHANG H H, et al. Genome-wide CRISPR off-target prediction and optimization using RNA-DNA interaction fingerprints. Nature Communications, 2023, 14(1):7521-7521.
doi: 10.1038/s41467-023-42695-4 pmid: 37980345 |
| [39] |
康里奇, 谈攀, 洪亮. 人工智能时代下的酶工程. 合成生物学, 2023, 4(3): 524-534.
doi: 10.12211/2096-8280.2023-009 |
|
KANG L Q, TAN P, HONG L. Enzyme engineering in the age of artificial intelligence. Synthetic Biology Journal, 2023, 4(3): 524-534.
doi: 10.12211/2096-8280.2023-009 |
|
| [40] |
THEAN D G L, CHU H Y, FONG J H C, et al. Machine learning-coupled combinatorial mutagenesis enables resource-efficient engineering of CRISPR-Cas9 genome editor activities. Nature Communications, 2022, 13(1): 2219.
doi: 10.1038/s41467-022-29874-5 pmid: 35468907 |
| [41] |
KIM N, KIM H K, LEE S, et al. Prediction of the sequence-specific cleavage activity of Cas9 variants. Nature Biotechnol, 2020, 38: 1328-1336.
doi: 10.1038/s41587-020-0537-9 |
| [42] | CONCORDET J P, HAEUSSLER M. CRISPOR: Intuitive guide selection for CRIsPR/Cas9 genome editing experiments and screens. Nucleic Acids Research, 2018, 46 (W1):W242-W2 45. |
| [43] | NAYFACH S, BHATNAGAR A, NOVICHKOV A, et al. Engineer- ing of CRISPR-Cas PAM recognition using deep learning of vast evolutionary data. BioRxiv: the preprint server for biology, 2025. |
| [44] | JIANG F, LI M, DONG J, et al. A general temperature-guided language model to design proteins of enhanced stability and activity. Science Advances, 2024, 29;10(48). |
| [45] | SINGH R, LANCHANTIN J, ROBINS G, et al. DeepChroene: deep- learning for predicting gene expressionfromhistone podifications. Bioinformatics, 2016. 32(17): i639-i648. |
| [46] | YIN Q, WU M, LIU Q, et al. DeepHiatone: a deep leaming approach topredictinghistone modications. BMC Geomics, 2019, 20: 11-23. |
| [47] |
HOFFMAN G E, BENDL J, GIRDHAR K, et al. Functional interpretation of genetic variants using deep learning predicts impact on chromatin accessibility and histone modification. Nucleic Acids Research, 2019, 47 (20): 10597-10611.
doi: 10.1093/nar/gkz808 pmid: 31544924 |
| [48] |
PARK H M, WON J, PARK Y, et al. CRISPR-Cas-Docker: web-based in silico docking and machine learning-based classification of crRNAs with Cas proteins. BMC Bioinformatics, 2023, 24(1): 167.
doi: 10.1186/s12859-023-05296-y |
| [49] |
NETHERY M A, KORVINK M, MAKAROVA K S, et al. CRISPRclassify: repeat-based classification of CRISPR loci. The CRISPR Journal, 2021, 4(4): 558-574.
doi: 10.1089/crispr.2021.0021 |
| [50] | YANG S S, HUANG J, HE B F. CASPredict: a web service for identifying Cas proteins. PeerJ, 2021, 9: e11887. |
| [51] | ZHANG T J, JIA Y R, LI H F, et al. CRISPRCasStack: a stacking strategy-based ensemble learning identification of framework for accurate Cas proteins. Briefings in Bioinformatics, 2022, 23(5): bbac335. |
| [52] |
ALKHNBASHI O S, MITROFANOV A, BONIDIA R, et al. CRISPRloci: comprehensive and accurate annotation of CRISPR-Cas systems. Nucleic Acids Research, 2021, 49(W1): W125-W130.
doi: 10.1093/nar/gkab456 |
| [53] |
NAHYE K, SUNGCHUL C, SUNGJAE K, et al. Deep learning models to predict the editing efficiencies and outcomes of diverse base editors. Nature Biotechnology, 2023, 42(3):484-497.
doi: 10.1038/s41587-023-01792-x pmid: 37188916 |
| [54] |
YUAN T, WU L, LI S, et al. Deep learning models incorporating endogenous factors beyond DNA sequences improve the prediction accuracy of base editing outcomes. Cell Discovery, 2024, 10(1):20-20.
doi: 10.1038/s41421-023-00624-1 pmid: 38378648 |
| [55] |
ALEX H, FLORENCE D, SEBASTIEN B, et al. Generating functional protein variants with variational autoencoders. Plos Computational Biology, 2021, 17(2):e1008736-e1008736.
doi: 10.1371/journal.pcbi.1008736 |
| [56] | DAWID G, SIMON M, IRENE R G, et al. LATE-a novel sensitive cell-based assay for the study of CRISPR/Cas9-related long-term adverse treatment effects. Molecular Therapy - Methods & Clinical Development, 2021, 22 249-262. |
| [57] | TAN Y Y, CHU A H Y, BAO S Y, et al. Rationally engineered Staphylococcus aureus Cas9 nucleases with high genome-wide specificity. Proceedings of the National Academy of Sciences of the United States of America, 2019, 116(42):20969-20976. |
| [58] |
TIEU V, SOTILLO E, BJELAJAC R J, et al. A versatile CRISPR- Cas13d platform for multiplexed transcriptomic regulation and metabolic engineering in primary human T cells. Cell, 2024, 187(5): 1278-1295.e20.
doi: 10.1016/j.cell.2024.01.035 |
| [1] | 李娇, 鲜国建, 黄永文, 罗婷婷, 孙坦, 马玮璐. 基于大模型的水稻育种领域知识发现与应用研究[J]. 农业大数据学报, 2025, 7(4): 421-430. |
| [2] | 叶端南, 李根田. 一种基于改进YOLO的小目标检测模型[J]. 农业大数据学报, 2025, 7(2): 173-182. |
| [3] | 刘洪, 窦婧文, 王越, 廖勇, 刘小磊, 李新云, 赵书红, 付玉华. 一种面向功能基因挖掘的动物多组学数据集[J]. 农业大数据学报, 2025, 7(1): 96-106. |
| [4] | 牛博文, 冯权泷, 张毓, 高秉博, SUKHBAATAR Chinzorig, 冯爱萍, 杨建宇. 2022年克鲁伦河流域10米分辨率植被覆盖度月度数据集[J]. 农业大数据学报, 2025, 7(1): 59-68. |
| [5] | 张宇芹, 朱景全, 董薇, 李富忠, 郭雷风. 农业垂直领域大语言模型构建流程和技术展望[J]. 农业大数据学报, 2024, 6(3): 412-423. |
| [6] | 李佳乐, 张建华, 王健, 周国民. 数据驱动的农业深度学习方法计量分析[J]. 农业大数据学报, 2024, 6(3): 400-411. |
| [7] | 杜家宽, 李雁飞, 孙嗣文, 刘继东, 江腾达. 多源数据融合的泛时空特征水稻深度学习提取[J]. 农业大数据学报, 2024, 6(1): 56-67. |
| [8] | 毛克彪, 袁紫晋, 施建成, 武胜利, 胡德勇, 车进, 董立新. 基于大数据的遥感参数人工智能反演范式理论形成与工程技术实现[J]. 农业大数据学报, 2023, 5(4): 1-12. |
| [9] | 赵鸿鑫, 邵明月, 潘攀, 王芝奥, 牟强, 贺子康, 张建华. 一种面向深度神经网络模型的棉花常见病害训练数据集[J]. 农业大数据学报, 2023, 5(4): 47-55. |
| [10] | 张凌栩,韩锐,李文明,史银雪,刘驰. 大数据深度学习系统研究进展与典型农业应用[J]. 农业大数据学报, 2019, 1(2): 88-104. |
| [11] | 吴蕾,梁晓贺,乌吉斯古楞,王瑞. 基于词向量的检索扩展方法与农业领域实证[J]. 农业大数据学报, 2019, 1(2): 114-120. |
| [12] | 李贤江, 陈佑启, 邹金秋, 石淑芹, 郭涛, 蔡为民, 陈浩. 卷积神经网络在高分辨率影像分类中的应用[J]. 农业大数据学报, 2019, 1(1): 67-77. |
|
||