Journal of Agricultural Big Data ›› 2025, Vol. 7 ›› Issue (1): 96-106.doi: 10.19788/j.issn.2096-6369.100039
Previous Articles Next Articles
LIU Hong1(), DOU JingWen1, WANG Yue1, LIAO Yong1, LIU XiaoLei1,2, LI XinYun1,2, ZHAO ShuHong1,2, FU YuHua1,2,*()
Received:
2024-06-06
Accepted:
2024-09-13
Online:
2025-03-26
Published:
2025-02-05
Contact:
FU YuHua
LIU Hong, DOU JingWen, WANG Yue, LIAO Yong, LIU XiaoLei, LI XinYun, ZHAO ShuHong, FU YuHua. A Multi-Omics Dataset for Functional Gene Mining in Animals[J].Journal of Agricultural Big Data, 2025, 7(1): 96-106.
Table 1
Overview of the multi-omics data set"
物种 | WGS | RNA | ChIP | ATAC | 文献 | 组织 | 变异数(M) | 碱基(TB) |
---|---|---|---|---|---|---|---|---|
Ailuropoda melanoleuca | 58 | 133 | 0 | 0 | 2534 | 21 | 12.42 | 2.49 |
Anas platyrhynchos | 1162 | 819 | 0 | 4 | 1408 | 30 | 44.27 | 18.38 |
Anser cygnoides | 283 | 134 | 0 | 8 | 130 | 10 | 22.67 | 5.31 |
Balaenoptera musculus | 1 | 2 | 0 | 0 | 629 | 3 | 6.17 | 0.13 |
Bos taurus | 983 | 3995 | 216 | 158 | 291242 | 85 | 52.71 | 76.89 |
Camelus dromedarius | 38 | 28 | 0 | 0 | 4197 | 10 | 10.17 | 1.7 |
Canis lupus familiaris | 2116 | 2581 | 95 | 9 | 225467 | 126 | 47.57 | 134.04 |
Capra hircus | 961 | 1355 | 0 | 5 | 1015 | 60 | 65.66 | 64.24 |
Equus asinus | 189 | 61 | 8 | 0 | 53280 | 14 | 16 | 2.66 |
Equus caballus | 538 | 2192 | 135 | 18 | 58089 | 95 | 35.71 | 42.64 |
Felis catus | 311 | 180 | 0 | 0 | 92331 | 46 | 79.49 | 25.91 |
Gallus gallus | 1108 | 4098 | 533 | 161 | 108208 | 111 | 37.16 | 53.26 |
Loxodonta africana | 11 | 23 | 0 | 0 | 567 | 6 | 14.3 | 1.55 |
Macaca mulatta | 696 | 7318 | 222 | 149 | 37963 | 127 | 107.18 | 129.66 |
Mus musculus | 80 | 8983 | 2499 | 544 | 1644283 | 132 | 16.47 | 77.99 |
Oryctolagus cuniculus | 49 | 1424 | 67 | 12 | 234595 | 45 | 106.05 | 12.33 |
Ovis aries | 877 | 2682 | 90 | 8 | 8715 | 75 | 71.91 | 61.83 |
Panthera leo | 41 | 2 | 0 | 0 | 5439 | 6 | 13.93 | 1.14 |
Panthera tigris | 8 | 2 | 0 | 0 | 914 | 4 | 12.11 | 0.99 |
Sus scrofa | 1311 | 8626 | 647 | 130 | 23092 | 218 | 95.2 | 132.92 |
Ursus thibetanus | 14 | 0 | 0 | 0 | 139 | 1 | 10.44 | 0.4 |
ALL | 10853 | 44638 | 4512 | 1206 | 2794237 | 256 | 877.6 | 846.46 |
Table 2
Annotation results of protein-coding genes for 21 species in different databases"
物种 | 基因数 | Swiss-Prot | KEGG | GO | Pfam | InterPro | KOG |
---|---|---|---|---|---|---|---|
Ailuropoda melanoleuca | 24463 | 20295 | 15775 | 14847 | 18915 | 19596 | 18246 |
Anas platyrhynchos | 18491 | 15871 | 12099 | 11811 | 15235 | 15779 | 14433 |
Anser cygnoides | 19449 | 14734 | 11593 | 11032 | 14010 | 14524 | 13405 |
Balaenoptera musculus | 22592 | 19038 | 14746 | 13724 | 17599 | 18245 | 16980 |
Bos taurus | 27608 | 21224 | 16772 | 15410 | 19968 | 20612 | 19096 |
Camelus dromedarius | 22445 | 18878 | 14801 | 13762 | 17604 | 18195 | 16862 |
Canis lupus familiaris | 30952 | 19760 | 15442 | 14255 | 18483 | 19104 | 17837 |
Capra hircus | 27272 | 20923 | 16621 | 15502 | 19620 | 20291 | 18872 |
Equus asinus | 22929 | 19425 | 15053 | 14114 | 18043 | 18685 | 17453 |
Equus caballus | 30372 | 20367 | 16038 | 14696 | 19292 | 19851 | 18334 |
Felis catus | 29551 | 19019 | 15183 | 13989 | 17858 | 18452 | 17053 |
Gallus gallus | 24357 | 15920 | 12269 | 11799 | 15252 | 15842 | 14465 |
Loxodonta africana | 23246 | 19793 | 15739 | 14685 | 18491 | 19081 | 17938 |
Macaca mulatta | 35433 | 21036 | 15102 | 13646 | 18250 | 18802 | 19161 |
Mus musculus | 55417 | 21373 | 17288 | 15889 | 20606 | 21156 | 18964 |
Oryctolagus cuniculus | 29588 | 19997 | 14793 | 14064 | 18576 | 19250 | 18082 |
Ovis aries | 26479 | 19959 | 15808 | 14758 | 18724 | 19338 | 17955 |
Panthera leo | 22744 | 19054 | 15091 | 13893 | 17831 | 18438 | 17056 |
Panthera tigris | 22120 | 17231 | 13084 | 12454 | 15867 | 16491 | 15450 |
Sus scrofa | 31909 | 20041 | 15562 | 14436 | 18871 | 19453 | 18202 |
Ursus thibetanus | 23211 | 19560 | 15553 | 14234 | 18380 | 18942 | 17525 |
All | 570628 | 403498 | 314412 | 293000 | 377475 | 390127 | 363369 |
Table 3
Summary of genomic variation datasets"
物种 | 个体 | 品种 | 分组 | 平均深度 | 变异数(M) |
---|---|---|---|---|---|
Ailuropoda melanoleuca | 58 | 2 | 4 | 7.04 | 12.42 |
Anas platyrhynchos | 1162 | 27 | 4 | 6.86 | 44.27 |
Anser cygnoides | 283 | 8 | 5 | 9.55 | 22.67 |
Balaenoptera musculus | 1 | 1 | 1 | 40.72 | 6.17 |
Bos taurus | 983 | 36 | 10 | 17.01 | 52.71 |
Camelus dromedarius | 38 | 11 | 11 | 15.27 | 10.17 |
Canis lupus familiaris | 2116 | 301 | 11 | 20.63 | 47.57 |
Capra hircus | 961 | 108 | 6 | 15.32 | 65.66 |
Equus asinus | 189 | 5 | 5 | 3.62 | 16 |
Equus caballus | 538 | 61 | 7 | 18.21 | 35.71 |
Felis catus | 311 | 54 | 9 | 26.84 | 79.49 |
Gallus gallus | 1108 | 148 | 10 | 14.92 | 37.16 |
Loxodonta africana | 11 | 3 | 3 | 19.82 | 14.3 |
Macaca mulatta | 696 | 9 | 6 | 26.86 | 107.18 |
Mus musculus | 80 | 15 | 10 | 29.99 | 16.47 |
Oryctolagus cuniculus | 49 | 14 | 4 | 11.29 | 106.05 |
Ovis aries | 877 | 149 | 6 | 13.35 | 71.91 |
Panthera leo | 41 | 4 | 3 | 9.28 | 13.93 |
Panthera tigris | 8 | 2 | 2 | 33.18 | 12.11 |
Sus scrofa | 1311 | 65 | 9 | 14.89 | 95.2 |
Ursus thibetanus | 14 | 2 | 2 | 8.50 | 10.44 |
ALL | 10835 | 1025 | 128 | 17.30 | 877.60 |
Table 4
Summary of tissue classification for transcriptome samples"
物种 | 脂肪 | 骨 | 血管 | 胚胎 | 免疫 | 肌肉 | 神经 | 器官 | 其他 |
---|---|---|---|---|---|---|---|---|---|
Ailuropoda melanoleuca | 0 | 0 | 80 | 0 | 5 | 2 | 2 | 39 | 5 |
Anas platyrhynchos | 42 | 6 | 129 | 36 | 55 | 134 | 12 | 357 | 48 |
Anser cygnoides | 12 | 0 | 0 | 0 | 3 | 6 | 0 | 104 | 9 |
Balaenoptera musculus | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
Bos taurus | 94 | 43 | 689 | 222 | 464 | 365 | 74 | 1626 | 418 |
Camelus dromedarius | 0 | 0 | 12 | 0 | 0 | 0 | 10 | 0 | 6 |
Canis lupus familiaris | 36 | 29 | 261 | 14 | 145 | 78 | 139 | 940 | 939 |
Capra hircus | 17 | 6 | 136 | 12 | 63 | 140 | 42 | 641 | 298 |
Equus asinus | 0 | 0 | 12 | 3 | 1 | 6 | 2 | 32 | 5 |
Equus caballus | 1 | 82 | 751 | 196 | 23 | 234 | 75 | 377 | 453 |
Felis catus | 9 | 1 | 5 | 4 | 38 | 4 | 26 | 50 | 43 |
Gallus gallus | 167 | 22 | 306 | 453 | 650 | 269 | 346 | 1272 | 613 |
Loxodonta africana | 0 | 0 | 19 | 1 | 0 | 0 | 0 | 0 | 3 |
Macaca mulatta | 143 | 16 | 3569 | 529 | 422 | 94 | 1553 | 791 | 201 |
Mus musculus | 636 | 286 | 706 | 443 | 705 | 653 | 2450 | 1836 | 1268 |
Oryctolagus cuniculus | 61 | 0 | 354 | 138 | 12 | 19 | 151 | 446 | 243 |
Ovis aries | 242 | 0 | 319 | 57 | 83 | 366 | 126 | 1057 | 432 |
Panthera leo | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 |
Panthera tigris | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
Sus scrofa | 395 | 39 | 1251 | 719 | 415 | 1663 | 573 | 3051 | 520 |
Ursus thibetanus | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
[1] |
FU Y, XU J, TANG Z, et al. A gene prioritization method based on a swine multi-omics knowledgebase and a deep learning model. Communications Biology, 2020, 3(1): 502.
doi: 10.1038/s42003-020-01233-4 pmid: 32913254 |
[2] | 刘松誉, 王向峰. 多组学数据关联分析挖掘玉米抗逆基因(英文). 第二十届中国作物学会学术年会.中国湖南长沙:2023. |
[3] | 刘华涛, 马福平, 赵卿尧, 等. 联合多组学数据鉴定猪脂肪沉积的候选基因. 中国畜牧杂志, 2023, 59(8): 123-130. |
[4] | 赵黄青, 马钧, 李欣淼, 等. 多组学分析技术在肉牛生长发育研究中的应用. 中国畜禽种业, 2023, 19(7): 43-49. |
[5] | CUNNINGHAM F, ALLEN J E, ALLEN J, et al. Ensembl 2022. Nucleic Acids Research, 2022, 50(D1):D988-D995. doi: 10.1093/nar/gkab1049D988-d95. |
[6] | KATZ K, SHUTOV O, LAPOINT R, et al. The Sequence Read Archive: a decade more of explosive growth. Nucleic Acids Research, 2022, 50(D1): D387-D390. doi: 10.1093/nar/gkab1053. |
[7] | CANTELLI G, BATEMAN A, BROOKSBANK C, et al. The European Bioinformatics Institute (EMBL-EBI) in 2021. Nucleic Acids Research, 2022, 50(D1):D11-D19. doi:10.1093/nar/gkab1127. |
[8] |
SAYERS E W, BECK J, BOLTON E E, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Research, 2021, 49(D1): D10-d7.
doi: 10.1093/nar/gkaa892 pmid: 33095870 |
[9] | BOUTET E, LIEBERHERR D, TOGNOLLI M, et al. UniProtKB/ Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View. Methods in Molecular Biology, 2016, 1374:23-54. doi: 10.1007/978-1-4939-3167-5_2. |
[10] |
KANEHISA M, GOTO S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Research, 2000, 28(1): 27-30.
doi: 10.1093/nar/28.1.27 pmid: 10592173 |
[11] |
Gene Ontology Consortium. The Gene Ontology resource: enriching a GOld mine. Nucleic Acids Research. 2021, 49(D1):D325-D334. doi: 10.1093/nar/gkaa1113.
pmid: 33290552 |
[12] |
MISTRY J, CHUGURANSKY S, WILLIAMS L, et al. Pfam: The protein families database in 2021. Nucleic Acids Research, 2021, 49(D1):D412-D419. doi: 10.1093/nar/gkaa913.
pmid: 33125078 |
[13] | BLUM M, CHANG H Y, CHUGURANSKY S, et al. The InterPro protein families and domains database: 20 years on. Nucleic Acids Research, 2021, 49(D1): D344-d54. |
[14] |
TATUSOV R L, FEDOROVA N D, JACKSON J D, et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics, 2003, 4: 41. doi: 10.1186/1471-2105-4-41.
pmid: 12969510 |
[15] | CHEN S, ZHOU Y, CHEN Y, et al. Bioinformatics, 2018, 34(17): i884-i890. doi:10.1093/bioinformatics/bty560. |
[16] |
LI H, DURBIN R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 2009, 25(14): 1754-1760.
doi: 10.1093/bioinformatics/btp324 pmid: 19451168 |
[17] |
ALDANA R, FREED D. Data Processing and Germline Variant Calling with the Sentieon Pipeline. Methods in Molecular Biology, 2022, 2493: 1-19.
doi: 10.1007/978-1-0716-2293-3_1 pmid: 35751805 |
[18] | WANG K, LI M, HAKONARSON H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Research, 2010, 38(16): e164. |
[19] |
KIM D, PAGGI J M, PARK C, et al. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature Biotechnology, 2019, 37(8):907-915. doi: 10.1038/s41587-019-0201-4.
pmid: 31375807 |
[20] |
PERTEA M, PERTEA G M, ANTONESCU C M, et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature Biotechnology, 2015, 33(3): 290-295.
doi: 10.1038/nbt.3122 pmid: 25690850 |
[21] |
ZHANG H, SONG L, WANG X, et al. Fast alignment and preprocessing of chromatin profiles with Chromap. Nature Communications, 2021, 12(1): 6566.
doi: 10.1038/s41467-021-26865-w pmid: 34772935 |
[22] |
LIU T. Use model-based analysis of ChIP-Seq (MACS) to analyze short reads generated by sequencing protein-DNA interactions in embryonic stem cells. Methods in Molecular Biology, 2014, 1150: 81-95.
doi: 10.1007/978-1-4939-0512-6_4 pmid: 24743991 |
[23] | NASSAR L R, BARBER G P, BENET-PAGÈS A, et al. The UCSC Genome Browser database: 2023 update. Nucleic Acids Research, 2023, 51(D1): D1188-D1195. |
[24] |
LEE J, YOON W, KIM S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 2020, 36(4): 1234-1240.
doi: 10.1093/bioinformatics/btz682 pmid: 31501885 |
[25] | SHANG J, LIU L, REN X, et al. Learning named entity tagger using domain-specific dictionary. arXiv:180903599, 2018. |
[26] |
Di TOMMASO P, CHATZOU M, FLODEN E W, et al. Nextflow enables reproducible computational workflows. Nature Biotechnology, 2017, 35(4): 316-319.
doi: 10.1038/nbt.3820 pmid: 28398311 |
[27] | FU Y, LIU H, DOU J, et al. IAnimal: a cross-species omics knowledgebase for animals. Nucleic Acids Res, 2023, 51(D1): D1312-D1324. |
[1] | NIU BoWen, FENG QuanLong, ZHANG Yu, GAO BingBo, SUKHBAATAR Chinzorig, FENG AiPing, YANG JianYu. A 10-m Fractional Vegetation Cover Monthly Dataset of the Kherlen River Basin in 2022 [J]. Journal of Agricultural Big Data, 2025, 7(1): 59-68. |
[2] | LI JiaLe, ZHANG JianHua, WANG Jian, ZHOU GuoMin. Metrological Analysis of Data-driven Deep Learning Methods for Agriculture [J]. Journal of Agricultural Big Data, 2024, 6(3): 400-411. |
[3] | DU JiaKuan, LI YanFei, SUN SiWen, LIU JiDong, JIANG TengDa. Pan-spatiotemporal Feature Rice Deep Learning Extraction Based on Multi-source Data Fusion [J]. Journal of Agricultural Big Data, 2024, 6(1): 56-67. |
[4] | MAO KeBiao, YUAN ZiJin, SHI JianCheng, WU ShengLi, HU DeYong, CHE Jin, DONG LiXin. Theory and Engineering Technology Implementation of Artificial Intelligence Retrieval Paradigm for Parameters of Remote Sensing Based on Big Data [J]. Journal of Agricultural Big Data, 2023, 5(4): 1-12. |
[5] | ZHAO HongXin, SHAO MingYue, PAN Pan, WANG ZhiAo, MU Qiang, HE ZiKang, ZHANG JianHua. A Training Dataset for Deep Neural Network Model Recognition of Common Cotton Diseases [J]. Journal of Agricultural Big Data, 2023, 5(4): 47-55. |
[6] | Lingxu Zhang,Rui Han,Wenming Li,Yinxue Shi,Chi Liu. A Survey of Big Data Deep Learning Systems and a Typical Agricultural Application [J]. Journal of Agricultural Big Data, 2019, 1(2): 88-104. |
[7] | Lei Wu,Xiaohe Liang,Jisiguleng Wu,Rui Wang. Method and Agricultural Empirical Study of Query Reformulation Based on Word Embedding [J]. Journal of Agricultural Big Data, 2019, 1(2): 114-120. |
[8] | Li Xianjiang, Chen Youqi, Zou Jinqiu, Shi Shuqin, Guo Tao, Cai Weimin, Chen Hao. Application of Convolutional Neural Networks in High-Resolution Image Classification [J]. Journal of Agricultural Big Data, 2019, 1(1): 67-77. |
|