Journal of Agricultural Big Data >
Variant Site Dataset of 99 Durio zibethinus Germplasm Resources
Received date: 2024-06-27
Accepted date: 2024-08-13
Online published: 2025-06-23
Durian has high economic and nutritional value. In China, the durian industry is highly dependent on imports. The durian industry in Hainan Province is in its infancy, characterized by limited acreage, low yield, complete reliance on introduced varieties, lack of self-sufficiency, and insufficient supporting cultivation techniques. These issues lead to a stark contrast between high market demand and a weak industry. There is an urgent need for the collection, identification, and evaluation of durian germplasm resources. In this study, DNA was extracted from 99 durian germplasm resources. Libraries were constructed, and second-generation whole-genome sequencing was performed. Bioinformatic analyses, including quality control of sequencing data, variant site discovery and annotation, and population evolution studies, were conducted on the sequencing data. The total amount of sequencing data was 1.62 Tb, yielding 54,974,697 variant sites, including SNPs, insertions (INS), and deletions (DEL), with SNPs being the most prevalent. On average, there is one variant site per 13 bases in the durian genome. These variant sites are mainly located in intergenic regions, with fewer in gene exons and introns. The 99 durian resources can be divided into three subgroups. The distance at which the LD coefficient decays to half its maximum value is only 0.1-0.2 kb, indicating rich genetic diversity. This study provides genome sequencing data and variant site information for 99 durian germplasm resources, offering fundamental data support for durian genetics, breeding methods, and breeding theory research. This will aid in the selection and breeding of durian varieties in Hainan and worldwide.
Data summary:
Items | Description |
---|---|
Name of dataset | Variant Site Dataset of 99 Durio zibethinus Germplasm Resources |
Specific subject area | Agronomy, biology |
Research topic | Genetic variation of durian germplasm resources |
Time range | 2022 - 2023 |
Temporal resolution | one year |
Geographical scope | Sanya City, Hainan Province, China |
Data types and technical formats | .XLSX, VCF |
Dataset structure | This dataset consists of one table and one VCF file, primarily including the quality control results of WGS sequencing data, alignment information, and variant site information. |
Volume of dataset | 143.36 GB |
Data accessibility | CSTR:17058.11.sciencedb.agriculture.00077;https://cstr.cn/17058.11.sciencedb.agriculture.00077 DOI:10.57760/sciencedb.agriculture.00077; |
Financial support | Nanfan Special Project of the Chinese Academy of Agricultural Sciences(SWAQ09); The Agricultural Science and Technology Innovation Program of the Chinese Academy of Agricultural Sciences (CAAS-ASTIP-2021-RIP-02). |
Key words: durian; variant sites; SNP; population evolution
JI XiaoHao , ZHENG DaoJun , XIE ShengHua , SHI Meng , ZHONG YiWang , WANG YingYing , WANG XiaoDi , LIU FengZhi , FENG XueJie , WANG HaiBo . Variant Site Dataset of 99 Durio zibethinus Germplasm Resources[J]. Journal of Agricultural Big Data, 2025 , 7(2) : 227 -237 . DOI: 10.19788/j.issn.2096-6369.100040
[1] | 青莲. 榴莲品种介绍. 世界热带农业信息, 2005(10):27-30. |
[2] | KHAKSAR G, KASEMCHOLATHAN S, SIRIKANTARAMAS S. Durian (Durio zibethinus L.): Nutritional composition, pharmacological implications, value-added products, and omics-based investigations. Horticulturae, 2024, 10(4): 342. DOI:10.3390/HORTICULTURAE10040342. |
[3] | 朱振忠, 周兆禧, 陈妹姑, 等. 榴莲果实品质与矿质元素的灰色关联度和通径分析. 中国南方果树, 2024, 53(6):76-82. |
[4] | 余顺生, 辛勍, 刘文玫. 中国水果进口贸易现状分析. 天津农林科技, 2023(6):39-42. |
[5] | 张放. 2023年我国进口鲜榴莲情况简析. 中国果业信息, 2024, 41(5):36-43. |
[6] | 冯学杰, 华敏, 郭利军, 等. 海南榴莲产业的培育对策与发展建议. 中国热带农业, 2019(6):12-14+65. |
[7] | 王秋萍. 海南:有序扩大榴莲种植规模. 中国果业信息, 2024, 41(4):59. |
[8] | CHEN S, ZHOU Y, CHEN Y, et al. Fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics, 2018, 34(17): i884-i890. https://doi.org/10.1093/bioinformatics/bty560. |
[9] | LI H, HANDSAKER B, WYSOKER A, et al. The Sequence Alignment / Map format and SAMtools. Bioinformatics, 2009, 25(16): 2078-2079. https://doi.org/10.1093/bioinformatics/btp352. |
[10] | MCKENNA A, HANNA M, BANKS E, et al. The genome analysis toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research, 2010, 20(9): 1297-1303. https://doi.org/10.1101/gr.107524.110. |
[11] | CINGOLANI P, PLATTS A, WANG L L, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly, 2012, 6(2): 80-92. https://doi.org/10.4161/fly.19695. |
[12] | CHANG C C, CHOW C C, TELLIER L C, et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience, 2015, 4(1):7. https://doi.org/10.1186/s13742-015-0047-8. |
[13] | PRICE M N, DEHAL P S, ARKIN A P. FastTree 2-approximately maximum-likelihood trees for large alignments. PloS One, 2010, 5(3): e9490. https://doi.org/10.1371/journal.pone.0009490. |
[14] | ALEXANDER D H, NOVEMBRE J, LANGE K. Fast model-based estimation of ancestry in unrelated individuals. Genome Research, 2009, 19(9): 1655-1664. https://doi.org/10.1101/gr.094052.109. |
[15] | DANECEK P, AUTON A, ABECASIS G, et al. The variant call format and VCFtools. Bioinformatics, 2011, 27(15): 2156-2158. https://doi.org/10.1093/bioinformatics/btr330 . |
[16] | ZHANG C, DONG S S, XU J Y, et al. PopLDdecay: a fast and effective tool for linkage disequilibrium decay analysis based on variant call format files. Bioinformatics, 2019, 35(10): 1786-1788. https://doi.org/10.1093/bioinformatics/bty875. |
/
〈 |
|
〉 |