榴莲99份种质资源变异位点数据集
收稿日期: 2024-06-27
录用日期: 2024-08-13
网络出版日期: 2025-06-23
基金资助
中国农业科学院南繁专项(SWAQ09);中国农业科学院创新工程项目(CAAS-ASTIP-2021-RIP-02)
Variant Site Dataset of 99 Durio zibethinus Germplasm Resources
Received date: 2024-06-27
Accepted date: 2024-08-13
Online published: 2025-06-23
榴莲具有较高的经济价值和营养价值。我国榴莲高度依赖进口,海南省榴莲产业处于刚刚起步阶段,存在面积少、产量低、品种完全依赖引种而缺乏自主性、配套栽培技术欠缺等诸多问题,导致市场需求大而产业薄弱的矛盾突出,迫切需要进行榴莲种质资源收集、鉴定与评价。该研究对99份榴莲种质资源提取DNA,构建文库并开展了二代全基因组测序,对测序数据开展了质控、变异位点挖掘注释和群体进化等生信分析。测序数据量共计1.62 Tb,共挖掘到54 974 697个变异位点,包括SNP、INS和DEL三种变异类型,以SNP为主,榴莲基因组中平均每13个碱基有1个变异位点,变异位点主要位于基因间,位于基因外显子和内含子的较少。99份榴莲资源可以分成3个亚群,LD系数降低到最大值的一半的衰减距离只有0.1-0.2 kb,表现出丰富的遗传多样性。99份榴莲种质资源的基因组测序数据和变异位点信息,为榴莲遗传学以及育种方法和育种理论研究提供了基础数据支撑,有助于海南乃至世界榴莲品种选育。
数据摘要:
项目 | 描述 |
---|---|
数据库(集)名称 | 榴莲99份种质资源变异位点数据集 |
所属学科 | 农学,生物学 |
研究主题 | 榴莲种质资源遗传变异 |
数据时间范围 | 2022年-2023年 |
时间分辨率 | 1年 |
数据地理空间覆盖 | 海南省三亚市 |
数据类型与技术格式 | .XLSX和VCF |
数据库(集)组成 | 本数据集由1个表格和1个VCF文件组成,主要包括WGS测序数据质控结果、比对情况和变异位点信息。 |
数据量 | 143.36 GB |
数据可用性 | CSTR:17058.11.sciencedb.agriculture.00077;https://cstr.cn/17058.11.sciencedb.agriculture.00077 DOI:10.57760/sciencedb.agriculture.00077; |
经费支持 | 中国农业科学院南繁专项(SWAQ09);中国农业科学院创新工程项目(CAAS-ASTIP-2021-RIP-02) |
冀晓昊 , 郑道君 , 谢圣华 , 时梦 , 钟义旺 , 王莹莹 , 王孝娣 , 刘凤之 , 冯学杰 , 王海波 . 榴莲99份种质资源变异位点数据集[J]. 农业大数据学报, 2025 , 7(2) : 227 -237 . DOI: 10.19788/j.issn.2096-6369.100040
Durian has high economic and nutritional value. In China, the durian industry is highly dependent on imports. The durian industry in Hainan Province is in its infancy, characterized by limited acreage, low yield, complete reliance on introduced varieties, lack of self-sufficiency, and insufficient supporting cultivation techniques. These issues lead to a stark contrast between high market demand and a weak industry. There is an urgent need for the collection, identification, and evaluation of durian germplasm resources. In this study, DNA was extracted from 99 durian germplasm resources. Libraries were constructed, and second-generation whole-genome sequencing was performed. Bioinformatic analyses, including quality control of sequencing data, variant site discovery and annotation, and population evolution studies, were conducted on the sequencing data. The total amount of sequencing data was 1.62 Tb, yielding 54,974,697 variant sites, including SNPs, insertions (INS), and deletions (DEL), with SNPs being the most prevalent. On average, there is one variant site per 13 bases in the durian genome. These variant sites are mainly located in intergenic regions, with fewer in gene exons and introns. The 99 durian resources can be divided into three subgroups. The distance at which the LD coefficient decays to half its maximum value is only 0.1-0.2 kb, indicating rich genetic diversity. This study provides genome sequencing data and variant site information for 99 durian germplasm resources, offering fundamental data support for durian genetics, breeding methods, and breeding theory research. This will aid in the selection and breeding of durian varieties in Hainan and worldwide.
Data summary:
Items | Description |
---|---|
Name of dataset | Variant Site Dataset of 99 Durio zibethinus Germplasm Resources |
Specific subject area | Agronomy, biology |
Research topic | Genetic variation of durian germplasm resources |
Time range | 2022 - 2023 |
Temporal resolution | one year |
Geographical scope | Sanya City, Hainan Province, China |
Data types and technical formats | .XLSX, VCF |
Dataset structure | This dataset consists of one table and one VCF file, primarily including the quality control results of WGS sequencing data, alignment information, and variant site information. |
Volume of dataset | 143.36 GB |
Data accessibility | CSTR:17058.11.sciencedb.agriculture.00077;https://cstr.cn/17058.11.sciencedb.agriculture.00077 DOI:10.57760/sciencedb.agriculture.00077; |
Financial support | Nanfan Special Project of the Chinese Academy of Agricultural Sciences(SWAQ09); The Agricultural Science and Technology Innovation Program of the Chinese Academy of Agricultural Sciences (CAAS-ASTIP-2021-RIP-02). |
Key words: durian; variant sites; SNP; population evolution
[1] | 青莲. 榴莲品种介绍. 世界热带农业信息, 2005(10):27-30. |
[2] | KHAKSAR G, KASEMCHOLATHAN S, SIRIKANTARAMAS S. Durian (Durio zibethinus L.): Nutritional composition, pharmacological implications, value-added products, and omics-based investigations. Horticulturae, 2024, 10(4): 342. DOI:10.3390/HORTICULTURAE10040342. |
[3] | 朱振忠, 周兆禧, 陈妹姑, 等. 榴莲果实品质与矿质元素的灰色关联度和通径分析. 中国南方果树, 2024, 53(6):76-82. |
[4] | 余顺生, 辛勍, 刘文玫. 中国水果进口贸易现状分析. 天津农林科技, 2023(6):39-42. |
[5] | 张放. 2023年我国进口鲜榴莲情况简析. 中国果业信息, 2024, 41(5):36-43. |
[6] | 冯学杰, 华敏, 郭利军, 等. 海南榴莲产业的培育对策与发展建议. 中国热带农业, 2019(6):12-14+65. |
[7] | 王秋萍. 海南:有序扩大榴莲种植规模. 中国果业信息, 2024, 41(4):59. |
[8] | CHEN S, ZHOU Y, CHEN Y, et al. Fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics, 2018, 34(17): i884-i890. https://doi.org/10.1093/bioinformatics/bty560. |
[9] | LI H, HANDSAKER B, WYSOKER A, et al. The Sequence Alignment / Map format and SAMtools. Bioinformatics, 2009, 25(16): 2078-2079. https://doi.org/10.1093/bioinformatics/btp352. |
[10] | MCKENNA A, HANNA M, BANKS E, et al. The genome analysis toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research, 2010, 20(9): 1297-1303. https://doi.org/10.1101/gr.107524.110. |
[11] | CINGOLANI P, PLATTS A, WANG L L, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly, 2012, 6(2): 80-92. https://doi.org/10.4161/fly.19695. |
[12] | CHANG C C, CHOW C C, TELLIER L C, et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience, 2015, 4(1):7. https://doi.org/10.1186/s13742-015-0047-8. |
[13] | PRICE M N, DEHAL P S, ARKIN A P. FastTree 2-approximately maximum-likelihood trees for large alignments. PloS One, 2010, 5(3): e9490. https://doi.org/10.1371/journal.pone.0009490. |
[14] | ALEXANDER D H, NOVEMBRE J, LANGE K. Fast model-based estimation of ancestry in unrelated individuals. Genome Research, 2009, 19(9): 1655-1664. https://doi.org/10.1101/gr.094052.109. |
[15] | DANECEK P, AUTON A, ABECASIS G, et al. The variant call format and VCFtools. Bioinformatics, 2011, 27(15): 2156-2158. https://doi.org/10.1093/bioinformatics/btr330 . |
[16] | ZHANG C, DONG S S, XU J Y, et al. PopLDdecay: a fast and effective tool for linkage disequilibrium decay analysis based on variant call format files. Bioinformatics, 2019, 35(10): 1786-1788. https://doi.org/10.1093/bioinformatics/bty875. |
/
〈 |
|
〉 |