农业大数据学报 ›› 2025, Vol. 7 ›› Issue (1): 96-106.doi: 10.19788/j.issn.2096-6369.100039

• 数据资源 • 上一篇    下一篇

一种面向功能基因挖掘的动物多组学数据集

刘洪1(), 窦婧文1, 王越1, 廖勇1, 刘小磊1,2, 李新云1,2, 赵书红1,2, 付玉华1,2,*()   

  1. 1.农业动物遗传育种与繁殖教育部重点实验室,华中农业大学动物科学技术学院,武汉 430070
    2.湖北洪山实验室,武汉 430070
  • 收稿日期:2024-06-06 接受日期:2024-09-13 出版日期:2025-03-26 发布日期:2025-02-05
  • 通讯作者: 付玉华,E-mail:yhfu@mail.hzau.edu.cn
  • 作者简介:刘洪,E-mail:hong_liu2@webmail.hzau.edu.cn
  • 基金资助:
    国家自然科学基金面上项目(32272841);湖北国际科技合作项目(2022EHB055)

A Multi-Omics Dataset for Functional Gene Mining in Animals

LIU Hong1(), DOU JingWen1, WANG Yue1, LIAO Yong1, LIU XiaoLei1,2, LI XinYun1,2, ZHAO ShuHong1,2, FU YuHua1,2,*()   

  1. 1. Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, College of Animal Science & Technology, Huazhong Agricultural University, Wuhan 430070, China
    2. Hubei Hongshan Laboratory, Wuhan 430070, China
  • Received:2024-06-06 Accepted:2024-09-13 Published:2025-03-26 Online:2025-02-05

摘要:

单一的组学数据难以全面揭示基因调控性状的复杂分子机制,整合不同类型和层次的生物组学数据对于理解生物体内复杂的分子网络具有重要的意义。本数据集提供了包含21个动物物种的61191个个体水平组学数据(WGS、RNA-Seq、ChIP-Seq和ATAC-Seq)和基因组注释信息,有效数据规模为2.8 TB。此外,本数据集还收录了基于深度学习算法得到的基因与表型实体识别数据。总的来说,该多组学数据集可用于农业重要性状的基因发掘和功能验证,能够为跨物种比较研究提供有价值的资源,也可更好地服务于动物经济性状关键基因识别模型构建以及算法研究。

数据摘要:

项目 描述
数据库(集)名称 一种面向功能基因挖掘的动物多组学数据集
所属学科 农学
研究主题 动物多组学数据集
数据时间范围 2000-2022年
数据类型与技术格式 .txt,.vcf,ped,map,bed,bim,fam
数据库(集)组成 数据集由五部分组成:
21个物种403216个基因的功能注释信息。
21个物种10835个个体的基因组变异数据,共包含877.59M变异。
21个物种44638个个体的基因表达矩阵数据。
21个物种5718个个体的表观信号矩阵数据,包含H3K27ac等124个marker。
21个物种2794237篇文献的基因、表型预标注数据。
数据量 2.8 TB
主要数据指标 基因功能注释、基因组变异信息、基因表达矩阵、表观信号矩阵、基因和表型预标注数据
数据可用性 https://cstr.cn/17058.11.sciencedb.agriculture.00024
https://doi.org/10.57760/sciencedb.agriculture.00024
经费支持 国家自然科学基金面上项目(32272841);湖北国际科技合作项目(2022EHB055)

关键词: 多组学数据, 跨物种, 功能基因挖掘, 个体水平, 深度学习

Abstract:

Single-omics data alone is insufficient to comprehensively reveal the complex molecular mechanisms of gene regulation traits. Integrating different types and levels of biological omics data is of great significance for understanding the complex molecular networks within organisms. This dataset provides individual-level omics data (WGS, RNA-Seq, ChIP-Seq, and ATAC-Seq) and genome annotation information for 61,191 individuals from 21 animal species, with an effective data size of 2.8 TB. Additionally, this dataset includes gene and phenotype entity recognition data obtained through deep learning algorithms. Overall, this multi-omics dataset can be used for gene discovery and functional validation of agriculturally important traits, offering valuable resources for cross-species comparative studies. It also supports the construction of models for identifying key genes associated with economic traits in animals and facilitates algorithm research.

Data summary:

Item Description
Dataset name A Multi-Omics Dataset for Functional Gene Mining in Animals
Specific subject area Agronomy
Research topic Animal Multi-Omics Dataset
Time range 2000-2022
Data types and technical formats .txt,.vcf, ped, map, bed, bim, fam
Dataset stucture The dataset consists of five parts:
Functional annotation information for 403,216 genes across 21 species.
Genomic variation data for 10,835 individuals from 21 species, encompassing 877.59 million variations.
Gene expression matrix data for 44,638 individuals from 21 species.
Epigenetic signal matrix data for 5,718 individuals from 21 species, including 124 markers such as H3K27ac.
The pre-labeled gene and phenotype data of 2794237 articles from 21 species.
Volume of dataset 2.8 TB
Key index in dataset Gene functional annotation, genomic variation information, gene expression matrices, epigenetic signal matrices, gene and phenotypic pre-labeled data
Data accessibility https://cstr.cn/17058.11.sciencedb.agriculture.00024
https://doi.org/10.57760/sciencedb.agriculture.00024
PUBLIC, CC BY-NC 4.0
Financial support National Natural Science Foundation of China General Program (32272841); Hubei International Science and technology cooperation project (2022EHB055)

Key words: multi-omics data, cross-species, functional gene mining, individual level, deep learning