Journal of Agricultural Big Data ›› 2025, Vol. 7 ›› Issue (1): 96-106.doi: 10.19788/j.issn.2096-6369.100039

Previous Articles     Next Articles

A Multi-Omics Dataset for Functional Gene Mining in Animals

LIU Hong1(), DOU JingWen1, WANG Yue1, LIAO Yong1, LIU XiaoLei1,2, LI XinYun1,2, ZHAO ShuHong1,2, FU YuHua1,2,*()   

  1. 1. Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, College of Animal Science & Technology, Huazhong Agricultural University, Wuhan 430070, China
    2. Hubei Hongshan Laboratory, Wuhan 430070, China
  • Received:2024-06-06 Accepted:2024-09-13 Online:2025-03-26 Published:2025-02-05
  • Contact: FU YuHua

Abstract:

Single-omics data alone is insufficient to comprehensively reveal the complex molecular mechanisms of gene regulation traits. Integrating different types and levels of biological omics data is of great significance for understanding the complex molecular networks within organisms. This dataset provides individual-level omics data (WGS, RNA-Seq, ChIP-Seq, and ATAC-Seq) and genome annotation information for 61,191 individuals from 21 animal species, with an effective data size of 2.8 TB. Additionally, this dataset includes gene and phenotype entity recognition data obtained through deep learning algorithms. Overall, this multi-omics dataset can be used for gene discovery and functional validation of agriculturally important traits, offering valuable resources for cross-species comparative studies. It also supports the construction of models for identifying key genes associated with economic traits in animals and facilitates algorithm research.

Data summary:

Item Description
Dataset name A Multi-Omics Dataset for Functional Gene Mining in Animals
Specific subject area Agronomy
Research topic Animal Multi-Omics Dataset
Time range 2000-2022
Data types and technical formats .txt,.vcf, ped, map, bed, bim, fam
Dataset stucture The dataset consists of five parts:
Functional annotation information for 403,216 genes across 21 species.
Genomic variation data for 10,835 individuals from 21 species, encompassing 877.59 million variations.
Gene expression matrix data for 44,638 individuals from 21 species.
Epigenetic signal matrix data for 5,718 individuals from 21 species, including 124 markers such as H3K27ac.
The pre-labeled gene and phenotype data of 2794237 articles from 21 species.
Volume of dataset 2.8 TB
Key index in dataset Gene functional annotation, genomic variation information, gene expression matrices, epigenetic signal matrices, gene and phenotypic pre-labeled data
Data accessibility https://cstr.cn/17058.11.sciencedb.agriculture.00024
https://doi.org/10.57760/sciencedb.agriculture.00024
PUBLIC, CC BY-NC 4.0
Financial support National Natural Science Foundation of China General Program (32272841); Hubei International Science and technology cooperation project (2022EHB055)

Key words: multi-omics data, cross-species, functional gene mining, individual level, deep learning