期刊论文详细信息
BMC Genomics
Revealing the missing expressed genes beyond the human reference genome by RNA-Seq
Research Article
Junyi Qi1  Geng Chen1  Jian Luo1  Pengzhan Hu1  Mingyao Liu1  Tieliu Shi2  Leming Shi3  Ruiyuan Li4 
[1] Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Science, East China Normal University, 200241, Shanghai, China;Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Science, East China Normal University, 200241, Shanghai, China;Shanghai Information Center for Life Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Science, 200031, Shanghai, China;National Center for Toxicological Research, US Food and Drug Administration, 72079, Jefferson, Arkansas, USA;National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, 430070, Wuhan, China;
关键词: Human Reference Genome;    RefSeq Gene;    Macaque Genome;    Transcript Contigs;    Celera Genome;   
DOI  :  10.1186/1471-2164-12-590
 received in 2011-08-07, accepted in 2011-12-02,  发布年份 2011
来源: Springer
PDF
【 摘 要 】

BackgroundThe complete and accurate human reference genome is important for functional genomics researches. Therefore, the incomplete reference genome and individual specific sequences have significant effects on various studies.Resultswe used two RNA-Seq datasets from human brain tissues and 10 mixed cell lines to investigate the completeness of human reference genome. First, we demonstrated that in previously identified ~5 Mb Asian and ~5 Mb African novel sequences that are absent from the human reference genome of NCBI build 36, ~211 kb and ~201 kb of them could be transcribed, respectively. Our results suggest that many of those transcribed regions are not specific to Asian and African, but also present in Caucasian. Then, we found that the expressions of 104 RefSeq genes that are unalignable to NCBI build 37 in brain and cell lines are higher than 0.1 RPKM. 55 of them are conserved across human, chimpanzee and macaque, suggesting that there are still a significant number of functional human genes absent from the human reference genome. Moreover, we identified hundreds of novel transcript contigs that cannot be aligned to NCBI build 37, RefSeq genes and EST sequences. Some of those novel transcript contigs are also conserved among human, chimpanzee and macaque. By positioning those contigs onto the human genome, we identified several large deletions in the reference genome. Several conserved novel transcript contigs were further validated by RT-PCR.ConclusionOur findings demonstrate that a significant number of genes are still absent from the incomplete human reference genome, highlighting the importance of further refining the human reference genome and curating those missing genes. Our study also shows the importance of de novo transcriptome assembly. The comparative approach between reference genome and other related human genomes based on the transcriptome provides an alternative way to refine the human reference genome.

【 授权许可】

Unknown   
© Chen et al; licensee BioMed Central Ltd. 2011. This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

【 预 览 】
附件列表
Files Size Format View
RO202311090277393ZK.pdf 666KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  文献评价指标  
  下载次数:3次 浏览次数:0次