期刊论文详细信息
BMC Bioinformatics
Geminivirus data warehouse: a database enriched with machine learning approaches
Database
Thales F. M. Carvalho1  Fabio R. Cerqueira2  Jose Cleydson F. Silva3  Renildes L. F. Fontes4  Fabyano F. Silva5  Marcos F. Basso6  Roberto R. Sobrinho6  Welison A. Pereira6  Maximiller Dal-Bianco6  Michihito Deguchi6  Otávio J. B. Brustolini6  Anésia A. Santos7  Elizabeth P. B. Fontes8  Francisco Murilo Zerbini9  Pedro M. P. Vidigal1,10 
[1] Departamento de Informática, Universidade Federal de Viçosa, Viçosa, Brazil;Departamento de Informática, Universidade Federal de Viçosa, Viçosa, Brazil;Departamento de Engenharia de Produção, Universidade Federal Fluminense, Petrópolis, Rio de Janeiro, Brazil;Departamento de Informática, Universidade Federal de Viçosa, Viçosa, Brazil;National Institute of Science and Technology in Plant-Pest Interactions/BIOAGRO, Universidade Federal de Viçosa, Viçosa, Brazil;Departamento de Solos, Universidade Federal de Viçosa, Viçosa, Brazil;Departamento de Zootecnia, Universidade Federal de Viçosa, Viçosa, Brazil;National Institute of Science and Technology in Plant-Pest Interactions/BIOAGRO, Universidade Federal de Viçosa, Viçosa, Brazil;National Institute of Science and Technology in Plant-Pest Interactions/BIOAGRO, Universidade Federal de Viçosa, Viçosa, Brazil;Departamento de Biologia Geral, Universidade Federal de Viçosa, Viçosa, Brazil;National Institute of Science and Technology in Plant-Pest Interactions/BIOAGRO, Universidade Federal de Viçosa, Viçosa, Brazil;Departamento de Bioquímica e Biologia Molecular, Universidade Federal de Viçosa, Viçosa, Brazil;National Institute of Science and Technology in Plant-Pest Interactions/BIOAGRO, Universidade Federal de Viçosa, Viçosa, Brazil;Departamento de Fitopatologia, Universidade Federal de Viçosa, Viçosa, MG, Brazil;Núcleo de Biomoléculas, Universidade Federal de Viçosa, Viçosa, MG, Brazil;
关键词: Machine learning;    Random Forest;    Knowledge discovery;    Data mining;    Data Warehouse;    Geminivirus;   
DOI  :  10.1186/s12859-017-1646-4
 received in 2016-12-23, accepted in 2017-04-25,  发布年份 2017
来源: Springer
PDF
【 摘 要 】

BackgroundThe Geminiviridae family encompasses a group of single-stranded DNA viruses with twinned and quasi-isometric virions, which infect a wide range of dicotyledonous and monocotyledonous plants and are responsible for significant economic losses worldwide. Geminiviruses are divided into nine genera, according to their insect vector, host range, genome organization, and phylogeny reconstruction. Using rolling-circle amplification approaches along with high-throughput sequencing technologies, thousands of full-length geminivirus and satellite genome sequences were amplified and have become available in public databases. As a consequence, many important challenges have emerged, namely, how to classify, store, and analyze massive datasets as well as how to extract information or new knowledge. Data mining approaches, mainly supported by machine learning (ML) techniques, are a natural means for high-throughput data analysis in the context of genomics, transcriptomics, proteomics, and metabolomics.ResultsHere, we describe the development of a data warehouse enriched with ML approaches, designated geminivirus.org. We implemented search modules, bioinformatics tools, and ML methods to retrieve high precision information, demarcate species, and create classifiers for genera and open reading frames (ORFs) of geminivirus genomes.ConclusionsThe use of data mining techniques such as ETL (Extract, Transform, Load) to feed our database, as well as algorithms based on machine learning for knowledge extraction, allowed us to obtain a database with quality data and suitable tools for bioinformatics analysis. The Geminivirus Data Warehouse (geminivirus.org) offers a simple and user-friendly environment for information retrieval and knowledge discovery related to geminiviruses.

【 授权许可】

CC BY   
© The Author(s). 2017

【 预 览 】
附件列表
Files Size Format View
RO202311094427491ZK.pdf 1291KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  • [44]
  • [45]
  • [46]
  • [47]
  • [48]
  • [49]
  • [50]
  • [51]
  文献评价指标  
  下载次数:6次 浏览次数:1次