期刊论文详细信息
BMC Bioinformatics
Interactive knowledge discovery and data mining on genomic expression data with numeric formal concept analysis
Methodology Article
Francisco J Valverde-Albacete1  Jose M González-Calabozo1  Carmen Peláez-Moreno1 
[1] Department of Signal Theory and Communications, University Carlos III Madrid, Avda. Universidad, 30, Leganés (Madrid), Spain;
关键词: Biclustering;    Gene expression data;    Formal concept analysis;    Exploratory data analysis;    Gene set enrichment;    Knowledged discovery;    Data mining;   
DOI  :  10.1186/s12859-016-1234-z
 received in 2016-02-01, accepted in 2016-09-01,  发布年份 2016
来源: Springer
PDF
【 摘 要 】

BackgroundGene Expression Data (GED) analysis poses a great challenge to the scientific community that can be framed into the Knowledge Discovery in Databases (KDD) and Data Mining (DM) paradigm. Biclustering has emerged as the machine learning method of choice to solve this task, but its unsupervised nature makes result assessment problematic. This is often addressed by means of Gene Set Enrichment Analysis (GSEA).ResultsWe put forward a framework in which GED analysis is understood as an Exploratory Data Analysis (EDA) process where we provide support for continuous human interaction with data aiming at improving the step of hypothesis abduction and assessment. We focus on the adaptation to human cognition of data interpretation and visualization of the output of EDA.First, we give a proper theoretical background to bi-clustering using Lattice Theory and provide a set of analysis tools revolving around K\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$\mathcal {K}$\end{document}-Formal Concept Analysis (K\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$\mathcal {K}$\end{document}-FCA), a lattice-theoretic unsupervised learning technique for real-valued matrices.By using different kinds of cost structures to quantify expression we obtain different sequences of hierarchical bi-clusterings for gene under- and over-expression using thresholds. Consequently, we provide a method with interleaved analysis steps and visualization devices so that the sequences of lattices for a particular experiment summarize the researcher’s vision of the data. This also allows us to define measures of persistence and robustness of biclusters to assess them.Second, the resulting biclusters are used to index external omics databases—for instance, Gene Ontology (GO)—thus offering a new way of accessing publicly available resources. This provides different flavors of gene set enrichment against which to assess the biclusters, by obtaining their p-values according to the terminology of those resources.We illustrate the exploration procedure on a real data example confirming results previously published.ConclusionsThe GED analysis problem gets transformed into the exploration of a sequence of lattices enabling the visualization of the hierarchical structure of the biclusters with a certain degree of granularity. The ability of FCA-based bi-clustering methods to index external databases such as GO allows us to obtain a quality measure of the biclusters, to observe the evolution of a gene throughout the different biclusters it appears in, to look for relevant biclusters—by observing their genes and what their persistence is—to infer, for instance, hypotheses on their function.

【 授权许可】

CC BY   
© The Author(s) 2016

【 预 览 】
附件列表
Files Size Format View
RO202311101909173ZK.pdf 1603KB PDF download
Fig. 1 874KB Image download
Fig. 3 683KB Image download
Fig. 7 1862KB Image download
Fig. 4 224KB Image download
1920KB Image download
MediaObjects/13049_2023_1122_MOESM1_ESM.docx 133KB Other download
12936_2023_4756_Article_IEq3.gif 1KB Image download
Fig. 4 137KB Image download
MediaObjects/13049_2023_1122_MOESM2_ESM.docx 20KB Other download
Fig. 1 303KB Image download
Fig. 5 152KB Image download
12960_2017_220_Article_IEq2.gif 1KB Image download
Fig. 2 234KB Image download
MediaObjects/41408_2023_927_MOESM1_ESM.png 1051KB Other download
Fig. 6 47KB Image download
Fig. 1 86KB Image download
Fig. 5 747KB Image download
Fig. 7 1484KB Image download
Fig. 10 1239KB Image download
Fig. 2 86KB Image download
Fig. 1 2460KB Image download
Fig. 3 305KB Image download
Fig. 3 77KB Image download
12937_2016_133_Article_IEq1.gif 1KB Image download
Fig. 4 62KB Image download
Fig. 4 79KB Image download
MediaObjects/13068_2023_2399_MOESM4_ESM.xlsx 12KB Other download
Fig. 6 54KB Image download
Fig. 5 91KB Image download
Fig. 3 254KB Image download
Fig. 6 90KB Image download
12951_2017_292_Article_IEq1.gif 1KB Image download
12951_2015_155_Article_IEq62.gif 1KB Image download
【 图 表 】

12951_2015_155_Article_IEq62.gif

12951_2017_292_Article_IEq1.gif

Fig. 6

Fig. 3

Fig. 5

Fig. 6

Fig. 4

Fig. 4

12937_2016_133_Article_IEq1.gif

Fig. 3

Fig. 3

Fig. 1

Fig. 2

Fig. 10

Fig. 7

Fig. 5

Fig. 1

Fig. 6

Fig. 2

12960_2017_220_Article_IEq2.gif

Fig. 5

Fig. 1

Fig. 4

12936_2023_4756_Article_IEq3.gif

Fig. 4

Fig. 7

Fig. 3

Fig. 1

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  • [44]
  • [45]
  • [46]
  文献评价指标  
  下载次数:16次 浏览次数:8次