学位论文详细信息
Data Mining in Tree-Based Models and Large-Scale Contingency Tables
Cross validation;Tree-Based Models;Data mining;Protein structure;Contingency tables
Kim, Seoung Bum ; Industrial and Systems Engineering
University:Georgia Institute of Technology
Department:Industrial and Systems Engineering
关键词: Cross validation;    Tree-Based Models;    Data mining;    Protein structure;    Contingency tables;   
Others  :  https://smartech.gatech.edu/bitstream/1853/6825/1/kim_seoungbum_200505_phd.pdf
美国|英语
来源: SMARTech Repository
PDF
【 摘 要 】

This thesis is composed of two parts. The first part pertains to tree-based models. The second part deals with multiple testing in large-scale contingency tables. Tree-based models have gained enormous popularity in statistical modeling and data mining. We propose a novel tree-pruning algorithm called frontier-based tree-pruning algorithm (FBP). The new method has an order of computational complexity comparable to cost-complexity pruning (CCP). Regarding tree pruning, it provides a full spectrum of information. Numerical study on real data sets reveals a surprise: in the complexity-penalization approach, most of the tree sizes are inadmissible. FBP facilitates a more faithful implementation of cross validation, which is favored by simulations. One of the most common test procedures using two-way contingency tables is the test of independence between two categorizations. Current test procedures such as chi-square or likelihood ratio tests provide overall independency but bring limited information about the nature of the association in contingency tables. We propose an approach of testing independence of categories in individual cells of contingency tables based on a multiple testing framework. We then employ the proposed method to identify the patterns of pair-wise associations between amino acids involved in beta-sheet bridges of proteins. We identify a number of amino acid pairs that exhibit either strong or weak association. These patterns provide useful information for algorithms that predict secondary and tertiary structures of proteins.

【 预 览 】
附件列表
Files Size Format View
Data Mining in Tree-Based Models and Large-Scale Contingency Tables 1110KB PDF download
  文献评价指标  
  下载次数:7次 浏览次数:12次