Individualized modeling and multi-modality data integration have experienced an explosive growth in recent years, which have many important applications in biomedical research, personalized education and marketing. Conventional statistical models usually fail to capture significant variation due to subject-specific effects and heterogeneity of data from multiple sources. Consequently, it has become very critical to incorporate individuals’ and modalities’ heterogeneous characteristics in order to efficiently explore the data structure and enhance the prediction power. In this thesis, we address three challenging issues: mixture modeling for longitudinal data, individualized variable selection and multi-modality tensor learning with an application in medical imaging analysis.In the first part of the thesis, we develop a model-based subgrouping method for longitudinal data. Specifically, we propose an unbiased estimating equation approach for a two-component mixture model with correlated response data. In contrast to most existing longitudinal data clustering methods, the proposed model allows subgroup membership change for each individual over time. Furthermore, we incorporate correlation structure on unobservable latent indicator variables. Another advantage our approach is that we do not require any information about joint likelihood function for each subject. The proposed model is shown to have more efficient parameter estimators in both mixing proportions and component densities. In addition, by utilizing within-subject serial correlations, the proposed approach enhances classification power compared to existing methods, especially for those boundary observations.In the second part of the thesis, we propose an individualized variable selection approach to select different relevant variables for different individuals. The conventional homogeneous model, which assumes all subjects share the same effects of certain predictors, may wash out important information due to heterogeneous variation. For example, in personalized medicine, some individuals could have positive responses to the treatment while some individuals could have negative ones. Hence the population average effect could be close to zero. In this thesis, we construct a separation penalty with multi-directional shrinkages including zero, which facilitates individualized modeling to distinguish strong signals from noisy ones. As a byproduct, the proposed model identifies subgroups among which individuals share similar effects, and thus improves estimation efficiency and personalized prediction accuracy. Finite sample simulation studies and an application to HIV longitudinal data demonstrate the model efficiency and the prediction power of the new approach compared to a variety of existing penalization models.In the third part of the thesis, we are interested in employing medical imaging data for diagnosis. This work is motivated by breast cancer imaging data produced by a multimodality multiphoton optical imaging technique. We develop an innovative multilayer tensor learning method to predict disease status effectively through utilizing subject-wise imaging information. In particular, we propose an individualized multilayer model which leverages an additional layer of individual structure of imaging shared by multiple modalities in addition to employing a high-order tensor decomposition shared by populations. One major advantage of our approach is that we are able to capture the spatial information of microvesicles observed in certain modalities of optical imaging through integrating multimodality imaging data. Our simulation studies and real data analysis both indicate that the proposed multilayer learning method improves prediction accuracy significantly compared to existing competitive statistical and machine learning methods.
【 预 览 】
附件列表
Files
Size
Format
View
Individualized learning and integration for multi-modality data