Large-scale complex data have drawn great attention in recent years, which play an important role in information technology and biomedical research. In this thesis, we address three challenging issues: sufficient dimension reduction for longitudinal data, nonignorable missing data with refreshment samples, and large-scale recommender systems.In the first part of this thesis, we incorporate correlation structure in sufficient dimension reduction for longitudinal data. Existing sufficient dimension reduction approaches assuming independence may lead to substantial loss of efficiency. We apply the quadratic inference function to incorporate the correlation information and apply the transformation method to recover the central subspace. The proposed estimators are shown to be consistent and more efficient than the ones assuming independence. In addition, the estimated central subspace is also efficient when the correlation information is taken into account. We compare the proposed method with other dimension reduction approaches through simulation studies, and apply this new approach to an environmental health study.In the second part of this thesis, we address nonignorable missing data which occur frequently in longitudinal studies and can cause biased estimations. Refreshment samples which recruit new subjects in subsequent waves from the original population could mitigate the bias. In this thesis, we introduce a mixed-effects estimating equation approach which enables one to incorporate refreshment samples and recover missing information. We show that the proposed method achieves consistency and asymptotic normality for fixed-effect estimation under shared-parameter models, and we extend it to a more general nonignorable-missing framework. Our finite sample simulation studies show the effectiveness and robustness of the proposed method under different missing mechanisms. In addition, we apply our method to election poll longitudinal survey data with refreshment samples from the 2007-2008 Associated Press–Yahoo! News.In the third part of this thesis, we develop a novel recommender system which track users' preferences and recommend items of interest effectively. In this thesis, we propose a group-specific method to utilize dependency information from users and items which share similar characteristics under the singular value decomposition framework. The new approach is effective for the "cold-start" problem, where new users and new items' information is not available from the existing data collection. One advantage of the proposed model is that we are able to incorporate information from the missing mechanism and group-specific features through clustering based on variables associated with missing patterns. In addition, we propose a new algorithm that embeds a back-fitting algorithm into alternating least squares, which avoids large matrices operation and big memory storage, and therefore makes it feasible to achieve scalable computing. Our simulation studies and MovieLens data analysis both indicate that the proposed group-specific method improves prediction accuracy significantly compared to existing competitive recommender system approaches.
【 预 览 】
附件列表
Files
Size
Format
View
Dimension reduction and efficient recommender system for large-scale complex data