学位论文详细信息
Scalable Semi-parametric Methods in Biostatistics
latent class model;variational inference;etiology estimation;log linear model;accelerated failure time model;proportional hazards model;generalized method of moments;precision medicine;Biostatistics
Deng, DetianLessler, Justin T. ;
Johns Hopkins University
关键词: latent class model;    variational inference;    etiology estimation;    log linear model;    accelerated failure time model;    proportional hazards model;    generalized method of moments;    precision medicine;    Biostatistics;   
Others  :  https://jscholarship.library.jhu.edu/bitstream/handle/1774.2/59165/DENG-DISSERTATION-2018.pdf?sequence=1&isAllowed=y
瑞士|英语
来源: JOHNS HOPKINS DSpace Repository
PDF
【 摘 要 】

Individualized health, or precision medicine, is an emerging approach for disease prevention and treatment guided by the individual characteristics of the genome, medical imaging, family history, environment and lifestyle of each person. To achieve this goal, it requires efficient and scalable statistical technologies to decipher the connection between this information and the health outcomes. In this thesis, we present statistical methods in support of the goal of individualized health.In Part I, the primary goal is to provide flexible and efficient estimation to the latent etiology distribution given imperfect measurements. We parameterize the latent etiologic state as a multivariate binary variable, where each binary node represents the presence/absence of an etiologic agent. The multivariate binary measurements are assumed to be conditionally independent given the latent state. Their relation is parameterized by the true positive rates and false positive rates of the measurements. External information extracted from previous literature on the true positive rates are summarized by Beta prior distributions and used to improve the model identifiability. Experts;; knowledge on the competition mechanism among etiologic agents is translated into a sparse correlation structure of the latent state. A scalable Markov Chain Monte Carlo algorithm is proposed for approximating the exact posterior distribution. Also, a variational Bayesian algorithm is developed for fast and even more scalable estimation in case of large-scale problems. We demonstrate the model using the data from the motivating Pneumonia Etiology Research for Child Health (PERCH) study, which aims to provide a comprehensive estimation of the etiology distribution of childhood pneumonia in developing countries.In Part II, the key objective is to improve the efficiency of survival regression estimators by incorporating external information on the population level survival rates. The accelerated failure time (AFT) model and the Cox proportional hazards model are considered. For each model, the first estimating equation is created based on the benchmark semi-parametric estimator (partial-likelihood estimator for Cox and log-rank estimator for AFT), then additional estimating equations are formed based on the auxiliary survival information. The estimating equations are transformed by applying functional delta method to a set of over-identifying moment conditions. Finally, the parameter estimation and model diagnostics are carried out following the standard generalized method of moments (GMM) framework. We show that the new GMM-based estimators are asymptotically and empirically more efficient than the benchmark estimators. These new estimators are applied to a recent retrospective study on the prognosis of pancreatic cancer.

【 预 览 】
附件列表
Files Size Format View
Scalable Semi-parametric Methods in Biostatistics 12131KB PDF download
  文献评价指标  
  下载次数:12次 浏览次数:19次