学位论文详细信息
Investigating statistical approaches to handling missing data in the context of the Gateshead Millennium Study
HA Statistics
Gordon, Claire Ann ; McColl, John
University:University of Glasgow
Department:School of Mathematics and Statistics
关键词: Missing Data, Missing Data Mechanisms, Complete Case Analysis, EM Algorithm, Hot-deck Imputation, Multiple Imputation, Gateshead Millennium Study;   
Others  :  http://theses.gla.ac.uk/2312/1/2010gordonmscr..pdf
来源: University of Glasgow
PDF
【 摘 要 】

A commonly occurring problem in all kinds of studies is that ofmissing data. These missing values can occur for a number ofreasons, including equipment malfunctions and, more typically,subjects recruited to a study not participating fully. Inparticular, in a longitudinal study, one ormore of the repeated measurements on a subject might be missing.The way in which missing values are dealt with depends on the dataanalyst's experience with statistical techniques. The most commonway in which data analysts proceed is to use the complete caseanalysis method, i.e. removing cases with missing values for any ofthe variables and running the analysis on the remaining cases.Although this method is very straightforward to implement and isused by the vast majority of data analysts, it can lead to biasedresults unless data are missing completely at random.Complete Caseanalysis can dramatically reduce the sample size of the study, asonly those cases for which all variables are measured are includedin the analysis. Therefore the complete case analysis method is "notgenerally recommended" (Diggle et al., 2002). Alternative approaches tothe complete case analysis method involve filling in (or imputing)values for the incompletecases, making "more efficient use of the available data" (Schafer, 1997).The purpose of this thesis is to compare and contrast the resultsobtained from analysing the relationship between growth and feedingbehaviour in the first year of life using the complete case analysisand three imputation methods: single hot-decking, multiplehot-decking and the EM algorithm. The data used in this researchcome from the Gateshead Millennium Study, a prospective study of acohort of just over 1,000 babies. In practical terms, the purpose ofthe work is to confirm the conclusions from the publishedcomplete-case analysis. It is of more theoretical interest todetermine which imputation methodis the most appropriate for dealing with missing data in this study.Chapter 1 provides an introduction to the problem of missing dataand how they may arise and a description of the Gateshead MillenniumStudy data, to which all the missing data methods will be applied.It concludes by giving theaims of this thesis.Chapter 2 provides an in depth review of various missing dataapproaches and indicates which characteristics of the missing datahave to be considered in order to determine which of theseapproaches can be employed to deal with the missing values. Also inChapter 2, various aspects of the Gateshead Millennium Study dataare reviewed.Measures of growth and feeding behaviour in the firstyear of life are described as these are important variables in thepublished analysis.Chapter 3 assesses how complete the Gateshead Millennium Study datais by producing a detailed description of each of the questions ineach of the questionnaires.This is achieved by examining the WaveNon-response, SectionNon-response and Item Non-response for each of the six questionnaires. Chapter 4 recreates the results from the complete case analyses forthe relationship between development of growth and feeding in thefirst year of life which have already been performed and publishedin the paper - How Does Maternal and Child Feeding BehaviourRelate to Weight Gain and Failure to Thrive? Data From a ProspectiveBirth Cohort (Wright et al., 2006a).This chapter also gives insight as towhether or not it is appropriate to assume that the missing datamechanism is MCAR andtherefore whether or not it is reasonable to believe the results obtained from the complete case analysis. Chapter 5 focusses on the various methods used to impute the missingvalues in the Gateshead Millennium Study data.This chapter beginsby considering the EM Algorithm.It gives details of how the EMAlgorithm was performed and the results obtained.In addition tothe EM Algorithm, this chapter also considers the procedures andresults for Single Imputation and Multiple Imputation byhot-decking. This chapter concludes by comparing the results ofthese methods to one another and alsoto the complete case analysis results from Chapter 4.Finally, Chapter 6 provides a summary of the results from thevarious missing data methods applied and discusses variousalternative methods which could also have been performed.

【 预 览 】
附件列表
Files Size Format View
Investigating statistical approaches to handling missing data in the context of the Gateshead Millennium Study 18540KB PDF download
  文献评价指标  
  下载次数:16次 浏览次数:17次