学位论文详细信息
Data mining and analysis of lung cancer data.
Data mining;Text mining;Lung cancer;Health care;Predictive modeling
Guoxin Tang
University:University of Louisville
Department:Mathematics
关键词: Data mining;    Text mining;    Lung cancer;    Health care;    Predictive modeling;   
Others  :  https://ir.library.louisville.edu/cgi/viewcontent.cgi?article=2417&context=etd
美国|英语
来源: The Universite of Louisville's Institutional Repository
PDF
【 摘 要 】

Lung cancer is the leading cause of cancer death in the United States and the world, with more than 1.3 million deaths worldwide per year. However, because of a lack of effective tools to diagnose Lung Cancer, more than half of all cases are diagnosed at an advanced stage, when surgical resection is unlikely to be feasible. The main purpose of this study is to examine the relationship between patient outcomes and conditions of the patients undergoing different treatments for lung cancer and to develop models to predict the mortality of lung cancer. This study will identify the demographic, finance, and clinical factors related to the diagnosis or mortality of Lung Cancer to help physicians and patients in their decision-making. We combined Text Miner and Cluster analysis to identify the claim data for Lung Cancer and to determine the category of diagnosis, treatment procedures and medication treatments for those patients. Moreover, the claims data were used to define severity level and treatment categories. Compared with using diagnosis codes directly, the combination of text mining and cluster analysis is more efficient and captures more useful information for further analysis. In order to analyze the mortality of Lung Cancer, we also found that survival analysis is appropriate to preprocess the data for the relationship between a predictor variable of interest and the time of an event. The proportional hazard model examined the effects of different treatment clusters using a hazard ratio and the proportional effect of a treatment cluster (treatment procedure or medication treatment) may vary with time. A decision tree was built to generate rules for identifying high risk lung cancer cases among the regular inpatient population. Two primary data sets have been used in this study, the Nationwide Inpatient Sample (NIS) and the Thomson MedStat MarketScan data. Kernel density estimation was used for NIS to examine the relationship between Age, Length of stay, Diagnosis Categories, Total Cost and Lung Cancer by visualization. The Kaplan-Meier method and Cox proportional hazard model are used for the Medstat data to discover the relationship between the factors and the target variable for more detail. Time series and predictive modeling are used to predict the total cost for hospital decision making, the mortality of Lung cancer based on the historical data and to generate rules to identify the diagnosis of Lung cancer. Older patients are more likely to have lung cancers that would lead to

【 预 览 】
附件列表
Files Size Format View
Data mining and analysis of lung cancer data. 21554KB PDF download
  文献评价指标  
  下载次数:67次 浏览次数:45次