期刊论文

【摘要】

BackgroundAlzheimer’s disease (AD) is a highly heterogeneous disease with diverse trajectories and outcomes observed in clinical populations. Understanding this heterogeneity can enable better treatment, prognosis and disease management. Studies to date have mainly used imaging or cognition data and have been limited in terms of data breadth and sample size. Here we examine the clinical heterogeneity of Alzheimer's disease patients using electronic health records (EHR) to identify and characterise disease subgroups using multiple clustering methods, identifying clusters which are clinically actionable.MethodsWe identified AD patients in primary care EHR from the Clinical Practice Research Datalink (CPRD) using a previously validated rule-based phenotyping algorithm. We extracted and included a range of comorbidities, symptoms and demographic features as patient features. We evaluated four different clustering methods (k-means, kernel k-means, affinity propagation and latent class analysis) to cluster Alzheimer’s disease patients. We compared clusters on clinically relevant outcomes and evaluated each method using measures of cluster structure, stability, efficiency of outcome prediction and replicability in external data sets.ResultsWe identified 7,913 AD patients, with a mean age of 82 and 66.2% female. We included 21 features in our analysis. We observed 5, 2, 5 and 6 clusters in k-means, kernel k-means, affinity propagation and latent class analysis respectively. K-means was found to produce the most consistent results based on four evaluative measures. We discovered a consistent cluster found in three of the four methods composed of predominantly female, younger disease onset (43% between ages 42–73) diagnosed with depression and anxiety, with a quicker rate of progression compared to the average across other clusters.ConclusionEach clustering approach produced substantially different clusters and K-Means performed the best out of the four methods based on the four evaluative criteria. However, the consistent appearance of one particular cluster across three of the four methods potentially suggests the presence of a distinct disease subtype that merits further exploration. Our study underlines the variability of the results obtained from different clustering approaches and the importance of systematically evaluating different approaches for identifying disease subtypes in complex EHR.

【授权许可】

CC BY

【预览】

附件列表
Files	Size	Format	View
RO202203044755290ZK.pdf	1762KB	PDF	download

BMC Medical Informatics and Decision Making
Identifying and evaluating clinical subtypes of Alzheimer’s disease in care electronic health records using unsupervised machine learning

Daniel C. Alexander¹ Frederik Barkhof² Nonie Alexander³ Spiros Denaxas⁴
[1] Centre for Medical Image Computing, Department of Computer Science, University College London, London, UK;Centre for Medical Image Computing, Department of Computer Science, University College London, London, UK;UCL Institute of Neurology, University College London, London, UK;Department of Radiology and Nuclear Medicine, Amsterdam University Medical Centers, Amsterdam, The Netherlands;Institute of Health Informatics, University College London, London, UK;Health Data Research UK, London, UK;Institute of Health Informatics, University College London, London, UK;Health Data Research UK, London, UK;Alan Turing Institute, London, UK;
关键词: Clustering; EHR; Alzheimer's disease; Subtyping; K-means;
DOI : 10.1186/s12911-021-01693-6
来源: Springer
PDF


	文献评价指标
	下载次数：14次	浏览次数：3次

【 摘 要 】

【 授权许可】

【 预 览 】

【摘要】

【授权许可】

【预览】