BMC Medical Research Methodology | |
A comparison of three clustering methods for finding subgroups in MRI, SMS or clinical data: SPSS TwoStep Cluster analysis, Latent Gold and SNOB | |
Alice Kongsted3  Rikke K Jensen1  Peter Kent2  | |
[1] Research Department, Spine Centre of Southern Denmark, Hospital Lillebaelt, Institute of Regional Health Services Research, University of Southern Denmark, Middelfart, Denmark;School of Sports Science and Clinical Biomechanics, University of Southern Denmark, Campusvej 55, Odense M 5230, Denmark;Nordic Institute of Chiropractic and Clinical Biomechanics, University of Southern Denmark, Odense, Denmark | |
关键词: SMS; MRI; Reproducibility; Head-to-head comparison; Latent Class Analysis; Cluster analysis; | |
Others : 1090789 DOI : 10.1186/1471-2288-14-113 |
|
received in 2013-09-06, accepted in 2014-09-24, 发布年份 2014 | |
【 摘 要 】
Background
There are various methodological approaches to identifying clinically important subgroups and one method is to identify clusters of characteristics that differentiate people in cross-sectional and/or longitudinal data using Cluster Analysis (CA) or Latent Class Analysis (LCA). There is a scarcity of head-to-head comparisons that can inform the choice of which clustering method might be suitable for particular clinical datasets and research questions. Therefore, the aim of this study was to perform a head-to-head comparison of three commonly available methods (SPSS TwoStep CA, Latent Gold LCA and SNOB LCA).
Methods
The performance of these three methods was compared: (i) quantitatively using the number of subgroups detected, the classification probability of individuals into subgroups, the reproducibility of results, and (ii) qualitatively using subjective judgments about each program’s ease of use and interpretability of the presentation of results.
We analysed five real datasets of varying complexity in a secondary analysis of data from other research projects. Three datasets contained only MRI findings (n = 2,060 to 20,810 vertebral disc levels), one dataset contained only pain intensity data collected for 52 weeks by text (SMS) messaging (n = 1,121 people), and the last dataset contained a range of clinical variables measured in low back pain patients (n = 543 people). Four artificial datasets (n = 1,000 each) containing subgroups of varying complexity were also analysed testing the ability of these clustering methods to detect subgroups and correctly classify individuals when subgroup membership was known.
Results
The results from the real clinical datasets indicated that the number of subgroups detected varied, the certainty of classifying individuals into those subgroups varied, the findings had perfect reproducibility, some programs were easier to use and the interpretability of the presentation of their findings also varied. The results from the artificial datasets indicated that all three clustering methods showed a near-perfect ability to detect known subgroups and correctly classify individuals into those subgroups.
Conclusions
Our subjective judgement was that Latent Gold offered the best balance of sensitivity to subgroups, ease of use and presentation of results with these datasets but we recognise that different clustering methods may suit other types of data and clinical research questions.
【 授权许可】
2014 Kent et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150128163339741.pdf | 883KB | download | |
Figure 6. | 23KB | Image | download |
Figure 5. | 86KB | Image | download |
Figure 4. | 45KB | Image | download |
Figure 3. | 34KB | Image | download |
Figure 2. | 51KB | Image | download |
Figure 1. | 47KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
【 参考文献 】
- [1]Hill JC, Whitehurst DG, Lewis M, Bryan S, Dunn K, Foster NE, Konstantinou K, Main CJ, Mason E, Somerville S, Sowden G, Vohora K, Hay EM: Comparison of stratified primary care management for low back pain with current best practice (STarT Back): a randomised controlled trial. Lancet 2011, 378(9802):1560-1571.
- [2]Hingorani AD, Windt DA, Riley RD, Abrams K, Moons KG, Steyerberg EW, Schroter S, Sauerbrei W, Altman DG, Hemingway H: Prognosis research strategy (PROGRESS) 4: Stratified medicine research. BMJ 2013, 346:e5793.
- [3]Lim SS, Vos T, Flaxman AD, Danaei G, Shibuya K, Adair-Rohani H, Amann M, Anderson HR, Andrews KG, Aryee M, Atkinson C, Bacchus LJ, Bahalim AN, Balakrishnan K, Balmes J, Barker-Collo S, Baxter A, Bell ML, Blore JD, Blyth F, Bonner C, Borges G, Bourne R, Boussinesq M, Brauer M, Brooks P, Bruce NG, Brunekreef B, Bryan-Hancock C, Bucello C, et al.: A comparative risk assessment of burden of disease and injury attributable to 67 risk factors and risk factor clusters in 21 regions, 1990–2010: a systematic analysis for the Global Burden of Disease Study 2010. Lancet 2012, 380(9859):2224-2260.
- [4]Jensen RK, Jensen TS, Kjaer P, Kent P: Can pathoanatomical pathways of degeneration in lumbar motion segments be identified by clustering MRI findings. BMC Musculoskelet Disord 2013, 14(1):198. BioMed Central Full Text
- [5]Takatalo J, Karppinen J, Niinimaki J, Taimela S, Mutanen P, Sequeiros RB, Nayha S, Jarvelin MR, Kyllonen E, Tervonen O: Association of modic changes, Schmorl's nodes, spondylolytic defects, high-intensity zone lesions, disc herniations, and radial tears with low back symptom severity among young Finnish adults. Spine 2012, 37(14):1231-1239.
- [6]Barban N, Billari FC: Classifying life course trajectories: a comparison of latent class and sequence analysis. J R Stat Soc 2012, 61(5):765-784.
- [7]Axen I, Bodin L, Bergstrom G, Halasz L, Lange F, Lovgren PW, Rosenbaum A, Leboeuf-Yde C, Jensen I: Clustering patients on the basis of their individual course of low back pain over a six month period. BMC Musculoskelet Disord 2011, 12:99. BioMed Central Full Text
- [8]Kent P, Keating JL, Leboeuf-Yde C: Research methods for subgrouping low back pain. BMC Med Res Methodol 2010, 10:62. doi:10.1186/1471-2288-10-62 BioMed Central Full Text
- [9]Klebanoff MA: Subgroup analysis in obstetrics clinical trials. Am J Obstet Gynecol 2007, 197:119-122.
- [10]Flynn T, Fritz JW, Whitman M, Wainner RS, Magel J, Rendeiro D, Butler B, Garber M, Allison S: A clinical prediction rule for classifying patients with low back pain who demonstrate short-term improvement with spinal manipulation. Spine 2002, 27(24):2835-2843.
- [11]Beneciuk JM, Robinson ME, George SZ: Low back pain subgroups using fear-avoidance model measures: results of a cluster analysis. Clin J Pain 2012, 28(8):658-666.
- [12]Bacher J, Wenzig K, Vogler M: SPSS TwoStep Cluster – a first evaluation. In Work and discussion paper. Erlangen-Nuremberg, Germany: Department of Sociology, Social Science Institute, Friedrich-Alexander-University; 2004:1-30.
- [13]Gelbard R, Goldman O, Spiegler I: Investigating diversity of clustering methods: An empirical comparison. Data Knowl Eng 2007, 63:155-166.
- [14]Magidson J, Vermunt JK: Latent class models for clustering: A comparison with k-means. Can J Market Res 2002, 20:1-9.
- [15]Haughton D, Legrand P, Woolford S: Review of three Latent Class Cluster Analysis packages: Latent GOLD, poLCA, and MCLUST. Am Stat 2009, 63(1):81-91.
- [16]SPSS: SPSS Base 17.0 Users guide. Chicago, IL, USA: SPSS Inc; 2009.
- [17]Vermunt JK, Magidson J: Latent Gold 4.0 users's guide. Belmont, Massachusetts, USA: Statistical Innovations Inc; 2005.
- [18]Wallace CS: Statistical and inductive inference by minimum message length. New York, USA: Springer; 2005.
- [19]Wallace CS, Boulton DM: An information measure for classification. Comput J 1968, 11(2):185-194.
- [20]Wallace CS, Dowe DL: MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions. Stat Comput 2000, 10(1):73-83.
- [21]Kjaer P, Korsholm L, Bendix T, Sorensen JS, Leboeuf-Yde C: Modic changes and their associations with clinical findings. Eur Spine J 2006, 15:1312-1319.
- [22]Jensen TS, Sorensen JS, Kjaer P: Intra- and interobserver reproducibility of vertebral endplate signal (modic) changes in the lumbar spine: The Nordic modic consensus group classification. Acta Radiol 2007, 48:748-754.
- [23]Jensen RK, Leboeuf-Yde C, Wedderkopp N, Sorensen JS, Manniche C: Rest versus exercise as treatment for patients with low back pain and Modic changes. A randomized controlled clinical trial. BMC Med 2012, 10:22. BioMed Central Full Text
- [24]Albert HB, Briggs AM, Kent P, Byrhagen A, Hansen C, Kjaergaard K: The prevalence of MRI-defined spinal pathoanatomies and their association with modic changes in individuals seeking care for low back pain. Eur Spine J 2011, 20(8):1355-1362.
- [25]Kent P, Briggs AM, Albert HB, Byrhagen A, Hansen C, Kjaergaard K, Jensen TS: Inexperienced clinicians can extract pathoanatomic information from MRI narrative reports with high reproducibility for use in research/quality assurance. Chiropr Man Therap 2011, 19(1):16. BioMed Central Full Text
- [26]Eirikstoft H, Kongsted A: Patient characteristics in low back pain subgroups based on an existing classification system. A descriptive cohort study in chiropractic practice. Man Ther 2014, 19(1):65-71.
- [27]Kent P, Kongsted A: Identifying clinical course patterns in SMS data using cluster analysis. Chiropr Man Therap 2012, 20(1):20. BioMed Central Full Text
- [28]Kongsted A, Johannesen E, Leboeuf-Yde C: Feasibility of the STarT back screening tool in chiropractic clinics: a cross-sectional study of patients with low back pain. Chiropr Man Therap 2011, 19:10. BioMed Central Full Text
- [29]Eshghi A, Haughton D, Legrand P, Skaletsky M, Woolford S: Identifying groups: A comparison of methodologies. J Data Sci 2011, 9:271-291.
- [30]Twisk J, Hoekstra T: Classifying developmental trajectories over time should be done with great caution: a comparison between methods. J Clin Epidemiol 2012, 65(10):1078-1087.