BMC Medical Imaging | |
Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool | |
Allan Hanbury1  Abdel Aziz Taha1  | |
[1] TU Wien, Institute of Software Technology and Interactive Systems, Favoritenstrasse 9-11, Vienna A-1040, Austria | |
关键词: Metric selection; Medical volume segmentation; Evaluation tool; Evaluation metrics; | |
Others : 1222834 DOI : 10.1186/s12880-015-0068-x |
|
received in 2014-12-15, accepted in 2015-07-09, 发布年份 2015 | |
【 摘 要 】
Background
Medical Image segmentation is an important image processing step. Comparing images to evaluate the quality of segmentation is an essential part of measuring progress in this research area. Some of the challenges in evaluating medical segmentation are: metric selection, the use in the literature of multiple definitions for certain metrics, inefficiency of the metric calculation implementations leading to difficulties with large volumes, and lack of support for fuzzy segmentation by existing metrics.
Result
First we present an overview of 20 evaluation metrics selected based on a comprehensive literature review. For fuzzy segmentation, which shows the level of membership of each voxel to multiple classes, fuzzy definitions of all metrics are provided. We present a discussion about metric properties to provide a guide for selecting evaluation metrics. Finally, we propose an efficient evaluation tool implementing the 20 selected metrics. The tool is optimized to perform efficiently in terms of speed and required memory, also if the image size is extremely large as in the case of whole body MRI or CT volume segmentation. An implementation of this tool is available as an open source project.
Conclusion
We propose an efficient evaluation tool for 3D medical image segmentation using 20 evaluation metrics and provide guidelines for selecting a subset of these metrics that is suitable for the data and the segmentation task.
【 授权许可】
2015 Taha and Hanbury.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150828021410952.pdf | 2218KB | download | |
Fig. 8. | 24KB | Image | download |
Fig. 7. | 18KB | Image | download |
Fig. 6. | 36KB | Image | download |
Fig. 5. | 19KB | Image | download |
Fig. 4. | 37KB | Image | download |
Fig. 3. | 173KB | Image | download |
Fig. 2. | 113KB | Image | download |
Fig. 1. | 43KB | Image | download |
【 图 表 】
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
【 参考文献 】
- [1]Zou KH, Warfield SK, Baharatha A, Tempany C, Kaus MR, Haker SJ et al.. Statistical validation of image segmentation quality based on a spatial overlap index. Academic Radiology. 2004; 11:178-89.
- [2]Zou KH, Wells WM, Kikinis R, Warfield SK. Three validation metrics for automated probabilistic image segmentation of brain tumours. Stat Med. 2004; 23:1259-82.
- [3]Kennedy DN, Makris N, Verne SC, Worth AJ. Neuroanatomical segmentation in MRI: Technological objectives. IJPRAI. 1997; 11(8):1161-87.
- [4]Warfield SK, Westin CF, Guttmann CRG, Albert MS, Jolesz FA, Kikinis R. Fractional segmentation of white matter. In: Proceedings of Second International Conference on Medical Imaging Computing and Computer Assisted Interventions: 1999. p. 62–71. doi:10.1007/10704282_7.
- [5]Shi R, Ngan KN, Li S. The objective evaluation of image object segmentation quality. ACIVS. 2013; 8192:470-9.
- [6]Fenster A, Chiu B. Evaluation of segmentation algorithms for medical imaging. In: Conf Proc IEEE Eng Med Biol Soc. Shanghai: 2005. p. 7186–189.
- [7]Vinh NX, Epps J, Bailey J. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. J Mach Learn Res. 2010; 11:2837-854.
- [8]Gerig G, Jomier M, Chakos M. Valmet: A new validation tool for assessing and improving 3D object segmentation. In: Proceedings of the 4th International Conference on Medical Image Computing and Computer-Assisted Intervention: 2001. p. 516–23. doi:10.1007/3-540-45468-3_62.
- [9]Jr. Maurer CR, Qi R, Raghavan V. A linear time algorithm for computing exact Euclidean distance transforms of binary images in arbitrary dimensions. IEEE Trans Pattern Anal Mach Intell. 2003; 25(2):265-70.
- [10]Udupa JK, LeBlanc VR, Zhuge Y, Imielinska C, Schmidt H, Currie LM et al.. A framework for evaluating image segmentation algorithms. Comput Med Imaging Graph. 2006; 30(2):75-87.
- [11]Klement EP, Pap E, Mesiar R. Trends in logic. Springer, Netherlands; 2000. https://books. google.at/books?id=9HawzJbmXHUC
- [12]Campello RJGB. A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recognit Lett. 2007; 28(7):833-41.
- [13]Dice LR. Measures of the amount of ecologic association between species. Ecology. 1945; 26(3):297-302.
- [14]Jaccard P. The distribution of the flora in the alpine zone. New Phytologist. 1912; 11(2):37-50.
- [15]Cardenes R, de Luis-Garcia R, Bach-Cuadra M. A multidimensional segmentation evaluation for medical image data. Comput Methods Prog Biomed. 2009; 96(2):108-24.
- [16]Al-Faris AQ, Ngah UK, Isa NAM, Shuaib IL. MRI breast skin-line segmentation and removal using integration method of level set active contour and morphological thinning algorithms. J Med Sci. 2013. doi:10.3923/jms.2012.286.291.
- [17]Crum WR, Camara O, Hill DLG. Generalized overlap measures for evaluation and validation in medical image analysis. IEEE Trans Med Imaging. 2006; 25(11):1451-61.
- [18]Chinchor N. MUC-4 evaluation metrics. In: Proceedings of the 4th Conference on Message Understanding: 1992. p. 22–9. doi:10.3115/1072064.1072067.
- [19]Rijsbergen CJV. Information Retrieval. Butterworth-Heinemann, Newton, MA, USA; 1979.
- [20]Martin DR, Fowlkes C, Tal D, Malik J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proc. 8th Int’l Conf. Computer Vision. Vancouver, BC: 2001. p. 416–23.
- [21]Reddy AR, Prasad EV, Reddy LSS. Abnormality detection of brain mr image segmentation using iterative conditional mode algorithm. Int J Appl Inform Syst. 2013; 5(2):56-66.
- [22]Vadaparthi N, Yarramalle S, Penumatsa SV, Murthy PSR. Segmentation of brain mr images based on finite skew gaussian mixture model with fuzzy c-means clustering and em algorithm. Int J Comput Appl. 2011; 28(10):18-26.
- [23]Reddy AR, Prasad EV, Reddy LSS. Comparative analysis of brain tumor detection using different segmentation techniques. Int J Comput Appl. 2013; 82(14):14-28.
- [24]Brennan RL, Light RJ. Measuring agreement when two observers classify people into categories not defined in advance. Br J Math Stat Psychol. 1974; 27(2). doi:10.1111/j.2044-8317.1974.tb00535.x.
- [25]Hallermeier E, Rifqi M, Henzgen S, Senge R. Comparing fuzzy partitions: A generalization of the rand index and related measures. IEEE T Fuzzy Syst. 2012. doi:10.1109/TFUZZ.2011.2179303.
- [26]Brouwer RK. Extending the Rand, adjusted Rand and Jaccard indices to fuzzy partitions. J Intell Inf Syst. 2009; 32(3):213-35.
- [27]Anderson DT, Bezdek JC, Popescu M, Keller JM. Comparing fuzzy, probabilistic, and possibilistic partitions. Trans Fuz Sys. 2010; 18(5):906-18.
- [28]Rand WM. Objective criteria for the evaluation of clustering methods. J Am Stat Assoc. 1971; 66(336):846-50.
- [29]Hubert L, Arabie P. Comparing partitions. J Classif. 1985; 2:193-218.
- [30]Cover TM, Thomas JA. Elements of Information Theory. Wiley-Interscience, New York, NY, USA; 1991.
- [31]Viola P, III Wells WM. Alignment by maximization of mutual information. international journal of computer. Int J Comput Vis. 1997; 24(2):137-54.
- [32]Russakoff DB, Tomasi C, Rohlfing T, Jr Maurer CR. Image similarity using mutual information of regions. In: 8th European Conference on Computer Vision, ECCV: 2004. p. 596–607. doi:10.1007/978-3-540-24672-5_47.
- [33]Meila M. Comparing clusterings by the variation of information. Learning Theory and Kernel Machines. Springer, Berlin Heidelberg; 2003.
- [34]Shrout P, Fleiss J. Intraclass correlations: Uses in assessing rater reliability. Psychol Bull. 1979; 86(2):420-8.
- [35]Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960; 20:37-46.
- [36]Hanley JA, Mcneil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982; 143:29-36.
- [37]Bradley AP. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997; 30(7):1145-59.
- [38]Powers DMW. Evaluation: From precision, recall and F-factor to ROC, informedness, markedness correlation. J Mach Learn Technol. 2011; 2:37-63.
- [39]Taha AA, Hanbury A. An efficient algorithm for calculating the exact Hausdorff distance. IEEE Trans Pattern Analysis and Machine Intelligence. 2014. http://ieeexplore. ieee.org/xpl/articleDetails.jsp?arnumber=7053955 webcite
- [40]Zhang D, Lu G. Review of shape representation and discription techniques. Pattern Recognit. 2004; 37(1):1-19.
- [41]Huttenlocher DP, Klanderman GA, Rucklidge WA. Comparing images using the Hausdorff distance. IEEE Trans Pattern Anal Mach Intell. 1993; 15:850-63.
- [42]Zhao J, Long C, Xiong S, Liu C, Yua Z. A new k nearest neighbors search algorithm using cell grids for 3d scattered point cloud. Electron Electrical Eng. 2014; 20(1). doi:10.5755/j01.eee.20.1.3926.
- [43]Mahalanobis PC. On the generalised distance in statistics. In: Proceedings National Institute of Science, India: 1936. p. 49–55.
- [44]McLachlan GJ. Mahalanobis distance. Resonance. 1999; 4:20-6.
- [45]Sladoje N, Lindblad J, Nystrom I. Defuzzification of spatial fuzzy sets by feature distance minimization. Image Vis Comput. 2011; 29:127-41.
- [46]Zwick R, Karlstein E, Budiscu DV. Measures of similarity among fuzzy concepts: a comparative analysis. Int J Approx Reason. 1987; 1(2):221-42.
- [47]Saporta G, Youness G. Comparing two partitions: Some proposals and experiments. In: Proceedings in Computational Statistics: 2002. p. 243–8. doi:10.1007/978-3-642-57489-4_33.
- [48]Jain AK, Dubes RC. Algorithms for Clustering Data. Prentice-Hall, Inc., Upper Saddle River, NJ, USA; 1988.
- [49]Igual L, Soliva JC, Hernndez-Vela A, Escalera S, Vilarroya O, Radeva P. Supervised brain segmentation and classification in diagnostic of attention-deficit/hyperactivity disorder. In: HPCS. Madrid: 2012. p. 182–7.
- [50]Fisher RA. Statistical Methods for Research Workers. Oliver and Boyd, Edinburgh Scotland; 1970.
- [51]Tustison NJ, Siqueira M, Gee JC. N-D linear time exact signed Euclidean distance transform. The Insight Journal. 2006. http://hdl. handle.net/1926/171 webcite
- [52]Eric B, Andriy F, Nikos C. The use of robust local Hausdorff distances in accuracy assessment for image alignment of brain MRI. The Insight Journal. 2008. doi:10.1007/978-3-540-89639-5_57.
- [53]Langs G, Mueller H, Menze BH, Hanbury A. Visceral: Towards large data in medical imaging - challenges and directions. In: MCBR-CDS MICCAI Workshop. Nice, France: 2013. p. 92–8.
- [54]Jimenez del Toro OA, Goksel O, Menze B, Mueller H, Langs G, Weber MA, et al. Visceral - visual concept extraction challenge in radiology: Isbi 2014 challenge organization. In: Proceedings of the VISCERAL Challenge at ISBI. Beijing, China: 2014. p. 6–15.
- [55]Fatourechi M, Ward RK, Mason SG, Huggins J, Schloegl A, Birch GE. Comparison of evaluation metrics in classification applications with imbalanced datasets. In: ICMLA. San Diego, CA: 2009. p. 777–82.
- [56]Taha AA, Hanbury A, Jimenez del Toro O. A formal method for selecting evaluation metrics for image segmentation. In: 2014 IEEE International Conference on Image Processing (ICIP) (ICIP 2014). Paris, France: 2014. p. 932–6.
- [57]Klein S, van der Heide UA, Raaymakers BW, Kotte ANTJ, Staring M, Pluim JPW. Segmentation of the prostate in mr images by atlas matching. In: ISBI. Arlington, VA: 2007. p. 1300–3.
- [58]Cai X, Hou Y, Li C, Lee J, Wee WG. 2006. Evaluation of two segmentation methods on mri brain tissue structures.
- [59]Gouttard S, Styner M, Prastawa M, Piven J, Gerig G. Assessment of reliability of multi-site neuroimaging via traveling phantom study. In: Proceedings of the 11th International Conference on Medical Image Computing and Computer-Assisted Intervention: 2008. p. 263–70. doi:10.1007/978-3-540-85990-1_32.
- [60]Keyvan K, Mohammad Javad D, Kamran K, Mohammad Sadegh H, Kafshgari S. Comparison evaluation of three brain mri segmentation methods in software tools. In: Biomedical Engineering (ICBME). Isfahan: 2010. p. 1–4.
- [61]Babalola KO, Patenaude B, Aljabar P, Schnabel J, Kennedy D, Crum W et al.. Comparison and evaluation of segmentation techniques for subcortical structures in brain MRI. Med Image Comput Comput Assist Interv. 2008; 11(Pt 1):409-16.
- [62]MICCAI 2012 Challenge on Multimodal Brain Tumor Segmentation BRATS2012. MICCAI, Nice, France; 2012. http://www2. imm.dtu.dk/projects/BRATS2012/
- [63]Khotanlou H, Colliot O, Atif J, Bloch I. 3D brain tumor segmentation in MRI using fuzzy classification, symmetry analysis and spatially constrained deformable models. Fuzzy Sets Syst. 2009; 160(10):1457-73.
- [64]Pang Y, Li L, Hu W, Peng Y, Liu L, Shao Y. Computerized segmentation and characterization of breast lesions in dynamic contrast-enhanced mr images using fuzzy c-means clustering and snake algorithm. Comput Math Methods Med. 2012. doi:10.1155/2012/634907.
- [65]Yadav S, Meshram S. Brain tumor detection using clustering method. International Journal of Computational Engineering Research (IJCER). 2013:11–14. doi:10.1016/j.eij.2015.01.003.
- [66]Yadav S, Meshram S. Performance evaluation of basic segmented algorithms for brain tumor detection. J Electron Commun Eng IOSR. 2013; 5:08-13.
- [67]Ginneken BV, Heimann T, Styner M. 3d segmentation in the clinic: A grand challenge. In: MICCAI Workshop on 3D Segmentation in the Clinic: 2007. p. 7–15. http://hdl. handle.net/10380/1509 webcite
- [68]Wehrens R, Buydens LMC, Fraley C, Raftery AE. Model-based clustering for image segmentation and large datasets via sampling. J Classif. 2004; 21(2). doi:10.1007/s00357-004-0018-8.
- [69]Moberts B, Vilanova A, van Wijk JJ. Evaluation of fiber clustering methods for diffusion tensor imaging. In: IEEE Visualization: 2005. p. 65–72. doi:10.1109/VISUAL.2005.1532779.
- [70]Doring TM, Kubo TTA, Cruz LCH, Juruena MF, Fainberg J, Domingues RC et al.. Evaluation of hippocampal volume based on mr imaging in patients with bipolar affective disorder applying manual and automatic segmentation techniques. J Magn Reson Imaging. 2011; 33(3):565-72.
- [71]Morain-Nicolier F, Lebonvallet S, Baudrier E, Ruan S. Hausdorff distance based 3D quantification of brain tumor evolution from MRI images. In: 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Lyon, France: 2007. p. 5597–600.
- [72]Narendran P, Narendira Kumar VK, Somasundaram K. 3D Brain Tumors and internal brain structures segmentation in mr images. Int J Image Graphics Signal Process. 2012; 1:ISSN: 2074-9074.
- [73]Niessen WJ, Vincken KL, Viergever MA. Evaluation of mr segmentation algorithms. In: International Society Magnetic Resonance in Medicine: 1999. doi:10.1016/j.media.2013.12.002.