Healthcare | |
Integrated Natural Language Processing and Machine Learning Models for Standardizing Radiotherapy Structure Names | |
Kevin Ivey1  Preetam Ghosh2  Khajamoinuddin Syed2  WilliamSleeman IV2  Michael Hagan3  Rishabh Kapoor3  Jatinder Palta3  | |
[1] Department of Computer Science, University of Virginia, Charlottesville, VA 22904, USA;Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA;Department of Radiation Oncology, Virginia Commonwealth University, Richmond, VA 23298, USA; | |
关键词: radiotherapy structure names; nomenclature standardization; quality assurance; machine learning; natural language processing; text categorization; | |
DOI : 10.3390/healthcare8020120 | |
来源: DOAJ |
【 摘 要 】
The lack of standardized structure names in radiotherapy (RT) data limits interoperability, data sharing, and the ability to perform big data analysis. To standardize radiotherapy structure names, we developed an integrated natural language processing (NLP) and machine learning (ML) based system that can map the physician-given structure names to American Association of Physicists in Medicine (AAPM) Task Group 263 (TG-263) standard names. The dataset consist of 794 prostate and 754 lung cancer patients across the 40 different radiation therapy centers managed by the Veterans Health Administration (VA). Additionally, data from the Radiation Oncology department at Virginia Commonwealth University (VCU) was collected to serve as a test set. Domain experts identified as anatomically significant nine prostate and ten lung organs-at-risk (OAR) structures and manually labeled them according to the TG-263 standards, and remaining structures were labeled as Non_OAR. We experimented with six different classification algorithms and three feature vector methods, and the final model was built with fastText algorithm. Multiple validation techniques are used to assess the robustness of the proposed methodology. The macro-averaged F
【 授权许可】
Unknown