APSIPA Transactions on Signal and Information Processing | |
Combining acoustic signals and medical records to improve pathological voice classification | |
Chi-Te Wang^2^3^41  Shih-Hau Fang^12  Ji-Ying Chen^2^43  Yu Tsao^54  Feng-Chuan Lin^3^45  | |
[1] Department of Electrical Engineering,Yuan Ze University, and MOST Joint Research Center for AI Technology and All Vista Healthcare Innovation Center,Taoyuan,Taiwan,^2;Department of Electrical Engineering,Yuan Ze University, and MOST Joint Research Center for AI Technology and All Vista Healthcare Innovation Center,Taoyuan,Taiwan^1;Department of Otolaryngology Head and Neck Surgery,Far Eastern Memorial Hospital,New Taipei City,Taiwan,^3;Department of Special Education,University of Taipei,Taipei,Taiwan^4;Research Center for Information Technology Innovation,Academia Sinica,Taipei,Taiwan^5 | |
关键词: Pathological voice; Diseases classification; Acoustic signal; Medical record; Artificial intelligence; | |
DOI : 10.1017/ATSIP.2019.7 | |
学科分类:计算机科学(综合) | |
来源: Cambridge University Press | |
【 摘 要 】
This study proposes two multimodal frameworks to classify pathological voice samples by combining acoustic signals and medical records. In the first framework, acoustic signals are transformed into static supervectors via Gaussian mixture models; then, a deep neural network (DNN) combines the supervectors with the medical record and classifies the voice signals. In the second framework, both acoustic features and medical data are processed through first-stage DNNs individually; then, a second-stage DNN combines the outputs of the first-stage DNNs and performs classification. Voice samples were recorded in a specific voice clinic of a tertiary teaching hospital, including three common categories of vocal diseases, i.e. glottic neoplasm, phonotraumatic lesions, and vocal paralysis. Experimental results demonstrated that the proposed framework yields significant accuracy and unweighted average recall (UAR) improvements of 2.02–10.32% and 2.48–17.31%, respectively, compared with systems that use only acoustic signals or medical records. The proposed algorithm also provides higher accuracy and UAR than traditional feature-based and model-based combination methods.
【 授权许可】
CC BY
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO201911044410317ZK.pdf | 839KB | download |