会议论文详细信息
2019 International Conference on Advanced Electronic Materials, Computers and Materials Engineering
Intelligent Assessment of 95598 Speech Transcription Text Quality Based on Topic Model
无线电电子学;计算机科学;材料科学
Song, Bochuan^1 ; Wu, Peng^1 ; Zhang, Qiang^1 ; Chai, Bo^1 ; Gao, Yuanbo^1 ; Yang, He^1
Artificial Intelligence on Electric Power System State Grid Corporation Joint Laboratory (GEIRI), Global Energy Interconnection Research Institute, Beijing
102211, China^1
关键词: Data preprocessing;    Latent Dirichlet allocation;    NAtural language processing;    Power system management;    Speech transcriptions;    Subsequent data analysis;    Topic distributions;    Unsupervised clustering;   
Others  :  https://iopscience.iop.org/article/10.1088/1757-899X/563/4/042001/pdf
DOI  :  10.1088/1757-899X/563/4/042001
来源: IOP
PDF
【 摘 要 】

The quality of speech transcripts is of great significance to power system management and is an important basis for supporting subsequent data analysis. In this paper, based on the characteristics of speech transcription texts of customer service, this paper proposes an analysis method combining manual processing and latent Dirichlet allocation topic model, analyzing the transcribed texts. First, data preprocessing is performed on the State Grid's work order data, and then the text topic distribution calculation is performed by the LDA topic model, and the topic parameter is set to a total of 100 topics. Next, the unsupervised clustering of the documents is performed by the k-means method, and the similarity between the files is obtained. Finally, the quality of the data is analyzed by combining manual labeling and manual evaluation. For the first time, this paper marks and identifies the State Grid's work order analysis data, which is a pioneering work for natural language processing technology in the field of power grid.

【 预 览 】
附件列表
Files Size Format View
Intelligent Assessment of 95598 Speech Transcription Text Quality Based on Topic Model 537KB PDF download
  文献评价指标  
  下载次数:9次 浏览次数:37次