We explore automated discovery of topically coherent segments in speech or text sequences. We give two new discriminative topic segmen tation algorithms which employ a new measure of text similarity based on word cooccurrence. Both algorithms function by finding extrema in the similarity signal over the text, with the lat ter algorithm using a compact supportvector based description of a window of text or speech observations in word similarity space to over come noise introduced by speech recognition er rors and offtopic content. In experiments over speech and text news streams, we show that these algorithms outperform previous methods. We observe that topic segmentation of speech rec ognizer output is a more difficult problem than that of text streams; however, we demonstrate that by using a lattice of competing hypotheses rather than just the onebest hypothesis as input to the segmentation algorithm, the performance
【 预 览 】
附件列表
Files
Size
Format
View
Discriminative Topic Segmentation of Text and Speech