| Using DEDICOM for completely unsupervised part-of-speech tagging. | |
| Chew, Peter A. ; Bader, Brett William ; Rozovskaya, Alla (University of Illinois, Urbana, IL) | |
| 关键词: SPEECH; D CODES; IDENTIFICATION SYSTEMS Cognitive science.; Speech-Mathematical models.; Speech processing systems.; | |
| DOI : 10.2172/978915 RP-ID : SAND2009-0842 PID : OSTI ID: 978915 Others : TRN: US201011%%3 |
|
| 学科分类:社会科学、人文和艺术(综合) | |
| 美国|英语 | |
| 来源: SciTech Connect | |
PDF
|
|
【 摘 要 】
A standard and widespread approach to part-of-speech tagging is based on Hidden Markov Models (HMMs). An alternative approach, pioneered by Schuetze (1993), induces parts of speech from scratch using singular value decomposition (SVD). We introduce DEDICOM as an alternative to SVD for part-of-speech induction. DEDICOM retains the advantages of SVD in that it is completely unsupervised: no prior knowledge is required to induce either the tagset or the associations of terms with tags. However, unlike SVD, it is also fully compatible with the HMM framework, in that it can be used to estimate emission- and transition-probability matrices which can then be used as the input for an HMM. We apply the DEDICOM method to the CONLL corpus (CONLL 2000) and compare the output of DEDICOM to the part-of-speech tags given in the corpus, and find that the correlation (almost 0.5) is quite high. Using DEDICOM, we also estimate part-of-speech ambiguity for each term, and find that these estimates correlate highly with part-of-speech ambiguity as measured in the original corpus (around 0.88). Finally, we show how the output of DEDICOM can be evaluated and compared against the more familiar output of supervised HMM-based tagging.
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| RO201705170002947LZ | 109KB |
PDF