会议论文详细信息
Eighth International Conference on Language Resources and Evaluation
Morphosyntactic Analysis of the CHILDES and TalkBank Corpora
Brian MacWhinney
Others  :  http://www.lrec-conf.org/proceedings/lrec2012/pdf/616_Paper.pdf
PID  :  52059
来源: CEUR
PDF
【 摘 要 】

This paper describes the construction and usage of the MOR and GRASP programs for part of speech tagging and syntactic dependency analysis of the corpora in the CHILDES and TalkBank databases.We have written MOR grammars for 11 languages and GRASP analyses for three.For English data, the MOR tagger reaches 98% accuracy on adult corpora and 97% accuracy on child language corpora.The paper discusses the construction of MOR lexicons with an emphasis on compounds and special conversational forms. The shape of rules for controlling allomorphy and morpheme concatenation are discussed.The analysis of bilingual corpora is illustrated in the context of the Cantonese-English bilingual corpora. Methods for preparing data for MOR analysis and for developing MOR grammars are discussed. We believe that recent computational work using this system is leading to significant advances in child language acquisition theory and theories of grammar identification more generally.

【 预 览 】
附件列表
Files Size Format View
Morphosyntactic Analysis of the CHILDES and TalkBank Corpora 521KB PDF download
  文献评价指标  
  下载次数:7次 浏览次数:4次