期刊论文详细信息
ETRI Journal
An Algorithm for Predicting the Relation between Lemmas and Corpus Size
关键词: prediction;    corpus compiling;    computational linguistics;    NLP;    piecewise curve-fitting;    corpus size;   
Others  :  1184332
DOI  :  10.4218/etrij.00.0100.0203
PDF
【 摘 要 】
Much research on natural language processing(NLP), computational linguistic and lexicograph has relied and depended on lingistic corpora. In recent years, many organizations around the world have been constructing their own large corpora to achieve corpus
【 授权许可】

   

【 预 览 】
附件列表
Files Size Format View
20150520102332548.pdf 446KB PDF download
【 参考文献 】
  • [1]S.-S. Lee, "Corpus: The concept and implementation," Lexicographic Study, Tap Press, Seoul, vol. 5-6, 1995, pp. 7-28.
  • [2]D.-H. Yang, and M. Song, "Machine learning and corpus building of the Korean language," Proceedings of '98 Spring Conference of the Korea Information Science Society (KISS), Seoul, 1998, pp. 408-410.
  • [3]D.-H. Yang, M. Song, "Representation and acquisition of the word meaning for picking out thematic roles," International Journal of Computer Processing of Oriental Languages (CPOL), the Oriental Languages Computer Society 1999a, vol. 12, no. 2, 1999, pp. 161-177.
  • [4]K.W. Church and R.L. Mercer, "Introduction to the special issue on computational linguistics using large corpora," Using Large Corpora, edited by Susan Armstrong. The MIT Press, pp. 1-24.
  • [5]P. Resnik, Selection and Information: A Class-based Approach to Lexical Relationships, Ph.D. Dissertation of Department of Computer and Information Science. Pennsylvania University, 1993, pp. 6-33.
  • [6]D.-H. Yang and M. Song, "How much training data is required to remove data sparseness in statistical language learning?," Pmceedings of the First Workshop on Text, Speech, Dialogue (TSD '98), 1998, pp. 141-146.
  • [7]R. Weischedel, "Coping with ambiguity and unknown words, through probabilistic models," Using Large Corpora, edited by Susan Armstrong, The MIT Press, 1994, pp. 323-326.
  • [8]M. Lauer, "Conserving fuel in statistical language learning: Predicting data requirements," The 8th Australian Joint Conference on Artificial Intelligence, Canberra, 1995.
  • [9]M. Lauer, "How much is enough?: Data requirements for statistical NLP," 2th Conference of the Pacific Association for Computational Linguistics cmp-1g/9509001 Brisbane, Australia, 1995.
  • [10]P. De Haan, "The optimum Corpus sample size?," New Directions in English Language Corpora, Methodology Results, Software Development, Leitner, Gerhard (eds.): Mouton de Gruyte, New York, 1992, pp. 3-19.
  • [11]H.S. Heaps, Information Retrieval: Computational and Theoretical Aspects, Academic Press, New York, 1978, pp. 206-208.
  • [12]Y.-M. Jeong, "Statistical characteristics of Korean vocabulary and its application," Lexicographic Study, 5-6, Tap Press, Seoul, 1995, pp. 134-163.
  • [13]A. Sanchez and P. Cantos, "Predictability of Word Forms (Types) and Lemmas in Linguistic Corpora. A Case Study Based on the Analysis of the CUMBRE Corpus: An 8-Million-Word Corpus of Contemporary Spanish," International Journal of Corpus Linguistics, vol. 2, no. 2, pp. 259-280.
  • [14]C.-S. Jeong, L. Sang-Sup and K.-S. Nam, "Selection criteria of sampling for frequency survey in Korean words," Lexicographic Study, vol. 3, Tap Press, Seoul, 1990, pp. 7-69.
  • [15]R.L. Burden and J.D. Faires, Numerical Analysis, Brooks/Cole Publishing, California, 1997, pp. 473-483.
  • [16]W. Hays, Statistics, Harcourt Brace College Publishers, Florida, 1994, pp. 28-30.
  • [17]M.J. Maron, Numerical Analysis: A Practical Approach, Macmillan Publishing, New York, 1987, pp. 201-248.
  • [18]Y.-C. Kim, "Frequency survey of Korean vocabulary," Journal of Korean Psychological Association, vol. 5, no. 3, pp. 217-285.
  • [19]D.-H. Yang, S.-J. Lim and M. Song, "The estimate of the corpus size for solving data sparseness," Journal of KISS, vol. 26, no. 4, pp. 568-583.
  • [20]A. Sánchez and P. Cantos, "El ritmo incremental de palabras nuevas en los repertorios de textos. Estudio experimental y comparativo basado en dos corpus lingüísticos equivalentes de cuatro millones de palabras, de las lenguas inglesa y española y en cinco autores de ambas lenguas," Atlantis (Revista de la Asociación Española de Estudios Anglo-Norteamericanos), vol. 19, no. 1, pp. 205-223.
  文献评价指标  
  下载次数:1次 浏览次数:28次