Glossa | |
Estimating child linguistic experience from historical corpora | |
Jordan Kodner1  | |
[1] University of Pennsylvania, Philadelphia, PA; | |
关键词: child language acquisition; corpus linguistics; historical linguistics; english; latin; spanish; proto-germanic; paradigm saturation; | |
DOI : 10.5334/gjgl.926 | |
来源: DOAJ |
【 摘 要 】
Child language acquisition is often identified as one of the primary drivers of language change, but the lack of historical child data presents a challenge for empirically investigating its effect. In this work, I observe the relationship between lexicons extracted from modern child-directed speech and those drawn from modern and historical literary corpora in order to better understand when language acquisition can be modeled over historical and non-child corpora as it is over child corpora. The type frequencies of morphophonological and syntactic-semantic patterns occur at similar type frequencies in these corpora among high token frequency items, and furthermore, when a learning algorithm is applied to lexicons sampled from these sources, it consistently achieves the same learning outcomes in each. With appropriate care and pre-processing, modern and historical text corpora are effectively interchangeable with child-directed speech corpora for the purpose of estimating child lexical experience, opening a path for modeling language acquisition where child-directed corpora are not available.
【 授权许可】
Unknown