The international arab journal of information technology | |
New Language Models for Spelling Correction | |
article | |
Saida Laaroussi1  Si Lhoussain Aouragh2  Abdellah Yousfi3  Mohamed Nejja4  Hicham Geddah5  Said Ouatik El Alaoui1  | |
[1] IT, Logistics and Mathematics, Ibn Tofail University;IT and Decision Support System, Mohamed V University;Department of Economics and Management, Mohamed V University;Department of Software Engineering, Mohamed V University;Department of Computer Science, Mohamed V University | |
关键词: Spelling correction; contextual correction; n-gram language models; edit distance; NLP; | |
DOI : 10.34028/iajit/19/6/12 | |
学科分类:计算机科学(综合) | |
来源: Zarqa University | |
【 摘 要 】
Correcting spelling errors based on the context is a fairly significant problem in Natural Language Processing(NLP) applications. The majority of the work carried out to introduce the context into the process of spelling correction usesthe n-gram language models. However, these models fail in several cases to give adequate probabilities for the suggestedsolutions of a misspelled word in a given context. To resolve this issue, we propose two new language models inspired bystochastic language models combined with edit distance. A first phase consists in finding the words of the lexiconorthographically close to the erroneous word and a second phase consists in ranking and limiting these suggestions. We haveapplied the new approach to Arabic language taking into account its specificity of having strong contextual connectionsbetween distant words in a sentence. To evaluate our approach, we have developed textual data processing applications,namely the extraction of distant transition dictionaries. The correction accuracy obtained exceeds 98% for the first 10suggestions. Our approach has the advantage of simplifying the parameters to be estimated with a higher correction accuracycompared to n-gram language models. Hence the need to use such an approach.
【 授权许可】
Unknown
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202307090002558ZK.pdf | 569KB | download |