首页> 外文会议>IST-Africa Week Conference >A Statistical Approach to Error Correction for isiZulu Spellcheckers
【24h】

A Statistical Approach to Error Correction for isiZulu Spellcheckers

机译:isiZulu拼写检查器错误校正的统计方法

获取原文

摘要

Spellcheckers have become important due to the increase of text-based communication at work and in society on social media. There is, however, very little support for spellchecking in agglutinating Sub-Saharan African (Bantu) languages. While error detection has shown to yield acceptable results for at least isiZulu, error correction has not even been investigated. The aim of this paper is to solve the spelling correction problem by means of a statistical approach such that it can provide candidate corrections to misspelled isiZulu words (non-word errors). Trigrams learned from a corpus, their probabilities, minimum edit distance, and additional optimisations are used in the error corrector. The corrector was evaluated for the four types of non-word errors (substitution, insertions, deletions, and transpositions). It achieved an 89% language recall rate, 84% error recall, 85% language precision, and 88% error precision for error correction. The error corrector was found to have an overall suggestions accuracy rate of 95% and relevance of 61%, performing best for transposition errors. The error corrector has been added to an existing open source isiZulu error detector. This facilitates uptake and, moreover, fills a feature gap that has numerous benefits for society, both for isiZulu speakers and learners, and for bootstrapping spellcheckers for related languages.
机译:由于在工作中和社交媒体上基于文本的交流的增加,拼写检查器变得非常重要。但是,几乎没有支持使用凝集的撒哈拉以南非洲(Bantu)语言进行拼写检查。虽然错误检测已显示出至少对于isiZulu可以产生可接受的结果,但甚至尚未研究错误纠正。本文的目的是通过一种统计方法解决拼写纠正问题,以便它可以为拼写错误的isiZulu单词(非单词错误)提供候选纠正。从语料库中学习到的Trigram,它们的概率,最小编辑距离和其他优化都用于纠错器。针对四种类型的非单词错误(替换,插入,删除和换位)对校正器进行了评估。它实现了89%的语言回忆率,84%的错误回忆率,85%的语言精度和88%的错误纠正率。发现纠错器的总体建议准确率达95%,相关性达61%,对换位错误表现最佳。错误校正器已添加到现有的开源isiZulu错误检测器中。这促进了学习的普及,并且填补了功能上的空白,这对于isiZulu的讲者和学习者以及相关语言的自检拼写检查程序均具有社会效益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号