首页> 外文期刊>International Journal of Computer Processing of Oriental Languages >Stochastic Korean Word-Spacing with Smoothing Using Korean Spelling Checker
【24h】

Stochastic Korean Word-Spacing with Smoothing Using Korean Spelling Checker

机译:使用朝鲜语拼写检查器进行平滑的随机朝鲜语单词间距

获取原文
获取原文并翻译 | 示例
           

摘要

Word-spacing errors, one of the most frequent errors in Korean, produce ambiguities in the lexical interpretation of parts of speech or render sentences including them incomprehensible. Resolving those errors is thus crucial in Korean language processing application domains. In this paper, we propose a stochastic Korean word-spacing system with smoothing using Korean Spelling Checker, which is equally robust for both inner data and external data. In order to cope with various problems of word-spacing, this study (a) presents a simple stochastic word-spacing system with only two parameters using the odds favoring the inner-spacing of a given syllable bigram as well as relative word ftequencies, and (b) endeavors to (ⅰ) remove noise from the training data and (ⅱ) diminish training data-dependency by dynamically creating a candidate word with a longest-radix-selecting algorithm. The system thus becomes robust against unseen words and offers a similar performance for both inner data and external data: it obtained a 98.47% and a 97.78% precision in word-unit correction for the inner test data and the balanced external test data, respectively.
机译:词间距错误是朝鲜语中最常见的错误之一,在词性的词法解释中产生歧义,或者使句子(包括它们)变得难以理解。因此,解决这些错误对于韩语处理应用程序领域至关重要。在本文中,我们提出了一种使用朝鲜语拼写检查器进行平滑处理的随机朝鲜语单词间距系统,该系统对内部数据和外部数据均具有同样的鲁棒性。为了解决单词间距的各种问题,本研究(a)提出了一个简单的随机单词间距系统,该系统仅具有两个参数,使用了给定音节二元组的内部间距以及相对单词频率的几率,并且(b)努力通过以最长基数选择算法动态创建候选单词来(())从训练数据中消除噪声,以及(())减少训练数据的依赖性。因此,该系统可以抵抗看不见的单词,并为内部数据和外部数据提供类似的性能:对于内部测试数据和平衡的外部测试数据,它在字单元校正中分别获得了98.47%和97.78%的精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号