首页> 外文会议>International conference on computational linguistics >Chinese Word Ordering Errors Detection and Correction for Non-Native Chinese Language Learners
【24h】

Chinese Word Ordering Errors Detection and Correction for Non-Native Chinese Language Learners

机译:非母语中文学习者的中文单词排序错误检测与纠正

获取原文

摘要

Word Ordering Errors (WOEs) are the most frequent type of grammatical errors at sentence level for non-native Chinese language learners. Learners taking Chinese as a foreign language often place character(s) in the wrong places in sentences, and that results in wrong word(s) or ungrammatical sentences. Besides, there are no clear word boundaries in Chinese sentences. That makes WOEs detection and correction more challenging. In this paper, we propose methods to detect and correct WOEs in Chinese sentences. Conditional random fields (CRFs) based WOEs detection models identify the sentence segments containing WOEs. Segment point-wise mutual information (PMI), inter-segment PMI difference, language model, tag of the previous segment, and CRF bigram template are explored. Words in the segments containing WOEs are reordered to generate candidates that may have correct word orderings. Ranking SVM based models rank the candidates and suggests the most proper corrections. Training and testing sets are selected from HSK dynamic composition corpus created by Beijing Language and Culture University. Besides the HSK WOE dataset, Google Chinese Web 5-gram corpus is used to learn features for WOEs detection and correction. The best model achieves an accuracy of 0.834 for detecting WOEs in sentence segments. On the average, the correct word orderings are ranked 4.8 among 184.48 candidates.
机译:对于非母语的中文学习者,单词顺序错误(WOE)是句子级别上最常见的语法错误类型。将汉语作为外语的学习者经常将字符放在句子中的错误位置,从而导致单词或语法错误的句子。此外,中文句子中没有明确的单词边界。这使得WOE的检测和纠正更具挑战性。在本文中,我们提出了检测和纠正中文句子中的WOE的方法。基于条件随机字段(CRF)的WOE检测模型可识别包含WOE的句子片段。研究了分段逐点相互信息(PMI),分段间PMI差异,语言模型,上一个分段的标签以及CRF bigram模板。包含WOE的句段中的单词会重新排序,以生成可能具有正确单词顺序的候选单词。基于SVM的排名模型对候选者进行排名,并提出最适当的更正。培训和测试集选自北京语言大学创建的HSK动态写作语料库。除了HSK WOE数据集,Google中文Web 5克语料库还用于学习WOE检测和纠正的功能。最佳模型在句子段中检测WOE的准确度达到0.834。平均而言,正确的词序在184.48个候选词中排名4.8。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号