首页> 外文会议>International Atlantic Web Intelligence Conference >On Correction of Semantic Errors in Natural Language Texts with a Dictionary of Literal Paronyms
【24h】

On Correction of Semantic Errors in Natural Language Texts with a Dictionary of Literal Paronyms

机译:关于用文字语言词典的自然语言文本语义错误的纠正

获取原文

摘要

Due to the open nature of the Web, search engines must include means of meaningful processing of incorrect texts, including automatic error detection and correction. One of wide-spread types of errors in Internet texts are malapropisms, i.e., semantic errors replacing a word by another existing word similar in letter composition and/or sound but semantically incompatible with the context. Methods for detection and correction of malapropisms have been proposed recently. Any such method relies on a generator of correction candidates—paronyms, i.e., real words similar to the suspicious one encountered in the text and having the same grammatical properties. Literal paronyms are words at the distant of few editing operations from a given word. We argue that a dictionary of literal paronyms should be compiled beforehand and that its units should be grammeme names. For Spanish, such grammemes are (1) singulars and plurals of nouns; (2) adjectives plus participles; (3) verbs in infinitive; (4) gerunds plus adverbs; (5) personal verb forms. Basing on existing Spanish electronic dictionaries, we have compiled a dictionary of one-letter-distant literal paronyms. The size of the dictionary is few tens thousand entries, an entry averaging approximately three paronyms. We calculate the gain in number of candidate search operations achievable through the proposed dictionary and give illustrative examples of correcting one-letter malapropisms using our dictionary.
机译:由于Web的开放性,搜索引擎必须包括有意义的文本处理的方法,包括自动错误检测和校正。互联网文本中的一个广泛类型的错误是恶毒人士,即语义错误,通过信件组成和/或声音中的另一个现有词替换单词,而是与上下文语义不兼容。最近提出了对恶毒人士的检测和校正方法。任何此类方法都依赖于校正候选的生成器 - PaROMOMS,即,类似于文本中遇到的可疑的真实单词并具有相同的语法属性。文字代理是在给定词的遥远的少数编辑操作中的单词。我们认为应该事先编制文字代理词典,并且其单位应该是GRAMMEME名称。对于西班牙语来说,这种格栅可以是(1)奇异的名词和复数; (2)形容词加分解; (3)不定式的动词; (4)Gerunds Plus副词; (5)个人动词形式。基于现有的西班牙语电子词典,编译了一对字母的遥远文字语言词典的字典。字典的大小几乎一千个条目,一个意见大约三个代名词。我们通过所提出的字典计算可实现的候选搜索操作数量的增益,并给出使用我们的字典校正一个字母的恶性腺主义的说明性示例。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号