首页> 外文会议>International Atlantic Web Intelligence Conference(AWIC 2004); 20040516-20040519; Cancun; MX >On Correction of Semantic Errors in Natural Language Texts with a Dictionary of Literal Paronyms
【24h】

On Correction of Semantic Errors in Natural Language Texts with a Dictionary of Literal Paronyms

机译:用字面同义词词典对自然语言文本中语义错误的纠正

获取原文
获取原文并翻译 | 示例

摘要

Due to the open nature of the Web, search engines must include means of meaningful processing of incorrect texts, including automatic error detection and correction. One of wide-spread types of errors in Internet texts are malapropisms, i.e., semantic errors replacing a word by another existing word similar in letter composition and/or sound but semantically incompatible with the context Methods for detection and correction of malapropisms have been proposed recently. Any such method relies on a generator of correction candidates — paronyms, i.e., real words similar to the suspicious one encountered in the text and having the same grammatical properties. Literal paronyms are words at the distant of few editing operations from a given word. We argue that a dictionary of literal paronyms should be compiled beforehand and that its units should be grammeme names. For Spanish, such grammemes are (1) singulars and plurals of nouns; (2) adjectives plus participles; (3) verbs in infinitive; (4) gerunds plus adverbs; (5) personal verb forms. Basing on existing Spanish electronic dictionaries, we have compiled a dictionary of one-letter-distant literal paronyms. The size of the dictionary is few tens thousand entries, an entry averaging approximately three paronyms. We calculate the gain in number of candidate search operations achievable through the proposed dictionary and give illustrative examples of correcting one-letter malapropisms using our dictionary.
机译:由于Web的开放性,搜索引擎必须包括有意义的错误文本处理方法,包括自动错误检测和更正。 Internet文本中广泛分布的错误类型之一是恶意行为,即,语义错误被另一个字母组成和/或发音相似但与上下文不兼容的现有单词替换了一个单词,最近已提出了检测和纠正恶意行为的方法。任何这样的方法都依赖于更正候选的生成器,即同义词,即与文字中遇到的可疑单词相似且具有相同语法特性的真实单词。文字同义词是与给定单词相距很少的编辑操作的单词。我们认为,应事先编译字面同义词字典,并且其单位应为音素名称。对于西班牙语,此类语法是(1)名词的单数和复数; (2)形容词加分词; (3)动词不定式; (4)动名词加副词; (5)个人动词形式。在现有的西班牙电子词典的基础上,我们编写了一个单字母远距离原义同义词的字典。字典的大小只有几万个条目,一个条目平均大约三个同义词。我们计算了通过提出的字典可实现的候选搜索操作数量的收益,并给出了使用我们的字典纠正单字母错误的说明性示例。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号