首页> 外文期刊>Information Processing & Management >Studying the effect and treatment of misspelled queries in Cross-Language Information Retrieval
【24h】

Studying the effect and treatment of misspelled queries in Cross-Language Information Retrieval

机译:在跨语言信息检索中研究拼写错误查询的效果和处理

获取原文
获取原文并翻译 | 示例
           

摘要

In contrast with their monolingual counterparts, little attention has been paid to the effects that misspelled queries have on the performance of Cross-Language Information Retrieval (CUR) systems. The present work makes a first attempt to fill this gap by extending our previous work on monolingual retrieval in order to study the impact that the progressive addition of misspellings to input queries has, this time, on the output of CUR systems. Two approaches for dealing with this problem are analyzed in this paper. Firstly, the use of automatic spelling correction techniques for which, in turn, we consider two algorithms: the first one for the correction of isolated words and the second one for a correction based on the linguistic context of the misspelled word. The second approach to be studied is the use of character n-grams both as index terms and translation units, seeking to take advantage of their inherent robustness and language-independence. All these approaches have been tested on a from-Spanish-to-English CUR system, that is, Spanish queries on English documents. Real, user-generated spelling errors have been used under a methodology that allows us to study the effectiveness of the different approaches to be tested and their behavior when confronted with different error rates. The results obtained show the great sensitiveness of classic word-based approaches to misspelled queries, although spelling correction techniques can mitigate such negative effects. On the other hand, the use of character n-grams provides great robustness against misspellings.
机译:与单语的对照相比,拼写错误的查询对跨语言信息检索(CUR)系统的性能影响很小。当前的工作是通过扩展我们先前在单语言检索方面的工作来填补这一空白的首次尝试,以便研究这次向输入查询中逐步添加错误拼写对CUR系统的输出产生的影响。本文分析了两种解决此问题的方法。首先,使用自动拼写纠正技术,为此,我们考虑两种算法:第一种算法用于纠正孤立单词,第二种算法用于根据拼写错误的单词的语言环境进行纠正。要研究的第二种方法是使用字符n-gram作为索引词和翻译单位,力图利用其固有的鲁棒性和语言独立性。所有这些方法都在从西班牙语到英语的CUR系统上进行了测试,即对英语文档的西班牙语查询。实际的,用户生成的拼写错误已在一种方法下使用,该方法使我们能够研究不同测试方法的有效性以及当遇到不同错误率时的行为。尽管拼写校正技术可以减轻这种负面影响,但获得的结果表明,经典的基于单词的方法对拼写错误的查询非常敏感。另一方面,字符n-gram的使用可提供强大的鲁棒性以防止拼写错误。

著录项

  • 来源
    《Information Processing & Management》 |2016年第4期|646-657|共12页
  • 作者单位

    Grupo LYS, Departamento de Computacion, Facultade de Informatica, Universidade da Coruna, Campus de Elvina, 15071 - A Coruna, Spain;

    Grupo LYS, Departamento de Computacion, Facultade de Informatica, Universidade da Coruna, Campus de Elvina, 15071 - A Coruna, Spain;

    Grupo LYS, Departamento de Computacion, Facultade de Informatica, Universidade da Coruna, Campus de Elvina, 15071 - A Coruna, Spain ,Grupo COLE, Departamento de Informatica, E.S. de Enxenaria Informatica, Universidade de Vigo, Campus As Lagoas, 32004 - Ourense, Spain;

    Grupo COLE, Departamento de Informatica, E.S. de Enxenaria Informatica, Universidade de Vigo, Campus As Lagoas, 32004 - Ourense, Spain;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Misspelled queries; Cross-Language Information Retrieval; Machine translation; Spelling correction; Character n-grams;

    机译:拼写错误的查询;跨语言信息检索;机器翻译;拼写更正;字符n-gram;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号