...
首页> 外文期刊>Computer speech and language >On the feasibility of character n-grams pseudo-translation for Cross-Language Information Retrieval tasks
【24h】

On the feasibility of character n-grams pseudo-translation for Cross-Language Information Retrieval tasks

机译:关于字符n元语法伪翻译在跨语言信息检索任务中的可行性

获取原文
获取原文并翻译 | 示例
           

摘要

The field of Cross-Language Information Retrieval relates techniques close to both the Machine Translation and Information Retrieval fields, although in a context involving characteristics of its own. The present study looks to widen our knowledge about the effectiveness and applicability to that field of non-classical translation mechanisms that work at character n-gram level. For the purpose of this study, an n-gram based system of this type has been developed. This system requires only a bilingual machine-readable dictionary of n-grams, automatically generated from parallel corpora, which serves to translate queries previously n-grammed in the source language. n-Gramming is then used as an approximate string matching technique to perform monolingual text retrieval on the set of n-grammed documents in the target language. The tests for this work have been performed on CLEF collections for seven European languages, taking English as the target language. After an initial tuning phase in order to analyze the most effective way for its application, the results obtained, close to the upper baseline, not only confirm the consistency across languages of this kind of character n-gram based approaches, but also constitute a further proof of their validity and applicability, these not being tied to a given implementation.
机译:跨语言信息检索领域与接近机器翻译和信息检索领域的技术相关,尽管在上下文中涉及其自身的特征。本研究旨在拓宽我们关于在字符n-gram级别上起作用的非经典翻译机制对该领域的有效性和适用性的知识。为了该研究的目的,已经开发了这种基于n-gram的系统。该系统仅需要从并行语料库自动生成的双语的n-gram机器可读词典,即可翻译以前用源语言进行n-gram的查询。然后,将n语法用作近似字符串匹配技术,以目标语言对一组n语法化的文档执行单语文本检索。针对这项工作的测试已针对7种欧洲语言在CLEF集合上进行,以英语为目标语言。经过最初的调试阶段以分析最有效的应用方式后,获得的结果接近上基线,不仅确认了这种基于字符n-gram的方法在各种语言中的一致性,而且还构成了另一种它们的有效性和适用性的证明,这些与特定实现无关。

著录项

  • 来源
    《Computer speech and language》 |2016年第3期|136-164|共29页
  • 作者单位

    Grupo LYS, Departamento de Computacion, Facultade de Informatica, Universidade da Coruna, Campus de A Coruna, 15071 A Coruna, Spain;

    Grupo COLE, Departamento de Informatica, Escola Superior de Enxenaria Informatica, Universidade de Vigo, Campus de As Lagoas, 32004 Ourense, Spain;

    Grupo LYS, Departamento de Computacion, Facultade de Informatica, Universidade da Coruna, Campus de A Coruna, 15071 A Coruna, Spain;

    Research Institute of Information and Language Processing, University of Wolverhampton, Stafford St., Wolverhampton WV1 1NA, United Kingdom;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Cross-Language Information Retrieval; Character n-grams; Alignment algorithms for Machine Translation;

    机译:跨语言信息检索;字符n-gram;机器翻译的对齐算法;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号