首页> 外文期刊>Information Processing & Management >Does dictionary based bilingual retrieval work in a non-normalized index?
【24h】

Does dictionary based bilingual retrieval work in a non-normalized index?

机译:基于字典的双语检索是否可以在非规范索引中工作?

获取原文
获取原文并翻译 | 示例
       

摘要

Many operational IR indexes are non-normalized, i.e. no lemmatization or stemming techniques, etc. have been employed in indexing. This poses a challenge for dictionary-based cross-language retrieval (CLIR), because translations are mostly lemmas. In this study, we face the challenge of dictionary-based CLIR in a non-normalized index. We test two optional approaches: FCG (Frequent Case Generation) and s-gramming. The idea of FCG is to automatically generate the most frequent inflected forms for a given lemma. FCG has been tested in monolingual retrieval and has been shown to be a good method for inflected retrieval, especially for highly inflected languages. S-gramming is an approximate string matching technique (an extension of n-gramming). The language pairs in our tests were English-Finnish, English-Swedish, Swedish-Finnish and Finnish-Swedish. Both our approaches performed quite well, but the results varied depending on the language pair. S-gramming and FCG performed quite equally in all the other language pairs except Finnish-Swedish, where s-gramming outperformed FCG.
机译:许多可操作的IR索引未经标准化,即在索引编制过程中未采用词条化或词干提取技术等。这对基于字典的跨语言检索(CLIR)构成了挑战,因为翻译主要是引理。在这项研究中,我们在非规范化索引中面临基于字典的CLIR的挑战。我们测试了两种可选方法:FCG(常见案例生成)和s-gramming。 FCG的想法是针对给定引理自动生成最常见的变形形式。 FCG已在单语检索中进行了测试,并且已被证明是一种有效的词尾检索方法,特别是对于高度词尾的语言。 S-gramming是一种近似的字符串匹配技术(n-gramming的扩展)。我们测试中的语言对为英语-芬兰语,英语-瑞典语,瑞典语-芬兰语和芬兰语-瑞典语。我们的两种方法都执行得很好,但是结果因语言对而异。除了芬兰语-瑞典语外,S-gramming和FCG在所有其他语言对中的表现均相当,其中s-gramming优于FCG。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号