首页> 外文期刊>International Journal of Computer Processing of Oriental Languages >Combination Approaches in Korean Information Retrieval: Words vs. n-grams, and Query Translation vs. Document Translation
【24h】

Combination Approaches in Korean Information Retrieval: Words vs. n-grams, and Query Translation vs. Document Translation

机译:朝鲜语信息检索中的组合方法:单词与n-gram,查询翻译与文档翻译

获取原文
获取原文并翻译 | 示例
           

摘要

Asian language information retrieval has been a challenge to existing information retrieval researchers, since Asian languages have different characteristics from Indo-European languages. For example, Chinese and Japanese lack word delimiters, and Korean does not allow spaces between words within its syntactic unit called Eojeol. In addition, they employ large sets of characters originating from ideographic Chinese characters. Although much research has been conducted on the above Asian languages in order to adapt or confirm existing information retrieval solutions that were developed primarily for English, there have been only a few Korean-related works reported internationally, and most of them have been done on small-scale document collections. Thus, this study presents large-scale retrieval evaluations on Korean to serve as a benchmark for further Korean-related information retrieval researches. In particular, this article investigates the following issues regarding Korean: word-based retrieval vs. n-gram-based retrieval, and query translation vs. document translation. Our monolingual experiments confirmed that, in Korean, n-gram-based and word-based retrieval show different retrieval characteristics for many queries, and that their fusion achieves better performance than either one alone in the case of the probabilistic model. The same was witnessed on query translation and document translation from cross-lingual experiments. In addition, we observed that naive document translation performs slightly better than naive query translation since the former performs query structuring similar to the Pirkola method.
机译:亚洲语言信息检索一直是现有信息检索研究人员面临的挑战,因为亚洲语言与印欧语言具有不同的特征。例如,中文和日语缺少单词定界符,而朝鲜语在其称为Eojeol的语法单元内不允许在单词之间留空格。此外,他们使用源自表意汉字的大量字符。尽管已经对上述亚洲语言进行了大量研究,以适应或确认主要为英语开发的现有信息检索解决方案,但国际上仅有少数与韩国有关的著作被报道,其中大部分是在小型出版物上完成的。规模的文档集合。因此,本研究提出了对朝鲜语的大规模检索评估,以作为进一步的朝鲜语相关信息检索研究的基准。特别是,本文研究了有关韩语的以下问题:基于单词的检索与基于n-gram的检索,以及查询翻译与文档翻译。我们的单语实验证实,在朝鲜语中,基于n元语法和基于单词的检索对于许多查询都显示出不同的检索特征,并且在概率模型的情况下,它们的融合比单独使用任一查询都具有更好的性能。跨语言实验的查询翻译和文档翻译也见证了同样的情况。此外,我们观察到朴素文档翻译的性能比朴素查询翻译略好,因为前者执行的查询结构类似于Pirkola方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号