【24h】

Innovations in Parallel Corpus Search Tools

机译:并行语料库搜索工具的创新

获取原文

摘要

Recent years have seen an increased interest in and availability of parallel corpora. Large corpora from international organizations (e.g. European Union, United Nations, European Patent Office), or from multilingual Internet sites (e.g. OpenSubtitles) are now easily available and are used for statistical machine translation but also for online search by different user groups. This paper gives an overview of different usages and different types of search systems. In the past, parallel corpus search systems were based on sentence-aligned corpora. We argue that automatic word alignment allows for major innovations in searching parallel corpora. Some online query systems already employ word alignment for sorting translation variants, but none supports the full query functionality that has been developed for parallel treebanks. We propose to develop such a system for efficiently searching large parallel corpora with a powerful query language.
机译:近年来,有兴趣和平行对象的可用性增加。来自国际组织的大公司(例如欧洲联盟,联合国,欧洲专利局),或来自多语言互联网网站(例如,Opensubtitles)现在可以使用并用于统计机器翻译,但也用于由不同的用户组进行在线搜索。本文概述了不同的使用和不同类型的搜索系统。在过去,并行语料库搜索系统基于句子对齐的语料库。我们认为,自动字对齐允许搜索并行基层的主要创新。一些在线查询系统已经采用字对齐进行排序转换变体,但是无支持为并行树班班开发的完整查询功能。我们建议开发这样一个系统,以有效地以强大的查询语言搜索大型平行语言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号