首页> 外文会议>International conference on language resources and evaluation >Annotated Corpora for Word Alignment Between Japanese and English and its Evaluation with MAP-based Word Aligner
【24h】

Annotated Corpora for Word Alignment Between Japanese and English and its Evaluation with MAP-based Word Aligner

机译:日语和英语之间单词对齐的带注释语料库及其基于MAP的单词对齐器的评估

获取原文

摘要

This paper presents two annotated corpora for word alignment between Japanese and English. We annotated on top of the IWSLT-2006 and the NTCIR-8 corpora. The IWSLT-2006 corpus is in the domain of travel conversation while the NTCIR-8 corpus is in the domain of patent. We annotated the first 500 sentence pairs from the IWSLT-2006 corpus and the first 100 sentence pairs from the NTCIR-8 corpus. After mentioned the annotation guideline, we present two evaluation algorithms how to use such hand-annotated corpora: although one is a well-known algorithm for word alignment researchers, one is novel which intends to evaluate a MAP-based word aligner of Okita et al. (2010b).
机译:本文提出了两个带注释的语料库,用于日语和英语之间的单词对齐。我们在IWSLT-2006和NTCIR-8语料库的顶部进行了注释。 IWSLT-2006语料库在旅行对话中,而NTCIR-8语料库在专利中。我们注释了IWSLT-2006语料库的前500个句子对和NTCIR-8语料库的前100个句子对。在提到注释准则之后,我们提出了两种如何使用这种手工注释的语料库的评估算法:尽管一种是词对齐研究人员的著名算法,一种是新颖的,旨在评估Okita等人的基于MAP的词对齐器。 (2010b)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号