首页> 外文会议>Moratuwa Engineering Research Conference >Automatic creation of a word aligned Sinhala-Tamil parallel corpus
【24h】

Automatic creation of a word aligned Sinhala-Tamil parallel corpus

机译:自动创建一个单词对齐Sinhala-Tamil并行语料库

获取原文

摘要

A parallel corpus aligned at both sentence and word level is an important prerequisite in statistical machine translation. However, manual creation of such a parallel corpus is time consuming, and requires experts fluent in both languages. This paper presents the first ever empirical evaluation carried out to identify the best unsupervised word alignment technique for Sinhala and Tamil. It also presents a novel approach that combines the output of individual aligners, which outperforms the solitary use of these aligners. Sentence aligned parallel text from annual reports and letters of Sri Lankan Government institutions, and order papers from the Parliament of Sri Lanka were used in the evaluation.
机译:在句子和字级别对齐的并行语料库是统计机器翻译中的重要前提。但是,手动创建这种并行语料库是耗时的,并且需要专家流利的两种语言。本文介绍了第一个实证评价,以确定僧伽伽罗和泰米尔的最佳无人监督的词对齐技术。它还介绍了一种新的方法,它结合了各个对准器的输出,这优于这些对准器的孤独使用。句子从年度报告和斯里兰卡政府机构的年度报告和信件中的并联文本,并在评估中使用来自斯里兰卡议会的秩序论文。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号