【24h】

Local Context Selection for Aligning Sentences in Parallel Corpora

机译:并行语料库中对齐句子的局部上下文选择

获取原文
获取原文并翻译 | 示例

摘要

This paper presents a novel language-independent context-based sentence alignment technique given parallel corpora. We can view the problem of aligning sentences as finding translations of sentences chosen from different sources. Unlike current approaches which rely on pre-defined features and models, our algorithm employs features derived from the distributional properties of sentences and does not use any language dependent knowledge. We make use of the context of sentences and introduce the notion of Zipfian word vectors which effectively models the distributional properties of a given sentence. We accept the context to be the frame in which the reasoning about sentence alignment is done. We examine alternatives for local context models and demonstrate that our context based sentence alignment algorithm performs better than prominent sentence alignment techniques. Our system dynamically selects the local context for a pair of set of sentences which maximizes the correlation. We evaluate the performance of our system based on two different measures: sentence alignment accuracy and sentence alignment coverage. We compare the performance of our system with commonly used sentence alignment systems and show that our system performs 1.1951 to 1.5404 times better in reducing the error rate in alignment accuracy and coverage.
机译:本文提出了一种基于平行语料库的新颖的独立于语言的基于上下文的句子对齐技术。我们可以将对齐句子的问题视为查找从不同来源中选择的句子的译文。与当前的依赖于预定义特征和模型的方法不同,我们的算法采用了从句子的分布特性派生的特征,并且不使用任何与语言相关的知识。我们利用句子的上下文并介绍Zipfian词向量的概念,该模型有效地模拟了给定句子的分布特性。我们认为上下文是完成句子对齐推理的框架。我们研究了局部上下文模型的替代方案,并证明了基于上下文的句子对齐算法的性能优于突出的句子对齐技术。我们的系统动态地为一对句子集选择局部上下文,以最大化相关性。我们基于两种不同的指标评估系统的性能:句子对齐精度和句子对齐覆盖率。我们将我们的系统与常用的句子对齐系统的性能进行了比较,结果表明我们的系统在降低对齐准确度和覆盖率方面的错误率方面,性能提高了1.1951到1.5404倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号