首页> 外文学位 >Bean Soup Translation: Flexible, Linguistically-motivated Syntax for Machine Translation.
【24h】

Bean Soup Translation: Flexible, Linguistically-motivated Syntax for Machine Translation.

机译:豆汤翻译:灵活的,基于语言的机器翻译语法。

获取原文
获取原文并翻译 | 示例

摘要

Machine translation (MT) systems attempt to translate texts from one language into another by translating words from a source language and rearranging them into fluent utterances in a target language. When the two languages organize concepts in very different ways, knowledge of their general sentence structure, or syntax, is crucial. The syntax of the target language is particularly useful, because it provides a means of testing whether the reorderings that a system might try are grammatically licensed. This thesis presents two novel syntactic techniques that aid in producing correct and grammatical translations. The first technique controls target language reordering using syntactic categories that span multiple words. The second technique complements the first by assessing the well-formedness of sequences formed by these reorderings using the same syntactic categories. These innovations are implemented in the context of statistical phrase-based machine translation [Zens et al., 2002; Koehn et al., 2003], which is the prevailing modern translation paradigm.;The main contribution of this thesis is to use the flexible syntax of Combinatory Categorial Grammar [CCG, Steedman, 2000] as the basis for deriving syntactic constituent labels for target strings in phrase-based systems, providing CCG labels for many target strings that traditional syntactic theories struggle to describe. These CCG labels are used to train novel syntax-based reordering and language models, which efficiently describe translation reordering patterns, as well as assess the grammaticality of target translations. The models are easily incorporated into phrase-based systems with minimal disruption to existing technology and achieve superior automatic metric scores and human evaluation ratings over a strong phrase-based baseline, as well as over syntax-based techniques that do not use CCG.
机译:机器翻译(MT)系统试图通过翻译源语言中的单词并将其重新排列为目标语言中的流利语音来将文本从一种语言翻译成另一种语言。当两种语言以非常不同的方式组织概念时,了解它们的一般句子结构或语法至关重要。目标语言的语法特别有用,因为它提供了一种测试系统可能尝试的重新排序是否获得语法许可的方法。本文提出了两种新颖的句法技术,它们有助于产生正确的和语法的翻译。第一种技术使用跨多个单词的句法类别来控制目标语言的重新排序。第二种技术通过使用相同的句法类别评估由这些重新排序形成的序列的格式正确性,对第一种技术进行了补充。这些创新是在基于统计短语的机器翻译的背景下实现的[Zens et al。,2002; Koehn et al。,2003],这是当前流行的现代翻译范式。本论文的主要贡献是使用组合分类语法的灵活语法[CCG,Steedman,2000]作为推导目标句法成分标签的基础基于短语的系统中的字符串,为传统语法理论难以描述的许多目标字符串提供CCG标签。这些CCG标签用于训练新颖的基于语法的重新排序和语言模型,这些模型可以有效地描述翻译重新排序模式,并评估目标翻译的语法性。可以轻松地将模型合并到基于短语的系统中,而对现有技术的干扰最小,并且可以在基于短语的强大基准以及不使用CCG的基于语法的技术上实现出色的自动指标得分和人工评估等级。

著录项

  • 作者

    Mehay, Dennis Nolan.;

  • 作者单位

    The Ohio State University.;

  • 授予单位 The Ohio State University.;
  • 学科 Language Linguistics.;Artificial Intelligence.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 172 p.
  • 总页数 172
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号