首页> 外文学位 >Lost and Found in Translation: Cross-Lingual Question Answering with Result Translation.
【24h】

Lost and Found in Translation: Cross-Lingual Question Answering with Result Translation.

机译:翻译中的失物招领:带有结果翻译的跨语言问答。

获取原文
获取原文并翻译 | 示例

摘要

Using cross-lingual question answering (CLQA), users can find information in languages that they do not know. In this thesis, we consider the broader problem of CLQA with result translation, where answers retrieved by a CLQA system must be translated back to the user's language by a machine translation (MT) system. This task is challenging because answers must be both relevant to the question and adequately translated in order to be correct. In this work, we show that integrating the MT closely with cross-lingual retrieval can improve result relevance and we further demonstrate that automatically correcting errors in the MT output can improve the adequacy of translated results.;To understand the task better, we undertake detailed error analyses examining the impact of MT errors on CLQA with result translation. We identify which MT errors are most detrimental to the task and how different cross-lingual information retrieval (CLIR) systems respond to different kinds of MT errors. We describe two main types of CLQA errors caused by MT errors: lost in retrieval errors, where relevant results are not returned, and lost in translation errors, where relevant results are perceived irrelevant due to inadequate MT.;To address the lost in retrieval errors, we introduce two novel models for cross-lingual information retrieval that combine complementary source-language and target-language information from MT. We show empirically that these hybrid, bilingual models outperform both monolingual models and a prior hybrid model.;Even once relevant results are retrieved, if they are not translated adequately, users will not understand that they are relevant. Rather than improving a specific MT system, we take a more general approach that can be applied to the output of any MT system. Our adequacy-oriented automatic post-editors (APEs) use resources from the CLQA context and information from the MT system to automatically detect and correct phrase-level errors in MT at query time, focusing on the errors that are most likely to impact CLQA: deleted or missing content words and mistranslated named entities. Human evaluations show that these adequacy-oriented APEs can successfully adapt task-agnostic MT systems to the needs of the CLQA task.;Since there is no existing test data for translingual QA or IR tasks, we create a translingual information retrieval (TLIR) evaluation corpus. Furthermore, we develop an analysis framework for isolating the impact of MT errors on CLIR and on result understanding, as well as evaluating the whole TLIR task. We use the TLIR corpus to carry out a task-embedded MT evaluation, which shows that our CLIR models address lost in retrieval errors, resulting in higher TLIR recall; and that the APEs successfully correct many lost in translation errors, leading to more adequately translated results.
机译:使用跨语言问答(CLQA),用户可以使用他们不知道的语言查找信息。在本文中,我们考虑了带有结果翻译的更广泛的CLQA问题,其中必须将由CLQA系统检索到的答案通过机器翻译(MT)系统翻译回用户的语言。这项任务具有挑战性,因为答案必须与问题相关并且必须正确翻译才能正确。在这项工作中,我们证明了将MT与跨语言检索紧密集成可以改善结果的相关性,并且我们进一步证明自动更正MT输出中的错误可以改善翻译结果的充分性。错误分析通过结果转换检查MT错误对CLQA的影响。我们确定哪些MT错误对任务最有害,以及不同的跨语言信息检索(CLIR)系统如何响应不同种类的MT错误。我们描述了由MT错误引起的CLQA错误的两种主要类型:检索错误中丢失的(未返回相关结果)和翻译错误中的丢失(由于MT不足导致相关结果被认为不相关)。 ,我们介绍了两种新颖的跨语言信息检索模型,这些模型结合了MT的补充源语言信息和目标语言信息。我们从经验上证明这些混合双语模型优于单语言模型和先前的混合模型。即使检索到相关结果,如果它们翻译得不够充分,用户也不会理解它们是相关的。我们没有改进特定的MT系统,而是采用了更通用的方法,可以将其应用于任何MT系统的输出。我们面向充分性的自动后期编辑器(APE)使用来自CLQA上下文的资源和来自MT系统的信息来在查询时自动检测和更正MT中的短语级错误,重点是最有可能影响CLQA的错误:删除或丢失了内容词,并且翻译了错误的命名实体。人工评估表明,这些面向充分性的APE可以成功地使与任务无关的MT系统适应CLQA任务的需求。;由于没有针对跨语言QA或IR任务的测试数据,因此我们创建了跨语言信息检索(TLIR)评估语料库。此外,我们开发了一个分析框架,用于隔离MT错误对CLIR和结果理解的影响,以及评估整个TLIR任务。我们使用TLIR语料库进行任务嵌入式MT评估,这表明我们的CLIR模型解决了检索错误中的丢失,从而导致更高的TLIR召回率;而且APE可以成功纠正许多翻译错误中的损失,从而获得更充分的翻译结果。

著录项

  • 作者

    Parton, Kristen.;

  • 作者单位

    Columbia University.;

  • 授予单位 Columbia University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 231 p.
  • 总页数 231
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号