首页> 外文学位 >Active Learning and Crowdsourcing for Machine Translation in Low Resource Scenarios.

【24h】

Active Learning and Crowdsourcing for Machine Translation in Low Resource Scenarios.

机译：在资源不足的情况下为机器翻译进行主动学习和众包。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Corpus based approaches to automatic translation such as Example Based and Statistical Machine Translation systems use large amounts of parallel data created by humans to train mathematical models for automatic language translation. Large scale parallel data generation for new language pairs requires intensive human effort and availability of fluent bilinguals or expert translators. Therefore it becomes immensely difficult and expensive to provide state-of-the-art Machine Translation (MT) systems for rare languages.;In this thesis, we explore active learning to reduce costs and make best use of human resources for building low-resource MT systems. Active learning approaches help us identify sentences, which if translated have the potential to provide maximal improvement to an existing system. We then apply active learning to other relevant tasks in MT such as word alignment, classifying monolingual text by topic, extracting comparable corpora from the web. In all these tasks we reduce annotated data required by the underlying supervised learning models. We also extend the traditional active learning approach of optimizing selection for a single annotation to handle cases of multiple-type annotations and show further reduction of costs in building low-resource MT systems.;Finally, as part of this thesis, we have implemented a new framework - Active Crowd Translation (ACT), a cost sensitive active learning setup for building MT systems for low-resource language pairs. Our framework will provide a suitable platform for involving disparately spread out human translators around the world, in a timely and sparingly fashion for rapid building of translation systems. We first explore the ACT paradigm with expert translators and then generalize to full-scale crowdsourcing with non-expert bilingual speakers. In case of Machine Translation, although crowdsourcing services like Amazon's Mechanical Turk have opened doors to tap human potential, they do not guarantee translation expertise nor extended availability of translators. We address several challenges in eliciting quality translations from an unvetted crowd of bilingual speakers.

机译：基于语料库的自动翻译方法（例如，基于示例的翻译和统计机器翻译系统）使用大量由人类创建的并行数据来训练用于自动语言翻译的数学模型。对于新语言对，大规模并行数据生成需要大量的人力和熟练的双语者或专家翻译。因此，为稀有语言提供最新的机器翻译（MT）系统变得极为困难和昂贵。本论文中，我们探索主动学习以降低成本并充分利用人力资源来构建资源匮乏的资源。 MT系统。主动学习方法可以帮助我们识别句子，如果将其翻译，则有可能对现有系统提供最大的改进。然后，我们将主动学习应用于MT中的其他相关任务，例如单词对齐，按主题对单语文本进行分类，从网络中提取可比的语料库。在所有这些任务中，我们减少了基础监督学习模型所需的带注释的数据。我们还扩展了针对单个注释优化选择的传统主动学习方法，以处理多类型注释的情况，并显示出进一步降低了构建低资源MT系统的成本。最后，作为本文的一部分，我们实现了一个新框架-主动人群翻译（ACT），这是一种对成本敏感的主动学习设置，用于为低资源语言对构建MT系统。我们的框架将提供一个合适的平台，以适时和少量的方式让世界各地分散的人类翻译参与其中，以快速构建翻译系统。我们首先与专家翻译探讨ACT范例，然后与非专家双语者一起推广到全面的众包。就机器翻译而言，尽管诸如亚马逊的Mechanical Turk之类的众包服务为挖掘人类潜力打开了大门，但它们不能保证翻译专业知识或翻译人员的可用性。我们要从众多未经审查的双语者中获取高质量的翻译，以应对一些挑战。

著录项

作者
Ambati, Vamshi.;
展开▼
作者单位

Carnegie Mellon University.;

展开▼
授予单位 Carnegie Mellon University.;
学科 Computer Science.
学位 Ph.D.
年度 2012
页码 148 p.
总页数 148
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A Joint Back-Translation and Transfer Learning Method for Low-Resource Neural Machine Translation [J] . Gong-Xu Luo, Ya-Ting Yang, Rui Dong, Mathematical Problems in Engineering: Theory, Methods and Applications . 2020,第1期

机译：低资源神经电机翻译的联合背翻译与转移学习方法
2. Spanish-Turkish Low-Resource Machine Translation: Unsupervised Learning vs Round-Tripping [J] . Tianyi Xu, Ozge Ilkim Ozbek, Shannon Marks, American Journal of Artificial Intelligence . 2020,第2期

机译：西班牙语 - 土耳其低资源机器翻译：无监督的学习与圆绊倒
3. Learning cross-lingual phonological and orthagraphic adaptations: a case study in improving neural machine translation between low-resource languages [J] . Saurav Jha, Akhilesh Sudhakar, Anil Kumar Singh Journal of Language Modelling . 2019,第2期

机译：学习跨语言的语音和拼字法适应：改进低资源语言之间的神经机器翻译的案例研究
4. Experiences in Resource Generation for Machine Translation through Crowdsourcing [C] . Anoop Kunchukuttan, Shourya Roy, Pratik Patel, International conference on language resources and evaluation . 2012

机译：通过众包进行机器翻译的资源生成经验
5. Non-Traditional Resources and Improved Tools for Low-Resource Machine Translation [D] . Pourdamghani, Nima. 2019

机译：非传统资源和低资源机器翻译的改进工具
6. Pseudotext Injection and Advance Filtering of Low-Resource Corpus for Neural Machine Translation [O] . Michael Adjeisah, Guohua Liu, Douglas Omwenga Nyabuga, 2021

机译：神经电机翻译低资源语料的假义注射和预先滤波
7. Hierarchical Transfer Learning Architecture for Low-Resource Neural Machine Translation [O] . Gongxu Luo, Yating Yang, Yang Yuan, 2019

机译：低资源神经机翻译的分层转移学习架构

Active Learning and Crowdsourcing for Machine Translation in Low Resource Scenarios.

摘要

著录项

相似文献

相关主题

期刊订阅