首页> 外文会议>10th Workshop on statistical machine translation 2015 >Statistical Machine Translation with Automatic Identification of Translationese
【24h】

Statistical Machine Translation with Automatic Identification of Translationese

机译:统计机器翻译与翻译语言自动识别

获取原文
获取原文并翻译 | 示例

摘要

Translated texts (in any language) are so markedly different from original ones that text classification techniques can be used to tease them apart. Previous work has shown that awareness to these differences can significantly improve statistical machine translation. These results, however, required meta-information on the on-tological status of texts (original or translated) which is typically unavailable. In this work we show that the predictions of translationese classifiers are as good as meta-information. First, when a monolingual corpus in the target language is given, to be used for constructing a language model, predicting the translated portions of the corpus, and using only them for the language model, is as good as using the entire corpus. Second, identifying the portions of a parallel corpus that are translated in the direction of the translation task, and using only them for the translation model, is as good as using the entire corpus. We present results from several language pairs and various data sets, indicating that these results are robust and general.
机译:翻译后的文本(任何语言)与原始文本有显着差异,因此可以使用文本分类技术将它们分开。先前的工作表明,了解这些差异可以显着改善统计机器翻译。但是,这些结果需要有关文本(原始或翻译的)本体状态的元信息,而这些信息通常是不可用的。在这项工作中,我们证明翻译汉语分类器的预测与元信息一样好。首先,当给出目标语言的单语语料库时,将其用于构建语言模型,预测语料库的翻译部分并将其仅用于语言模型与使用整个语料库一样好。其次,识别在翻译任务方向上翻译的并行语料库的部分,并将其仅用于翻译模型,与使用整个语料库一样好。我们提供了几种语言对和各种数据集的结果,表明这些结果是可靠且通用的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号