首页> 外文期刊>ACM transactions on Asian and low-resource language information processing >Towards Integrated Classification Lexicon for Handling Unknown Words in Chinese-Vietnamese Neural Machine Translation
【24h】

Towards Integrated Classification Lexicon for Handling Unknown Words in Chinese-Vietnamese Neural Machine Translation

机译:朝着综合分类词典,用于处理中越神经电脑翻译中未知词

获取原文
获取原文并翻译 | 示例
           

摘要

In Neural Machine Translation (NMT), due to the limitations of the vocabulary, unknown words cannot be translated properly, which brings suboptimal performance of the translation system. For resource-scarce NMT that have small-scale training corpus, the effect is amplified. The traditional approach of amplifying the scale of the corpus is not applicable, because the parallel corpus is difficult to obtain in a resource-scarce setting; however, it is easy to obtain and utilize external knowledge, bilingual lexicon, and other resources. Therefore, we propose classification lexicon approach for processing unknown words in the Chinese-Vietnamese NMT task. Specifically, three types of unknown Chinese-Vietnamese words are classified and their corresponding classification lexicon are constructed by word alignment, Wikipedia extraction, and rule-based methods, respectively. After translation, the unknown words are restored by lexicon for post-processing. Experiment results on Chinese-Vietnamese, English-Vietnamese, and Mongolian-Chinese translations show that our approach significantly improves the accuracy and the performance of NMT especially in a resource-scarce setting.
机译:在神经机翻译(NMT)中,由于词汇量的局限性,无法正确翻译未知的单词,这带来了翻译系统的次优性能。对于具有小规模培训语料库的资源稀缺NMT,效果被放大。传统的放大语料库的方法不适用,因为在资源稀缺环境中难以获得并行语料库;但是,很容易获得和利用外部知识,双语词典和其他资源。因此,我们提出了在越南NMT任务中处理未知词的分类词典方法。具体地,分别分类了三种类型的未知的中越词语,并且它们的相应分类词典分别由字对齐,维基百科提取和基于规则的方法构成。在翻译之后,Lexicon恢复未知单词以进行后处理。汉语 - 越南语,英语 - 越南语和蒙古语 - 中文翻译实验结果表明,我们的方法显着提高了NMT的准确性和性能,尤其是在资源稀缺环境中。

著录项

  • 来源
  • 作者单位

    Faculty of Information Engineering and Automation Yunnan Key Laboratory of Artificial Intelligence Kunming University of Science and Technology Kunming China;

    Faculty of Information Engineering and Automation Yunnan Key Laboratory of Artificial Intelligence Kunming University of Science and Technology Kunming China;

    Faculty of Information Engineering and Automation Yunnan Key Laboratory of Artificial Intelligence Kunming University of Science and Technology Kunming China;

    Faculty of Information Engineering and Automation Yunnan Key Laboratory of Artificial Intelligence Kunming University of Science and Technology Kunming China;

    Faculty of Information Engineering and Automation Yunnan Key Laboratory of Artificial Intelligence Kunming University of Science and Technology Kunming China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Neural machine translation; classification lexicon; resource-scarce; unknown words;

    机译:神经机翻译;分类词典;资源稀缺;未知词语;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号