首页> 外文会议>3rd workshop on representation learning for NLP 2018 >Multilingual seq2seq training with similarity loss for cross-lingual document classification
【24h】

Multilingual seq2seq training with similarity loss for cross-lingual document classification

机译:具有相似性损失的多语言seq2seq训练用于跨语言文档分类

获取原文
获取原文并翻译 | 示例

摘要

In this paper we continue the line of work where neural machine translation training is used to produce joint cross-lingual fixed-dimensional sentence embeddings. In this framework we introduce a simple method of adding a loss to the learning objective which penalizes distance between representations of bilingually aligned sentences. We evaluate cross-lingual transfer using two approaches, cross-lingual similarity search on an aligned corpus (Eu-roparl) and cross-lingual document classification on a recently published benchmark Reuters corpus, and we find the similarity loss significantly improves performance on both. Our cross-lingual transfer performance is competitive with state-of-the-art, even while there is potential to further improve by investing in a better in-language baseline. Our results are based on a set of 6 European languages.
机译:在本文中,我们继续使用神经机器翻译训练来产生联合跨语言固定维句子嵌入的工作。在这个框架中,我们介绍了一种简单的方法,将损失添加到学习目标中,这惩罚了双语对齐句子表示之间的距离。我们使用两种方法评估跨语言迁移,即在对齐的语料库(Eu-roparl)上进行跨语言相似性搜索,以及在最近发布的基准路透社语料库上进行跨语言文档分类,我们发现相似度损失可显着提高这两种方法的性能。我们的跨语言翻译性能与最新技术相媲美,尽管有可能通过投资于更好的语言基准来进一步提高。我们的结果基于一组6种欧洲语言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号