Multilingual seq2seq training with similarity loss for cross-lingual document classification

机译：具有相似性损失的多语言seq2seq训练用于跨语言文档分类

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we continue the line of work where neural machine translation training is used to produce joint cross-lingual fixed-dimensional sentence embeddings. In this framework we introduce a simple method of adding a loss to the learning objective which penalizes distance between representations of bilingually aligned sentences. We evaluate cross-lingual transfer using two approaches, cross-lingual similarity search on an aligned corpus (Eu-roparl) and cross-lingual document classification on a recently published benchmark Reuters corpus, and we find the similarity loss significantly improves performance on both. Our cross-lingual transfer performance is competitive with state-of-the-art, even while there is potential to further improve by investing in a better in-language baseline. Our results are based on a set of 6 European languages.

机译：在本文中，我们继续使用神经机器翻译训练来产生联合跨语言固定维句子嵌入的工作。在这个框架中，我们介绍了一种简单的方法，将损失添加到学习目标中，这惩罚了双语对齐句子表示之间的距离。我们使用两种方法评估跨语言迁移，即在对齐的语料库（Eu-roparl）上进行跨语言相似性搜索，以及在最近发布的基准路透社语料库上进行跨语言文档分类，我们发现相似度损失可显着提高这两种方法的性能。我们的跨语言翻译性能与最新技术相媲美，尽管有可能通过投资于更好的语言基准来进一步提高。我们的结果基于一组6种欧洲语言。

著录项

来源
《3rd workshop on representation learning for NLP 2018》|2018年|175-179|共5页
会议地点 Melbourne(AU)
作者
Katherin Yu; Haoran Li; Barlas Oguz;
展开▼
作者单位

Facebook AML;

Facebook AML;

Facebook AML;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Cross-lingual sentiment classification: Similarity discovery plus training data adjustment [J] . Zhang Peng, Wang Suge, Li Deyu Knowledge-Based Systems . 2016,第sepa1期

机译：跨语言情感分类：相似度发现和训练数据调整
2. Cross-lingual document similarity estimation and dictionary generation with comparable corpora [J] . Stajner Tadej, Mladenic Dunja Knowledge and information systems . 2019,第3期

机译：与可比语料库的交叉语言文档相似性估算与字典代
3. News Across Languages - Cross-Lingual Document Similarity and Event Tracking [J] . Fortuna Blaz, Grobelnik Marko, Leban Gregor, The Journal of Artificial Intelligence Research . 2016,第10期

机译：跨语言新闻-跨语言文档相似性和事件跟踪
4. Multilingual seq2seq training with similarity loss for cross-lingual document classification [C] . Katherin Yu, Haoran Li, Barlas Oguz Annual meeting of the Association for Computational Linguistics . 2018

机译：具有相似性损失的多语种SEQ2SEQ培训，用于交叉文档分类
5. Multilingual model using cross-lingual word embeddings based on subword alignment and cross-task projection利用統計を見る [D] . Sakuma Jin 2019

机译：使用基于子词对齐和跨任务投影的跨语言词嵌入的多语言模型
6. A Cross-Lingual Similarity Measure for Detecting Biomedical Term Translations [O] . Danushka Bollegala, Georgios Kontonatsios, Sophia Ananiadou -1

机译：用于检测生物医学术语翻译的跨语言相似性度量
7. Multilingual Seq2seq Training with Similarity Loss for Cross-Lingual Document Classification [O] . Katherine Yu, Haoran Li, Barlas Oguz 2018

机译：具有相似性损失的多语种SEQ2SEQ培训，用于交叉文档分类

Multilingual seq2seq training with similarity loss for cross-lingual document classification

摘要

著录项

相似文献

相关主题

期刊订阅