Bilingual Word Embeddings from Parallel and Non-parallel Corpora for Cross-Language Text Classification

机译：并行和非并行语料库的双语词嵌入用于跨语言文本分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In many languages, sparse availability of resources causes numerous challenges for textual analysis tasks. Text classification is one of such standard tasks that is hindered due to limited availability of label information in low-resource languages. Transferring knowledge (i.e. label information) from high-resource to low-resource languages might improve text classification as compared to the other approaches like machine translation. We introduce BRAVE (Bilingual paRAgraph VEctors), a model to learn bilingual distributed representations (i.e. embeddings) of words without word alignments either from sentence-aligned parallel or label-aligned non-parallel document corpora to support cross-language text classification. Empirical analysis shows that classification models trained with our bilingual embeddings outperforms other state-of-the-art systems on three different cross-language text classification tasks.

机译：在许多语言中，资源的稀缺性给文本分析任务带来了许多挑战。文本分类是此类标准任务之一，由于资源较少的语言中标签信息的可用性有限，因此无法进行分类。与其他方法（例如机器翻译）相比，将知识（即标签信息）从高资源语言转换为低资源语言可能会改善文本分类。我们引入了BRAVE（双语参数向量）模型，该模型可从句子对齐的平行或标签对齐的非平行文档语料库中学习没有单词对齐的单词的双语分布式表示（即嵌入），以支持跨语言文本分类。实证分析表明，在三种不同的跨语言文本分类任务上，使用我们的双语嵌入训练的分类模型优于其他最新系统。

著录项

来源
《Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies》|2016年|692-702|共11页
会议地点
作者
Aditya Mogadala; Achim Rettinger;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Cross-language message- and word-level transfer effects in bilingual text processing [J] . Friesen DC, Jared D Memory & cognition . 2007,第7期

机译：双语文本处理中的跨语言消息和单词级传输效果
2. Automatic Extraction of Bilingual Word Pairs from Parallel Corpora with Various Languages Using Learning for Adjacent Information [J] . Hiroshi Echizen-ya, Kenji Araki, Yoshio Momouchi Systems and Computers in Japan . 2006,第13期

机译：通过学习相邻信息自动从平行语料库中提取双语单词对
3. Improving Polarity Classification of Bilingual Parallel Corpora Combining Machine Learning and Semantic Orientation Approaches [J] . Jose M. Perea-Ortega, M. Teresa Martin-Valdivia, L. Alfonso Urena-Lopez, Journal of the American Society for Information Science and Technology . 2013,第9期

机译：结合机器学习和语义定向方法改进双语平行语料库的极性分类
4. Bilingual Word Embeddings from Parallel and Non-parallel Corpora for Cross-Language Text Classification [C] . Aditya Mogadala, Achim Rettinger Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 2016

机译：双语单词嵌入来自平行和非平行语料的跨语言文本分类
5. Parallel Sentence Detection in Comparable Corpora with Bilingual Word Embeddings for Low-Resource Languages [D] . Cadigan, John. 2018

机译：与低资源语言的双语单词嵌入式的同类语料中的并行句子检测
6. Cross-Language Nonword Repetition by Bilingual and Monolingual Children [O] . Jennifer Windsor, Kathryn Kohnert, Kelann F. Lobitz, -1

机译：双语和单语儿童的跨语言非单词重复
7. Bilingual word embeddings from non-parallel document-aligned data applied to bilingual lexicon induction [O] . Vulic Ivan, Moens Marie-Francine 2015

机译：来自非平行文档对齐数据的双语词嵌入应用于双语词典归纳

Bilingual Word Embeddings from Parallel and Non-parallel Corpora for Cross-Language Text Classification

摘要

著录项

相似文献

相关主题

期刊订阅