首页> 外文会议>Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies >Bilingual Word Embeddings from Parallel and Non-parallel Corpora for Cross-Language Text Classification
【24h】

Bilingual Word Embeddings from Parallel and Non-parallel Corpora for Cross-Language Text Classification

机译:并行和非并行语料库的双语词嵌入用于跨语言文本分类

获取原文

摘要

In many languages, sparse availability of resources causes numerous challenges for textual analysis tasks. Text classification is one of such standard tasks that is hindered due to limited availability of label information in low-resource languages. Transferring knowledge (i.e. label information) from high-resource to low-resource languages might improve text classification as compared to the other approaches like machine translation. We introduce BRAVE (Bilingual paRAgraph VEctors), a model to learn bilingual distributed representations (i.e. embeddings) of words without word alignments either from sentence-aligned parallel or label-aligned non-parallel document corpora to support cross-language text classification. Empirical analysis shows that classification models trained with our bilingual embeddings outperforms other state-of-the-art systems on three different cross-language text classification tasks.
机译:在许多语言中,资源的稀缺性给文本分析任务带来了许多挑战。文本分类是此类标准任务之一,由于资源较少的语言中标签信息的可用性有限,因此无法进行分类。与其他方法(例如机器翻译)相比,将知识(即标签信息)从高资源语言转换为低资源语言可能会改善文本分类。我们引入了BRAVE(双语参数向量)模型,该模型可从句子对齐的平行或标签对齐的非平行文档语料库中学习没有单词对齐的单词的双语分布式表示(即嵌入),以支持跨语言文本分类。实证分析表明,在三种不同的跨语言文本分类任务上,使用我们的双语嵌入训练的分类模型优于其他最新系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号