首页> 外文期刊>ACM transactions on Asian language information processing >Improving Vector Space Word Representations Via Kernel Canonical Correlation Analysis
【24h】

Improving Vector Space Word Representations Via Kernel Canonical Correlation Analysis

机译:通过核规范相关分析改进向量空间词表示

获取原文
获取原文并翻译 | 示例
           

摘要

Cross-lingual word embeddings are representations for vocabularies of two or more languages in one common continuous vector space and are widely used in various natural language processing tasks. A state-ofthe-art way to generate cross-lingual word embeddings is to learn a linear mapping, with an assumption that the vector representations of similar words in different languages are related by a linear relationship. However, this assumption does not always hold true, especially for substantially different languages. We therefore propose to use kernel canonical correlation analysis to capture a non-linear relationship between word embeddings of two languages. By extensively evaluating the learned word embeddings on three tasks (word similarity, cross-lingual dictionary induction, and cross-lingual document classification) across five language pairs, we demonstrate that our proposed approach achieves essentially better performances than previous linear methods on all of the three tasks, especially for language pairs with substantial typological difference.
机译:跨语言单词嵌入是一种公共连续向量空间中两种或多种语言词汇的表示形式,并广泛用于各种自然语言处理任务中。生成跨语言单词嵌入的最新方法是学习线性映射,并假设不同语言中相似单词的矢量表示通过线性关系关联。但是,这种假设并不总是成立,尤其是对于实质上不同的语言。因此,我们建议使用内核规范相关分析来捕获两种语言的词嵌入之间的非线性关系。通过广泛评估五种语言对在三个任务(单词相似度,跨语言词典归纳和跨语言文档分类)上学习的单词嵌入,我们证明了我们提出的方法在所有方面都比以前的线性方法具有更好的性能。这是三项任务,特别是对于类型差异很大的语言对。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号