Improving Vector Space Word Representations Via Kernel Canonical Correlation Analysis

Bai Xuefeng; Cao Hailong; Zhao Tiejun

首页> 外文期刊>ACM transactions on Asian language information processing >Improving Vector Space Word Representations Via Kernel Canonical Correlation Analysis

【24h】

Improving Vector Space Word Representations Via Kernel Canonical Correlation Analysis

机译：通过核规范相关分析改进向量空间词表示

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Cross-lingual word embeddings are representations for vocabularies of two or more languages in one common continuous vector space and are widely used in various natural language processing tasks. A state-ofthe-art way to generate cross-lingual word embeddings is to learn a linear mapping, with an assumption that the vector representations of similar words in different languages are related by a linear relationship. However, this assumption does not always hold true, especially for substantially different languages. We therefore propose to use kernel canonical correlation analysis to capture a non-linear relationship between word embeddings of two languages. By extensively evaluating the learned word embeddings on three tasks (word similarity, cross-lingual dictionary induction, and cross-lingual document classification) across five language pairs, we demonstrate that our proposed approach achieves essentially better performances than previous linear methods on all of the three tasks, especially for language pairs with substantial typological difference.

机译：跨语言单词嵌入是一种公共连续向量空间中两种或多种语言词汇的表示形式，并广泛用于各种自然语言处理任务中。生成跨语言单词嵌入的最新方法是学习线性映射，并假设不同语言中相似单词的矢量表示通过线性关系关联。但是，这种假设并不总是成立，尤其是对于实质上不同的语言。因此，我们建议使用内核规范相关分析来捕获两种语言的词嵌入之间的非线性关系。通过广泛评估五种语言对在三个任务（单词相似度，跨语言词典归纳和跨语言文档分类）上学习的单词嵌入，我们证明了我们提出的方法在所有方面都比以前的线性方法具有更好的性能。这是三项任务，特别是对于类型差异很大的语言对。

著录项

来源
《ACM transactions on Asian language information processing》 |2018年第4期|29.1-29.16|共16页
作者
Bai Xuefeng; Cao Hailong; Zhao Tiejun;
展开▼
作者单位

Harbin Inst Technol, 92 West Dazhi St, Harbin 150001, Heilongjiang, Peoples R China;

Harbin Inst Technol, 92 West Dazhi St, Harbin 150001, Heilongjiang, Peoples R China;

Harbin Inst Technol, 92 West Dazhi St, Harbin 150001, Heilongjiang, Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Cross-lingual word representation; kernel canonical correlation analysis (KCCA); word embedding evaluation;

机译：跨语言单词表示;内核规范相关分析（KCCA）;单词嵌入评估;

相似文献

外文文献
中文文献
专利

1. An improved composite kernel framework for hyperspectral image classification using canonical correlation analysis [J] . Hao Chen, Jianjun Liu, Liang Xiao Remote sensing letters . 2019,第4a6期

机译：使用规范相关分析的高光谱图像分类的改进复合核框架
2. Evaluating vector space models with canonical correlation analysis [J] . SAMI VIRPIOJA, MARI-SANNA PAUKKERI, ABHISHEK TRIPATHI, Natural language engineering . 2012,第PTa3期

机译：用规范相关分析评估向量空间模型
3. Improving short text classification by learning vector representations of both words and hidden topics [J] . Zhang Heng, Zhong Guoqiang Knowledge-Based Systems . 2016,第juna15期

机译：通过学习单词和隐藏主题的向量表示来改善短文本分类
4. Improved estimation of canonical vectors in canonical correlation analysis [C] . Nicholas Asendorf, Raj Rao Nadakuditi Asilomar Conference on Signals, Systems and Computers . 2015

机译：典范相关分析中典范向量的改进估计
5. An exploration of the word2vec algorithm: Creating a vector representation of a language vocabulary that encodes meaning and usage patterns in the vector space structure [D] . Le, Thu Anh. 2016

机译：word2vec算法的探索：创建语言词汇的矢量表示，该矢量表示编码矢量空间结构中的含义和用法模式
6. Construction of Condition-Specific Gene Regulatory Network Using Kernel Canonical Correlation Analysis [O] . Dabin Jeong, Sangsoo Lim, Sangseon Lee, 2021

机译：利用核典型相关分析构建病症特异性基因调节网络
7. Improving Vector Space Word Representations Using Multilingual Correlation [O] . Faruqui Manaal, Dyer Chris 2014

机译：使用多语言相关性改进向量空间字表示

Improving Vector Space Word Representations Via Kernel Canonical Correlation Analysis

摘要

著录项

相似文献

相关主题

期刊订阅