首页> 中文期刊>中文信息学报 >基于点关联测度矩阵分解的中英跨语言词嵌入方法

基于点关联测度矩阵分解的中英跨语言词嵌入方法

     

摘要

研究基于矩阵分解的词嵌入方法,提出统一的描述模型,并应用于中英跨语言词嵌入问题.以双语对齐语料为知识源,提出跨语言关联词计算方法和两种点关联测度的计算方法:跨语言共现计数和跨语言点互信息.分别设计目标函数学习中英跨语言词嵌入.从目标函数、语料数据、向量维数等角度进行实验,结果表明,在中英跨语言文档分类中以前者作为点关联测度最高得到87.04%的准确率;在中英跨语言词义相似度计算中,后者作为点关联测度得到更好的性能,同时在英—英词义相似度计算中的性能略高于主流的英语词嵌入.%This paper presents a unified model for matrix factorization based word embeddings,and applies the model to Chinese-English cross-lingual word embeddings.It proposes a method to determine cross-lingual relevant word on parallel corpus.Both cross-lingual word co-occurrence and pointwise mutual information are served as pointwise relevant measurements to design objective function for learning cross-lingual word embeddings.Experiments are carried out from perspectives of different objective function,corpus,and vector dimension.For the task of cross-lingual document classification,the best performance model achieves 87.04% in accuracy,as it adopts cross-lingual word co-occurrence as relevant measurement.In contrast,models adopt cross-lingual pointwise mutual information get better performance in cross-lingual word similarity calculation task.Meanwhile,for the problem of English word similarity calculation,experimental result shows that our methods get slightly higher performance than English word embeddings trained by state-of-the-art methods.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号