首页> 中文期刊> 《计算机工程与科学》 >基于语义特征空间上下文的短文本表示学习

基于语义特征空间上下文的短文本表示学习

         

摘要

文本表示是自然语言处理中的基础任务, 针对传统短文本表示高维稀疏问题, 提出1种基于语义特征空间上下文的短文本表示学习方法.考虑到初始特征空间维度过高, 通过计算词项间互信息与共现关系, 得到初始相似度并对词项进行聚类, 利用聚类中心表示降维后的语义特征空间.然后, 在聚类后形成的簇上结合词项的上下文信息, 设计3种相似度计算方法分别计算待表示文本中词项与特征空间中特征词的相似度, 以形成文本映射矩阵对短文本进行表示学习.实验结果表明, 所提出的方法能很好地反映短文本的语义信息, 能对短文本进行合理而有效的表示学习.%Text representation is a basic task in natural language processing. Aiming at the drawback of the traditional high-dimensional sparse representation of short text, we propose a short text representation learning method based on semantic feature space context, called SFCR. Given the high dimension of the initial feature space, we firstly calculate the mutual information and co-occurrence relationship between terms, based on which we obtain the initial similarity and perform semantic clustering of terms. And the semantic feature space after dimensionality reduction can then be represented via the cluster center. Secondly, by combining the context information of the terms on the cluster formed after clustering, three similarity calculation methods are designed to calculate the similarity between the terms of the short text to be represented and the feature terms in the feature space. Thereafter the text mapping matrix for short text representation learning is constructed. Experimental results show that the proposed method can well reflect the semantic information of short text, and make reasonable and effective representation learning of short text.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号