首页> 中文期刊> 《中文信息学报》 >基于汉字固有属性的中文字向量方法研究

基于汉字固有属性的中文字向量方法研究

         

摘要

中文短文本在如今高速发展的互联网应用中变得日趋重要,如何从海量短文本消息中挖掘出有价值的信息,已成为当前中文自然语言处理中非常重要且具有挑战性的课题.然而,采用传统的长文本处理方法进行分析往往得不到很好的效果,其根本原因在于中文短文本消息的语法及其语义的稀疏性.基于此,该文提出一种基于汉字笔画属性的中文字向量表示方法,并结合深度学习对短文本消息进行相似性计算.该方法结合中文汉字的构词和拼音属性,将中文汉字映射为一个仅32维的空间向量,最后使用卷积神经网络进行语义提取并进行相似性计算.实验结果表明,与现有的短文本相似性计算方法相比,该方法在算法性能及准确率上均有较大的提高.%With the rapid development of Internet,Chinese short text has become increasingly im-portant.How to mining valuable information from massive short text has become a very important and challenging task in Chinese natural language processing.However,using the traditional methods which analyze long text often get bad results due to the sparsity of syntax and semantic.This paper proposed a Chinese word embedding method based on stroke,combined with deep learning of short text similarity calculation.This method combined Chinese word-building and its Pin-Yin attributes.The Chinese characters were mapped to a 32-dimensional vector.Then we used convolution neural network to extract the semantic of each short text and calculate similarity.Experimental results show that compared with the existing short text similarity calculation method,the method has greatly improved on performance and accuracy.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号