首页> 中文期刊> 《情报学报》 >基于共现潜在语义向量空间模型的语义核构建

基于共现潜在语义向量空间模型的语义核构建

         

摘要

实现数字图书馆资源聚合的知识发现离不开对知识的有效表示.作为经典的文本表示模型,向量空间模型(VSM)及其衍生模型在信息检索以及知识发现等研究中都有着重要的地位,但依然存在不足.共现潜在语义向量空间模型(CLSVSM)作为新的文本表示模型,与VSM相比明显提高了文本聚类的精度.然而,面对文本大数据的应用,共现矩阵维度往往较高,致使模型的计算复杂度也较大.因此,本文在CLSVSM基础上构建了语义核(CLSVSM_K),构建的原理是基于潜在语义分析(LSA)的思想.CLSVSM_K不仅降低了共现矩阵的维度,而且实现了文本特征词之间同义信息的合并.本文将该语义核模型应用于文献的主题聚类中,实验结果表明,该方法的确有效降低了特征词空间的维度和计算的复杂度,提高了聚类算法的性能,且提高了文献主题聚类的精确度.该模型的应用将有助于数字图书馆信息资源组织、知识发现和知识优化.%The effective representation of knowledge is the key technology to realize the knowledge discovery of digital library resource aggregation. As a classic text representation model, vector space model (VSM) and its deriva-tives play an important role in the field of information retrieval and knowledge discovery, but there are also many problems in applying them. Co-occurrence latent semantic VSM (CLSVSM) is used as a new model in the vector representation of text information, which significantly improves the accuracy of text clustering compared with VSM. However, the dimension of the co-occurrence matrix is often higher when faced with the large text data, which leads to the complexity of the model. This paper constructed a semantic kernel (CLSVSM_K) base on CLSVSM, which uses the idea of latent semantic analysis. CLSVSM_K not only reduces the dimensions of the co-occurrence matrix, but also realizes the merging of synonymous information of text feature words. In this paper, the semantic kernel model is used in the topic clustering of the literature. Experimental results show that the proposed method can effec-tively reduce the dimensions of the feature word space and the complexity of computation, and improve the per-formance of the clustering algorithm. Moreover, the model improves the accuracy of topic clustering of the literature. The application of the proposed model to digital library information resources will promote the development of knowledge organization, discovery, and optimization.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号