首页> 中文期刊> 《计算机技术与发展》 >基于特征空间的文本聚类

基于特征空间的文本聚类

         

摘要

Text clustering is a specific application of the clustering algorithm. With the development of Internet,the text clustering has got-ten an increasingly wide utilization in many fields,such as information retrieval and intelligent search engine. Text clustering algorithm in-volves text preprocessing and text clustering primarily,so some improvements on text clustering from these two aspects have been conduc-ted. The traditional text clustering adopts the VSM without considering the semantic similarity and correlation between words,which leads to low accuracy. In view of it,the text clustering method based on feature space is proposed which constructs an alternative word library through the feature space of document collection and gets the document theme according to the alternative word library,and then replaces the words in document based on the themes and its corresponding domain dictionary. However the traditional text clustering algorithm must need artificial K value. Therefore, K-means algorithm is presented based on the K value optimization. The experimental results show that the two improvements above mentioned have made text clustering more intelligent and more precise.%文本聚类是聚类算法的一种具体应用,随着互联网的发展,文本聚类应用越来越广泛,譬如在信息检索、智能搜索引擎等方面都有较为广泛的应用.文本聚类算法主要涉及文本预处理和文本聚类算法,故对文本聚类进行改进可以从这两方面入手.传统文本聚类的文本预处理采用VSM模型,该模型不考虑词与词的语义相似度和词与词的相关性,导致文本聚类精确度非常低.针对该问题,提出了基于特征空间文本聚类的方法.该方法根据文档集合的特征空间构造一个替代词库,并根据这个替代词库得到文档的主题,依据主题配合其对应的领域词典对文档词进行相应的替换.传统的文本聚类使用K-means算法,但该算法需要人工指定K值.为此,提出了基于K值优化的K-means改进算法.实验结果表明,所提出的文本聚类方法和K-means改进算法显著提高了文本聚类的智能性和精确性.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号