基于特征空间的文本聚类

黄建宇; 周爱武; 肖云; 谭天诚

首页> 中文期刊> 《计算机技术与发展》 >基于特征空间的文本聚类

基于特征空间的文本聚类

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text clustering is a specific application of the clustering algorithm. With the development of Internet,the text clustering has got-ten an increasingly wide utilization in many fields,such as information retrieval and intelligent search engine. Text clustering algorithm in-volves text preprocessing and text clustering primarily,so some improvements on text clustering from these two aspects have been conduc-ted. The traditional text clustering adopts the VSM without considering the semantic similarity and correlation between words,which leads to low accuracy. In view of it,the text clustering method based on feature space is proposed which constructs an alternative word library through the feature space of document collection and gets the document theme according to the alternative word library,and then replaces the words in document based on the themes and its corresponding domain dictionary. However the traditional text clustering algorithm must need artificial K value. Therefore, K-means algorithm is presented based on the K value optimization. The experimental results show that the two improvements above mentioned have made text clustering more intelligent and more precise.%文本聚类是聚类算法的一种具体应用,随着互联网的发展,文本聚类应用越来越广泛,譬如在信息检索、智能搜索引擎等方面都有较为广泛的应用.文本聚类算法主要涉及文本预处理和文本聚类算法,故对文本聚类进行改进可以从这两方面入手.传统文本聚类的文本预处理采用VSM模型,该模型不考虑词与词的语义相似度和词与词的相关性,导致文本聚类精确度非常低.针对该问题,提出了基于特征空间文本聚类的方法.该方法根据文档集合的特征空间构造一个替代词库,并根据这个替代词库得到文档的主题,依据主题配合其对应的领域词典对文档词进行相应的替换.传统的文本聚类使用K-means算法,但该算法需要人工指定K值.为此,提出了基于K值优化的K-means改进算法.实验结果表明,所提出的文本聚类方法和K-means改进算法显著提高了文本聚类的智能性和精确性.

著录项

来源
《计算机技术与发展》 |2017年第9期|75-7781|共4页
作者
黄建宇; 周爱武; 肖云; 谭天诚;
展开▼
作者单位

安徽大学计算机科学与技术学院;

安徽合肥 230601;

安徽大学计算机科学与技术学院;

安徽合肥 230601;

安徽大学计算机科学与技术学院;

安徽合肥 230601;

安徽大学计算机科学与技术学院;

安徽合肥 230601;

展开▼
原文格式 PDF
正文语种 chi
中图分类算法理论;
关键词
知网; 领域词典; 主题; 义原; 聚类; K值优化;

相似文献

中文文献
外文文献
专利

1. 高维特征空间中文本聚类研究 [J] . 姜宁 ,宫秀军 ,史忠植 . 计算机工程与应用 . 2002,第010期
2. 基于阶梯型特征空间分割与局部注意力机制的行人重识别 [J] . 石跃祥 ,周玥 . 电子与信息学报 . 2022,第1期
3. 基于特征空间变换的运载火箭Pogo模型降阶方法 [J] . 谭述君 ,高强 ,赵旺 . 宇航学报 . 2021,第004期
4. 基于潜在特征空间的低秩表示算法 [J] . 周翊航 . 计算机科学与应用 . 2021,第004期
5. 基于多源特征空间的微服务可维护性评估 [J] . 晋武侠 ,钟定洪 ,张宇云 . 软件学报 . 2021,第005期
6. 基于信噪比的特征空间最小方差波束合成算法 [C] . DU Bin ,杜斌 ,FANG Si-yuan . 2018年全国声学大会 . 2018
7. 基于固定维度特征空间的鲁棒自适应滤波器研究 [A] . 熊奎 . 2020

基于特征空间的文本聚类

摘要

著录项

相似文献

相关主题

期刊订阅