Text clustering based on VSM ignored the acquaintance of drift problems caused by sparse text key words and semantic information between key words and the relationships between the dimensions , so text similarity computation is not accurate .In this article , TF-IDF on similarity calculation method has been improved , and it made a new clustering method .It uses distributed estimation algorithm and tabu search algorithm for clustering .Fusion has the advantages of fast convergence speed of the EDA and jump out local search of tabu search algorithms .First it preprocessed the text , and then used EDA and tabu search algorithm for clustering , in this way ,it can cluster fastly and can prevent the cluster converge to a local optimum .The test results show that this algorithm works efficiently .%基于VSM的文本聚类忽略了文本关键词稀疏带来的相似度漂移问题和关键词之间的语义信息和各维度之间的关系,致使文本的相似度计算不精确,文中对相似度计算方法TF-IDF进行了改进,并提出一种新的聚类方法,利用分布式估计算法和禁忌搜索算法进行聚类,融合分布式估计算法的收敛速度快和禁忌搜索算法能跳出局部搜索的优点,首先对文本进行预处理,然后用分布式估计算法和禁忌搜索算法聚类,既能快速聚类又能防止聚类收敛到局部最优。测试结果表明这种算法行之有效。
展开▼