...
首页> 外文期刊>The international arab journal of information technology >Preceding Document Clustering by Graph Mining Based Maximal Frequent Termsets Preservation
【24h】

Preceding Document Clustering by Graph Mining Based Maximal Frequent Termsets Preservation

机译:通过基于图挖掘的最大频繁项保留来进行文档聚类

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents an approach to cluster documents. It introduces a novel graph mining based algorithm to find frequent termsets present in a document set. The document set is initially mapped onto a bipartite graph. Based on the results of our algorithm, the document set is modified to reduce its dimensionality. Then, Bisecting K-means algorithm is executed over the modified document set to obtain a set of very meaningful clusters. It has been shown that the proposed approach, Clustering preceded by Graph Mining based Maximal Frequent Termsets Preservation (CGFTP), produces better quality clusters than produced by some classical document clustering algorithm(s). It has also been shown that the produced clusters are easily interpretable. The quality of clusters has been measured in terms of their F-measure.
机译:本文提出了一种对文档进行聚类的方法。它引入了一种新颖的基于图挖掘的算法来查找文档集中存在的频繁术语集。首先将文档集映射到二部图。根据我们算法的结果,对文档集进行了修改以降低其维数。然后,对修改后的文档集执行二等分K均值算法,以获得一组非常有意义的聚类。结果表明,与基于某些经典文档聚类算法产生的聚类相比,所提出的方法聚类在基于图挖掘的最大频繁项保留(CGFTP)之前产生了更好的质量聚类。还显示出产生的簇易于解释。集群的质量已根据其F度量进行了度量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号