首页> 外文会议>Advances in intelligent web mastering-3 >Comparison of Selected Methods for Document Clustering
【24h】

Comparison of Selected Methods for Document Clustering

机译:所选文件聚类方法的比较

获取原文
获取原文并翻译 | 示例

摘要

17 cluster analysis techniques proposed for document clustering in terms of internal and external quality measures of clustering and computing time demands are compared. These are combinations of three basic methods (direct, repeated bisection and agglomerative) and five clustering criterion functions for solution assessment (two intra-cluster, one inter-cluster, and two complex ones); all implemented in the CLUTO software package. Furthermore, in the case of the agglomerative method we also applied a single linkage and complete linkage clustering as a criterion function. Collection 20 Newsgroups, a binary vector representation of e-mail messages, was used for comparing the methods. Experiments with document clustering have proved that, from the point of view of entropy and purity, the direct method provides the best results. As regards computing time, the repeated bisection (divisive) method has been the fastest.
机译:比较了针对文档聚类提出的17种聚类分析技术,这些技术针对聚类和计算时间需求的内部和外部质量度量进行了比较。这些是三种基本方法(直接,重复对分和凝聚)和五个用于评估解决方案的聚类标准函数(两个集群内,一个集群间和两个复杂的)的组合;所有这些都在CLUTO软件包中实现。此外,在凝聚方法的情况下,我们还应用了单个链接和完整的链接聚类作为标准函数。集合20新闻组是电子邮件的二进制矢量表示形式,用于比较这些方法。文档聚类的实验证明,从熵和纯度的角度来看,直接方法可提供最佳结果。关于计算时间,重复二等分(分裂)方法是最快的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号