【24h】

Scalable clustering with adaptive instance sampling

机译:具有自适应实例采样的可扩展聚类

获取原文

摘要

Most of the clustering algorithms are affected by the number of attributes and instances with respect to the computation time. Thus, the data mining community has made efforts to enable induction of the clustering efficient. Hence, scalability is naturally a critical issue that the data mining community faces. A method to handle this issue is to use a subset of all instances. This paper suggests an algorithm that enables to perform clustering efficiently. This is done by using nested partitions method for solving the noisy performance problems, which arises when using a subset of instances and adjusting the sample rate properly at each iteration. This Adaptive NPCLUSTER algorithm had better similarity in small dataset and had worse similarity in large dataset than NPCLUSTER, but it had shorter computation time than NPCLUSTER.
机译:相对于计算时间,大多数聚类算法受属性和实例数量的影响。因此,数据挖掘社区已尽力使归纳有效。因此,可伸缩性自然是数据挖掘社区面临的关键问题。解决此问题的方法是使用所有实例的子集。本文提出了一种能够有效执行聚​​类的算法。这是通过使用嵌套分区方法解决嘈杂的性能问题而实现的,该问题是在使用实例子集并在每次迭代中适当调整采样率时出现的。与NPCLUSTER相比,这种自适应NPCLUSTER算法在小数据集中具有更好的相似性,在大数据集中具有较差的相似性,但是其计算时间比NPCLUSTER短。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号