【24h】

Rough ISODATA Algorithm

机译:粗略的ISODATA算法

获取原文
获取原文并翻译 | 示例
           

摘要

Cluster analysis is a branch of data mining, which plays a vital role in bringing out hidden information in databases. Clustering algorithms help medical researchers in identifying the presence of natural subgroups in a data set. Different types of clustering algorithms are available in the literature. The most popular among them is k-means clustering. Even though k-means clustering is a popular clustering method widely used, its application requires the knowledge of the number of clusters present in the given data set. Several solutions are available in literature to overcome this limitation. The k-means clustering method creates a disjoint and exhaustive partition of the data set. However, in some situations one can come across objects that belong to more than one cluster. In this paper, a clustering algorithm capable of producing rough clusters automatically without requiring the user to give as input the number of clusters to be produced. The efficiency of the algorithm in detecting the number of clusters present in the data set has been studied with the help of some real life data sets. Further, a nonparametric statistical analysis on the results of the experimental study has been carried out in order to analyze the efficiency of the proposed algorithm in automatic detection of the number of clusters in the data set with the help of rough version of Davies-Bouldin index.
机译:集群分析是数据挖掘的一个分支,在挖掘数据库中的隐藏信息方面起着至关重要的作用。聚类算法可帮助医学研究人员识别数据集中自然亚组的存在。文献中提供了不同类型的聚类算法。其中最受欢迎的是k均值聚类。尽管k均值聚类是一种广泛使用的流行聚类方法,但其应用仍需要了解给定数据集中存在的聚类数量。文献中提供了几种解决方案来克服此限制。 k均值聚类方法创建了数据集的不连续且详尽的分区。但是,在某些情况下,可能会遇到属于多个群集的对象。在本文中,一种能够自动生成粗糙聚类的聚类算法,而无需用户提供要生成的聚类数量作为输入。借助于一些实际数据集,研究了该算法在检测数据集中存在的簇数时的效率。此外,已经对实验研究的结果进行了非参数统计分析,以便借助粗略的Davies-Bouldin指数来分析所提出算法在自动检测数据集中的簇数方面的效率。 。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号