首页> 外文期刊>Arabian Journal for Science and Engineering. Section A, Sciences >An Attributes Similarity-Based K-Medoids Clustering Technique in Data Mining
【24h】

An Attributes Similarity-Based K-Medoids Clustering Technique in Data Mining

机译:数据挖掘中基于属性相似度的K-Medoids聚类技术

获取原文
获取原文并翻译 | 示例
           

摘要

In recent days, mining data in the form of information and knowledge from large databases is one of the demanding and task. Finding similarity between different attributes in a synthetic dataset is an aggressive concept in data retrieval applications. For this purpose, some of the clustering techniques are proposed in the existing works such as k -means, fuzzy c -means, and fuzzy k -means. But it has some drawbacks that include high overhead, less effective results, computation complexity, high time consumption, and memory utilization. To overcome these drawbacks, a similarity-based categorical data clustering technique is proposed. Here, the similarities of inter- and intra-attributes are simultaneously calculated and it is integrated to improve the performance. The dataset loaded as input, where the preprocessing is performed to remove the noise. Once the data are noise free, the similarity between the elements is computed; then, the most relevant attributes are selected and the insignificant attributes are neglected. The support and confidence measures are estimated by applying association rule mining for resource planning. The similarity-based K -medoids clustering technique is used to cluster the attributes based on the Euclidean distance to reduce the overhead. Finally, the bee colony (BC) optimization technique is used to select the optimal features for further use. In experiments, the results of the proposed clustering system are estimated and analyzed with respect to the clustering accuracy, execution time (s), error rate, convergence time (s), and adjusted Rand index (ARI). From the results, it is observed that the proposed technique provides better results when compared to the other techniques.
机译:近年来,从大型数据库中以信息和知识的形式挖掘数据是一项艰巨的任务。在合成数据集中找到不同属性之间的相似性是数据检索应用程序中一个激进的概念。为此目的,在现有工作中提出了一些聚类技术,例如k均值,模糊c均值和模糊k均值。但是它具有一些缺点,包括高开销,较差的结果,计算复杂性,高时间消耗和内存利用率。为了克服这些缺点,提出了一种基于相似度的分类数据聚类技术。在此,属性间和属性内的相似度是同时计算的,并将其集成以提高性能。加载为输入的数据集,在其中执行预处理以消除噪声。一旦数据无噪声,就可以计算元素之间的相似度;然后,选择最相关的属性,忽略不重要的属性。通过将关联规则挖掘应用于资源规划来估计支持和信心措施。基于相似度的K-medoids聚类技术用于基于欧几里得距离对属性进行聚类以减少开销。最后,采用蜂群(BC)优化技术来选择最佳特征,以备将来使用。在实验中,针对聚类精度,执行时间(s),错误率,收敛时间(s)和调整后的兰德指数(ARI),对提出的聚类系统的结果进行了估计和分析。从结果可以看出,与其他技术相比,提出的技术提供了更好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号