...
首页> 外文期刊>International Journal of Engineering Research and Applications >A High Dimensional Clustering Scheme for Data Classification
【24h】

A High Dimensional Clustering Scheme for Data Classification

机译:数据分类的高维聚类方案

获取原文
           

摘要

The data mining is the knowledge extraction or finding the hidden patterns from large data these data may be in different form as well from different resources. The data mining systems can be used in various research domains like health, share market analysis, super market, weather forecasting and many other domains. Data mining systems use the computer oriented algorithms. These algorithms can be categorized as supervised and unsupervised respectively. The classification or prediction algorithms belong to supervised category and clustering algorithms are the type of unsupervised. The clustering is an approach to group the similar data objects to the same cluster and the main aspect of clustering is that the distance between data objects in same cluster should be as minimum as and distance of objects in inter cluster should be high. k-means is one of the most common clustering algorithm. K-means is very easy to use and efficient but has also some weakness because of random or inappropriate selection of initial centroids so need to improve k-means. The proposed work is an attempt to improve k means by using genetic algorithm for selection of initial cluster centroid.
机译:数据挖掘是从大数据中提取知识或发现隐藏模式,这些数据可能来自不同资源,也可能来自不同资源。数据挖掘系统可以用于各种研究领域,例如健康,股票市场分析,超级市场,天气预报和许多其他领域。数据挖掘系统使用面向计算机的算法。这些算法可以分别分类为有监督的和无监督的。分类或预测算法属于有监督的类别,而聚类算法是无监督的类型。聚类是将相似数据对象分组到同一聚类的一种方法,聚类的主要方面是,同一聚类中的数据对象之间的距离应尽可能小,而相互聚类中的对象距离应尽可能大。 k均值是最常见的聚类算法之一。 K均值非常易于使用且高效,但也存在一些缺点,因为初始质心的随机选择或不合适选择,因此需要改进k均值。提出的工作是通过使用遗传算法选择初始聚类质心来改进k均值的尝试。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号