首页> 外文期刊>Intelligent data analysis >Evaluation of data analytics based clustering algorithms for knowledge mining in a student engagement data
【24h】

Evaluation of data analytics based clustering algorithms for knowledge mining in a student engagement data

机译:基于数据分析的基于数据分析的聚类算法评估学生参与数据中的知识挖掘

获取原文
获取原文并翻译 | 示例
           

摘要

The application of algorithms based on data analytics for the task of knowledge mining in a student dataset is an important strategy for improving learning outcomes, student success and supporting strategic decision making in higher educational institutions of learning. However, the widely used data analytics based clustering algorithms are highly data dependent, making it pertinent to find the most effective algorithm for knowledge mining in a dataset associated with student engagement. In this study, performances of five famous clustering algorithms are evaluated for this purpose. The k-means algorithm was benchmarked with 22 distance functions based on the Silhouette index, Dunn's index and partition entropy internal validity metrics. The hierarchical clustering algorithm was benchmarked with the Cophenetic correlation coefficient computed for different combinations of distance and linkage functions. The Fuzzy c-means algorithm was benchmarked with the partition entropy, partition coefficient, Silhouette index and modified partition coefficient. The k-nearest neighbor algorithm was applied to determine the optimum epsilon value for the density-based spatial clustering of applications with noise. The default parameter settings were accepted for the expectation-maximization algorithm. The overall ranking of the clustering algorithms was based on cluster potentiality using the median deviation statistics. The results of the evaluation show the well-known k-means algorithm to have the highest cluster potentiality, demonstrating its effectiveness for the task of knowledge mining in a student engagement dataset.
机译:基于数据分析的算法在学生数据集中的知识挖掘任务的应用是提高学习成果,学生成功和支持高等教育学习机构的战略决策的重要策略。然而,广泛使用的基于数据分析的聚类算法是高度数据所依赖的,使得它有关在与学生参与相关的数据集中找到最有效的知识挖掘算法。在这项研究中,为此目的评估了五种着名聚类算法的性能。基于轮廓索引,DUNN的索引和分区熵内部有效度量,K-means算法与22个距离功能进行了基准测试。分层聚类算法与计算距离和连杆功能的不同组合计算的CopEnenetic相关系数基准测试。模糊C均值算法与分区熵,分区系​​数,剪影索引和修改分区系数基准测试。应用K最近邻算法以确定具有噪声的基于密度的空间聚类的最佳ePsilon值。默认参数设置被接受了预期最大化算法。群集算法的整体排名基于使用中值偏差统计的集群潜力。评估结果显示了众所周知的K-mean算法,以具有最高的集群潜力,展示了其在学生参与数据集中的知识挖掘任务的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号