首页> 中文期刊> 《计算机应用研究》 >基于最近最远邻和互信息的特征选择方法

基于最近最远邻和互信息的特征选择方法

         

摘要

随着数据量的增加,特征选择已经成为机器学习和数据挖掘领域的研究热点,提出一种基于最近最远邻的特征选择算法.一个数据点与其最近的邻点属于同一集群,与最远的邻点属于不同的集群,通过计算最近最远邻的特征距离可以得到一种判断特征重要性的指标;在此基础上运用互信息方法去除了特征之间的冗余;同时引入了Gradient boosting方法进行模型参数调优,提高了分类准确性.在UCI数据集上进行分类预测,结果表明该算法能够找到较优的特征子集,分类准确性得到一定提升.%As to increase the amount of data,feature selection has become a hotspot in the field of machine learning and data mining.This paper proposed a nearest neighbors and farthest neighbors feature selection algorithm(NFFS).The nearest neighboring points of a data point belonged to the same cluster,and the furthest points belonged to a different cluster.Through calculating distances of the nearest cluster and the farthest cluster,it could get an indicator of judging characteristic importance.On the basis,it used the mutual information criterion to get rid of the redundancy between the features.At the same time,it introduced the Gradient boosting method to the tuning parameters of model.This method could improve the classification accuracy.By categorical forecasting on the UCI data sets,the results show that the algorithm can find the optimal feature subset and improve the classification accuracy.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号