首页> 外文期刊>International journal of data mining, modelling and management >Hybrid feature selection methods for high-dimensional multi-class datasets
【24h】

Hybrid feature selection methods for high-dimensional multi-class datasets

机译:高维多类数据集的混合特征选择方法

获取原文
获取原文并翻译 | 示例
           

摘要

Hybrid methods are very important for feature selection in case of the classification of high-dimensional datasets. In this paper, we proposed two hybrid methods which are the combination of filter-based feature selection, genetic algorithm, and sequential random search methods. The first proposed method is hybridisation of information gain and genetic algorithm. In this, first, the features are ranked based on the information gain and then a user defined features are selected from the ranked features. Genetic algorithm with these selected features is applied for the selection of optimal feature subset. It is applied for feature selection with two types of fitness functions which are single objective and multi-objective in nature. The second feature selection model is the hybridisation of information gain and sequential random K-nearest neighbour (SRKNN). In this method, again information gain is used to rank the features and a user defined top ranked number of features are selected. A set of binary population (having all feature selected by users) are generated and on each population sequential search method is applied for maximising the classification accuracy. These methods are applied to 21 high-dimensional multi-class datasets. Obtained results show that on some datasets first method's performance is good and on some datasets second method's performance is good. The results obtained by proposed methods are compared with results registered for other methods.
机译:在对高维数据集进行分类的情况下,混合方法对于特征选择非常重要。在本文中,我们提出了两种混合方法,它们是基于滤波器的特征选择,遗传算法和顺序随机搜索方法的组合。首先提出的方法是信息增益与遗传算法的混合。在此,首先,基于信息增益对特征进行排名,然后从排名的特征中选择用户定义的特征。将具有这些选定特征的遗传算法应用于最佳特征子集的选择。它用于具有两种适应度函数的特征选择,这两种适应度函数本质上是单目标和多目标。第二个特征选择模型是信息增益与顺序随机K近邻(SRKNN)的混合。在这种方法中,再次使用信息增益对特征进行排名,并选择用户定义的排名最高的特征数量。生成一组二进制种群(具有用户选择的所有特征),并在每种种群上应用顺序搜索方法以最大化分类精度。这些方法适用于21个高维多维类数据集。所得结果表明,在某些数据集上,第一种方法的性能良好,而在某些数据集上,第二种方法的性能良好。将通过提议的方法获得的结果与为其他方法注册的结果进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号