...
首页> 外文期刊>International Journal of Data Science and Analytics >Redundant features removal for unsupervised spectral feature selection algorithms: an empirical study based on nonparametric sparse feature graph
【24h】

Redundant features removal for unsupervised spectral feature selection algorithms: an empirical study based on nonparametric sparse feature graph

机译:无监督频谱特征选择算法的冗余特征去除:基于非参数稀疏特征图的实证研究

获取原文
获取原文并翻译 | 示例
           

摘要

For existing unsupervised spectral feature selection algorithms, the quality of the eigenvectors decides the performance. There eigenvectors are calculated from the Laplacian matrix of similarity graph which is built from samples. When applying these algorithms to high-dimensional data, we meet the very embarrassing chicken-and-egg problem: "the success of feature selection depends on the quality of indication vectors which are related to the structure of data. But the purpose of feature selection is to give more accurate data structure." To alleviate this problem, we propose a graph-based approach to reduce the dimension of data by searching and removing redundant features automatically. A sparse graph is generated at feature side and is used to learn the redundant relationship among features. We name this novel graph as sparse feature graph (SFG). To avoid the inaccurate distance information among high-dimensional vectors, the construction of SFG does not utilize the pairwise relationship among samples, which means the structure info of data is not used. Our proposed algorithm is also a nonparametric one as it does not make any assumption about the data distribution. We treat this proposed redundant feature removal algorithm as a data preprocessing approach for existing popular unsupervised spectral feature selection algorithms like multi-cluster feature selection (MCFS) which requires accurate cluster structure information based on samples. Our experimental results on benchmark datasets show that the proposed SFG and redundant feature remove algorithm can improve the performance of those unsupervised spectral feature selection algorithms consistently.
机译:对于现有的无监督频谱特征选择算法,特征向量的质量决定了性能。特征向量是根据样本建立的相似度拉普拉斯矩阵来计算的。当将这些算法应用于高维数据时,我们遇到了一个非常尴尬的“鸡与蛋”问题:“特征选择的成功取决于与数据结构相关的指示向量的质量。但是特征的目的为了减轻这个问题,我们提出了一种基于图的方法,通过自动搜索和删除冗余特征来减少数据量。稀疏图在特征侧生成,用于了解特征之间的冗余关系。我们将此新颖的图命名为稀疏特征图(SFG)。为了避免高维向量之间的距离信息不准确,SFG的构造不利用样本之间的成对关系,这意味着不使用数据的结构信息。我们提出的算法也是一种非参数算法,因为它没有对数据分布进行任何假设。我们将这种提议的冗余特征去除算法视为现有流行的无监督频谱特征选择算法(如多集群特征选择(MCFS))的数据预处理方法,该算法需要基于样本的准确集群结构信息。我们在基准数据集上的实验结果表明,所提出的SFG和冗余特征消除算法可以一致地提高那些无监督频谱特征选择算法的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号