...
首页> 外文期刊>International Journal of Data Science and Analytics >Redundant features removal for unsupervised spectral feature selection algorithms: an empirical study based on nonparametric sparse feature graph
【24h】

Redundant features removal for unsupervised spectral feature selection algorithms: an empirical study based on nonparametric sparse feature graph

机译:无监督谱特征选择算法的冗余功能拆除:基于非参数稀疏特征图的实证研究

获取原文
获取原文并翻译 | 示例
           

摘要

For existing unsupervised spectral feature selection algorithms, the quality of the eigenvectors decides the performance. There eigenvectors are calculated from the Laplacian matrix of similarity graph which is built from samples. When applying these algorithms to high-dimensional data, we meet the very embarrassing chicken-and-egg problem: "the success of feature selection depends on the quality of indication vectors which are related to the structure of data. But the purpose of feature selection is to give more accurate data structure." To alleviate this problem, we propose a graph-based approach to reduce the dimension of data by searching and removing redundant features automatically. A sparse graph is generated at feature side and is used to learn the redundant relationship among features. We name this novel graph as sparse feature graph (SFG). To avoid the inaccurate distance information among high-dimensional vectors, the construction of SFG does not utilize the pairwise relationship among samples, which means the structure info of data is not used. Our proposed algorithm is also a nonparametric one as it does not make any assumption about the data distribution. We treat this proposed redundant feature removal algorithm as a data preprocessing approach for existing popular unsupervised spectral feature selection algorithms like multi-cluster feature selection (MCFS) which requires accurate cluster structure information based on samples. Our experimental results on benchmark datasets show that the proposed SFG and redundant feature remove algorithm can improve the performance of those unsupervised spectral feature selection algorithms consistently.
机译:对于现有的无监督谱特征选择算法,特征向量的质量决定性能。从样本构建的Laplacian矩阵计算特征向量。在将这些算法应用于高维数据时,我们符合非常尴尬的鸡肉和蛋问题:“特征选择的成功取决于与数据结构相关的指示向量。但要素选择的目的是给出更准确的数据结构。“为了缓解这个问题,我们提出了一种基于图形的方法来通过自动搜索和删除冗余功能来减少数据的维度。稀疏图是在特征侧生成的,并且用于学习特征之间的冗余关系。我们将此新颖的图表命名为稀疏功能图(SFG)。为了避免高维向量之间的不准确距离信息,SFG的构造不利用样本之间的成对关系,这意味着不使用数据的结构信息。我们所提出的算法也是一个非参数,因为它不会对数据分布进行任何假设。我们将该提出的冗余特征拆除算法视为现有流行的无监督谱特征选择算法的数据预处理方法,如多簇特征选择(MCF),这需要基于样本的准确集群结构信息。我们对基准数据集的实验结果表明,所提出的SFG和冗余特征删除算法可以始终如一地提高这些无监督谱特征选择算法的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号