首页> 外文会议>Database Systems for Advanced Applications >Distance Based Feature Selection for Clustering Microarray Data
【24h】

Distance Based Feature Selection for Clustering Microarray Data

机译:基于距离的聚类微阵列数据特征选择

获取原文
获取原文并翻译 | 示例

摘要

In microarray data, clustering is the fundamental task for separating genes into biologically functional groups or for classifying tissues and phenotypes. Recently, with innovative gene expression microarray data technologies, thousands of expression levels of genes (features) can be measured simultaneously in a single experiment. The large number of genes with a lot of noise causes high complexity for cluster analysis. This challenge has raised the demand for feature selection - an effective dimensionality reduction technique that removes noisy features. In this paper we propose a novel filter method for feature selection. The suggested method, called ClosestFS, is based on a distance measure. For each feature, the distance is evaluated by computing its impact on the histogram for the whole data. Our experimental results show that the quality of clustering results (evaluated by several widely used measures) of K-means algorithm using ClosestFS as the pre-processing step is significantly better than that of the pure K-means.
机译:在微阵列数据中,聚类是将基因分为生物学功能组或对组织和表型分类的基本任务。最近,借助创新的基因表达微阵列数据技术,可以在一个实验中同时测量成千上万个基因(特征)的表达水平。大量带有大量噪声的基因导致聚类分析的高度复杂性。这一挑战提出了对特征选择的需求-一种有效的降维技术,该技术可消除噪声特征。在本文中,我们提出了一种新颖的特征选择滤波方法。建议的方法称为ClosestFS,该方法基于距离度量。对于每个特征,通过计算整个数据对直方图的影响来评估距离。我们的实验结果表明,使用ClosestFS作为预处理步骤的K-means算法的聚类结果(通过几种广泛使用的评估方法)的质量明显优于纯K-means。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号