首页> 外文会议>International Conference on Computational Science and Technology >Gene selection for high dimensional data using k-means clustering algorithm and statistical approach
【24h】

Gene selection for high dimensional data using k-means clustering algorithm and statistical approach

机译:使用k均值聚类算法和统计方法对高维数据进行基因选择

获取原文

摘要

Microarray technology can measure thousands of genes which are useful for biologist to study and classify the cancer cells. However, this high dimensional data consists of large number of genes to be examined in regard of small samples size. Thus, selection of relevant genes is a challenging issue in microarray data analysis and has been a central research focus. This study proposed kmeans clustering algorithm to groups the relevant genes. Several statistical techniques such as Fisher criterion, Golub signal-to-noise, Mann Whitney rank and t-test have been used in deciding the clusters are well separated from one and others. Those genes with high discriminative score will later be used to train the k-NN classifier. The experimental results showed that the proposed gene selection methods able to identify differentially expressed genes with 0.86 ROC score.
机译:微阵列技术可以测量成千上万的基因,这对于生物学家研究和分类癌细胞非常有用。但是,这种高维数据由大量的基因组成,涉及的样本量较小。因此,相关基因的选择在微阵列数据分析中是一个具有挑战性的问题,并且一直是研究的重点。本研究提出了kmeans聚类算法对相关基因进行分组。一些统计技术(例如Fisher准则,Golub信噪比,Mann Whitney秩和t检验)已用于确定群集彼此之间很好地分离。那些具有高判别分数的基因将在以后用于训练k-NN分类器。实验结果表明,所提出的基因选择方法能够识别ROC评分为0.86的差异表达基因。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号