...
首页> 外文期刊>Journal of Bioinformatics and Computational Biology >SEMIPARAMETRIC CLUSTERING METHOD FOR MICROARRAY DATA ANALYSIS
【24h】

SEMIPARAMETRIC CLUSTERING METHOD FOR MICROARRAY DATA ANALYSIS

机译:用于微阵列数据分析的半参数聚类方法

获取原文
获取原文并翻译 | 示例
           

摘要

Clustering is a major tool for microarray gene expression data analysis. The existing clustering methods fall mainly into two categories: parametric and nonparametric. The parametric methods generally assume a mixture of parametric subdistributions. When the mixture distribution approximately fits the true data generating mechanism, the parametric methods perform well, but not so when there is nonnegligible deviation between them. On the other hand, the nonparametric methods, which usually do not make distributional assumptions, are robust but pay the price for efficiency loss. In an attempt to utilize the known mixture form to increase efficiency, and to free assumptions about the unknown subdistributions to enhance robustness, we propose a semiparametric method for clustering. The proposed approach possesses the form of parametric mixture, with no assumptions to the subdistributions. The subdistributions are estimated nonparametrically, with constraints just being imposed on the modes. An expectation-maximization (EM) algorithm along with a classification step is invoked to cluster the data, and a modified Bayesian information criterion (BIC) is employed to guide the determination of the optimal number of clusters. Simulation studies are conducted to assess the performance and the robustness of the proposed method. The results show that the proposed method yields reasonable partition of the data. As an illustration, the proposed method is applied to a real microarray data set to cluster genes.
机译:聚类是微阵列基因表达数据分析的主要工具。现有的聚类方法主要分为两类:参数化和非参数化。参数方法通常假定参数子分布的混合。当混合分布大致适合真实的数据生成机制时,参数方法的效果很好,但是当它们之间的偏差不可忽略时,则效果不佳。另一方面,通常不做分布假设的非参数方法很健壮,但却为效率损失付出了代价。为了尝试利用已知的混合形式来提高效率,并释放关于未知子分布的假设以增强鲁棒性,我们提出了一种用于聚类的半参数方法。所提出的方法具有参数混合的形式,没有对子分布的假设。子分布是非参数估计的,仅对模式施加了约束。调用期望最大化(EM)算法以及分类步骤来对数据进行聚类,并采用改进的贝叶斯信息标准(BIC)指导确定最佳聚类数。进行仿真研究以评估所提出方法的性能和鲁棒性。结果表明,所提出的方法对数据进行了合理的划分。作为说明,将所提出的方法应用于对基因进行聚类的实际微阵列数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号