首页> 外文期刊>Journal of Bioinformatics and Computational Biology >NOISE-ROBUST SOFT CLUSTERING OF GENE EXPRESSION TIME-COURSE DATA
【24h】

NOISE-ROBUST SOFT CLUSTERING OF GENE EXPRESSION TIME-COURSE DATA

机译:基因表达时间数据的鲁棒性软聚类

获取原文
获取原文并翻译 | 示例
           

摘要

Clustering is an important tool in microarray data analysis. This unsupervised learning technique is commonly used to reveal structures hidden in large gene expression data sets. The vast majority of clustering algorithms applied so far produce hard partitions of the data, i.e. each gene is assigned exactly to one cluster. Hard clustering is favourable if clusters are well separated. However, this is generally not the case for microarray time-course data, where gene clusters frequently overlap. Additionally, hard clustering algorithms are often highly sensitive to noise. To overcome the limitations of hard clustering, we applied soft clustering which offers several advantages for researchers. First, it generates accessible internal cluster structures, i.e. it indicates how well corresponding clusters represent genes. This can be used for the more targeted search for regulatory elements. Second, the overall relation between clusters, and thus a global clustering structure, can be defined. Additionally, soft clustering is more noise robust and a priori pre-filtering of genes can be avoided. This prevents the exclusion of biologically relevant genes from the data analysis. Soft clustering was implemented here using the fuzzy e-means algorithm, Procedures to find optimal clustering parameters were developed. A software package for soft clustering has been developed based on the open-source statistical language R. The package called Mfuzz is freely available.
机译:聚类是微阵列数据分析中的重要工具。这种无监督的学习技术通常用于揭示隐藏在大型基因表达数据集中的结构。迄今为止应用的绝大多数聚类算法都会对数据进行硬分区,即每个基因都精确分配给一个聚类。如果群集分离良好,则硬群集是有利的。但是,对于基因阵列经常重叠的微阵列时程数据,通常不是这种情况。此外,硬聚类算法通常对噪声非常敏感。为了克服硬聚类的局限性,我们应用了软聚类,这为研究人员提供了许多优势。首先,它产生可访问的内部簇结构,即它指示相应簇代表基因的程度。这可以用于更有针对性的监管元素搜索。第二,可以定义群集之间的整体关系,从而定义全局群集结构。另外,软聚类具有更强的抗噪性,可以避免对基因进行先验预过滤。这样可以防止从数据分析中排除生物学相关基因。这里使用模糊e-means算法实现了软聚类,并开发了寻找最佳聚类参数的程序。已经基于开源统计语言R开发了用于软集群的软件包。名为Mfuzz的软件包可免费获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号