...
首页> 外文期刊>BMC Bioinformatics >Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions
【24h】

Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions

机译:DNA甲基化阵列数据的基于模型的聚类:针对β分布混合出现的高维数据的递归划分算法

获取原文
           

摘要

Background Epigenetics is the study of heritable changes in gene function that cannot be explained by changes in DNA sequence. One of the most commonly studied epigenetic alterations is cytosine methylation, which is a well recognized mechanism of epigenetic gene silencing and often occurs at tumor suppressor gene loci in human cancer. Arrays are now being used to study DNA methylation at a large number of loci; for example, the Illumina GoldenGate platform assesses DNA methylation at 1505 loci associated with over 800 cancer-related genes. Model-based cluster analysis is often used to identify DNA methylation subgroups in data, but it is unclear how to cluster DNA methylation data from arrays in a scalable and reliable manner. Results We propose a novel model-based recursive-partitioning algorithm to navigate clusters in a beta mixture model. We present simulations that show that the method is more reliable than competing nonparametric clustering approaches, and is at least as reliable as conventional mixture model methods. We also show that our proposed method is more computationally efficient than conventional mixture model approaches. We demonstrate our method on the normal tissue samples and show that the clusters are associated with tissue type as well as age. Conclusion Our proposed recursively-partitioned mixture model is an effective and computationally efficient method for clustering DNA methylation data.
机译:背景技术表观遗传学是对基因功能可遗传变化的研究,无法用DNA序列的变化来解释。最常研究的表观遗传学改变之一是胞嘧啶甲基化,它是表观遗传基因沉默的公认机制,通常发生在人类癌症的肿瘤抑制基因位点。阵列现在被用于研究大量基因座的DNA甲基化。例如,Illumina GoldenGate平台评估了与800多个与癌症相关的基因相关的1505个基因座处的DNA甲基化。基于模型的聚类分析通常用于识别数据中的DNA甲基化亚组,但目前尚不清楚如何以可扩展且可靠的方式对来自阵列的DNA甲基化数据进行聚类。结果我们提出了一种新颖的基于模型的递归分区算法,以在beta混合模型中导航聚类。我们提供的仿真结果表明,该方法比竞争性非参数聚类方法更可靠,并且至少与常规混合模型方法一样可靠。我们还表明,我们提出的方法比常规的混合模型方法具有更高的计算效率。我们在正常组织样本上证明了我们的方法,并表明该簇与组织类型以及年龄有关。结论我们提出的递归分区混合模型是一种有效的和计算有效的聚类DNA甲基化数据的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号