首页> 美国卫生研究院文献>Statistical Applications in Genetics and Molecular Biology >Recursively partitioned mixture model clustering of DNA methylation data using biologically informed correlation structures
【2h】

Recursively partitioned mixture model clustering of DNA methylation data using biologically informed correlation structures

机译:使用生物学上已知的相关结构对DNA甲基化数据进行递归划分的混合物模型聚类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

DNA methylation is a well-recognized epigenetic mechanism that has been the subject of a growing body of literature typically focused on the identification and study of profiles of DNA methylation and their association with human diseases and exposures. In recent years, a number of unsupervised clustering algorithms, both parametric and non-parametric, have been proposed for clustering large-scale DNA methylation data. However, most of these approaches do not incorporate known biological relationships of measured features, and in some cases, rely on unrealistic assumptions regarding the nature of DNA methylation. Here, we propose a modified version of a recursively partitioned mixture model (RPMM) that integrates information related to the proximity of CpG loci within the genome to inform correlation structures from which subsequent clustering analysis is based. Using simulations and four methylation data sets, we demonstrate that integrating biologically informative correlation structures within RPMM resulted in improved goodness-of-fit, clustering consistency, and the ability to detect biologically meaningful clusters compared to methods which ignore such correlation. Integrating biologically-informed correlation structures to enhance modeling techniques is motivated by the rapid increase in resolution of DNA methylation microarrays and the increasing understanding of the biology of this epigenetic mechanism.
机译:DNA甲基化是一种公认​​的表观遗传机制,这已成为越来越多的文献的主题,这些文献通常集中于DNA甲基化特征的鉴定和研究及其与人类疾病和暴露的关系。近年来,已经提出了许多无监督的聚类算法,包括参数化和非参数化的聚类,用于聚类大规模DNA甲基化数据。但是,大多数这些方法没有结合已知的测量特征的生物学关系,并且在某些情况下依赖于有关DNA甲基化性质的不切实际的假设。在这里,我们提出了一种递归分区混合模型(RPMM)的改进版本,该模型整合了与基因组内CpG基因座的邻近性有关的信息,以告知相关结构,后续的聚类分析基于该结构。使用模拟和四个甲基化数据集,我们证明与忽略此类相关性的方法相比,在RPMM中整合生物学信息相关结构可提高拟合优度,聚类一致性以及检测具有生物学意义的聚类的能力。 DNA甲基化微阵列分辨率的快速提高以及对这种表观遗传机制生物学的日益了解,促使了生物学信息的相关结构的整合以增强建模技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号