...
首页> 外文期刊>Journal of Bioinformatics and Computational Biology >DETECTING COPY NUMBER VARIATIONSFROM ARRAY CGH DATA BASED ON A CONDITIONALRANDOM FIELD MODEL
【24h】

DETECTING COPY NUMBER VARIATIONSFROM ARRAY CGH DATA BASED ON A CONDITIONALRANDOM FIELD MODEL

机译:基于条件随机场模型检测阵列CGH数据的复印件数量变化

获取原文
获取原文并翻译 | 示例
           

摘要

Array comparative genomic hybridization (aCGH) allows identification of copy numberalterations across genomes. The key computational challenge in analyzing copy num-ber variations (CNVs) using aCGH data or other similar data generated by a varietyof array technologies is the detection of segment boundaries of copy number changesand inference of the copy number state for each segment. We have developed a novelstatistical model based on the framework of conditional random fields (CRFs) that caneffectively combine data smoothing, segmentation and copy number state decoding intoone unified framework. Our approach (termed CRF-CNV) provides great flexibilities indefining meaningful feature functions. Therefore, it can effectively integrate local spatialinformation of arbitrary sizes into the model. For model parameter estimations, we haveadopted the conjugate gradient (CG) method for likelihood optimization and developedefficient forward/backward algorithms within the CG framework. The method is evalu-ated using real data with known copy numbers as well as simulated data with realisticassumptions, and compared with two popular publicly available programs. Experimen-tal results have demonstrated that CRF-CNV outperforms a Bayesian Hidden MarkovModel-based approach on both datasets in terms of copy number assignments. Com-paring to a non-parametric approach, CRF-CNV has achieved much greater precisionwhile maintaining the same level of recall on the real data, and their performance onthe simulated data is comparable.
机译:阵列比较基因组杂交(aCGH)允许鉴定整个基因组的拷贝数改变。使用aCGH数据或由各种阵列技术生成的其他类似数据来分析拷贝数变异(CNV)的关键计算挑战是,检测拷贝数变化的片段边界以及推断每个片段的拷贝数状态。我们基于条件随机字段(CRF)框架开发了一种新颖的统计模型,该模型可以有效地将数据平滑,分段和拷贝数状态解码组合到一个统一的框架中。我们的方法(称为CRF-CNV)提供了很大的灵活性,可以定义有意义的功能。因此,它可以有效地将任意大小的局部空间信息集成到模型中。对于模型参数估计,我们采用了共轭梯度(CG)方法进行似然优化,并在CG框架内开发了高效的前向/后向算法。该方法是使用具有已知拷贝数的真实数据以及具有现实假设的模拟数据进行评估的,并与两个流行的公开程序进行比较。实验结果表明,在拷贝数分配方面,CRF-CNV在两个数据集上均优于基于贝叶斯隐马尔可夫模型的方法。与非参数方法相比,CRF-CNV在保持真实数据的相同召回水平的同时实现了更高的精度,并且它们在模拟数据上的性能相当。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号