首页> 外文期刊>Journal of Bioinformatics and Computational Biology >DETECTING COPY NUMBER VARIATIONS FROM ARRAY CGH DATA BASED ON A CONDITIONAL RANDOM FIELD MODEL
【24h】

DETECTING COPY NUMBER VARIATIONS FROM ARRAY CGH DATA BASED ON A CONDITIONAL RANDOM FIELD MODEL

机译:基于条件随机场模型从阵列CGH数据中检测出副本数量变化

获取原文
获取原文并翻译 | 示例
           

摘要

Array comparative genomic hybridization (aCGH) allows identification of copy numbernalterations across genomes. The key computational challenge in analyzing copy numbernvariations (CNVs) using aCGH data or other similar data generated by a varietynof array technologies is the detection of segment boundaries of copy number changesnand inference of the copy number state for each segment. We have developed a novelnstatistical model based on the framework of conditional random fields (CRFs) that canneffectively combine data smoothing, segmentation and copy number state decoding intonone unified framework. Our approach (termed CRF-CNV) provides great flexibilities inndefining meaningful feature functions. Therefore, it can effectively integrate local spatialninformation of arbitrary sizes into the model. For model parameter estimations, we havenadopted the conjugate gradient (CG) method for likelihood optimization and developednefficient forward/backward algorithms within the CG framework. The method is evaluatednusing real data with known copy numbers as well as simulated data with realisticnassumptions, and compared with two popular publicly available programs. Experimentalnresults have demonstrated that CRF-CNV outperforms a Bayesian Hidden MarkovnModel-based approach on both datasets in terms of copy number assignments. Comparingnto a non-parametric approach, CRF-CNV has achieved much greater precisionnwhile maintaining the same level of recall on the real data, and their performance onnthe simulated data is comparable
机译:阵列比较基因组杂交(aCGH)可以识别整个基因组的拷贝数突变。使用aCGH数据或由多种阵列技术生成的其他类似数据来分析拷贝数变异(CNV)的关键计算难题是,检测拷贝数变化的片段边界以及推断每个片段的拷贝数状态。我们已经基于条件随机字段(CRF)框架开发了一个新颖的统计模型,该模型无法有效地结合数据平滑,分段和拷贝数状态解码intonone统一框架。我们的方法(称为CRF-CNV)提供了极大的灵活性,可以定义有意义的特征函数。因此,它可以有效地将任意大小的局部空间信息集成到模型中。对于模型参数估计,我们未采用共轭梯度(CG)方法进行似然优化,并在CG框架内开发了高效的前向/后向算法。该方法是使用已知副本数的真实数据以及具有现实假设的模拟数据进行评估的,并与两个流行的公开程序进行比较。实验结果表明,在拷贝数分配方面,CRF-CNV在两个数据集上均优于基于贝叶斯隐马尔可夫模型的方法。与非参数方法相比,CRF-CNV在保持真实数据的相同召回水平的同时实现了更高的精度,并且它们在模拟数据上的性能相当

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号