...
首页> 外文期刊>BMC Genomics >Collective judgment predicts disease-associated single nucleotide variants
【24h】

Collective judgment predicts disease-associated single nucleotide variants

机译:集体判断预测与疾病相关的单核苷酸变异

获取原文
   

获取外文期刊封面封底 >>

       

摘要

BackgroundIn recent years the number of human genetic variants deposited into the publicly available databases has been increasing exponentially. The latest version of dbSNP, for example, contains ~50 million validated Single Nucleotide Variants (SNVs). SNVs make up most of human variation and are often the primary causes of disease. The non-synonymous SNVs (nsSNVs) result in single amino acid substitutions and may affect protein function, often causing disease. Although several methods for the detection of nsSNV effects have already been developed, the consistent increase in annotated data is offering the opportunity to improve prediction accuracy.ResultsHere we present a new approach for the detection of disease-associated nsSNVs (Meta-SNP) that integrates four existing methods: PANTHER, PhD-SNP, SIFT and SNAP. We first tested the accuracy of each method using a dataset of 35,766 disease-annotated mutations from 8,667 proteins extracted from the SwissVar database. The four methods reached overall accuracies of 64%-76% with a Matthew's correlation coefficient (MCC) of 0.38-0.53. We then used the outputs of these methods to develop a machine learning based approach that discriminates between disease-associated and polymorphic variants (Meta-SNP). In testing, the combined method reached 79% overall accuracy and 0.59 MCC, ~3% higher accuracy and ~0.05 higher correlation with respect to the best-performing method. Moreover, for the hardest-to-define subset of nsSNVs, i.e. variants for which half of the predictors disagreed with the other half, Meta-SNP attained 8% higher accuracy than the best predictor.ConclusionsHere we find that the Meta-SNP algorithm achieves better performance than the best single predictor. This result suggests that the methods used for the prediction of variant-disease associations are orthogonal, encoding different biologically relevant relationships. Careful combination of predictions from various resources is therefore a good strategy for the selection of high reliability predictions. Indeed, for the subset of nsSNVs where all predictors were in agreement (46% of all nsSNVs in the set), our method reached 87% overall accuracy and 0.73 MCC.
机译:背景技术近年来,存放在公开数据库中的人类遗传变异的数量呈指数增长。例如,最新版本的dbSNP包含约5000万个经过验证的单核苷酸变体(SNV)。 SNV构成了人类变异的大部分,通常是疾病的主要原因。非同义SNV(nsSNV)导致单个氨基酸取代,并可能影响蛋白质功能,从而经常导致疾病。尽管已经开发出了多种检测nsSNV效应的方法,但注释数据的不断增加为改善预测准确性提供了机会。结果在此,我们提出了一种新的方法来检测与疾病相关的nsSNV(Meta-SNP),该方法整合了现有四种方法:PANTHER,PhD-SNP,SIFT和SNAP。我们首先使用从SwissVar数据库提取的8,667种蛋白质中的35,766种疾病注释突变的数据集测试了每种方法的准确性。四种方法的总准确度达到64%-76%,马修相关系数(MCC)为0.38-0.53。然后,我们使用这些方法的输出来开发一种基于机器学习的方法,该方法可区分疾病相关变异和多态变异(Meta-SNP)。在测试中,相对于最佳方法,组合方法达到了79%的整体准确度和0.59 MCC,准确度提高了约3%,相关性也提高了约0.05。此外,对于nsSNV最难定义的子集,即其中一半预测变量与另一半变量不一致的变体,Meta-SNP的准确度比最佳预测变量高8%。结论在这里,我们发现Meta-SNP算法可以实现比最佳的单一预测变量更好的性能。该结果表明,用于预测变异疾病关联的方法是正交的,编码不同的生物学相关关系。因此,精心组合来自各种资源的预测是选择高可靠性预测的好策略。确实,对于所有预测变量都一致的nsSNV子集(该集合中所有nsSNV的46%),我们的方法达到了87%的整体准确度和0.73 MCC。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号