...
首页> 外文期刊>Human Heredity >Multiple subsampling of dense SNP data localizes disease genes with increased precision.
【24h】

Multiple subsampling of dense SNP data localizes disease genes with increased precision.

机译:密集SNP数据的多个子采样以更高的精度定位疾病基因。

获取原文
获取原文并翻译 | 示例
           

摘要

BACKGROUND/AIMS: Current linkage studies detect and localize trait loci using genotypes sampled at hundreds of thousands of single nucleotide polymorphisms (SNPs). Such data should provide precise estimates of trait location once linkage has been established. However, correlations between nearby SNPs can distort the information about trait location. Traditionally, when faced with this dilemma, three approaches have been used: (1) ignore the correlation; (2) approximate the correlation; or, (3) analyze a single, approximately uncorrelated subset of the original dense data. METHODS: Here, we examine and test a simple and efficient estimator of trait location that averages location estimates across random subsamples of the original dense data. Based on pairwise estimates of correlation, we ensure that the SNPs within each subsample are approximately uncorrelated. In addition, we use the nonparametric bootstrap procedure to compute narrow, high-resolution candidate gene regions (i.e. confidence intervals for the true trait location). RESULTS: Using simulated data, we show that the three existing approaches to dense SNP linkage analysis (described above) can yield biased and/or inefficient estimation depending on the underlying correlation structure. With respect to mean squared error, our estimator outperforms the third approach, and is as good as, but usually better than the first and second approaches. Relative to the third approach, our estimator led to a 47.5% reduction in the candidate gene region length based on the analysis of 15 hypertension families genotyped at approximately 500,000 SNPs. CONCLUSION: The method we developed will be an important tool for constructing high-resolution candidate gene regions that could ultimately aid in targeting regions for sequencing projects.
机译:背景/目的:当前的连锁研究使用数十万个单核苷酸多态性(SNP)采样的基因型来检测和定位性状基因座。建立联系后,此类数据应提供性状位置的精确估计。但是,附近SNP之间的相关性可能会使有关性状位置的信息失真。传统上,面对这种困境时,使用了三种方法:(1)忽略相关性; (2)近似相关;或(3)分析原始密集数据的一个近似不相关的子集。方法:在这里,我们检查并测试了一个简单有效的性状位置估计器,该估计器可以对原始密集数据的随机子样本中的位置估计值进行平均。基于相关性的成对估计,我们确保每个子样本中的SNP近似不相关。此外,我们使用非参数引导程序来计算狭窄的高分辨率候选基因区域(即,真实特征位置的置信区间)。结果:使用模拟的数据,我们表明密集的SNP连锁分析(如上所述)的三种现有方法可以产生有偏差的和/或低效的估计,具体取决于底层的相关结构。关于均方误差,我们的估计量优于第三种方法,虽然与第一种和第二种方法一样好,但通常更好。相对于第三种方法,基于对大约500,000个SNP基因型的15个高血压家族的分析,我们的估计量导致候选基因区域长度减少了47.5%。结论:我们开发的方法将是构建高分辨率候选基因区域的重要工具,该区域可能最终有助于靶向测序项目的区域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号