...
首页> 外文期刊>NanoBioscience, IEEE Transactions on >Bosco: Boosting Corrections for Genome-Wide Association Studies With Imbalanced Samples
【24h】

Bosco: Boosting Corrections for Genome-Wide Association Studies With Imbalanced Samples

机译:Bosco:利用不平衡样品促进基因组范围关联研究的校正

获取原文
获取原文并翻译 | 示例
           

摘要

In genome-wide association studies (GWAS), the acquired sequential data may exhibit imbalance structure: abundant control vs. limited case samples. Such sample imbalance issue is particularly serious when investigating rare diseases or common diseases on rare populations. Conventional GWAS methods may suffer from severe statistic biases to the major group, leading to power losses in uncovering true suspicious loci. We introduce a boosting correction method termed as Bosco to deal with such imbalanced problem. Bosco is motivated by the boost learning theory in machine learning and is implemented in a coarse-to-fine learning framework: the coarse step assigns importance scores for all samples in the major group and the fine step calculates P -values by a weighted logistic regression. On simulated data sets, we demonstrate the proposed methods can dramatically improve the discovery power even on extremely imbalanced datasets, with well controlling the false positives. The Bosco is also applied to a genome-scale gastric cancer data set to conduct genome-wide analysis. Our method replicates existing reported findings (from the likelihood ratio test) with high statistical significance and shows the ability to identify new suspicious SNPs.
机译:在全基因组关联研究(GWAS)中,获得的顺序数据可能显示不平衡结构:大量对照与有限病例样品。当调查稀有疾病或稀有人群的常见疾病时,这种样本失衡问题尤其严重。常规的GWAS方法可能会对主要群体造成严重的统计偏差,导致发现真实可疑基因座时发生功率损失。我们引入一种称为Bosco的增强校正方法来解决这种不平衡问题。 Bosco受到机器学习中的增强学习理论的激励,并在从粗到精的学习框架中实现:粗步为主要组中的所有样本分配重要性得分,而精步通过加权逻辑回归计算P值。在模拟数据集上,我们证明了所提出的方法即使在极不平衡的数据集上也能显着提高发现能力,并且可以很好地控制误报。 Bosco还应用于基因组规模的胃癌数据集,以进行全基因组分析。我们的方法复制了具有较高统计学意义的现有报告发现(来自似然比检验),并显示了识别新的可疑SNP的能力。

著录项

  • 来源
    《NanoBioscience, IEEE Transactions on》 |2017年第1期|69-77|共9页
  • 作者单位

    Tsinghua National Laboratory for Information Science and Technology, Beijing Key Laboratory of Multi-dimension and Multi-scale Computational Photography, Tsinghua University, Beijing, China;

    Tsinghua National Laboratory for Information Science and Technology, Beijing Key Laboratory of Multi-dimension and Multi-scale Computational Photography, Tsinghua University, Beijing, China;

    Department of Biomedical Engineering, Boston University, Boston, MA, USA;

    Tsinghua National Laboratory for Information Science and Technology, Beijing Key Laboratory of Multi-dimension and Multi-scale Computational Photography, Tsinghua University, Beijing, China;

    Tsinghua National Laboratory for Information Science and Technology, Beijing Key Laboratory of Multi-dimension and Multi-scale Computational Photography, Tsinghua University, Beijing, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Genomics; Boosting; Bioinformatics; Diseases; Nanobioscience; Additives;

    机译:基因组学;促进;生物信息学;疾病;纳米生物科学;添加剂;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号