首页> 外文学位 >Unified sparse regression models for sequence variants association analysis.
【24h】

Unified sparse regression models for sequence variants association analysis.

机译:统一的稀疏回归模型,用于序列变异关联分析。

获取原文
获取原文并翻译 | 示例

摘要

Joint adjustment of cryptic relatedness and population structure is necessary to reduce bias in DNA sequence analysis; however, existent sparse regression methods model these two confounders separately. Incorporating prior biological information has great potential to enhance statistical power but such information is often overlooked in many existent sparse regression models. We developed a unified sparse regression (USR) to incorporate prior information and jointly adjust for cryptic relatedness, population structure and other environmental covariates. Our USR models cryptic relatedness as a random effect and population structure as fixed effect and utilize the weighted penalties to incorporate prior knowledge. As demonstrated by extensive simulations, our USR algorithm can discover more true causal variants while maintain a lower false discovery rate than do several commonly used feature selection methods. It can detect rare and common variants with almost equal efficiency.;After further investigation and assessing the oracle property of the USR method, we propose a unified test (uFineMap) for accurately localizing causal loci and a unified test (uHDSet) for identifying high-dimensional sparse associations in deep sequencing genomic data of multi-ethnic individuals. These novel tests are based on scaled sparse linear mixed regressions with Lp (0 linear mixed regressions with Lp (0<1) norm regularization. Under extensive simulated scenarios, the proposed tests appropriately controlled Type I error rate and appeared more powerful than several existing prominent methods (famSKAT and Gemma).;In addition, we incorporate the idea of Generalized Linear Mixed Models (GLMMs) to further extend the USR model for non-Gaussian phenotype data. The generalized USR method include structure regularization (i.e., group L1 norm and sparse group L1 norm) as well. The algorithm is applicable to a wide range of genetic data association analyses, which can incorporate the effect of a group of SNPs or genes in an integrative way. It can be used as variable screening method to reduce the number of variables, under a wide range of high-dimensional data with complex group structure.
机译:共同调整隐秘的相关性和种群结构对于减少DNA序列分析中的偏差是必要的。但是,现有的稀疏回归方法分别为这两个混杂因素建模。合并先前的生物学信息具有增强统计能力的巨大潜力,但是在许多现有的稀疏回归模型中,这些信息通常被忽略。我们开发了一个统一的稀疏回归(USR)来合并先前的信息,并针对隐秘的相关性,总体结构和其他环境协变量共同进行调整。我们的USR将隐喻相关性建模为随机效应,将种群结构建模为固定效应,并利用加权罚分合并先验知识。如广泛的仿真所示,与几种常用的特征选择方法相比,我们的USR算法可以发现更多真实的因果变量,同时保持较低的错误发现率。它可以检测几乎相同效率的稀有和常见变体。;在进一步研究和评估USR方法的预言性之后,我们提出了用于精确定位因果基因座的统一测试(uFineMap)和用于识别高致病基因的统一测试(uHDSet)。多种族个体的深度测序基因组数据中的三维稀疏关联。这些新颖的测试基于具有Lp的缩放稀疏线性混合回归(具有Lp(0 <1)范数正则化的0线性混合回归)。在广泛的模拟情况下,拟议的测试适当地控制了I型错误率,并且比几种方法更有效此外,我们结合了广义线性混合模型(GLMM)的思想,以进一步扩展非高斯表型数据的USR模型。广义USR方法包括结构正则化(即L1组)规范和稀疏组L1规范)。该算法适用于广泛的遗传数据关联分析,可以综合整合一组SNP或基因的作用,可用作变量筛选方法在具有复杂组结构的各种高维数据下,减少变量的数量。

著录项

  • 作者

    Cao, Shaolong.;

  • 作者单位

    Tulane University School of Science and Engineering.;

  • 授予单位 Tulane University School of Science and Engineering.;
  • 学科 Biostatistics.;Genetics.;Bioinformatics.
  • 学位 Ph.D.
  • 年度 2016
  • 页码 112 p.
  • 总页数 112
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 物理化学(理论化学)、化学物理学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号