首页> 外文学位 >Weak Signal Detection with Applications in High-throughput Genomic Data Analysis
【24h】

Weak Signal Detection with Applications in High-throughput Genomic Data Analysis

机译:弱信号检测及其在高通量基因组数据分析中的应用

获取原文
获取原文并翻译 | 示例

摘要

Variable selection in high dimensional data analysis has been widely studied in many scientific areas. For example, in genome-wide association study (GWAS), the goal is to identify important variables (i.e., single nucleotide polymorphisms; SNPs) from a large number of variables that are potentially associated with a target phenotype. Among the large number of SNPs, there are usually only a small proportion of them carry valuable information which we call signals. Identifying signals can help people to find out potential causal factors, provide predictions and better understand the scientific problems. However, many signals are weakly associated with the target response. Identifying the weak signals is challenging in large dimensionality studies.;In this dissertation, we first develop a data-driven variable screening procedure, i.e., adaptive false negative control (AFNC) procedure, for weak signal detection. The AFNC procedure can efficiently eliminate unimportant variables while retain a high proportion of signals under block-structured dependence that is widely observed in genomic data analysis. The proposed AFNC procedure is applied to eQTL data analysis from International Hapmap project as well as the signal detection in human height analysis.;Based on adaptive false negative control (AFNC), we also propose a two-stage method in eQTL study. In the first stage, we give a SNP-wise screening by AFNC to eliminate the SNPs which are not associated with any gene. In the second stage, with SNPs retained by false negative control, we use multiple response regression to model the joint effect of SNPs and account for correlation among gene expression values.
机译:高维数据分析中的变量选择已在许多科学领域得到广泛研究。例如,在全基因组关联研究(GWAS)中,目标是从可能与目标表型相关的大量变量中识别出重要变量(即单核苷酸多态性; SNP)。在大量的SNP中,通常只有一小部分携带有价值的信息,我们称之为信号。识别信号可以帮助人们发现潜在的因果关系,提供预测并更好地理解科学问题。但是,许多信号与目标响应之间存在微弱的关联。在大尺寸研究中,识别弱信号具有挑战性。在本文中,我们首先开发了一种数据驱动的变量筛选程序,即自适应虚假负控制(AFNC)程序,用于弱信号检测。 AFNC程序可以有效消除不重要的变量,同时在基因组数据分析中广泛观察到的块结构依赖性下保留大量信号。所提出的AFNC程序被应用于国际Hapmap项目的eQTL数据分析以及人体高度分析中的信号检测。基于自适应假阴性控制(AFNC),我们还提出了一种在eQTL研究中的两阶段方法。在第一阶段,我们通过AFNC进行SNP筛选,以消除与任何基因均不相关的SNP。在第二阶段,在假阴性对照保留SNP的情况下,我们使用多重响应回归来模拟SNP的联合效应,并说明基因表达值之间的相关性。

著录项

  • 作者

    Zhang, Teng.;

  • 作者单位

    North Carolina State University.;

  • 授予单位 North Carolina State University.;
  • 学科 Statistics.
  • 学位 Ph.D.
  • 年度 2016
  • 页码 104 p.
  • 总页数 104
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号