首页> 外文学位 >Estimation and hypothesis testing with additive kernel machines for high-dimensional data.
【24h】

Estimation and hypothesis testing with additive kernel machines for high-dimensional data.

机译:使用附加内核机器进行高维数据的估计和假设检验。

获取原文
获取原文并翻译 | 示例

摘要

Advances in high throughput biotechnology have culminated in the development of large scale, population based studies for identifying genomic features (e.g. genes, SNPs, CpGs, etc.) associated with complex diseases and traits. Understanding an individual's genetic disposition for particular traits and diseases can provide information toward the development of individualized risk profiles and treatment regimes and simultaneously provides clues as to the biological mechanisms underlying complex traits. However, the high-dimensionality of the feature space, the limited availability of samples, and our incomplete understanding of how features biologically influence various diseases impose a grand challenge for statisticians. To mitigate some of these challenges, we propose several new methods. First, we develop the additive least square kernel machine (ALSKM) approach for nonparametrically modeling and testing the cumulative effect of a group of features (such as multiple biologically related CpGs) while nonparametrically adjusting for complex, nonlinear covariates. Our proposed methods model both the genomic features and the complex covariates using the kernel machine framework. Second, building on the ALSKM, we develop a novel approach for testing for interactions between two different groups of (biologically related) features. Specifically, we develop a multi-marker test which can test for epistasis, or gene-gene interactions, between two different groups of genomic features. Finally, we again use on the machinery developed under Topics 1 and 2 to develop an approach for testing the association between rare variants and a phenotype in the presence of common variants while accommodating potential interactions between the common and rare variants. By focusing on multi-feature testing, these approaches reduce the dimensionality of the data. Using the kernel machine framework allows for flexible, possibly nonparametric, analysis which is important given our incomplete understanding of how features influence various traits and diseases.
机译:高通量生物技术的进步最终导致了大规模,以人群为基础的研究的发展,以鉴定与复杂疾病和性状有关的基因组特征(例如基因,SNP,CpG等)。了解个体对特定特征和疾病的遗传倾向可以提供有关个体化风险概况和治疗方案发展的信息,同时可以提供有关复杂特征的生物学机制的线索。然而,特征空间的高维性,有限的样本可用性以及我们对特征在生物学上如何影响各种疾病的不完全了解,给统计学家带来了巨大挑战。为了缓解其中一些挑战,我们提出了几种新方法。首先,我们开发了加法最小二乘核机(ALSKM)方法,用于非参数建模和测试一组特征(例如多个生物学相关的CpG)的累积效果,同时针对复杂的非线性协变量进行非参数调整。我们提出的方法使用内核机器框架对基因组特征和复杂协变量建模。其次,在ALSKM的基础上,我们开发了一种新颖的方法来测试两组(生物学相关)特征之间的相互作用。具体来说,我们开发了一种多标记测试,可以测试两组不同基因组特征之间的上位性或基因-基因相互作用。最后,我们再次使用在主题1和主题2下开发的机制来开发一种方法,以在常见变体存在的情况下测试稀有变体与表型之间的关联,同时适应常见和稀有变体之间的潜在相互作用。通过专注于多功能测试,这些方法降低了数据的维数。使用内核机器框架可以进行灵活的,可能是非参数的分析,这在我们对特征如何影响各种特征和疾病的理解不完全的情况下非常重要。

著录项

  • 作者

    Clark, Jennifer J.;

  • 作者单位

    The University of North Carolina at Chapel Hill.;

  • 授予单位 The University of North Carolina at Chapel Hill.;
  • 学科 Biology Genetics.;Biology Bioinformatics.;Statistics.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 133 p.
  • 总页数 133
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号