首页> 外文学位 >Pairwise Ranking and Removal Network Analysis of Genome-Wide Gene Expression Data: Theory, a New Algorithm, and Analysis of Data from the Cancer Genome Atlas
【24h】

Pairwise Ranking and Removal Network Analysis of Genome-Wide Gene Expression Data: Theory, a New Algorithm, and Analysis of Data from the Cancer Genome Atlas

机译:全基因组表达数据的成对排序和去除网络分析:理论,新算法和癌症基因组图谱的数据分析

获取原文
获取原文并翻译 | 示例

摘要

Advances in genomics technologies have made genome-wide gene expression profiling of cell types and tissue biopsies a common practice. Network analysis in various forms has been an extremely important methodology for analyzing genome-wide gene expression data with the goal of identifying biological pathways and other factors responsible for observed gene expression. Among the many types of network analyses used for this purpose, methods that apply pairwise ranking and removal algorithmic approaches are some of the most commonly applied. Despite their wide usage, guidelines about how to interpret the set of network connections among the observed genes output by these methods are varied and often not well specified. This work addresses some of this interpretation ambiguity by formalizing concepts through a theoretical analysis where the goal is understand when such algorithms could return correct biological inferences. This theoretical analysis expounds on some of the proofs and properties of algorithm ARACNE, one of the oldest network analysis methods in this class, which has a simple enough structure to allow a theoretical treatment. The results point to two cases where the ARACNE algorithm could potentially return correct inferences: when measured genes causally regulate the expression of other genes in a tree network structure and when there is an unobserved latent process or pathway responsible for expression levels in observed genes. For both cases, a new way to interpret network output are presented that ascribe biological meaning to recovered hub genes that are connected to many other genes, which in turn are not strongly connected to one another. Analysis of hubs has been a traditional way the output of these algorithms has been employed and these results argue this provides the best chance of correctly identifying underlying biology. Building on this result, a new pairwise ranking and removal network algorithm based on a score computed on order statistics is proposed called MMBOS (vote based ranking of min-min), which is designed to have good properties for identifying strongly supported hub structures from gene expression data. This algorithm is shown to have better performance compared to ARACNE and similar algorithms in this class when analyzing simulated data. MMBOS is then used to analyze genome-wide gene expression data for 11 cancer types with large sample sizes available from the The Cancer Genome Atlas. An analysis of the hub structures shared among cancer types identified several hub genes including GADD45GIP1, ASXL2, and TPX2 that have experimentally validated roles in cancer biology and that also have validated relationships with connected genes that are consistent with either the causal regulatory or latent variable interpretation of a network hub. While preliminary, these results indicate that despite the highly restricted underlying biology necessary for the network output of pairwise ranking and removal algorithms such as MMBOS to make correct inferences, there do appear to be cases where algorithms in this class can provide useful insights. A perspective on the general usefulness of pairwise ranking and removal algorithms such as ARACNE and MMBOS is discussed, as well as thoughts on how this usefulness could be assessed in the future.
机译:基因组学技术的进步使细胞类型和组织活检的全基因组基因表达谱成为一种普遍的做法。各种形式的网络分析已成为分析全基因组基因表达数据的极其重要的方法,其目的是确定导致观察到的基因表达的生物途径和其他因素。在用于此目的的多种类型的网络分析中,应用成对排名和删除算法方法的方法是最常用的方法。尽管用途广泛,但有关如何解释通过这些方法输出的观察到的基因之间的网络连接集的准则却千差万别,而且往往没有明确规定。这项工作通过理论分析形式化概念,解决了一些解释上的歧义,目的是了解此类算法何时可以返回正确的生物学推论。该理论分析阐述了算法ARACNE的一些证明和性质,该算法是此类中最古老的网络分析方法之一,其结构足够简单,可以进行理论处理。结果指向两种情况,其中ARACNE算法可能会返回正确的推论:当被测基因因果关系调节树网络结构中其他基因的表达时,以及当观察到的基因中存在未观察到的潜在过程或通路负责表达水平时。对于这两种情况,都提出了一种解释网络输出的新方法,该方法将生物学意义归因于与许多其他基因相连的已恢复的集线器基因,而这些基因又彼此之间没有很强的联系。集线器分析一直是采用这些算法的输出的传统方式,这些结果表明,这提供了正确识别基础生物学的最佳机会。在此结果的基础上,提出了一种新的基于排序统计的得分的成对排名和删除网络算法,称为MMBOS(基于投票的分钟-分钟排名),该算法具有良好的特性,可从基因中识别出强力支持的集线器结构表达式数据。在分析模拟数据时,与该类中的ARACNE和类似算法相比,该算法具有更好的性能。然后,MMBOS可用于分析11种癌症类型的全基因组基因表达数据,并从The Cancer Genome Atlas获得大量样本。对癌症类型之间共有的枢纽结构的分析确定了包括GADD45GIP1,ASXL2和TPX2在内的多个枢纽基因,这些基因在实验中已在癌症生物学中得到验证,并且还与与因果调节或潜在变量解释相符的相关基因之间存在关联网络集线器。虽然初步,但这些结果表明,尽管成对排名和删除算法(如MMBOS)的网络输出做出正确的推理所必需的基础生物学受到严格限制,但确实存在此类中的算法可以提供有用见解的情况。讨论了关于成对排名和删除算法(如ARACNE和MMBOS)的一般实用性的观点,以及有关如何在将来评估这种实用性的想法。

著录项

  • 作者

    Sainath Madduri, Abishek.;

  • 作者单位

    Weill Medical College of Cornell University.;

  • 授予单位 Weill Medical College of Cornell University.;
  • 学科 Bioinformatics.
  • 学位 Ph.D.
  • 年度 2017
  • 页码 119 p.
  • 总页数 119
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号