首页> 外文期刊>Statistica >LARGE-SCALE SIMULTANEOUS INFERENCE WITH APPLICATIONS TO THE DETECTION OF DIFFERENTIAL EXPRESSION WITH MICROARRAY DATA
【24h】

LARGE-SCALE SIMULTANEOUS INFERENCE WITH APPLICATIONS TO THE DETECTION OF DIFFERENTIAL EXPRESSION WITH MICROARRAY DATA

机译:大型同时推论及其在微阵列数据检测差异表达中的应用

获取原文
获取原文并翻译 | 示例
       

摘要

Often the first step, and indeed the major goal for many microarray studies, is the detection of genes that are differentially expressed in a known number of classes, С_1, ., C_g. Statistical significance of differential expression can be tested by performing a test for each gene. When many hypotheses are tested, the probabil-ity that a type I error (a false positive error) is committed increases sharply with the number of hypotheses. In this paper, we focus on the use of a two-component mixture model to handle the multiplicity issue, as proposed initially by McLachlan, Bean, and Ben-Tovim Jones (2006). This model is becoming more widely adopted in the context of microarrays, where one component density cor-responds to that of the test statistics for genes that are not differentially ex-pressed, and the other component density to that of the test statistic for genes that are differentially expressed. For the adopted test statistic, its values are trans-formed to - scores, whose null and non-null distributio e represented by a single normal each. We explain how this two-component normal mixture model can be fitted very quickly via the ЕМ algorithm started from a point that is com-pletely determined by an initial specification of the proportion no of genes that are not differentially expressed. There is an easy to apply procedure for determining suitable initial values for no in the case where the null density is taken to be stan-dard normal (the theoretical null distribution). We also consider the provision of an initial partition of the genes into two groups for the application of the ЕМ al-gorithm in the case where the adoption of the theoretical null distribution would appear not to be appropriate and an empirical null distribution needs to be used. We demonstrate the approach on a data set that has been analyzed previously in the bioinformatics literature.
机译:通常,第一步(实际上也是许多微阵列研究的主要目标)是检测在已知数量的类别С_1,...,C_g中差异表达的基因。差异表达的统计学意义可以通过对每个基因进行测试来测试。当检验了许多假设时,随着假设数量的增加,发生I型错误(假阳性错误)的概率将急剧增加。在本文中,我们专注于使用两组分混合模型来处理多重性问题,这是McLachlan,Bean和Ben-Tovim Jones(2006)最初提出的。在微阵列的背景下,该模型被越来越广泛地采用,其中一种成分的密度对应于未差异表达的基因的测试统计的密度,而另一种成分的密度对应于未表达的基因的测试统计的密度。被差异表达。对于采用的检验统计量,其值将转换为-分数,其零值和非零值分布分别由一个正态表示。我们解释了如何通过ЕМ算法非常快速地拟合此两成分正常混合物模型,该算法从完全由不差异表达的基因比例的初始规范完全确定的点开始。在零密度被视为标准正态(理论零分布)的情况下,有一个易于应用的过程可以为no确定合适的初始值。在采用理论上的零分布似乎不合适并且需要使用经验式零分布的情况下,我们还考虑了将基因的初始分配提供给两组算法,以便应用ЕМal-gorithm 。我们在以前已经在生物信息学文献中分析过的数据集上演示了该方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号