首页> 外文期刊>Information Theory, IEEE Transactions on >Hub Discovery in Partial Correlation Graphs
【24h】

Hub Discovery in Partial Correlation Graphs

机译:部分相关图中的集线器发现

获取原文
获取原文并翻译 | 示例
           

摘要

One of the most important problems in large-scale inference problems is the identification of variables that are highly dependent on several other variables. When dependence is measured by partial correlations, these variables identify those rows of the partial correlation matrix that have several entries with large magnitudes, i.e., hubs in the associated partial correlation graph. This paper develops theory and algorithms for discovering such hubs from a few observations of these variables. We introduce a hub screening framework in which the user specifies both a minimum (partial) correlation $rho $ and a minimum degree $delta $ to screen the vertices. The choice of $rho $ and $delta $ can be guided by our mathematical expressions for the phase transition correlation threshold $rho _{c}$ governing the average number of discoveries. They can also be guided by our asymptotic expressions for familywise discovery rates under the assumption of large number $p$ of variables, fixed number $n$ of multivariate samples, and weak dependence. Under the null hypothesis that the dispersion (covariance) matrix is sparse, these limiting expressions can be used to enforce familywise error constraints and to rank the discoveries in order of increasing statistical significance. For $nll p$, the computational complexity of the proposed partial correlation screening method is low and is therefore highly scalable. Thus, it can be applied to significantly large- problems than previous approaches. The theory is applied to discovering hubs in a high-dimensional gene microarray dataset.
机译:大规模推理问题中最重要的问题之一是对高度依赖于其他几个变量的变量的识别。当通过偏相关来测量相关性时,这些变量标识偏相关矩阵的那些行,这些行具有多个具有较大幅度的条目,即关联的偏相关图中的中心。本文开发了从这些变量的一些发现中发现此类中心的理论和算法。我们介绍了一种中心筛选框架,用户可以在其中指定最小(部分)相关度$ rho $和最小度数$ delta $来筛选顶点。可以通过我们的数学表达式来指导$ rho $和$ delta $的选择,这些数学表达式用于控制平均发现次数的相变相关性阈值$ rho _ {c} $。在变量为大数$ p $,变量样本为定数$ n $和依赖性较弱的假设下,它们也可以由我们的家庭发现率的渐近表达式指导。在色散(协方差)矩阵稀疏的零假设下,这些限制表达式可用于强制执行族性误差约束并按增加的统计显着性对发现进行排序。对于$ nll p $,所提出的部分相关筛选方法的计算复杂度较低,因此具有很高的可扩展性。因此,与以前的方法相比,它可以应用于明显较大的问题。该理论适用于在高维基因微阵列数据集中发现集线器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号