首页> 外文期刊>IEEE Transactions on Information Theory >Ensemble Estimation of Generalized Mutual Information With Applications to Genomics
【24h】

Ensemble Estimation of Generalized Mutual Information With Applications to Genomics

机译:与基因组学应用的集合估算广义互信息

获取原文
获取原文并翻译 | 示例
           

摘要

Mutual information is a measure of the dependence between random variables that has been used successfully in myriad applications in many fields. Generalized mutual information measures that go beyond classical Shannon mutual information have also received much interest in these applications. We derive the mean squared error convergence rates of kernel density-based plug-in estimators of general mutual information measures between two multidimensional random variables X and Y for two cases: 1) X and Y are continuous; 2) X and Y may have a mixture of discrete and continuous components. Using the derived rates, we propose an ensemble estimator of these information measures called GENIE by taking a weighted sum of the plug-in estimators with varied bandwidths. The resulting ensemble estimators achieve the 1/N parametric mean squared error convergence rate when the conditional densities of the continuous variables are sufficiently smooth. To the best of our knowledge, this is the first nonparametric mutual information estimator known to achieve the parametric convergence rate for the mixture case, which frequently arises in applications (e.g. variable selection in classification). The estimator is simple to implement and it uses the solution to an offline convex optimization problem and simple plug-in estimators. A central limit theorem is also derived for the ensemble estimators and minimax rates are derived for the continuous case. We demonstrate the ensemble estimator for the mixed case on simulated data and apply the proposed estimator to analyze gene relationships in single cell data.
机译:相互信息是在许多字段中在MYRIAD应用程序中成功使用的随机变量之间的依赖的衡量标准。超越古典香农互联信息的广义互信息措施也在这些应用中获得了很多兴趣。我们推导出常用互动变量X和Y之间的一般互动仪表的常用互动估计的均方方的误差收敛率:1)x和y是连续的; 2)X和Y可具有离散和连续组分的混合物。使用衍生率,我们提出了一种由具有各种带宽的插件估计量的加权之和来提出这些信息措施的集合估计。当连续变量的条件密度足够平滑时,所产生的集合估计器实现1 / N的参数均方误差会聚速率。据我们所知,这是已知的第一个非参数互信息估计,以实现混合案例的参数收敛速率,其经常出现在应用中(例如,分类中的变量选择)。估算器易于实现,它使用解决方案到脱机凸优化问题和简单的插件估计器。还导出了集合估计的中央限位定理,并且导出了连续情况的Minimax速率。我们展示了用于模拟数据的混合案例的集合估计,并应用所提出的估计器来分析单个细胞数据中的基因关系。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号