首页> 外国专利> Computer implemented method for determining all markov boundaries and its application for discovering multiple maximally accurate and non-redundant predictive models

Computer implemented method for determining all markov boundaries and its application for discovering multiple maximally accurate and non-redundant predictive models

机译:确定所有马尔科夫边界的计算机实现方法及其在发现多个最大精确和非冗余预测模型中的应用

摘要

Methods for discovery of a Markov boundary from data constitute one of the most important recent developments in pattern recognition and applied statistics, primarily because they offer a principled solution to the variable/feature selection problem and give insight about local causal structure. Even though there is always a single Markov boundary of the response variable in faithful distributions, distributions with violations of the intersection property of probability theory may have multiple Markov boundaries. Such distributions are abundant in practical data-analytic applications, and there are several reasons why it is important to discover all Markov boundaries from such data. The present invention is a novel computer implemented generative method (termed TIE*) that can discover all Markov boundaries from a data sample drawn from a distribution. TIE* can be instantiated to discover all and only Markov boundaries independent of data distribution. TIE* has been tested with simulated and re-simulated data and then applied to (a) identify the set of maximally accurate and non-redundant molecular signatures and to (b) discover Markov boundaries in datasets from several application domains including but not limited to: biology, medicine, economics, ecology, digit recognition, text categorization, and computational biology.
机译:从数据中发现马尔可夫边界的方法构成了模式识别和应用统计中最重要的最新进展之一,主要是因为它们为变量/特征选择问题提供了有原则的解决方案,并提供了关于局部因果结构的见识。即使在忠实分布中始终存在响应变量的单个马尔可夫边界,但违反概率论的相交属性的分布可能具有多个马尔可夫边界。这样的分布在实际的数据分析应用程序中非常丰富,因此有许多原因使从这些数据中发现所有马尔可夫边界很重要。本发明是一种新颖的计算机实现的生成方法(称为TIE *),其可以从从分布中提取的数据样本中发现所有的马尔可夫边界。可以实例化TIE *,以发现与数据分布无关的所有Markov边界。 TIE *已通过模拟和重新模拟的数据进行了测试,然后应用于(a)识别一组最大准确和非冗余的分子标记,以及(b)从多个应用领域(包括但不限于)发现数据集中的马尔可夫边界:生物学,医学,经济学,生态学,数字识别,文本分类和计算生物学。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号