首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >Classifier design given an uncertainty class of feature distributions via regularized maximum likelihood and the incorporation of biological pathway knowledge in steady-state phenotype classification
【24h】

Classifier design given an uncertainty class of feature distributions via regularized maximum likelihood and the incorporation of biological pathway knowledge in steady-state phenotype classification

机译:分类器设计通过规则化的最大似然给出了特征分布的不确定性类别,并将生物途径知识纳入稳态表型分类

获取原文
获取原文并翻译 | 示例
           

摘要

Contemporary high-throughput technologies provide measurements of very large numbers of variables but often with very small sample sizes. This paper proposes an optimization-based paradigm for utilizing prior knowledge to design better performing classifiers when sample sizes are limited. We derive approximate expressions for the first and second moments of the true error rate of the proposed classifier under the assumption of two widely used models for the uncertainty classes: ε-contamination and p-point classes. The applicability of the approximate expressions is discussed by defining the problem of finding optimal regularization parameters through minimizing the expected true error. Simulation results using the Zipf model show that the proposed paradigm yields improved classifiers that outperform traditional classifiers which use only training data. Our application of interest involves discrete gene regulatory networks possessing labeled steady-state distributions. Given prior operational knowledge of the process, our goal is to build a classifier that can accurately label future observations obtained in the steady state by utilizing both the available prior knowledge and the training data. We examine the proposed paradigm on networks containing NF-κB pathways, where it shows significant improvement in classifier performance over the classical data-only approach to classifier design. Companion website: http://gsp.tamu.edu/ Publications/supplementary/shahrokh12a.
机译:当代的高通量技术可以测量非常多的变量,但通常只需要很小的样本量。本文提出了一种基于优化的范式,用于在样本量有限的情况下利用先验知识设计性能更好的分类器。在不确定性类别的两个广泛使用的模型的假设下,我们推导出拟议分类器的真实错误率的第一刻和第二刻的近似表达式:ε污染和p点类别。通过定义通过使期望的真实误差最小化来找到最佳正则化参数的问题,来讨论近似表达式的适用性。使用Zipf模型的仿真结果表明,所提出的范式可产生改进的分类器,优于仅使用训练数据的传统分类器。我们感兴趣的应用涉及具有标记的稳态分布的离散基因调控网络。给定该过程的先验操作知识,我们的目标是建立一个分类器,该分类器可以利用可用的先验知识和训练数据来准确标记稳态下获得的未来观察结果。我们检查了包含NF-κB通路的网络上的拟议范式,与经典的仅基于数据的分类器设计方法相比,它显示了分类器性能的显着提高。随同网站:http://gsp.tamu.edu/ Publications / supplementary / shahrokh12a。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号