首页> 外文期刊>Cancer Informatics >Improved Sparse Multi-Class SVM and Its Application for Gene Selection in Cancer Classification
【24h】

Improved Sparse Multi-Class SVM and Its Application for Gene Selection in Cancer Classification

机译:改进的稀疏多类支持向量机及其在癌症分类基因选择中的应用

获取原文
           

摘要

Background: Microarray techniques provide promising tools for cancer diagnosis using gene expression profiles. However, molecular diagnosis based on high-throughput platforms presents great challenges due to the overwhelming number of variables versus the small sample size and the complex nature of multi-type tumors. Support vector machines (SVMs) have shown superior performance in cancer classification due to their ability to handle high dimensional low sample size data. The multi-class SVM algorithm of Crammer and Singer provides a natural framework for multi-class learning. Despite its effective performance, the procedure utilizes all variables without selection. In this paper, we propose to improve the procedure by imposing shrinkage penalties in learning to enforce solution sparsity.Results: The original multi-class SVM of Crammer and Singer is effective for multi-class classification but does not conduct variable selection. We improved the method by introducing soft-thresholding type penalties to incorporate variable selection into multi-class classification for high dimensional data. The new methods were applied to simulated data and two cancer gene expression data sets. The results demonstrate that the new methods can select a small number of genes for building accurate multi-class classification rules. Furthermore, the important genes selected by the methods overlap significantly, suggesting general agreement among different variable selection schemes.Conclusions: High accuracy and sparsity make the new methods attractive for cancer diagnostics with gene expression data and defining targets of therapeutic intervention.Availability: The source MATLAB code are available from http://math.arizona.edu/~hzhang/software.html.
机译:背景:微阵列技术为使用基因表达谱进行癌症诊断提供了有前途的工具。然而,基于高通量平台的分子诊断提出了巨大的挑战,这是因为变量数量众多而不是样本量小且多型肿瘤的复杂性。支持向量机(SVM)由于具有处理高维,低样本量数据的能力,因此在癌症分类中表现出卓越的性能。 Crammer和Singer的多类SVM算法为多类学习提供了自然的框架。尽管具有有效的性能,但该过程无需选择即可利用所有变量。在本文中,我们建议通过在学习中施加收缩惩罚来改善程序,以提高解决方案的稀疏性。结果:最初的Crammer和Singer的多类支持向量机对多类分类有效,但不进行变量选择。我们通过引入软阈值类型的惩罚来将变量选择纳入高维数据的多类分类中,从而改进了该方法。将该新方法应用于模拟数据和两个癌症基因表达数据集。结果表明,新方法可以选择少量基因来建立准确的多类分类规则。此外,通过这些方法选择的重要基因显着重叠,这表明不同的变量选择方案之间具有普遍的共识。可从http://math.arizona.edu/~hzhang/software.html获得MATLAB代码。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号