...
首页> 外文期刊>Statistics and Its Interface >REC: fast sparse regression-based multicategory classification
【24h】

REC: fast sparse regression-based multicategory classification

机译:REC:基于快速稀疏回归的多类别分类

获取原文
           

摘要

Recent advance in technology enables researchers to gather and store enormous data sets with ultra high dimensionality. In bioinformatics, microarray and next generation sequencing technologies can produce data with tens of thousands of predictors of biomarkers. On the other hand, the corresponding sample sizes are often limited. For classification problems, to predict new observations with high accuracy, and to better understand the effect of predictors on classification, it is desirable, and often necessary, to train the classifier with variable selection. In the literature, sparse regularized classification techniques have been popular due to the ability of simultaneous classification and variable selection. Despite its success, such a sparse penalized method may have low computational speed, when the dimension of the problem is ultra high. To overcome this challenge, we propose a new sparse REgression based multicategory Classifier (REC). Our method uses a simplex to represent different categories of the classification problem. A major advantage of REC is that the optimization can be decoupled into smaller independent sparse penalized regression problems, and hence solved by using parallel computing. Consequently, REC enjoys an extraordinarily fast computational speed. Moreover, REC is able to provide class conditional probability estimation. Simulated examples and applications on microarray and next generation sequencing data suggest that REC is very competitive when compared to several existing methods.
机译:技术的最新进展使研究人员能够收集和存储具有超高维度的大量数据集。在生物信息学中,微阵列和下一代测序技术可以产生具有成千上万个生物标志物预测因子的数据。另一方面,相应的样本大小通常受到限制。对于分类问题,要以高精度预测新的观测值并更好地了解预测变量对分类的影响,则需要并且经常有必要对分类器进行变量选择训练。在文献中,由于同时分类和变量选择的能力,稀疏的正则化分类技术已经流行。尽管其成功,但是当问题的规模非常高时,这种稀疏的惩罚方法可能具有较低的计算速度。为了克服这一挑战,我们提出了一种新的基于稀疏回归的多类别分类器(REC)。我们的方法使用单纯形来表示分类问题的不同类别。 REC的主要优点在于,可以将优化解耦为较小的独立稀疏惩罚回归问题,从而可以通过使用并行计算来解决。因此,REC具有极快的计算速度。而且,REC能够提供分类条件概率估计。在微阵列和下一代测序数据上的模拟实例和应用表明,与几种现有方法相比,REC具有很强的竞争力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号