...
首页> 外文期刊>Knowledge and information systems >Direct estimation of class membership probabilities for multiclass classification using multiple scores
【24h】

Direct estimation of class membership probabilities for multiclass classification using multiple scores

机译:使用多个分数直接估计用于多类分类的类成员资格概率

获取原文
获取原文并翻译 | 示例
           

摘要

Accurate estimation of class membership probability is needed for many applications in data mining and decision-making, to which multiclass classification is often applied. Since existing methods for estimation of class membership probability are designed for binary classification, in which only a single score outputted from a classifier can be used, an approach for multiclass classification requires both a decomposition of a multiclass classifier into binary classifiers and a combination of estimates obtained from each binary classifier to a target estimate. We propose a simple and general method for directly estimating class membership probability for any class in multiclass classification without decomposition and combination, using multiple scores not only for a predicted class but also for other proper classes. To make it possible to use multiple scores, we propose to modify or extend representative existing methods. As a non-parametric method, which refers to the idea of a binning method as proposed by Zadrozny et al., we create an "accuracy table" by a different method. Moreover we smooth accuracies on the table with methods such as the moving average to yield reliable probabilities (accuracies). As a parametric method, we extend Piatt's method to apply a multiple logistic regression. On two different datasets (open-ended data from Japanese social surveys and the 20 Newsgroups) both with Support Vector Machines and naive Bayes classifiers, we empirically show that the use of multiple scores is effective in the estimation of class membership probabilities in multiclass classification in terms of cross entropy, the reliability diagram, the ROC curve and AUC (area under the ROC curve), and that the proposed smoothing method for the accuracy table works quite well. Finally, we show empirically that in terms of MSE (mean squared error), our best proposed method is superior to an expansion for multiclass classification of a PAV method proposed by Zadrozny et al., in both the 20 Newsgroups dataset and the Pendigits dataset, but is slightly worse than the state-of-the-art method, which is an expansion for multiclass classification of a combination of boosting and a PAV method, on the Pendigits dataset.
机译:在数据挖掘和决策中的许多应用中都需要对类成员资格概率进行准确的估计,而在此类应用中通常会应用多类分类。由于现有的用于估计类成员资格概率的方法是针对二进制分类而设计的,其中只能使用从分类器输出的单个分数,因此用于多类分类的方法既需要将多类分类器分解为二进制分类器,又需要组合估计从每个二元分类器获得的目标估计值。我们提出了一种简单通用的方法,可以直接估计多类分类中任何类的类隶属概率,而无需分解和组合,不仅针对预测类,还针对其他适当类使用多个分数。为了能够使用多个分数,我们建议修改或扩展现有的代表性方法。作为非参数方法,它涉及Zadrozny等人提出的合并方法的思想,我们通过另一种方法来创建“精度表”。此外,我们使用诸如移动平均线之类的方法对表上的准确度进行平滑处理,以得出可靠的概率(准确度)。作为参数方法,我们扩展了Piatt方法以应用多元逻辑回归。在两个分别使用支持向量机和朴素贝叶斯分类器的数据集(来自日本社会调查和20个新闻组的开放式数据)上,我们凭经验表明,使用多个分数可以有效地估计美国多类分类中的类成员概率。交叉熵,可靠性图,ROC曲线和AUC(ROC曲线下的面积),以及所提出的精度表平滑方法效果很好。最后,我们通过经验证明,就20个新闻组数据集和Pendigits数据集而言,就MSE(均方误差)而言,我们最好的方法优于Zadrozny等人提出的PAV方法的多类分类扩展,但是它比Pendigits数据集上的最新方法稍差一些,后者是对boosting和PAV方法相结合的多类分类的扩展。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号