首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Density-Preserving Sampling: Robust and Efficient Alternative to Cross-Validation for Error Estimation
【24h】

Density-Preserving Sampling: Robust and Efficient Alternative to Cross-Validation for Error Estimation

机译:保持密度的采样:用于误差估计的交叉验证的可靠而有效的选择

获取原文
获取原文并翻译 | 示例
           

摘要

Estimation of the generalization ability of a classification or regression model is an important issue, as it indicates the expected performance on previously unseen data and is also used for model selection. Currently used generalization error estimation procedures, such as cross-validation (CV) or bootstrap, are stochastic and, thus, require multiple repetitions in order to produce reliable results, which can be computationally expensive, if not prohibitive. The correntropy-inspired density-preserving sampling (DPS) procedure proposed in this paper eliminates the need for repeating the error estimation procedure by dividing the available data into subsets that are guaranteed to be representative of the input dataset. This allows the production of low-variance error estimates with an accuracy comparable to 10 times repeated CV at a fraction of the computations required by CV. This method can also be used for model ranking and selection. This paper derives the DPS procedure and investigates its usability and performance using a set of public benchmark datasets and standard classifiers.
机译:分类或回归模型的泛化能力的估计是一个重要的问题,因为它表明了以前看不见的数据的预期性能,并且也用于模型选择。当前使用的通用误差估计程序(例如交叉验证(CV)或自举程序)是随机的,因此需要多次重复才能产生可​​靠的结果,如果不是禁止的话,这可能在计算上昂贵。本文提出的以熵为灵感的密度保持采样(DPS)程序通过将可用数据划分为保证代表输入数据集的子集,从而无需重复误差估计程序。这样就可以产生低方差误差估计,其准确度相当于CV所需计算的一部分,是重复CV的10倍。此方法也可以用于模型排名和选择。本文推导了DPS程序,并使用一组公共基准数据集和标准分类器来研究其可用性和性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号