首页> 外文期刊>Journal of applied statistics >Subsemble: an ensemble method for combining subset-specific algorithm fits
【24h】

Subsemble: an ensemble method for combining subset-specific algorithm fits

机译:子集:用于结合特定子集算法拟合的集成方法

获取原文
获取原文并翻译 | 示例
           

摘要

Ensemble methods using the same underlying algorithm trained on different subsets of observations have recently received increased attention as practical prediction tools for massive data sets. We propose Subsemble: a general subset ensemble prediction method, which can be used for small, moderate, or large data sets. Subsemble partitions the full data set into subsets of observations, fits a specified underlying algorithm on each subset, and uses a clever form of Ⅴ-fold cross-validation to output a prediction function that combines the subset-specific fits. We give an oracle result that provides a theoretical performance guarantee for Subsemble. Through simulations, we demonstrate that Subsemble can be a beneficial tool for small- to moderate-sized data sets, and often has better prediction performance than the underlying algorithm fit just once on the full data set. We also describe how to include Subsemble as a candidate in a SuperLearner library, providing a practical way to evaluate the performance of Subsemble relative to the underlying algorithm fit just once on the full data set.
机译:最近,使用针对不同观测子集训练的相同基础算法的集成方法作为海量数据集的实用预测工具受到越来越多的关注。我们提出了Subsemble:一种通用的子集集成预测方法,可用于小型,中型或大型数据集。子集将整个数据集划分为观察子集,对每个子集拟合指定的基础算法,并使用Ⅴ形交叉验证的巧妙形式输出结合了子集特定拟合的预测函数。我们给出一个预言结果,该结果为Subsemble提供了理论上的性能保证。通过仿真,我们证明了Subsemble对于中小型数据集可以是一种有益的工具,并且比基础算法仅对整个数据集拟合一次具有更好的预测性能。我们还描述了如何将Subsemble作为SuperLearner库的候选对象,从而提供了一种实用的方法来评估Subsemble相对于仅适用于整个数据集的基础算法的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号