首页> 外文期刊>Statistics and computing >Prediction-based regularization using data augmented regression
【24h】

Prediction-based regularization using data augmented regression

机译:使用数据增强回归的基于预测的正则化

获取原文
获取原文并翻译 | 示例
           

摘要

The role of regularization is to control fitted model complexity and variance by penalizing (or constraining) models to be in an area of model space that is deemed reasonable, thus facilitating good predictive performance. This is typically achieved by penalizing a parametric or non-parametric representation of the model. In this paper we advocate instead the use of prior knowledge or expectations about the predictions of models for regularization. This has the twofold advantage of allowing a more intuitive interpretation of penalties and priors and explicitly controlling model extrapolation into relevant regions of the feature space. This second point is especially critical in high-dimensional modeling situations, where the curse of dimensionality implies that new prediction points usually require extrapolation. We demonstrate that prediction-based regularization can, in many cases, be stochastically implemented by simply augmenting the dataset with Monte Carlo pseudo-data. We investigate the range of applicability of this implementation. An asymptotic analysis of the performance of Data Augmented Regression (DAR) in parametric and non-parametric linear regression, and in nearest neighbor regression, clarifies the regularizing behavior of DAR. We apply DAR to simulated and real data, and show that it is able to control the variance of extrapolation, while maintaining, and often improving, predictive accuracy.
机译:正则化的作用是通过将模型惩罚(或约束)在模型空间中被认为是合理的区域,从而控制拟合模型的复杂性和方差,从而促进良好的预测性能。这通常是通过惩罚模型的参数或非参数表示来实现的。在本文中,我们提倡使用正则化模型预测的先验知识或期望。这具有双重优势:允许更直观地解释罚金和先验,并明确控制将模型外推到特征空间的相关区域中。第二点在高维建模情况下尤其重要,在高维建模情况下,维数的诅咒意味着新的预测点通常需要外推。我们证明,在许多情况下,可以通过简单地用Monte Carlo伪数据扩充数据集来随机实现基于预测的正则化。我们调查了此实现的适用范围。在参数和非参数线性回归以及最近邻回归中对数据增强回归(DAR)的性能进行渐近分析,阐明了DAR的正则化行为。我们将DAR应用于模拟和真实数据,并表明DAR能够控制外推的方差,同时保持并经常提高预测精度。

著录项

  • 来源
    《Statistics and computing》 |2012年第1期|p.237-249|共13页
  • 作者

    Giles Hooker; Saharon Rosset;

  • 作者单位

    Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, USA$2School of Mathematical Sciences, Tel Aviv University, Tel Aviv, Israel;

    Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, USA$2School of Mathematical Sciences, Tel Aviv University, Tel Aviv, Israel;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    regression; nearest-neighbor; extrapolation; machine learning; regularization;

    机译:回归最近的邻居;外推机器学习正则化;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号