Prediction-based regularization using data augmented regression

Giles Hooker; Saharon Rosset

首页> 外文期刊>Statistics and computing >Prediction-based regularization using data augmented regression

【24h】

Prediction-based regularization using data augmented regression

机译：使用数据增强回归的基于预测的正则化

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The role of regularization is to control fitted model complexity and variance by penalizing (or constraining) models to be in an area of model space that is deemed reasonable, thus facilitating good predictive performance. This is typically achieved by penalizing a parametric or non-parametric representation of the model. In this paper we advocate instead the use of prior knowledge or expectations about the predictions of models for regularization. This has the twofold advantage of allowing a more intuitive interpretation of penalties and priors and explicitly controlling model extrapolation into relevant regions of the feature space. This second point is especially critical in high-dimensional modeling situations, where the curse of dimensionality implies that new prediction points usually require extrapolation. We demonstrate that prediction-based regularization can, in many cases, be stochastically implemented by simply augmenting the dataset with Monte Carlo pseudo-data. We investigate the range of applicability of this implementation. An asymptotic analysis of the performance of Data Augmented Regression (DAR) in parametric and non-parametric linear regression, and in nearest neighbor regression, clarifies the regularizing behavior of DAR. We apply DAR to simulated and real data, and show that it is able to control the variance of extrapolation, while maintaining, and often improving, predictive accuracy.

机译：正则化的作用是通过将模型惩罚（或约束）在模型空间中被认为是合理的区域，从而控制拟合模型的复杂性和方差，从而促进良好的预测性能。这通常是通过惩罚模型的参数或非参数表示来实现的。在本文中，我们提倡使用正则化模型预测的先验知识或期望。这具有双重优势：允许更直观地解释罚金和先验，并明确控制将模型外推到特征空间的相关区域中。第二点在高维建模情况下尤其重要，在高维建模情况下，维数的诅咒意味着新的预测点通常需要外推。我们证明，在许多情况下，可以通过简单地用Monte Carlo伪数据扩充数据集来随机实现基于预测的正则化。我们调查了此实现的适用范围。在参数和非参数线性回归以及最近邻回归中对数据增强回归（DAR）的性能进行渐近分析，阐明了DAR的正则化行为。我们将DAR应用于模拟和真实数据，并表明DAR能够控制外推的方差，同时保持并经常提高预测精度。

著录项

来源
《Statistics and computing》 |2012年第1期|p.237-249|共13页
作者
Giles Hooker; Saharon Rosset;
展开▼
作者单位

Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, USA$2School of Mathematical Sciences, Tel Aviv University, Tel Aviv, Israel;

Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, USA$2School of Mathematical Sciences, Tel Aviv University, Tel Aviv, Israel;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
regression; nearest-neighbor; extrapolation; machine learning; regularization;

机译：回归最近的邻居;外推机器学习正则化;

相似文献

外文文献
中文文献
专利

1. Augmenting limited background monitoring data for improved performance in land use regression modelling: Using support vector regression and mobile monitoring [J] . Basu Bidroha, Alam Md Saniul, Ghosh Bidisha, Atmospheric environment . 2019,第MARa期

机译：增强有限的背景监测数据以提高土地利用回归建模的性能：使用支持向量回归和移动监测
2. Comments on: Augmenting the bootstrap to analyze high dimensional genomic data: Connections to the ridge regularized covariance estimator with bagging [J] . Sunduz Keles, Hyonho Chun Test: An Official Journal of the Spanish Society of Statistics and Operations Research . 2008,第1期

机译：评论：增强引导程序以分析高维基因组数据：使用装袋法连接到岭正则化协方差估计器
3. Determination of Electron Flux Spectra in a Solar Flare with an Augmented Regularization Method: Application to Rhessi Data [J] . Eduard P. Kontar, A. Gordon Emslie, Michele Piana, Solar Physics . 2005,第2期

机译：增强正则化方法测定太阳耀斑中的电子通量谱：在Rhessi数据中的应用
4. A Fast PM2.5 Forecast Approach Based on Time-Series Data Analysis, Regression and Regularization [C] . Cyuan-Heng Luo, Hsuan Yang, Li-Pang Huang, Conference on Technologies and Applications of Artificial Intelligence . 2018

机译：基于时间序列数据分析，回归和正规化的快速PM2.5预测方法
5. High dimensional estimation and data analysis: Entropy and regularized regression. [D] . Vu, Vincent Quang. 2009

机译：高维估计和数据分析：熵和正则回归。
6. Discovering Temporal Patterns in Longitudinal Nontargeted Metabolomics Data via Group and Nuclear Norm Regularized Multivariate Regression [O] . Zhaozhou Lin, Qiao Zhang, Shengyun Dai, 2020

机译：通过组和核范数正则化多元回归发现纵向非靶向代谢组学数据中的时间模式
7. Prediction-Based Regularization Using Data Augmented Regression [O] . Giles Hooker, Saharon Rosset 2008

机译：基于预测的数据增广回归正则化

Prediction-based regularization using data augmented regression

摘要

著录项

相似文献

相关主题

期刊订阅