This paper uses stochastic subsampling of the dataset to provide a frequentist approximation to what is known in the Bayesian framework as the posterior inclusion probability (PIP). The distinct merit of this contribution lies in the fact that it makes it easier for typically non-Bayesian-minded practitioners, of which there are many, to relate to the way the Bayesian paradigm allows the computation of the nicely interpretable variable importance. Despite its computationally intensive nature, due to the need to fitting a very large number of models, the proposed approach is readily applicable to both classification and regression tasks, and can be done in comparatively competitive computational times thanks to the availability of parallel computing facilities through cloud and cluster computing. Finally, the scheme proposed is very general and can therefore be easily adapted to all kinds of statistical prediction tasks. Application of the proposed method to some very famous benchmark datasets shows that it mimics the Bayesian counterpart quite well in the important context of variable selection.
展开▼