Abstract A predictive model that is trained with non-randomly selected samples can offer biased predictions for the population. This paper discusses when non-random selection is a problem. For the applications in which it is a problem, this paper presents a procedure for adjusting the predictions of random forest to account for non-random sampling of the training data. This adjustment results in more accurate predictions for the population. This paper also warns against the use of inverse probability weighting for analyzing selected samples.
展开▼