...
首页> 外文期刊>Quality Control and Applied Statistics >Local case-control sampling: Efficient subsampling in imbalanced data sets
【24h】

Local case-control sampling: Efficient subsampling in imbalanced data sets

机译:局部病例对照抽样:不平衡数据集中的有效子抽样

获取原文
获取原文并翻译 | 示例
           

摘要

Now since there is enormous increase in data sets, it is necessary to contain them so that the computational cost is bearable. By doing so it is possible to have more number of experiments instead of gathering more information from limited number of experiments, refitting the models instead of changing conditions, allow for cross validation, bagging, boosting, bootstrapping like intensive methods, and using more sophisticated statistical methods on compressed data sets. This will help making use of the computational resources more efficiently. Imbalanced data sets are common nowadays. Data imbalances are of two types namely, marginal imbalance such as data sets for predicting click-through rates on online advertising, and conditional imbalance like email spam filtering where mistakes are uncommon. The data sets can have such imbalances. The approach to make the data balanced is by subsampling the original data that enriches rare events. This article proposes a data reduction scheme called local-control sampling to be used for fitting logistic regression models. This approach requires one parallelizable scan over the full data set that yields a potentially much smaller subsample containing almost half of the information as in the original data set. (20 refs.)
机译:现在,由于数据集的大量增加,有必要包含它们,以使计算成本可以承受。这样一来,就有可能进行更多次实验,而不是从有限次实验中收集更多信息,重新拟合模型,而不是更改条件,进行交叉验证,装袋,增强,自举(如强化方法),并使用更复杂的统计信息压缩数据集上的方法。这将有助于更有效地利用计算资源。如今,不平衡的数据集很常见。数据失衡有两种类型,即边际失衡(例如用于预测在线广告点击率的数据集)和条件失衡(例如电子邮件垃圾邮件过滤),其中错误很少见。数据集可能有这种不平衡。使数据平衡的方法是通过对丰富稀有事件的原始数据进行二次采样。本文提出了一种称为局部控制采样的数据缩减方案,用于拟合逻辑回归模型。这种方法需要对整个数据集进行一次可并行化扫描,以产生可能小得多的子样本,其中包含的原始信息几乎占原始数据集的一半。 (20篇)

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号