...
首页> 外文期刊>Quality Control and Applied Statistics >Statistical modeling:The two cultures
【24h】

Statistical modeling:The two cultures

机译:统计建模:两种文化

获取原文
获取原文并翻译 | 示例
           

摘要

Two goals of any data analysis are observed to be prediction and information.The paper identifies two different approaches (or cultures).In MC1 the data modeling culture prevalent in 98% of all statisticians,the data are assumed to be generated by a given stochastic data model (like linear regression,logistic regression or Cox model).Model validation is done through goodness-of-fit tests and examination of residuals.IN MC2:The algorithmic modeling culture observed in only 2% of the statisticians and many other fields;this method treats the data mechanism as unknown;and uses an algorithm that operates on the input vector X to predict response vector Y.The method can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets.The theory in this field shifts focus from data models to the properties of algorithms; it characterizes their strength as preductors,convergence if they are iterative,and on what gives them good predictive accuracy.Model validation is done by measuring the predictive accuracy.The paper concludes that MC1 has led to irrelevant theory, questionable scientific conclusions and has kept statisticians away from working on a large range of interesting current problems.If the goal of statistics is to use data to solve problems,them statisticians have to move away from exclusive dependence on data models and adopt a more diverse set of tools.The author also reviews three advances that occurred in the last five years in AM or also called machine learning: ·A1:Rashomon:The multiplicity of good models. ·A2:Occam:The conflict between simplicity and accuracy. ·A3:Bellman:Dimensionality - curse or blessing.The paper also presents AM applications to three data sets to show that AM can produce more and more reliable information about the structure of the relationship between inputs and outputs than data models. (37 refs.)
机译:观察到任何数据分析的两个目标是预测和信息。本文确定了两种不同的方法(或文化)。在MC1中,数据建模文化在98%的统计人员中普遍存在,假定数据是由给定的随机数生成的数据模型(例如线性回归,逻辑回归或Cox模型)。模型拟合通过拟合优度检验和残差检验来完成。INMC2:仅在2%的统计学家和许多其他领域中观察到的算法建模文化;该方法将数据机制视为未知;并使用对输入向量X进行运算的算法来预测响应向量Y.该方法既可以用于大型复杂数据集,也可以作为更准确,信息量更大的数据模型来替代较小的数据建模数据集。该领域的理论将重点从数据模型转移到算法的属性。通过对预测精度的测量来进行模型验证。论文的结论是,MC1导致了不相关的理论,可疑的科学结论,并保留了统计学家的特征。而不是处理大量有趣的当前问题。如果统计的目标是使用数据来解决问题,那么统计学家就必须摆脱对数据模型的完全依赖,而采用一套更多样化的工具。过去五年在AM中发生的三项进步或也称为机器学习:·A1:Rashomon:优质模型的多样性。 ·A2:Occam:简单性和准确性之间的冲突。 ·A3:贝尔曼:维度-诅咒或祝福。本文还介绍了AM在三个数据集上的应用,以表明AM可以提供比数据模型更多的关于输入和输出之间关系结构的可靠信息。 (37参考)

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号