首页> 外文期刊>Australian & New Zealand journal of statistics >A few statistical principles for data science
【24h】

A few statistical principles for data science

机译:数据科学的几个统计原则

获取原文
获取原文并翻译 | 示例
           

摘要

In any other circumstance, it might make sense to define the extent of the terrain (Data Science) first, and then locate and describe the landmarks (Principles). But this data revolution we are experiencing defies a cadastral survey. Areas are continually being annexed into Data Science. For example, biometrics was traditionally statistics for agriculture in all its forms but now, in Data Science, it means the study of characteristics that can be used to identify an individual. Examples of non-intrusive measurements include height, weight, fingerprints, retina scan, voice, photograph/video (facial landmarks and facial expressions) and gait. A multivariate analysis of such data would be a complex project for a statistician, but a software engineer might appear to have no trouble with it at all. In any applied-statistics project, the statistician worries about uncertainty and quantifies it by modelling data as realisations generated from a probability space. Another approach to uncertainty quantification is to find similar data sets, and then use the variability of results between these data sets to capture the uncertainty. Both approaches allow 'error bars' to be put on estimates obtained from the original data set, although the interpretations are different. A third approach, that concentrates on giving a single answer and gives up on uncertainty quantification, could be considered as Data Engineering, although it has staked a claim in the Data Science terrain. This article presents a few (actually nine) statistical principles for data scientists that have helped me, and continue to help me, when I work on complex interdisciplinary projects.
机译:在任何其他情况下,首先定义地形(数据科学)的程度可能有意义,然后定位和描述地标(原则)。但是,我们正在经历的数据革命蔑视地籍调查。地区不断被附加到数据科学中。例如,生物识别学传统上,其所有形式的农业统计数据,但现在在数据科学中,这意味着研究可以用来识别个人的特征。非侵入式测量的示例包括高度,重量,指纹,视网膜扫描,语音,照片/视频(面部地标和面部表情)和步态。对这些数据的多变量分析将是一个复杂的统计学家项目,但是一个软件工程师可能似乎根本没有问题。在任何应用程序统计项目中,统计学家担心不确定性并通过将数据建模为从概率空间生成的可实现来定量它。不确定量化的另一种方法是找到类似的数据集,然后使用这些数据集之间的结果的可变性来捕获不确定性。虽然解释是不同的,但两种方法都允许允许从原始数据集获得的估计上进行估计。一种专注于给予单一答案并放弃不确定性量化的第三种方法可以被视为数据工程,尽管它已经在数据科学地形中取出了索赔。本文介绍了帮助我的数据科学家的几个(实际九)统计原则,并在我在复杂的跨学科项目工作时继续帮助我。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号