首页> 外文期刊>The Review of Economic Studies >More Data or Better Data? A Statistical Decision Problem
【24h】

More Data or Better Data? A Statistical Decision Problem

机译:更多数据或更好的数据? 统计决策问题

获取原文
获取原文并翻译 | 示例
           

摘要

When designing data collection, crucial questions arise regarding how much data to collect and how much effort to expend to enhance the quality of the collected data. To make choice of sample design a coherent subject of study, it is desirable to specify an explicit decision problem. We use theWald framework of statistical decision theory to study allocation of a budget between two or more sampling processes. These processes all draw random samples from a population of interest and aim to collect data that are informative about the sample realizations of an outcome. They differ in the cost of data collection and the quality of the data obtained. One may incur lower cost per sample member but yield lower data quality than another. Increasing the allocation of budget to a low-cost process yields more data, while increasing the allocation to a high-cost process yields better data. We initially view the concept of "better data" abstractly and then fix attention on two important cases. In both cases, a high-cost sampling process accurately measures the outcome of each sample member. The cases differ in the data yielded by a low-cost process. In one, the low-cost process has non-response and in the other it provides a low-resolution interval measure of each sample member's outcome. In these settings, we study minimax-regret sample design for prediction of a real-valued outcome under square loss; that is, design which minimizes maximum mean square error. The analysis imposes no assumptions that restrict the unobserved outcomes. Hence, the decision maker must cope with both the statistical imprecision of finite samples and the partial identification of the true state of nature.
机译:在设计数据收集时,关键问题会出现,即需要收集多少数据,以及需要花费多少精力来提高所收集数据的质量。为了使样本设计的选择成为一个连贯的研究主题,需要指定一个明确的决策问题。我们使用统计决策理论的沃尔德框架来研究两个或多个抽样过程之间的预算分配。这些过程都是从感兴趣的人群中随机抽取样本,旨在收集有关结果样本实现的信息。它们在数据收集的成本和获得的数据的质量上有所不同。每个样本成员的成本可能较低,但产生的数据质量却较低。增加对低成本流程的预算分配会产生更多数据,而增加对高成本流程的分配会产生更好的数据。我们最初抽象地看待“更好的数据”的概念,然后关注两个重要案例。在这两种情况下,高成本的抽样过程都能准确地衡量每个样本成员的结果。这些案例在低成本流程产生的数据上有所不同。一种是低成本流程没有响应,另一种是它提供了每个样本成员结果的低分辨率间隔度量。在这些情况下,我们研究了在平方损失下预测实值结果的极大极小后悔样本设计;也就是说,最大均方误差最小的设计。该分析没有强加任何限制未观察结果的假设。因此,决策者必须同时应对有限样本的统计不精确性和对自然界真实状态的部分识别。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号