首页> 外文期刊>The Review of Economic Studies >More Data or Better Data? A Statistical Decision Problem
【24h】

More Data or Better Data? A Statistical Decision Problem

机译:更多数据或更好的数据? 统计决策问题

获取原文
获取原文并翻译 | 示例
           

摘要

When designing data collection, crucial questions arise regarding how much data to collect and how much effort to expend to enhance the quality of the collected data. To make choice of sample design a coherent subject of study, it is desirable to specify an explicit decision problem. We use theWald framework of statistical decision theory to study allocation of a budget between two or more sampling processes. These processes all draw random samples from a population of interest and aim to collect data that are informative about the sample realizations of an outcome. They differ in the cost of data collection and the quality of the data obtained. One may incur lower cost per sample member but yield lower data quality than another. Increasing the allocation of budget to a low-cost process yields more data, while increasing the allocation to a high-cost process yields better data. We initially view the concept of "better data" abstractly and then fix attention on two important cases. In both cases, a high-cost sampling process accurately measures the outcome of each sample member. The cases differ in the data yielded by a low-cost process. In one, the low-cost process has non-response and in the other it provides a low-resolution interval measure of each sample member's outcome. In these settings, we study minimax-regret sample design for prediction of a real-valued outcome under square loss; that is, design which minimizes maximum mean square error. The analysis imposes no assumptions that restrict the unobserved outcomes. Hence, the decision maker must cope with both the statistical imprecision of finite samples and the partial identification of the true state of nature.
机译:在设计数据收集时,有关收集的数据量和节约以提高收集数据质量的努力需要多少的问题。为了选择样本设计一个相干的研究主题,希望指定明确的决策问题。我们使用统计决策理论的沃尔德框架来研究两个或多个采样过程之间的预算。这些过程所有从感兴趣的人群中汲取随机样本,并旨在收集有关结果的样本实现的信息。它们在数据收集的成本和所获得的数据的质量方面不同。每个样本构件的成本可能会产生更低的成本,但能够比另一个更低的数据质量。增加预算分配到低成本过程会产生更多数据,同时增加对高成本过程的分配产生更好的数据。我们最初通过抽象地查看“更好数据”的概念,然后以两个重要案例为注意。在这两种情况下,高成本采样过程准确测量每个样品成员的结果。这种情况在低成本过程中产生的数据不同。在一个,低成本过程具有非响应,另一个提供了每个样本成员结果的低分辨率间隔测量。在这些设置中,我们研究了Minimax-Irteet样本设计,以便在方形损失下预测真实的结果;也就是说,设计最小化最大均方误差。分析没有限制未观察到的结果的假设。因此,决策者必须应对有限样本的统计不精确,以及局部识别性质的真实状态。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号