Statistics, Handle with Care: Detecting Multiple Model Components with the Likelihood Ratio Test

Rostislav Protassov; David A. van Dyk; Alanna Connors; Vinay L. Kashyap; Aneta Siemiginowska

摘要

The likelihood ratio test (LRT) and the related F-test, popularized in astrophysics by Eadie and coworkers in 1971, Bevington in 1969, Lampton, Margon, & Bowyer, in 1976, Cash in 1979, and Avni in 1978, do not (even asymptotically) adhere to their nominal χ2 and F-distributions in many statistical tests common in astrophysics, thereby casting many marginal line or source detections and nondetections into doubt. Although the above authors illustrate the many legitimate uses of these statistics, in some important cases it can be impossible to compute the correct false positive rate. For example, it has become common practice to use the LRT or the F-test to detect a line in a spectral model or a source above background despite the lack of certain required regularity conditions. (These applications were not originally suggested by Cash or by Bevington.) In these and other settings that involve testing a hypothesis that is on the boundary of the parameter space, contrary to common practice, the nominal χ2 distribution for the LRT or the F-distribution for the F-test should not be used. In this paper, we characterize an important class of problems in which the LRT and the F-test fail and illustrate this nonstandard behavior. We briefly sketch several possible acceptable alternatives, focusing on Bayesian posterior predictive probability values. We present this method in some detail since it is a simple, robust, and intuitive approach. This alternative method is illustrated using the gamma-ray burst of 1997 May 8 (GRB 970508) to investigate the presence of an Fe K emission line during the initial phase of the observation. There are many legitimate uses of the LRT and the F-test in astrophysics, and even when these tests are inappropriate, there remain several statistical alternatives (e.g., judicious use of error bars and Bayes factors). Nevertheless, there are numerous cases of the inappropriate use of the LRT and similar tests in the literature, bringing substantive scientific results into question.

机译：似然比检验（LRT）和相关的F检验在1971年由Eadie及其同事，1969年在Bevington，1976年在Lampton，Margon和Bowyer，1979年在Cash，在1978年在Avni上在天体物理学中普及了（甚至渐近地）在天体物理学中常见的许多统计检验中都遵循其名义χ2和F分布，从而使许多边际线或源探测和非探测成为疑问。尽管以上作者说明了这些统计信息的许多合法用途，但在某些重要情况下，可能无法计算出正确的误报率。例如，尽管缺少某些所需的规律性条件，使用LRT或F检验来检测光谱模型中的一条线或背景之上的源已成为一种常见的做法。（这些应用最初不是Cash或Bevington提出的。）在这些和其他设置中，涉及检验参数空间边界上的假设，这与通常的做法相反，是LRT或F-的名义χ2分布。 F测试的分布不应该使用。在本文中，我们描述了LRT和F检验失败的一类重要问题，并说明了这种非标准行为。我们简要地概述了几种可能的可接受替代方法，重点放在贝叶斯后验预测概率值上。由于此方法是一种简单，可靠且直观的方法，因此我们将对其进行详细介绍。使用1997年5月8日的伽马射线暴（GRB 970508）说明了该替代方法，以研究观测初始阶段Fe K发射谱线的存在。 LRT和F检验在天体物理学中有许多合法用途，即使这些检验不适当，仍存在几种统计选择（例如，明智地使用误差线和贝叶斯系数）。然而，在文献中有许多不适当使用LRT和类似测试的案例，这给实质性的科学结果带来了疑问。

Statistics, Handle with Care: Detecting Multiple Model Components with the Likelihood Ratio Test

摘要

著录项

相关主题

期刊订阅