...
【24h】

Optimal prediction of the number of unseen species

机译:Optimal prediction of the number of unseen species

获取原文
获取原文并翻译 | 示例
           

摘要

Estimating the number of unseen species is an important problem in many scientific endeavors. Its most popular formulation, introduced by Fisher et al. Fisher RA, Corbet AS, Williams CB (1943) J Animal Ecol 12(1): 42-58, uses n samples to predict the number U of hitherto unseen species that would be observed if t.n new samples were collected. Of considerable interest is the largest ratio t between the number of new and existing samples for which U can be accurately predicted. In seminal works, Good and Toulmin Good I, Toulmin G (1956) Biometrika 43(102): 45-63 constructed an intriguing estimator that predicts U for all t 1, but without provable guarantees. We derive a class of estimators that provably predict U all of the way up to t. log n. We also show that this range is the best possible and that the estimator's mean-square error is near optimal for any t. Our approach yields a provable guarantee for the Efron-Thisted estimator and, in addition, a variant with stronger theoretical and experimental performance than existing methodologies on a variety of synthetic and real datasets. The estimators are simple, linear, computationally efficient, and scalable to massive datasets. Their performance guarantees hold uniformly for all distributions, and apply to all four standard sampling models commonly used across various scientific disciplines: multinomial, Poisson, hypergeometric, and Bernoulli product.

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号