首页> 外文学位 >Comparison and evaluation of the effect of outliers on ordinary least squares and Theil nonparametric regression with the evaluation of standard error estimates for the Theil nonparametric regression method

【24h】

Comparison and evaluation of the effect of outliers on ordinary least squares and Theil nonparametric regression with the evaluation of standard error estimates for the Theil nonparametric regression method

机译：异常值对普通最小二乘法和Theil非参数回归的影响的比较和评估，以及Theil非参数回归方法的标准误差估计的评估

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Introduction. Detection of outliers in Ordinary Least Squares (OLS) Regression is important for researchers who want to prevent spurious values from affecting slope and intercept estimates. Visual inspection, and removing values that 'look' like outliers may introduce selection bias. Through the use of a simulation study, this dissertation evaluates the accuracy and efficiency of the OLS versus the Theil non-parametric regression method in the presence of outliers, across small sample sizes and different correlation levels. In addition the study tests the Tukey standard error of the median, the Kendall's tau, and the Bootstrap for use as a standard error for the Theil procedure.;Methods. Simulated data sets were generated in three correlation levels (rho = 0.50, rho = 0.75, and rho = 0.90) linked with three sample sizes (n = 5, n = 15, and n = 25). Outliers were added to various positions in the data sets and OLS and Theil regression methods were calculated on all data sets. The slope and intercept estimates were compared back to the simulation specifications to determine accuracy. In addition the three standard error methods were tested against the simulation estimates of error for the Theil procedure to determine whether they provided accurate enough estimates to be useful. Finally, the simulation standard error estimates for the Theil and OLS estimates of slope and intercepts were compared to determine which procedure was relatively more efficient.;Results. Both OLS and Theil regression estimates were accurate in situations when no outliers were present regardless of correlation level and sample size. When outliers were present in the data the Theil procedure always provided more accurate estimates than OLS, however when outliers were in the tails of the distribution and the samples were small these Theil slope and intercept estimates were not useful. Differences between simulation values and OLS and Theil estimates are smaller as correlation and sample size increases. In general, when no outliers are present OLS estimates were more efficient, while when outliers were present the reverse was true. Standard error estimates for the Theil procedure demonstrate that Bootstrap and Tukey's method provide similar results, however these are often not useful because of the great difference between standard error estimates and simulation values. Kendall's Tau was not found to be useful.;Conclusions. When outliers are present, both OLS and Theil procedure provide useful estimates of both slope and intercept. When outliers are present, the Theil procedure should be used, but caution should be used when outliers are in the tails of the 'y' variables. Bootstrap standard errors are generally more accurate for larger sample sizes, but are not accurate when samples are small. In small 'n' situations the Tukey method is more accurate for both slope and intercept. In general, no universal recommendation for a standard error suitable for the Theil procedure can be made.

机译：介绍。对于希望防止虚假值影响斜率并截取估计值的研究人员，在普通最小二乘（OLS）回归中检测异常值非常重要。目视检查以及删除“看起来”异常值的值可能会引入选择偏差。通过模拟研究，本文在较小样本量和不同相关水平下，在存在异常值的情况下评估了OLS与Theil非参数回归方法的准确性和效率。此外，该研究还测试了中位数的Tukey标准误差，Kendall的tau和Bootstrap，将其用作Theil程序的标准误差。模拟数据集以三个相关级别（rho = 0.50，rho = 0.75和rho = 0.90）生成，与三个样本大小（n = 5，n = 15和n = 25）相关。将异常值添加到数据集中的各个位置，并对所有数据集计算OLS和Theil回归方法。将斜率和截距估计值与仿真规范进行比较以确定准确性。此外，针对Theil程序的模拟误差估计值测试了三种标准误差方法，以确定它们是否提供了足够有用的估计值。最后，比较了Theil和OLS估计坡度和截距的仿真标准误差，以确定哪种程序相对更有效。在没有异常值的情况下，无论相关程度和样本大小如何，OLS和Theil回归估计值都是准确的。当数据中存在异常值时，Theil程序总是提供比OLS更准确的估计值，但是，当异常值位于分布的尾部且样本较小时，这些Theil斜率和截距估计值将无用。随着相关性和样本量的增加，仿真值与OLS和Theil估计之间的差异会变小。通常，当没有异常值时，OLS估计会更有效，而在存在异常值时，则相反。 Theil过程的标准误差估计值表明Bootstrap和Tukey的方法提供了相似的结果，但是由于标准误差估计值与模拟值之间的巨大差异，这些结果通常无用。没有发现肯德尔的Tau有用。如果存在异常值，则OLS和Theil程序都可以提供有用的斜率和截距估计值。如果存在离群值，则应使用Theil程序，但是当离群值位于'y'变量的尾部时，应格外小心。对于较大的样本量，Bootstrap标准错误通常更准确，但对于较小的样本，则不准确。在较小的“ n”情况下，Tukey方法对于斜率和截距都更准确。通常，无法针对适用于Theil程序的标准错误提出通用建议。

著录项

作者
Wasser, Thomas Emerson.;
展开▼
作者单位

Lehigh University.;

展开▼
授予单位 Lehigh University.;
学科 Mathematics.;Computer science.;Statistics.
学位 Ph.D.
年度 1998
页码 124 p.
总页数 124
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Theil-Sen nonparametric regression technique on univariate calibration, inverse regression and detection limits [J] . Lavagnini I., Badocco D., Pastore P., Talanta: The International Journal of Pure and Applied Analytical Chemistry . 2011,第Null期

机译：Theil-Sen非参数回归技术，用于单变量校准，逆回归和检测极限
2. Parametric (modified least squares) and non-parametric (Theil-Sen) linear regressions for predicting biophysical parameters in the presence of measurement errors [J] . Fernandes R, Leblanc SG Remote Sensing of Environment: An Interdisciplinary Journal . 2005,第3期

机译：参数（修改的最小二乘）和非参数（Theil-Sen）线性回归，用于在存在测量误差的情况下预测生物物理参数
3. A Note on the Theil-Sen Regression Estimator When the Regressor Is Random and the Error Term Is Heteroscedastic [J] . Pand R. Wilcox Biometrical Journal . 1998,第3期

机译：关于Theil-Sen回归估计量的注意事项，当回归变量是随机的且误差项是异方差时
4. Spline Estimate of Nonparametric Regression Function Under Martingale Difference Errors [C] . Xinqian Wu, Wancai Yang International Conference on MEMS, NANO and Smart Systems . 2012

机译：鞅差异错误下非参数回归函数的样条估计
5. An investigation of Type I error rate control for independent variable subset tests with a binary dependent variable using ordinary least squares, logistic regression analysis, and nonparametric regression [D] . LeMire, Steven D. 2005

机译：使用普通最小二乘，对数回归分析和非参数回归对具有二进制因变量的自变量子集测试的I型错误率控制进行研究
6. Testing and Estimating Shape-Constrained Nonparametric Density and Regression in the Presence of Measurement Error [O] . Raymond J. Carroll, Aurore Delaigle, Peter Hall -1

机译：在测量误差存在下测试和估计形状约束的非参数密度和回归
7. "A Karnel Regression of Phillips' Data" Abstract: Economists have assumed that the Phillips curve, which shows a positive (negative) relation between inflation and the output ratio (unemployment rate), may be mapped off the aggregate demand -aggregate supply apparatus. The paper shows that the Phillips curve requires that unlikely restrictions be put on the form of the aggregate supply and aggregate demand curves. In this case, it is inappropriate to treat data on inflation and capacity utilization as the basis for estimating an underlying formal model. The paper therefore uses a nonparametric, data-driven method to describe the data. This method, of kernel regression, shows the inflation-unemployment association in Phillips's sample to be negative on a global scale, yet irregular within particular ranges of unemployment. [O] . Nancy J. Wulwick, Y.P. Mack 100

机译：“菲利普斯数据的卡尔内尔回归”摘要：经济学家认为菲利普斯曲线显示了通货膨胀与产出比率（失业率）之间的正（负）关系，可以从总需求 - 聚集供应设备中绘制出来。该文表明，菲利普斯曲线要求对总供给和总需求曲线的形式进行不太可能的限制。在这种情况下，将通货膨胀和能力利用数据作为估算潜在正式模型的基础是不恰当的。因此，本文使用非参数，数据驱动的方法来描述数据。核心回归的这种方法表明菲利普斯样本中的通货膨胀 - 失业关联在全球范围内是负面的，但在特定的失业范围内是不规则的。

Comparison and evaluation of the effect of outliers on ordinary least squares and Theil nonparametric regression with the evaluation of standard error estimates for the Theil nonparametric regression method

摘要

著录项

相似文献

相关主题

期刊订阅