Applying Bootstrap Robust Regression Method on Data with Outliers

  • Ahmed M. Mami Department of Statistics, Faculty of Science, University of Benghazi, Benghazi, +128, Libya
  • Abobaker M. Jaber Department of Statistics, Faculty of Science, University of Benghazi, Benghazi, +128, Libya
  • Osama S. Almabrouk Department of Statistics, Faculty of Science, University of Benghazi, Benghazi, +128, Libya
Keywords: regression analysis, outliers, robust regression, bootstrap

Abstract

Identification and assessment of outliers have a key role in Ordinary Least Squares (OLS) regression analysis. This paper presents a robust two-stage procedure to identify outlying observations in regression analysis. The exploratory stage identifies leverage points and vertical outliers through a robust distance estimator based on Minimum Covariance Determinant (MCD). After deletion of these points, the confirmatory stage carries out an OLS analysis on the remaining subset of data and investigates the effect of adding back in the previously deleted observations. Cut-off points pertinent to different diagnostics are generated by bootstrapping and the cases are definitely labeled as good-leverage, bad leverage, vertical outliers and typical cases. This procedure is applied to four examples taken from the literature and it is effective in rightly pinpointing outlying observations, even in the presence of substantial masking. This procedure is able to identify and correctly classify vertical outliers, good and bad leverage points, through the use of jackknife-after-bootstrap robust cut-off points. Moreover its two stage structure makes it interactive and this enables the user to reach a deeper understanding of the dataset main features than resorting to an automatic procedure.

References

Beasley, D.A., Kuh, E., and Welsch, R.E (1980) Regression Diagnostics: Identifying Influential Data and sources of Collinearly. Wiley, New York.

Birkes, D. and Dodge, Y (1993) Alternative Methods of Regression. New York: John Wiley and Sons.

CHEN, C. (2002) Robust Regression and Outlier Detection with the ROBUSTREG procedure [online]. SUGI Paper, SAS Institute Inc., Cary, NC., http://www2.sas.com/proceedings/sugi27/p265-27.pdf

Cleveland, W. S., and McGill, R. (1984). Graphical perception: Theory, experimentation, and application to the development of graphical methods. Journal of the American statistical association,

‏ 554.-531 ,(783)79

COLE, S. R. (1999) Simple bootstrap statistical inference using the SAS system. Computer Methods and Programs in Biomedicine, 60, pp. 79–82.

Cook, R.D. (1977) Detection of Influential Observations in Linear Regression. Techno metrics 19: p 15-18.

DICICCIO, T. J., EFRON, B. (1996) Bootstrap confidence intervals. Statistical Science, 11(3), pp. 189–212.

Draper, N. R. and Smith, H (1998) Applied Regression Analysis. 3rd ed. New York: John Wiley & Sons.

Efron, B. (1979). Computers and the theory of statistics: thinking the unthinkable. SIAM review, 21(4), 460-480. ‏

Efron, B., and Tibshirani, R. J. (1993). CHAPMAN&HALL/CRC (Eds.), An Introduction to the Bootstrap. New York, U.S.A.

Fan, J., and Gijbels, I. (1996). Local Polynomial Modelling and its Applications. London: Chapman & Hall.

Fox, J. (1997) Applied Regression Analysis ,Linear Models, and Related Methods . Sage Publications.

Fox, J. (2002). Robust regression. An R and S-Plus companion to applied regression, 91. ‏

FREEDMAN, D. A. (1981) Bootstrapping regression models. The Annals of Statistics, 9(6), pp. 1218–1228.

Galton, F. (1886). Regression towards mediocrity in hereditary stature. The Journal of the Anthropological Institute of Great Britain and Ireland, 15, 246-263.

Jaber, A. M. (2008) On using Robust Regression. Unpublished

M.Sc. Thesis, University of Benghazi, Benghazi, Libya.

Huber, P. J, (1964)." Robust Estimation of a Location parameter. Annals of Mathematical Statistics 35:73, 101

Huber, P. J. (1981) Robust statistics. New York: John Wiley and Sons.

HUBERT, M., ROUSSEEUW, P. J., and VAN AELST. (2008) High-Breakdown Robust Multivariate Methods. Statistical Science, , 23(1), pp. 92–119.

Kutner, M. H., Nachtsheim, C., and Neter, J. (2004). Applied linear regression models. McGraw-Hill/Irwin.

Kleinbaum, D. G., Kupper, L. L., Muller, K. E. and Nizam, A. (1998) Applied Regression Analysis and Other Multivariable Methods. California: Duxbury Press.

Lane, K. (2002). What is robust regression and how do you it?, the Annual Meeting of the South Educational Research Association, Austin, Texas ED 466-697 P:15.

Olive, D.J. (2007) Applied Robust Statistics. Southern Illinois University Department of Mathematics.

Rahmatullah Imon, A.H.M. (2007).Cited at http://mnt.math.um.edu.my/ismweb/Announcement/Imon PG3.pdf

Rousseeuw, P. J. and Leroy, A. M. (1987) Robust Regression and Outlier Detection. New York: John Wiley and Sons.

ROUSSEEUW, P. J., LEROY, A. M.(2003) Robust Regression and Outlier Detection. John Willey and Sons , New Jersey, USA.

Ruppert, D., and Carroll, R. J. (1980). Trimmed least squares estimation in the linear model. Journal of the American Statistical Association, 75(372), 828-838. ‏

Stine, R. (1990). An introduction to bootstrap methods: examples and ideas. In Fox, J. and Long, J. S., editors, Modern Methods of Data Analysis, pages 325{373. Sage, Newbury Park, CA.

Published
2020-01-21
Section
Articles