1. Discuss the nature, causes, consequences and remedies of each of the following problems we might encounter in regression analysis.
a) Muticollinearity
b) Hetroscedasticity
c) Autocorrelation
1a. Multicollinearity
Multicollinearity is when independent variables of a model are correlated (Arkes, 2019). A change in one independent variable leads to a change in at least one of the other independent variables. Causes of multicollinearity can be data-based or structural (Daoud, 2017). Data-based multicollinearity can occur due to insufficient data, dummy variables, using a variable that is a combination of two existing variables or using two identical or almost identical variables (Daoud, 2017). Consequences of multicollinearity include sensitive coefficients to small changes and reduction in the precision of the estimated coefficients, which lowers the model's power.
(b) Heteroscedasticity
Heteroscedasticity is a situation where the variance of residuals is not constant over the range of measured values. It results in an unequal scatter of residuals in a regression analysis (Arkes, 2019). One cause of heteroscedasticity is using a dataset with a wide range of values, resulting in outliers. For instance, a dataset with values from 1-10,000,000 can be skewed due to large values. Another cause is the omission of variables from the model. the consequence of heteroscedasticity is that it results in estimators that are not best, linear and unbiased (Arkes, 2019). Similarly, hypothesis tests of the estimated coefficients using t-test and f-test become invalid due to heteroscedasticity.
(c) Autocorrelation
Autocorrelation is the correlation of the same variable between two successive time intervals (Arkes, 2019). For instance, the degree of correlation between June's sales and May's sales. It can be caused by seasonal shocks that affect a variable differently at different periods. Sales, for instance, can increase during the Christmas holidays. Inertia or sluggishness of a variable to adjust can also result in autocorrelation (Arkes, 2019). Other causes of autocorrelation are mis-specification and data smoothing or manipulation. Autocorrelation leads to coefficient estimates that are not best, linear and unbiased (Arkes, 2019). Besides, it underestimates the variances of the estimates, which affects hypothesis testing (Arkes, 2019). Similarly, the coefficient of determination becomes overestimated, and all t-statistics become higher.
References
Arkes, J. (2019). Regression analysis: A practical introduction. Routledge.
Daoud, J. I. (2017, December). Multicollinearity and regression analysis. In Journal of Physics: Conference Series (Vol. 949, No. 1, p. 012009). IOP Publishing.
Comments
Leave a comment