﻿ R: Testing Regression Assumptions

# Testing Regression Assumptions with R

Arndt Regorz, Dipl. Kfm. & M.Sc. Psychology, 08/20/2021

The following annotated code runs a multiple regression in R including the check of the different regression assumptions.

You will need the following R packages, each of which must be installed once before use, e.g. install.packages("olsrr"):

• olsrr
• jtools
• moments
• lmtest

In the following code you only have to adapt the bold code components to your own variable names (in the example: DV, IV1, IV2, IV3) and data frame names (in the example: my_data), as well as, if necessary, the number of desired decimal places and the desired confidence interval.

Note: If you run all the code at once, sometimes individual graphs are omitted. I have not yet been able to reconstruct the reason for this. In this respect I recommend to execute the code step by step (one graph at a time).

R code

# Multiple regression with test of regression assumptions

library(olsrr)
library(jtools)
library(moments)
library(lmtest)

# Calling the regression

# Regression
reg.fit <- lm(DV ~ IV1 + IV2 + IV3, data = my_data)

# Parameters for the regression output
my_confidence<- 0.95
my_digits <- 3

# Data for the linearity check
attach(my_data)
daten.plot <- data.frame(DV, IV1, IV2, IV3)
detach(my_data)

# 1 Regression output (with jtools package)

# 1.1 Unstandardized results
summ(reg.fit, confint=TRUE, ci.width = my_confidence,
digits = my_digits)

# 1.2 Standardized results
summ(reg.fit, scale=TRUE, transform.response = TRUE, digits=my_digits)

# 2 Regression diagnostics (with olsrr package, unless otherwise specified)

# 2.1 Homoskedasticity

# 2.1.1 Graphical test
# (should be a chaotic point cloud; problematic especially
# a funnel shape or a recognizably curved structure)
ols_plot_resid_fit(reg.fit)

# 2.1.2 Breusch Pagan Test - Signifikance test for heteroskedasticity
# (significant => heteroskedasticity)
ols_test_breusch_pagan(reg.fit)

# 2.2 Normality of the residuals

# 2.2.1 Histogram of residuals
# (The histogramm should show a normal
# distribution,
# especially at the tails of the distribution)
ols_plot_resid_hist(reg.fit)

# 2.2.2 QQ plot
# (The data points should be near the diagonal)
ols_plot_resid_qq(reg.fit)

# 2.2.3 Shapiro-Wilk test for normality
# (significant => residuals not normally distributed)
shapiro.test(reg.fit\$residuals)

# 2.2.4 Skewness and kurtosis (with moments package)
#(For normality skewness near 0 and kurtosis near 3)
skewness(reg.fit\$residuals)
kurtosis(reg.fit\$residuals)

# 2.2.5 Significance tests for skewness and kurtosis
#(with moments-Package)
#(significant => residuals not normally distributed)
agostino.test(reg.fit\$residuals)
anscombe.test(reg.fit\$residuals)

# 2.3 Linearity

# 2.3.1 Pairwise scatterplots
# (Only the scatterplots including the criterion
# variable are relevant)
pairs(daten.plot, pch = 19, lower.panel = NULL)

# 2.3.2 Rainbow test (with lmtest-Package) for linearity
# (significant => nonlinearity)
raintest(reg.fit)

# 2.4 Absence of strong multicollinearity

# (Problematic: VIF values above 10.0)
ols_vif_tol(reg.fit)

# 2.5 Outlier diagnostics

# 2.5.1 Studentized residuals
# (problematic: absolute values above 3 )
ols_plot_resid_stud(reg.fit)

# 2.5.2 Cook's distance
# (different cut-off values according to the literature
# the threshold here of 4/N is extremely conservative,
ols_plot_cooksd_chart(reg.fit)

# 2.5.3 Outlier & Leverage
# (problematic values that are: "outlier & leverage")
ols_plot_resid_lev(reg.fit)

# 2.5.4 DiffBetaS
# (Which observations have a strong influence
# on the parameter estimates?)
ols_plot_dfbetas(reg.fit)

# 2.6. Independence/uncorrelatedness of residuals

# This is a result of the sampling method
# (Cross sectional and without
# clusters/hierarchical data structures).

# 2.7 Scale properties

# This results from the scales used
# (not from an empirical test).

Documentation files for the R packages used in the code example above: