Statistical Tests and Assumptions

Homogeneity of Variance Test in R

This chapter describes methods for checking the homogeneity of variances test in R across two or more groups.

Some statistical tests, such as two independent samples T-test and ANOVA test, assume that variances are equal across groups.

There are different variance tests that can be used to assess the equality of variances. These include:

  • F-test: Compare the variances of two groups. The data must be normally distributed.
  • Bartlett’s test: Compare the variances of two or more groups. The data must be normally distributed.
  • Levene’s test: A robust alternative to the Bartlett’s test that is less sensitive to departures from normality.
  • Fligner-Killeen’s test: a non-parametric test which is very robust against departures from normality.

Note that, the Levene’s test is the most commonly used in the literature.

You will learn how to compare variances in R using each of the tests mentioned above.



Contents:

Related Book

Practical Statistics in R II - Comparing Groups: Numerical Variables

Prerequisites

Load the tidyverse package for easy data manipulation

library(tidyverse)

Demo dataset: ToothGrowth. Inspect the data by displaying some random rows.

# Data preparation
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
# Inspect
set.seed(123)
sample_n(ToothGrowth, 6)
##    len supp dose
## 1 14.5   VC    1
## 2 25.8   OJ    1
## 3 25.5   VC    2
## 4 25.5   OJ    2
## 5 22.4   OJ    2
## 6  7.3   VC  0.5

F-test: Compare two variances

The F-test is used to assess whether the variances of two populations (A and B) are equal. You need to check whether the data is normally distributed (Chapter @ref(normality-test-in-r)) before using the F-test.

Applications. Comparing two variances is useful in several cases, including:

  • When you want to perform a two samples t-test, you need to check the equality of the variances of the two samples
  • When you want to compare the variability of a new measurement method to an old one. Does the new method reduce the variability of the measure?

The statistical hypotheses are:

  • Null hypothesis (H0): the variances of the two groups are equal.
  • Alternative hypothesis (Ha): the variances are different.

Computation. The F-test statistic can be obtained by computing the ratio of the two variances Var(A)/Var(B). The more this ratio deviates from 1, the stronger the evidence for unequal population variances.

The F-test can be easily computed in R using the function var.test(). In the following R code, we want to test the equality of variances between the two groups OJ and VC (in the column “supp”) for the variable len.

res <- var.test(len ~ supp, data = ToothGrowth)
res
## 
##  F test to compare two variances
## 
## data:  len by supp
## F = 0.6, num df = 30, denom df = 30, p-value = 0.2
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.304 1.342
## sample estimates:
## ratio of variances 
##              0.639

Interpretation. The p-value is p = 0.2 which is greater than the significance level 0.05. In conclusion, there is no significant difference between the two variances.

Compare multiple variances

This section describes how to compare multiple variances in R using Bartlett, Levene or Fligner-Killeen tests.

Statistical hypotheses. For all these tests that follow, the null hypothesis is that all populations variances are equal, the alternative hypothesis is that at least two of them differ. Consequently, p-values less than 0.05 suggest variances are significantly different and the homogeneity of variance assumption has been violated.

Bartlett’s test

  1. Bartlett’s test with one independent variable:
res <- bartlett.test(weight ~ group, data = PlantGrowth)
res
## 
##  Bartlett test of homogeneity of variances
## 
## data:  weight by group
## Bartlett's K-squared = 3, df = 2, p-value = 0.2

From the output, it can be seen that the p-value of 0.237 is not less than the significance level of 0.05. This means that there is no evidence to suggest that the variance in plant growth is statistically significantly different for the three treatment groups.

  1. Bartlett’s test with multiple independent variables: the interaction() function must be used to collapse multiple factors into a single variable containing all combinations of the factors.
bartlett.test(len ~ interaction(supp,dose), data=ToothGrowth)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  len by interaction(supp, dose)
## Bartlett's K-squared = 7, df = 5, p-value = 0.2

Levene’s test

The function leveneTest() [in car package] can be used.

library(car)
# Levene's test with one independent variable
leveneTest(weight ~ group, data = PlantGrowth)
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  2    1.12   0.34
##       27
# Levene's test with multiple independent variables
leveneTest(len ~ supp*dose, data = ToothGrowth)
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  5    1.71   0.15
##       54

Fligner-Killeen’s test

The Fligner-Killeen’s test is one of the many tests for homogeneity of variances which is most robust against departures from normality.

The R function fligner.test() can be used to compute the test:

fligner.test(weight ~ group, data = PlantGrowth)
## 
##  Fligner-Killeen test of homogeneity of variances
## 
## data:  weight by group
## Fligner-Killeen:med chi-squared = 2, df = 2, p-value = 0.3

Summary

This article presents different tests for assessing the equality of variances between groups, an assumption made by the two-independent samples t-test and ANOVA tests.

The commonly used method is the Levene’s test available in the car package. A pipe-friendly wrapper levene_test() is also provided in the rstatix package.



Version: Français

Normality Test in R (Prev Lesson)
(Next Lesson) Mauchly’s Test of Sphericity in R
Back to Statistical Tests and Assumptions

Comment ( 1 )

  • Tayo aborisade

    pls what is the difference between “sample_n ” and “sample_n_by”. Thanks

Give a comment

Want to post an issue with R? If yes, please make sure you have read this: How to Include Reproducible R Script Examples in Datanovia Comments

Teacher
Alboukadel Kassambara
Role : Founder of Datanovia
Read More