This course describes how to compare multiple means in R using the ANOVA (Analysis of Variance) method and variants, including:
- ANOVA test for comparing independent measures.
- Repeated-measures ANOVA, which is used for analyzing data where same subjects are measured more than once.
- Mixed ANOVA, which is used to compare the means of groups cross-classified by at least two factors, where one factor is a “within-subjects” factor (repeated measures) and the other factor is a “between-subjects” factor.
- ANCOVA (analyse of covariance), an extension of the one-way ANOVA that incorporate a covariate variable.
- MANOVA (multivariate analysis of variance), an ANOVA with two or more continuous outcome variables.
We also provide R code to check ANOVA assumptions and perform Post-Hoc analyses. Additionally, we’ll present:
- Kruskal-Wallis test, which is a non-parametric alternative to the one-way ANOVA test.
- Friedman test, which is a non-parametric alternative to the one-way repeated measures ANOVA test.
R functions and packages
There are different functions/packages in R for computing ANOVA. These include:
aov()[stats]: Computes type I sum of squares (SS). Should be only used when you have balanced designs (group sizes are equal).
Anova()[car]: Computes type-II and type-III sum of squares. Type-II will yield identical ANOVA results as type-I when the data are balanced. When data are unbalanced, type-III will emulate the approach taken by popular commercial statistics packages like SAS and SPSS, but this approach is not without criticism.
anova_test()[rstatix]: Wrappers around the function
Anova()[car] for facilitating the analysis of factorial experiments, including purely ‘within-Ss’ designs (repeated measures), purely ‘between-Ss’ designs, and mixed ‘within-and-between-Ss’ designs.
The advantage of
anova_test() [rstatix] is that it supports both model and formula as inputs. Variables can be also specified as character vector using the arguments
covariate. Read more in the documentation by typing
?anova_test in R console. It provides a simple and intuitive pipe-friendly framework, coherent with the
tidyverse design philosophy. Additionally, it supports grouped data as returned by the function
dplyr::group_by(). The results include ANOVA table, generalized effect size and some assumption checks.
In this guide, we’ll use mainly the function
- The outcome variable, also known as dependent variable (dv), should be numeric
- The grouping variables, also known as predictors or independent variables, should be factors. If you want to compute ANCOVA models, you can also add numeric predictors.
- Do not use the R base functions aov() and anova() to get ANOVA tables unless you know what you are doing. They compute the type-I sum of squares, which is not, for example, suitable for unbalanced designs. The results, obtained with the default options of theses functions, are different from those obtained with commercial stats softwares, including SPSS and SAS, and most other stats packages. These differences are important and will be confusing and give you misleading results unless you understand them.
Follow the recommendations below:
- If you have a factorial design with independent measures, you can define your model using
lm()and then use
car::Anova()to calculate F tests.
- If you have perfect balanced repeated measures design with no missing values, then use
- If you have an unbalanced repeated measures design, or you repeated measures with missing data, use linear mixed models instead via the