Comparing Multiple Means in R

Comparing Multiple Means in R
Featured

Comparing Multiple Means in R

Course description

This course describes how to compare multiple means in R using the ANOVA (Analysis of Variance) method and variants, including:

  • ANOVA test for comparing independent measures.
  • Repeated-measures ANOVA, which is used for analyzing data where same subjects are measured more than once.
  • Mixed ANOVA, which is used to compare the means of groups cross-classified by at least two factors, where one factor is a “within-subjects” factor (repeated measures) and the other factor is a “between-subjects” factor.
  • ANCOVA (analyse of covariance), an extension of the one-way ANOVA that incorporate a covariate variable.
  • MANOVA (multivariate analysis of variance), an ANOVA with two or more continuous outcome variables.

We also provide R code to check ANOVA assumptions and perform Post-Hoc analyses. Additionally, we’ll present:

  • Kruskal-Wallis test, which is a non-parametric alternative to the one-way ANOVA test.
  • Friedman test, which is a non-parametric alternative to the one-way repeated measures ANOVA test.

Related Book

Practical Statistics in R II - Comparing Groups: Numerical Variables

R functions and packages

There are different functions/packages in R for computing ANOVA. These include:

  • aov() [stats]: Computes type I sum of squares (SS). Should be only used when you have balanced designs (group sizes are equal).
  • Anova() [car]: Computes type-II and type-III sum of squares. Type-II will yield identical ANOVA results as type-I when the data are balanced. When data are unbalanced, type-III will emulate the approach taken by popular commercial statistics packages like SAS and SPSS, but this approach is not without criticism.
  • ezANOVA() [ez], car_aov() [afex] and anova_test() [rstatix]: Wrappers around the function Anova() [car] for facilitating the analysis of factorial experiments, including purely ‘within-Ss’ designs (repeated measures), purely ‘between-Ss’ designs, and mixed ‘within-and-between-Ss’ designs.

The advantage of anova_test() [rstatix] is that it supports both model and formula as inputs. Variables can be also specified as character vector using the arguments dv, wid, between, within, covariate. Read more in the documentation by typing ?anova_test in R console. It provides a simple and intuitive pipe-friendly framework, coherent with the tidyverse design philosophy. Additionally, it supports grouped data as returned by the function dplyr::group_by(). The results include ANOVA table, generalized effect size and some assumption checks.

In this guide, we’ll use mainly the function anova_test().

Recommendations

  • The outcome variable, also known as dependent variable (dv), should be numeric
  • The grouping variables, also known as predictors or independent variables, should be factors. If you want to compute ANCOVA models, you can also add numeric predictors.
  • Do not use the R base functions aov() and anova() to get ANOVA tables unless you know what you are doing. They compute the type-I sum of squares, which is not, for example, suitable for unbalanced designs. The results, obtained with the default options of theses functions, are different from those obtained with commercial stats softwares, including SPSS and SAS, and most other stats packages. These differences are important and will be confusing and give you misleading results unless you understand them.

Follow the recommendations below:

  • If you have a factorial design with independent measures, you can define your model using lm() and then use rstatix::anova_test() or car::Anova() to calculate F tests.
  • If you have perfect balanced repeated measures design with no missing values, then use rstatix::anova_test().
  • If you have an unbalanced repeated measures design, or you repeated measures with missing data, use linear mixed models instead via the lme4::lmer().



Version: Français

Lessons

  1. The ANOVA test (or Analysis of Variance) is used to compare the mean of multiple groups. This chapter describes the different types of ANOVA for comparing independent groups, including: 1) One-way ANOVA: an extension of the independent samples t-test for comparing the means in a situation where there are more than two groups. 2) two-way ANOVA used to evaluate simultaneously the effect of two different grouping variables on a continuous outcome variable. 3) three-way ANOVA used to evaluate simultaneously the effect of three different grouping variables on a continuous outcome variable.
  2. The repeated-measures ANOVA is used for analyzing data where same subjects are measured more than once. This chapter describes the different types of repeated measures ANOVA, including: 1) One-way repeated measures ANOVA, an extension of the paired-samples t-test for comparing the means of three or more levels of a within-subjects variable. 2) two-way repeated measures ANOVA used to evaluate simultaneously the effect of two within-subject factors on a continuous outcome variable. 3) three-way repeated measures ANOVA used to evaluate simultaneously the effect of three within-subject factors on a continuous outcome variable.
  3. The Mixed ANOVA is used to compare the means of groups cross-classified by two different types of factor variables, including: i) between-subjects factors, which have independent categories (e.g., gender: male/female). ii) within-subjects factors, which have related categories also known as repeated measures (e.g., time: before/after treatment). This chapter describes how to compute and interpret the different mixed ANOVA tests in R.
  4. The Analysis of Covariance (ANCOVA) is used to compare means of an outcome variable between two or more groups taking into account (or to correct for) variability of other variables, called covariates. In this chapter, you will learn how to compute and interpret the one-way and the two-way ANCOVA in R.
  5. The Multivariate Analysis Of Variance (MANOVA) is an ANOVA with two or more continuous outcome (or response) variables. The one-way MANOVA tests simultaneously statistical differences for multiple response variables by one grouping variables. This chapter describes how to compute one-way MANOVA in R.
  6. The Kruskal-Wallis test is a non-parametric alternative to the one-way ANOVA test. It's recommended when the assumptions of one-way ANOVA test are not met. This chapter describes how to compute the Kruskal-Wallis test using the R software.
  7. The Friedman test is a non-parametric alternative to the one-way repeated measures ANOVA test. It extends the Sign test in the situation where there are more than two groups to compare. It's recommended when the normality assumptions of the one-way repeated measures ANOVA test is not met or when the dependent variable is measured on an ordinal scale. In this chapter, you will learn how to compute Friedman test in R and to perform pairwise-comparison between groups.

Comment ( 1 )

  • Matt Berry

    Having issues with trying to use ggplot to display emmeans pairwise results. I am wanting to see the contrast between all years not just within the same year.
    r script:
    library(emmeans) ### estimated marginal means and p values
    library(sjstats) ### partial eta squared and cohens f effect size
    library(lme4) ### estimated the multi level model (random intercept for participants)
    library(lmerTest) ### gives more comprehsive anova output with p values
    library(MuMIn) ### R2 for the model
    library(tidyverse)
    library(ggpubr)
    library(rstatix)
    library(dplyr)
    library(readxl)
    library(scales)
    library(ggsignif)
    library(sjPlot)

    #load data for One way linear mixed models, repeated measures ##########################
    glm_test <- read_excel("C:/Users/maber/Documents/Clover Valley/CLOVER VALLEY/veg data/LPI/UPL_cover.xlsx")
    view(glm_test)
    head(glm_test, 3)
    #factor data#############tell it you have factors and what they are called##############
    glm_test$year <- factor(glm_test$year,
    levels = c(1, 2, 3),
    labels = c("2018", "2019", "2020"))

    glm_test$transect <- factor(glm_test$transect,
    levels = c(1, 2,3),
    labels = c("Lower Meadow", "Middle Meadow", "Upper Meadow"))

    ###R knows your factors, the below command brakes down your variables
    summary(glm_test)

    #run linear model with cover as dependent varible, year as #############################
    #independent var. for the plot by stating + (1| plot), This makes it a repeated measures
    #we call our linear model lmer1
    #set up the model use the following code

    lmer2 <- lmer(cover ~ as.factor(year) * as.factor(transect) + (1|plot),
    data = glm_test)

    #linear model dep variable predicted by the indep variable (year)#####################
    #run the model with the following code
    anova(lmer2)
    summary(aov(cover ~ as.factor(year) * as.factor(transect) + Error (plot),
    data = glm_test))

    #post Hoc comparisons###################################################
    #emmeans(lmer2, list(pairwise ~ year), adjust = "bonferroni")

    emmeans(lmer2, list(pairwise ~ year), adjust = "tukey")

    # if there was an interaction your code would be this###################

    emmeans(lmer2, pairwise ~ year | transect)

    tab_model(lmer2)

    #Visulize data #########################################################

    ggboxplot(glm_test, x = "transect", y = "cover", title = "Total % Cover of Upland Species",
    ylab = " % Cover UPL Species",
    color = "year", ggtheme = theme_gray(base_size = 14))+
    scale_x_discrete(labels = label_wrap_gen(7)) +
    stat_compare_means(comparisons = my_comparisons, label.y = c(65, 75, 80))+
    stat_compare_means(label.y = 82)

    The results I want to plot are:
    transect = Upper Meadow:
    contrast estimate SE df t.ratio p.value
    2018 – 2019 27.500 7.15 38 3.848 0.0013
    2018 – 2020 20.000 7.15 38 2.798 0.0214
    2019 – 2020 -7.500 7.15 38 -1.049 0.5509

    Link to HTML doc. file:///C:/Users/maber/Documents/Clover%20Valley/CLOVER%20VALLEY/veg%20data/one%20way%20linear%20mixed%20model,%20repeated%20Meas/two-way_liner_mixed_model_repeated_measures.html
    data:
    plot transect year cover
    D11_NORTH 1 1 24
    D11_SOUTH 1 1 16
    R27_NORTH 1 1 18
    R27_SOUTH 1 1 8
    R28_NORTH 1 1 64
    R28_SOUTH 1 1 54
    D11_NORTH 1 2 46
    D11_SOUTH 1 2 38
    R27_NORTH 1 2 2
    R27_SOUTH 1 2 0
    R28_NORTH 1 2 20
    R28_SOUTH 1 2 34
    D11_NORTH 1 3 44
    D11_SOUTH 1 3 36
    R27_NORTH 1 3 0
    R27_SOUTH 1 3 0
    R28_NORTH 1 3 26
    R28_SOUTH 1 3 38
    D13_NORTH 2 1 44
    D13_SOUTH 2 1 30
    D14_NORTH 2 1 0
    D14_SOUTH 2 1 6
    R25_NORTH 2 1 14
    R25_SOUTH 2 1 2
    R26_NORTH 2 1 20
    R26_SOUTH 2 1 42
    D13_NORTH 2 2 6
    D13_SOUTH 2 2 0
    D14_NORTH 2 2 2
    D14_SOUTH 2 2 0
    R25_NORTH 2 2 0
    R25_SOUTH 2 2 14
    R26_NORTH 2 2 0
    R26_SOUTH 2 2 0
    D13_NORTH 2 3 2
    D13_SOUTH 2 3 8
    D14_NORTH 2 3 0
    D14_SOUTH 2 3 10
    R25_NORTH 2 3 16
    R25_SOUTH 2 3 0
    R26_NORTH 2 3 18
    R26_SOUTH 2 3 0
    R21_NORTH 3 1 32
    R21_SOUTH 3 1 82
    R22_NORTH 3 1 52
    R22_SOUTH 3 1 46
    R23_NORTH 3 1 64
    R23_SOUTH 3 1 68
    R24_NORTH 3 1 8
    R24_SOUTH 3 1 8
    R21_NORTH 3 2 10
    R21_SOUTH 3 2 16
    R22_NORTH 3 2 10
    R22_SOUTH 3 2 22
    R23_NORTH 3 2 48
    R23_SOUTH 3 2 28
    R24_NORTH 3 2 4
    R24_SOUTH 3 2 2
    R21_NORTH 3 3 6
    R21_SOUTH 3 3 10
    R22_NORTH 3 3 16
    R22_SOUTH 3 3 14
    R23_NORTH 3 3 54
    R23_SOUTH 3 3 82
    R24_NORTH 3 3 16
    R24_SOUTH 3 3 2

Give a comment

Want to post an issue with R? If yes, please make sure you have read this: How to Include Reproducible R Script Examples in Datanovia Comments

Teachers