## Course description

This course describes how to compare multiple means in R using the **ANOVA** (Analysis of Variance) method and variants, including:

**ANOVA test**for comparing independent measures.**Repeated-measures ANOVA**, which is used for analyzing data where same subjects are measured more than once.**Mixed ANOVA**, which is used to compare the means of groups cross-classified by at least two factors, where one factor is a “*within-subjects*” factor (repeated measures) and the other factor is a “*between-subjects*” factor.**ANCOVA**(analyse of covariance), an extension of the one-way ANOVA that incorporate a covariate variable.**MANOVA**(multivariate analysis of variance), an ANOVA with two or more continuous outcome variables.

We also provide R code to check ANOVA assumptions and perform Post-Hoc analyses. Additionally, we’ll present:

**Kruskal-Wallis test**, which is a non-parametric alternative to the one-way ANOVA test.**Friedman test**, which is a non-parametric alternative to the one-way repeated measures ANOVA test.

#### Related Book

Practical Statistics in R II - Comparing Groups: Numerical Variables## R functions and packages

There are different functions/packages in R for computing ANOVA. These include:

`aov()`

[stats]: Computes type I sum of squares (SS). Should be only used when you have balanced designs (group sizes are equal).`Anova()`

[car]: Computes type-II and type-III sum of squares. Type-II will yield identical ANOVA results as type-I when the data are balanced. When data are unbalanced, type-III will emulate the approach taken by popular commercial statistics packages like SAS and SPSS, but this approach is not without criticism.`ezANOVA()`

[ez],`car_aov()`

[afex] and`anova_test()`

[rstatix]: Wrappers around the function`Anova()`

[car] for facilitating the analysis of factorial experiments, including purely ‘within-Ss’ designs (repeated measures), purely ‘between-Ss’ designs, and mixed ‘within-and-between-Ss’ designs.

The advantage of `anova_test()`

[rstatix] is that it supports both model and formula as inputs. Variables can be also specified as character vector using the arguments `dv`

, `wid`

, `between`

, `within`

, `covariate`

. Read more in the documentation by typing `?anova_test`

in R console. It provides a simple and intuitive pipe-friendly framework, coherent with the `tidyverse`

design philosophy. Additionally, it supports grouped data as returned by the function `dplyr::group_by()`

. The results include ANOVA table, generalized effect size and some assumption checks.

In this guide, we’ll use mainly the function `anova_test()`

.

## Recommendations

- The
**outcome**variable, also known as dependent variable (dv), should be numeric - The
**grouping variables**, also known as predictors or independent variables, should be factors. If you want to compute ANCOVA models, you can also add numeric predictors. **Do not use the R base functions aov() and anova()**to get ANOVA tables unless you know what you are doing. They compute the type-I sum of squares, which is not, for example, suitable for unbalanced designs. The results, obtained with the default options of theses functions, are different from those obtained with commercial stats softwares, including SPSS and SAS, and most other stats packages. These differences are important and will be confusing and give you misleading results unless you understand them.

Follow the recommendations below:

- If you have a factorial design with
**independent measures**, you can define your model using`lm()`

and then use`rstatix::anova_test()`

or`car::Anova()`

to calculate F tests. - If you have perfect
**balanced repeated measures design**with no missing values, then use`rstatix::anova_test()`

. - If you have an
**unbalanced repeated measures design**, or you repeated measures with missing data, use linear mixed models instead via the`lme4::lmer()`

.

Version: Français

Having issues with trying to use ggplot to display emmeans pairwise results. I am wanting to see the contrast between all years not just within the same year.

r script:

library(emmeans) ### estimated marginal means and p values

library(sjstats) ### partial eta squared and cohens f effect size

library(lme4) ### estimated the multi level model (random intercept for participants)

library(lmerTest) ### gives more comprehsive anova output with p values

library(MuMIn) ### R2 for the model

library(tidyverse)

library(ggpubr)

library(rstatix)

library(dplyr)

library(readxl)

library(scales)

library(ggsignif)

library(sjPlot)

#load data for One way linear mixed models, repeated measures ##########################

glm_test <- read_excel("C:/Users/maber/Documents/Clover Valley/CLOVER VALLEY/veg data/LPI/UPL_cover.xlsx")

view(glm_test)

head(glm_test, 3)

#factor data#############tell it you have factors and what they are called##############

glm_test$year <- factor(glm_test$year,

levels = c(1, 2, 3),

labels = c("2018", "2019", "2020"))

glm_test$transect <- factor(glm_test$transect,

levels = c(1, 2,3),

labels = c("Lower Meadow", "Middle Meadow", "Upper Meadow"))

###R knows your factors, the below command brakes down your variables

summary(glm_test)

#run linear model with cover as dependent varible, year as #############################

#independent var. for the plot by stating + (1| plot), This makes it a repeated measures

#we call our linear model lmer1

#set up the model use the following code

lmer2 <- lmer(cover ~ as.factor(year) * as.factor(transect) + (1|plot),

data = glm_test)

#linear model dep variable predicted by the indep variable (year)#####################

#run the model with the following code

anova(lmer2)

summary(aov(cover ~ as.factor(year) * as.factor(transect) + Error (plot),

data = glm_test))

#post Hoc comparisons###################################################

#emmeans(lmer2, list(pairwise ~ year), adjust = "bonferroni")

emmeans(lmer2, list(pairwise ~ year), adjust = "tukey")

# if there was an interaction your code would be this###################

emmeans(lmer2, pairwise ~ year | transect)

tab_model(lmer2)

#Visulize data #########################################################

ggboxplot(glm_test, x = "transect", y = "cover", title = "Total % Cover of Upland Species",

ylab = " % Cover UPL Species",

color = "year", ggtheme = theme_gray(base_size = 14))+

scale_x_discrete(labels = label_wrap_gen(7)) +

stat_compare_means(comparisons = my_comparisons, label.y = c(65, 75, 80))+

stat_compare_means(label.y = 82)

The results I want to plot are:

transect = Upper Meadow:

contrast estimate SE df t.ratio p.value

2018 – 2019 27.500 7.15 38 3.848 0.0013

2018 – 2020 20.000 7.15 38 2.798 0.0214

2019 – 2020 -7.500 7.15 38 -1.049 0.5509

Link to HTML doc. file:///C:/Users/maber/Documents/Clover%20Valley/CLOVER%20VALLEY/veg%20data/one%20way%20linear%20mixed%20model,%20repeated%20Meas/two-way_liner_mixed_model_repeated_measures.html

data:

plot transect year cover

D11_NORTH 1 1 24

D11_SOUTH 1 1 16

R27_NORTH 1 1 18

R27_SOUTH 1 1 8

R28_NORTH 1 1 64

R28_SOUTH 1 1 54

D11_NORTH 1 2 46

D11_SOUTH 1 2 38

R27_NORTH 1 2 2

R27_SOUTH 1 2 0

R28_NORTH 1 2 20

R28_SOUTH 1 2 34

D11_NORTH 1 3 44

D11_SOUTH 1 3 36

R27_NORTH 1 3 0

R27_SOUTH 1 3 0

R28_NORTH 1 3 26

R28_SOUTH 1 3 38

D13_NORTH 2 1 44

D13_SOUTH 2 1 30

D14_NORTH 2 1 0

D14_SOUTH 2 1 6

R25_NORTH 2 1 14

R25_SOUTH 2 1 2

R26_NORTH 2 1 20

R26_SOUTH 2 1 42

D13_NORTH 2 2 6

D13_SOUTH 2 2 0

D14_NORTH 2 2 2

D14_SOUTH 2 2 0

R25_NORTH 2 2 0

R25_SOUTH 2 2 14

R26_NORTH 2 2 0

R26_SOUTH 2 2 0

D13_NORTH 2 3 2

D13_SOUTH 2 3 8

D14_NORTH 2 3 0

D14_SOUTH 2 3 10

R25_NORTH 2 3 16

R25_SOUTH 2 3 0

R26_NORTH 2 3 18

R26_SOUTH 2 3 0

R21_NORTH 3 1 32

R21_SOUTH 3 1 82

R22_NORTH 3 1 52

R22_SOUTH 3 1 46

R23_NORTH 3 1 64

R23_SOUTH 3 1 68

R24_NORTH 3 1 8

R24_SOUTH 3 1 8

R21_NORTH 3 2 10

R21_SOUTH 3 2 16

R22_NORTH 3 2 10

R22_SOUTH 3 2 22

R23_NORTH 3 2 48

R23_SOUTH 3 2 28

R24_NORTH 3 2 4

R24_SOUTH 3 2 2

R21_NORTH 3 3 6

R21_SOUTH 3 3 10

R22_NORTH 3 3 16

R22_SOUTH 3 3 14

R23_NORTH 3 3 54

R23_SOUTH 3 3 82

R24_NORTH 3 3 16

R24_SOUTH 3 3 2