# Comparing Means of Two Groups in R

## Wilcoxon Test in R

The Wilcoxon test is a non-parametric alternative to the t-test for comparing two means. It’s particularly recommended in a situation where the data are not normally distributed.

Like the t-test, the Wilcoxon test comes in two forms, one-sample and two-samples. They are used in more or less the exact same situations as the corresponding t-tests.

Note that, the sample size should be at least 6. Otherwise, the Wilcoxon test cannot become significant.

In this chapter, you will learn how to compute the different types of Wilcoxon tests in R, including:

• One-sample Wilcoxon signed rank test
• Wilcoxon rank sum test and
• Wilcoxon signed rank test on paired samples
• Check Wilcoxon test assumptions
• Calculate and report Wilcoxon test effect size (r value).

The effect size r is calculated as Z statistic divided by the square root of the sample size (N) (Z/sqrt(N)). The Z value is extracted from either coin::wilcoxsign_test() (case of one- or paired-samples test) or coin::wilcox_test() (case of independent two-samples test).

Note that N corresponds to the total sample size for independent-samples test and to the total number of pairs for paired samples test. The r value varies from 0 to close to 1. The interpretation values for r commonly in published literature are: 0.10 - < 0.3 (small effect), 0.30 - < 0.5 (moderate effect) and >= 0.5 (large effect).

We’ll use the pipe-friendly function wilcox_test() [rstatix package].

Contents:

#### Related Book

Practical Statistics in R II - Comparing Groups: Numerical Variables

## Prerequisites

Make sure that you have installed the following R packages:

• tidyverse for data manipulation and visualization
• ggpubr for creating easily publication ready plots
• rstatix provides pipe-friendly R functions for easy statistical analyses
• datarium: contains required datasets for this chapter

library(tidyverse)
library(rstatix)
library(ggpubr)

## One-sample Wilcoxon signed rank test

The one-sample Wilcoxon signed rank test is used to assess whether the median of the sample is equal to a known standard or theoretical value. This is a non-parametric equivalent of one-sample t-test.

### Demo data

Demo dataset: mice [in datarium package]. Contains the weight of 10 mice:

# Load and inspect the data
data(mice, package = "datarium")
head(mice, 3)
## # A tibble: 3 x 2
##   name  weight
##   <chr>  <dbl>
## 1 M_1     18.9
## 2 M_2     19.5
## 3 M_3     23.1

### Summary statistics

Compute the median and the interquartile range (IQR):

mice %>% get_summary_stats(weight, type = "median_iqr")
## # A tibble: 1 x 4
##   variable     n median   iqr
##   <chr>    <dbl>  <dbl> <dbl>
## 1 weight      10   19.8   1.8

### Visualization

Create a box plot to visualize the distribution of mice weights. Add also jittered points to show individual observations. The big dot represents the mean point.

bxp <- ggboxplot(
mice\$weight, width = 0.5, add = c("mean", "jitter"),
ylab = "Weight (g)", xlab = FALSE
)
bxp

### Assumptions and preliminary tests

The Wilcoxon signed-rank test assumes that the data are distributed symmetrically around the median. In other words, there should be roughly the same number of values above and below the median. This can be checked by visual inspection using histogram and density distribution.

Create a histogram: As we have only 10 individuals in our data, we specify the option bins = 4 instead of 30 (default).

gghistogram(mice, x = "weight", y = "..density..",
fill = "steelblue",bins = 4, add_density = TRUE)

From the plot above, it can be seen that the weight data are approximately symmetrical (you should not expect them to be perfect, particularly when you have smaller numbers of samples in your study). Therefore, we can use the Wilcoxon signed-rank test to analyse our data.

Note that, in the situation where your data is not symmetrically distributed, you could consider performing a sign test, instead of running the Wilcoxon signed-rank test.

The sign test does not make the assumption of a symmetrically-shaped distribution. However, it will most likely be less powerful compared to the Wilcoxon test.

### Computation

We want to know, whether the median weight of the mice differs from 25g (two-tailed test)?

stat.test <- mice %>% wilcox_test(weight ~ 1, mu = 25)
stat.test
## # A tibble: 1 x 6
##   .y.    group1 group2         n statistic       p
## * <chr>  <chr>  <chr>      <int>     <dbl>   <dbl>
## 1 weight 1      null model    10         0 0.00195

Note that, to compute one-sided wilcoxon test, you can specify the option alternative, which possible values can be “greater”, “less” or “two.sided”.

### Effect size

We’ll use the R function wilcox_effsize() [rstatix]. It requires the coin package for computing the Z statistic.

mice %>%  wilcox_effsize(weight ~ 1, mu = 25)
## # A tibble: 1 x 6
##   .y.    group1 group2     effsize     n magnitude
## * <chr>  <chr>  <chr>        <dbl> <int> <ord>
## 1 weight 1      null model   0.886    10 large

A large effect size is detected, r = 0.89.

### Report

We could report the result as follow:

A Wilcoxon signed-rank test was computed to assess whether the recruited mice median weight was different to the population normal median weight (25g).

The mice weight value were approximately symmetrically distributed, as assessed by a histogram with superimposed density curve.

The measured mice median weight (19.8) was statistically significantly lower than the population median weight 25g (p = 0.002, effect size r = 0.89).

Create a box plot with p-value:

bxp +
labs(subtitle = get_test_label(stat.test, detailed = TRUE))

Create a density plot with p-value:

• Red line corresponds to the observed median
• Blue line corresponds to the theoretical median
ggdensity(mice, x = "weight", rug = TRUE, fill = "lightgray") +
scale_x_continuous(limits = c(15, 27)) +
stat_central_tendency(type = "median", color = "red", linetype = "dashed") +
geom_vline(xintercept = 25, color = "blue", linetype = "dashed") +
labs(subtitle = get_test_label(stat.test, detailed = TRUE))

## Wilcoxon rank sum test

The Wilcoxon rank sum test is a non-parametric alternative to the independent two samples t-test for comparing two independent groups of samples, in the situation where the data are not normally distributed.

Synonymous: Mann-Whitney test, Mann-Whitney U test, Wilcoxon-Mann-Whitney test and two-sample Wilcoxon test.

### Demo data

Demo dataset: genderweight [in datarium package] containing the weight of 40 individuals (20 women and 20 men).

Load the data and show some random rows by groups:

# Load the data
data("genderweight", package = "datarium")
# Show a sample of the data by group
set.seed(123)
genderweight %>% sample_n_by(group, size = 2)
## # A tibble: 4 x 3
##   id    group weight
##   <fct> <fct>  <dbl>
## 1 6     F       65.0
## 2 15    F       65.9
## 3 29    M       88.9
## 4 37    M       77.0

### Summary statistics

Compute some summary statistics by groups: median and interquartile range.

genderweight %>%
group_by(group) %>%
get_summary_stats(weight, type = "median_iqr")
## # A tibble: 2 x 5
##   group variable     n median   iqr
##   <fct> <chr>    <dbl>  <dbl> <dbl>
## 1 F     weight      20   62.9  2.33
## 2 M     weight      20   86.3  4.59

### Visualization

Visualize the data using box plots. Plot weight by groups.

bxp <- ggboxplot(
genderweight, x = "group", y = "weight",
ylab = "Weight", xlab = "Groups", add = "jitter"
)
bxp

### Computation

Question : Is there any significant difference between women and men median weights?

stat.test <- genderweight %>%
wilcox_test(weight ~ group) %>%
stat.test
## # A tibble: 1 x 8
##   .y.    group1 group2    n1    n2 statistic        p p.signif
##   <chr>  <chr>  <chr>  <int> <int>     <dbl>    <dbl> <chr>
## 1 weight F      M         20    20         0 1.45e-11 ****

### Effect size

genderweight %>% wilcox_effsize(weight ~ group)
## # A tibble: 1 x 7
##   .y.    group1 group2 effsize    n1    n2 magnitude
## * <chr>  <chr>  <chr>    <dbl> <int> <int> <ord>
## 1 weight F      M        0.855    20    20 large

A large effect size is detected, r = 0.86.

### Report

We could report the result as follow:

The median weight in female group was 62.9 (IQR = 2.33), whereas the median in male group was 86.3 (IQR = 4.59). The Wilcoxon test showed that the difference was significant (p < 0.0001, effect size r = 0.86).

stat.test <- stat.test %>% add_xy_position(x = "group")
bxp +
stat_pvalue_manual(stat.test, tip.length = 0) +
labs(subtitle = get_test_label(stat.test, detailed = TRUE))

## Wilcoxon signed rank test on paired samples

The Wilcoxon signed rank test on paired sample is a non-parametric alternative to the paired samples t-test for comparing paired data. It’s used when the data are not normally distributed.

### Demo dataset

Here, we’ll use a demo dataset mice2 [datarium package], which contains the weight of 10 mice before and after the treatment.

# Wide format
data("mice2", package = "datarium")
head(mice2, 3)
##   id before after
## 1  1    187   430
## 2  2    194   404
## 3  3    232   406
# Transform into long data:
# gather the before and after values in the same column
mice2.long <- mice2 %>%
gather(key = "group", value = "weight", before, after)
head(mice2.long, 3)
##   id  group weight
## 1  1 before    187
## 2  2 before    194
## 3  3 before    232

### Summary statistics

Compute some summary statistics by groups: median and interquartile range (IQR).

mice2.long %>%
group_by(group) %>%
get_summary_stats(weight, type = "median_iqr")
## # A tibble: 2 x 5
##   group  variable     n median   iqr
##   <chr>  <chr>    <dbl>  <dbl> <dbl>
## 1 after  weight      10   405   28.3
## 2 before weight      10   197.  19.2

### Visualization

bxp <- ggpaired(mice2.long, x = "group", y = "weight",
order = c("before", "after"),
ylab = "Weight", xlab = "Groups")
bxp

### Assumptions and preliminary tests

The test assumes that differences between paired samples should be distributed symmetrically around the median.

Compute the differences between pairs and create histograms:

mice2 <- mice2 %>% mutate(differences = after - before)
gghistogram(mice2, x = "differences", y = "..density..",
fill = "steelblue",bins = 5, add_density = TRUE)

From the plot above, it can be seen that the differences data are approximately symmetrical (you should not expect them to be perfect, particularly when you have smaller numbers of samples in your study). Therefore, we can use the Wilcoxon signed-rank test to analyse our data.

Note that, in the situation where your data is not symmetrically distributed, you could consider performing a sign test, instead of running the Wilcoxon signed-rank test.

The sign test does not make the assumption of a symmetrically-shaped distribution. However, it will most likely be less powerful compared to the Wilcoxon test.

### Computation

Question : Is there any significant changes in the weights of mice after treatment?

stat.test <- mice2.long  %>%
wilcox_test(weight ~ group, paired = TRUE) %>%
stat.test
## # A tibble: 1 x 8
##   .y.    group1 group2    n1    n2 statistic       p p.signif
##   <chr>  <chr>  <chr>  <int> <int>     <dbl>   <dbl> <chr>
## 1 weight after  before    10    10        55 0.00195 **

### Effect size

mice2.long  %>%
wilcox_effsize(weight ~ group, paired = TRUE)
## # A tibble: 1 x 7
##   .y.    group1 group2 effsize    n1    n2 magnitude
## * <chr>  <chr>  <chr>    <dbl> <int> <int> <ord>
## 1 weight after  before   0.886    10    10 large

A large effect size is detected, r = 0.89.

### Report

From the output above, it can be concluded that the median weight of the mice before treatment is significantly different from the median weight after treatment with a p-value = 0.002, effect size r = 0.89.

stat.test <- stat.test %>% add_xy_position(x = "group")
bxp +
stat_pvalue_manual(stat.test, tip.length = 0) +
labs(subtitle = get_test_label(stat.test, detailed= TRUE))

## Summary

This chapter describes how to compare two means in R using the Wilcoxon test, which is a non-parametric alternative of the t-test.

Quick start R codes, to compute the different Wilcoxon tests, are:

# One-sample Wilcoxon signed rank test
mice %>% wilcox_test(weight ~ 1, mu = 25)
# Wilcoxon rank sum test: independent samples
genderweight %>% wilcox_test(weight ~ group)
# Wilcoxon signed rank test on paired samples
mice2.long %>% wilcox_test(weight ~ group, paired = TRUE)

Note that, to compute one-sided Wilcoxon tests, you can specify the option alternative, which possible values can be “greater”, “less” or “two.sided”.