The **Wilcoxon test** is a non-parametric alternative to the t-test for comparing two means. It’s particularly recommended in a situation where the data are not normally distributed.

Like the t-test, the Wilcoxon test comes in two forms, one-sample and two-samples. They are used in more or less the exact same situations as the corresponding t-tests.

Note that, the sample size should be at least 6. Otherwise, the Wilcoxon test cannot become significant.

In this chapter, you will learn how to compute the different types of Wilcoxon tests in R, including:

**One-sample Wilcoxon signed rank test****Wilcoxon rank sum test**and**Wilcoxon signed rank test on paired samples****Check Wilcoxon test assumptions****Calculate and report Wilcoxon test effect size (r value)**.

The effect size `r`

is calculated as `Z`

statistic divided by the square root of the sample size (N) (`Z/sqrt(N)`

). The `Z`

value is extracted from either `coin::wilcoxsign_test()`

(case of one- or paired-samples test) or `coin::wilcox_test()`

(case of independent two-samples test).

Note that `N`

corresponds to the total sample size for independent-samples test and to the total number of pairs for paired samples test. The `r`

value varies from 0 to close to 1. The interpretation values for `r`

commonly in published literature are: 0.10 - < 0.3 (small effect), 0.30 - < 0.5 (moderate effect) and >= 0.5 (large effect).

We’ll use the pipe-friendly function `wilcox_test()`

[rstatix package].

Contents:

#### Related Book

Practical Statistics in R II - Comparing Groups: Numerical Variables## Prerequisites

Make sure that you have installed the following R packages:

`tidyverse`

for data manipulation and visualization`ggpubr`

for creating easily publication ready plots`rstatix`

provides pipe-friendly R functions for easy statistical analyses`datarium`

: contains required datasets for this chapter

Start by loading the following required packages:

```
library(tidyverse)
library(rstatix)
library(ggpubr)
```

## One-sample Wilcoxon signed rank test

The one-sample Wilcoxon signed rank test is used to assess whether the median of the sample is equal to a known standard or theoretical value. This is a non-parametric equivalent of one-sample t-test.

### Demo data

Demo dataset: `mice`

[in datarium package]. Contains the weight of 10 mice:

```
# Load and inspect the data
data(mice, package = "datarium")
head(mice, 3)
```

```
## # A tibble: 3 x 2
## name weight
## <chr> <dbl>
## 1 M_1 18.9
## 2 M_2 19.5
## 3 M_3 23.1
```

### Summary statistics

Compute the median and the interquartile range (IQR):

`mice %>% get_summary_stats(weight, type = "median_iqr")`

```
## # A tibble: 1 x 4
## variable n median iqr
## <chr> <dbl> <dbl> <dbl>
## 1 weight 10 19.8 1.8
```

### Visualization

Create a box plot to visualize the distribution of mice weights. Add also jittered points to show individual observations. The big dot represents the mean point.

```
bxp <- ggboxplot(
mice$weight, width = 0.5, add = c("mean", "jitter"),
ylab = "Weight (g)", xlab = FALSE
)
bxp
```

### Assumptions and preliminary tests

The Wilcoxon signed-rank test assumes that the data are distributed symmetrically around the median. This can be checked by visual inspection using histogram and density distribution.

Create a histogram: As we have only 10 individuals in our data, we specify the option `bins = 4`

instead of 30 (default).

```
gghistogram(mice, x = "weight", y = "..density..",
fill = "steelblue",bins = 4, add_density = TRUE)
```

From the plot above, it can be seen that the `weight`

data are approximately symmetrical (you should not expect them to be perfect, particularly when you have smaller numbers of samples in your study). Therefore, we can use the Wilcoxon signed-rank test to analyse our data.

Note that, in the situation where your data is not symmetrically distributed, you could consider performing a **sign test**, instead of running the Wilcoxon signed-rank test.

The sign test does not make the assumption of a symmetrically-shaped distribution. However, it will most likely be less powerful compared to the Wilcoxon test.

### Computation

We want to know, whether the median weight of the mice differs from 25g (two-tailed test)?

```
stat.test <- mice %>% wilcox_test(weight ~ 1, mu = 25)
stat.test
```

```
## # A tibble: 1 x 6
## .y. group1 group2 n statistic p
## * <chr> <chr> <chr> <int> <dbl> <dbl>
## 1 weight 1 null model 10 0 0.00195
```

Note that, to compute one-sided wilcoxon test, you can specify the option `alternative`

, which possible values can be “greater”, “less” or “two.sided”.

### Effect size

We’ll use the R function `wilcox_effsize()`

[rstatix]. It requires the `coin`

package for computing the Z statistic.

`mice %>% wilcox_effsize(weight ~ 1, mu = 25)`

```
## # A tibble: 1 x 6
## .y. group1 group2 effsize n magnitude
## * <chr> <chr> <chr> <dbl> <int> <ord>
## 1 weight 1 null model 0.886 10 large
```

A large effect size is detected, r = 0.89.

### Report

We could report the result as follow:

A Wilcoxon signed-rank test was computed to assess whether the recruited mice median weight was different to the population normal median weight (25g).

The mice weight value were approximately symmetrically distributed, as assessed by a histogram with superimposed density curve.

The measured mice median weight (19.8) was statistically significantly lower than the population median weight 25g (p = 0.002, effect size r = 0.89).

Create a box plot with p-value:

```
bxp +
labs(subtitle = get_test_label(stat.test, detailed = TRUE))
```

Create a density plot with p-value:

- Red line corresponds to the observed median
- Blue line corresponds to the theoretical median

```
ggdensity(mice, x = "weight", rug = TRUE, fill = "lightgray") +
scale_x_continuous(limits = c(15, 27)) +
stat_central_tendency(type = "median", color = "red", linetype = "dashed") +
geom_vline(xintercept = 25, color = "blue", linetype = "dashed") +
labs(subtitle = get_test_label(stat.test, detailed = TRUE))
```

## Wilcoxon rank sum test

The **Wilcoxon rank sum test** is a non-parametric alternative to the independent two samples t-test for comparing two independent groups of samples, in the situation where the data are not normally distributed.

Synonymous: *Mann-Whitney test*, *Mann-Whitney U test*, *Wilcoxon-Mann-Whitney test* and *two-sample Wilcoxon test*.

### Demo data

Demo dataset: `genderweight`

[in datarium package] containing the weight of 40 individuals (20 women and 20 men).

Load the data and show some random rows by groups:

```
# Load the data
data("genderweight", package = "datarium")
# Show a sample of the data by group
set.seed(123)
genderweight %>% sample_n_by(group, size = 2)
```

```
## # A tibble: 4 x 3
## id group weight
## <fct> <fct> <dbl>
## 1 6 F 65.0
## 2 15 F 65.9
## 3 29 M 88.9
## 4 37 M 77.0
```

### Summary statistics

Compute some summary statistics by groups: median and interquartile range.

```
genderweight %>%
group_by(group) %>%
get_summary_stats(weight, type = "median_iqr")
```

```
## # A tibble: 2 x 5
## group variable n median iqr
## <fct> <chr> <dbl> <dbl> <dbl>
## 1 F weight 20 62.9 2.33
## 2 M weight 20 86.3 4.59
```

### Visualization

Visualize the data using box plots. Plot weight by groups.

```
bxp <- ggboxplot(
genderweight, x = "group", y = "weight",
ylab = "Weight", xlab = "Groups", add = "jitter"
)
bxp
```

### Computation

Question : Is there any significant difference between women and men median weights?

```
stat.test <- genderweight %>%
wilcox_test(weight ~ group) %>%
add_significance()
stat.test
```

```
## # A tibble: 1 x 8
## .y. group1 group2 n1 n2 statistic p p.signif
## <chr> <chr> <chr> <int> <int> <dbl> <dbl> <chr>
## 1 weight F M 20 20 0 1.45e-11 ****
```

### Effect size

`genderweight %>% wilcox_effsize(weight ~ group)`

```
## # A tibble: 1 x 7
## .y. group1 group2 effsize n1 n2 magnitude
## * <chr> <chr> <chr> <dbl> <int> <int> <ord>
## 1 weight F M 0.855 20 20 large
```

A large effect size is detected, r = 0.86.

### Report

We could report the result as follow:

The median weight in female group was 62.9 (IQR = 2.33), whereas the median in male group was 86.3 (IQR = 4.59). The Wilcoxon test showed that the difference was significant (p < 0.0001, effect size r = 0.86).

```
stat.test <- stat.test %>% add_xy_position(x = "group")
bxp +
stat_pvalue_manual(stat.test, tip.length = 0) +
labs(subtitle = get_test_label(stat.test, detailed = TRUE))
```

## Wilcoxon signed rank test on paired samples

The **Wilcoxon signed rank test on paired sample** is a non-parametric alternative to the paired samples t-test for comparing paired data. It’s used when the data are not normally distributed.

### Demo dataset

Here, we’ll use a demo dataset `mice2`

[datarium package], which contains the weight of 10 mice before and after the treatment.

```
# Wide format
data("mice2", package = "datarium")
head(mice2, 3)
```

```
## id before after
## 1 1 187 430
## 2 2 194 404
## 3 3 232 406
```

```
# Transform into long data:
# gather the before and after values in the same column
mice2.long <- mice2 %>%
gather(key = "group", value = "weight", before, after)
head(mice2.long, 3)
```

```
## id group weight
## 1 1 before 187
## 2 2 before 194
## 3 3 before 232
```

### Summary statistics

Compute some summary statistics by groups: median and interquartile range (IQR).

```
mice2.long %>%
group_by(group) %>%
get_summary_stats(weight, type = "median_iqr")
```

```
## # A tibble: 2 x 5
## group variable n median iqr
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 after weight 10 405 28.3
## 2 before weight 10 197. 19.2
```

### Visualization

```
bxp <- ggpaired(mice2.long, x = "group", y = "weight",
order = c("before", "after"),
ylab = "Weight", xlab = "Groups")
bxp
```

### Assumptions and preliminary tests

The test assumes that differences between paired samples should be distributed symmetrically around the median.

Compute the differences between pairs and create histograms:

```
mice2 <- mice2 %>% mutate(differences = after - before)
gghistogram(mice2, x = "differences", y = "..density..",
fill = "steelblue",bins = 5, add_density = TRUE)
```

From the plot above, it can be seen that the `differences`

data are approximately symmetrical (you should not expect them to be perfect, particularly when you have smaller numbers of samples in your study). Therefore, we can use the Wilcoxon signed-rank test to analyse our data.

Note that, in the situation where your data is not symmetrically distributed, you could consider performing a **sign test**, instead of running the Wilcoxon signed-rank test.

The sign test does not make the assumption of a symmetrically-shaped distribution. However, it will most likely be less powerful compared to the Wilcoxon test.

### Computation

Question : Is there any significant changes in the weights of mice after treatment?

```
stat.test <- mice2.long %>%
wilcox_test(weight ~ group, paired = TRUE) %>%
add_significance()
stat.test
```

```
## # A tibble: 1 x 8
## .y. group1 group2 n1 n2 statistic p p.signif
## <chr> <chr> <chr> <int> <int> <dbl> <dbl> <chr>
## 1 weight after before 10 10 55 0.00195 **
```

### Effect size

```
mice2.long %>%
wilcox_effsize(weight ~ group, paired = TRUE)
```

```
## # A tibble: 1 x 7
## .y. group1 group2 effsize n1 n2 magnitude
## * <chr> <chr> <chr> <dbl> <int> <int> <ord>
## 1 weight after before 0.886 10 10 large
```

A large effect size is detected, r = 0.89.

### Report

From the output above, it can be concluded that the median weight of the mice before treatment is significantly different from the median weight after treatment with a p-value = 0.002, effect size r = 0.89.

```
stat.test <- stat.test %>% add_xy_position(x = "group")
bxp +
stat_pvalue_manual(stat.test, tip.length = 0) +
labs(subtitle = get_test_label(stat.test, detailed= TRUE))
```

## Summary

This chapter describes how to compare two means in R using the Wilcoxon test, which is a non-parametric alternative of the t-test.

Quick start R codes, to compute the different Wilcoxon tests, are:

```
# One-sample Wilcoxon signed rank test
mice %>% wilcox_test(weight ~ 1, mu = 25)
# Wilcoxon rank sum test: independent samples
genderweight %>% wilcox_test(weight ~ group)
# Wilcoxon signed rank test on paired samples
mice2.long %>% wilcox_test(weight ~ group, paired = TRUE)
```

Note that, to compute one-sided Wilcoxon tests, you can specify the option `alternative`

, which possible values can be “greater”, “less” or “two.sided”.

Version: Français

This is amazingly helpful tutorial! I really appreciate it.

Would you mind if I ask one simple question?

It says …

> The Wilcoxon signed-rank test assumes that the data are distributed symmetrically around the median.

I appreciate the way that you introduced a histogram visualization method to test this assumption; however, would there be any way for non-visual learners (i.e., blind people) to test this assumption using R code instead of the visualization technique? I am asking this because I am blind so it would be more than helpful for me to analyze it using R code.

Any advice would be so much appreciated.

Thank you for your positive feedback, highly appreciated.

The function `symmetry.test()` [lawstat packaga] can be used to check whether the data is symmetrically distributed around the median. This function replaces the mean and standard deviation in the classic measure of asymmetry by corresponding robust estimators; the median and mean deviation from the median.

Statistical hypothesis:– Null hypothesis: the data is symmetric

– Alternative hypothesis: the data is not symmetric

Example of usage:In the example above, the p-value is not significant (p = 0.4), so we accept the null hypothesis: Our data is symmetrically distributed around the median.

Hi,

Wonderful tutorial, I’ve been following these also from STHDA.

I have troubles using the function: stat_pvalue_manual

Also with the toy data, I get the following error:

Warning: Ignoring unknown aesthetics: xmin, xmax, annotations, y_position

Error in f(…) :

Can only handle data with groups that are plotted on the x-axis

Boxplot is fine all the way to that function, any tips?

Thanks!

Hi,

Please make sure you have the latest version of ggpubr and rstatix

Hi!

Thank you for this tutorial and tutorial about Friedman test; they have been very helpful. Just making a couple of notes:

1. In addition to significance and magnitude, I find it important to know the direction of the difference. In the example at hand, one can see it very clearly even from the boxplots, but generally this is not working. After all, we are dealing with test based on ranks, not the original continuous data. This is why I have used the original wilcoxon.test function to calculate the pseudomedian.

2. Citation from this tutorial: “The Wilcoxon signed-rank test assumes that the data are distributed symmetrically around the median. In other words, there should be roughly the same number of values above and below the median.”

These two phrases are not equivalent. The latter one is always true because of the definition of the median. The first one is what this test truely assumes; if we choose an arbitrary d, the histogram should be about at the same hight in Md – d and Md + d (Md is the median).

Thank you for such a great comments. Following your comments, the improvements below have been made:

Updating the rstatix::wilcox_test() to report an estimator for the pseudomedian (one-sample case) or for the difference of the location parameters x-y. The column estimate and the confidence intervals are now displayed in the test result when the option detailed = TRUE is specified in the

`wilcox_test()`

and`pairwise_wilcox_test()`

functions. You can install the dev version using`devtools::install_github("kassambara/rstatix")`

Blog post updated, removing the following sentence “In other words, there should be roughly the same number of values above and below the median”.

Hi!

Thank you for this tutorial. I found it very useful. I have a question about the effect size. All of the examples above have a large effect size, but I still don’t understand what this means. Can you please get in details what is the impact of the effect size?