How to Add P-Values onto a Grouped GGPLOT using the GGPUBR R Package

058-add-p-values-to-a-grouped-ggplot-logo-1.png


How to Add P-Values onto a Grouped GGPLOT using the GGPUBR R Package

This article describes how to compute and automatically add p-values onto grouped ggplots using the ggpubr and the rstatix R packages.

You will learn how to:

  • Add p-values onto grouped box plots, bar plots and line plots. Examples, containing two and three groups by x position, are shown.
  • Show the p-values combined with the significance levels onto the grouped plots

We will follow the steps below for adding significance levels onto a ggplot:

  1. Compute easily statistical tests (t_test() or wilcox_test()) using the rstatix package
  2. Auto-compute p-value label positions using the function add_xy_position() [in rstatix package].
  3. Add the p-values to the plot using the function stat_pvalue_manual() [in ggpubr package]. The following key options are illustrated in some of the examples:
    • The option bracket.nudge.y is used to move up or to move down the brackets.
    • The option step.increase is used to add more space between brackets.
    • The option vjust is used to vertically adjust the position of the p-values labels

Note that, in some situations, the p-value labels are partially hidden by the plot top border. In these cases, the ggplot2 function scale_y_continuous(expand = expansion(mult = c(0, 0.1))) can be used to add more spaces between labels and the plot top border. The option mult = c(0, 0.1) indicates that 0% and 10% spaces are respectively added at the bottom and the top of the plot.



Contents:

Prerequisites

Make sure you have installed the following R packages:

  • tidyverse for data manipulation and visualization
  • ggpubr for creating easily publication ready plots
  • rstatix provides pipe-friendly R functions for easy statistical analyses.

Start by loading the following required packages:

library(ggpubr)
library(rstatix)

Key R functions

Statistical test functions for pairwise comparisons: t_test() and wilcox_test() [rstatix package]

Pipe-friendly framework to compare the mean of two groups. If grouping variable contains more than two levels, then a pairwise comparison between levels is automatically computed and the p-value is adjusted using the Holm p-value adjustement methods (by default). You can change this using for example p.adjust.method = "bonferroni", which is more stringent but frequently used.

Auto-Compute p-value x and y positions for plotting p-values and significance levels: add_xy_position() [rstatix package]

one of the key argument is fun, which indicates summary statistics functions used to compute automatically suitable y positions of p-value labels and brackets. The default value is fun = "max", which is suitable to compute p-value positions for box plots.

If you are creating a bar plot or a line plot with error bars (mean +/- SD or mean +/_ se), then you need to specify the corresponding summary statistics functions when computing the p-value labels positions. For example, fun = "mean_sd". Possible values include: "max", "mean", "mean_sd", "mean_se", "mean_ci", "median", "median_iqr", "median_mad".

Add manually p-values to a ggplot: stat_pvalue_manual() [in ggpubr package]

This function can be used to add manually p-values to a ggplot, such as box blots, dot plots, stripcharts, line plots and bar plots. Frequently asked questions are available on Datanovia ggpubr FAQ page.

Key arguments include:

  • data: a data frame containing statistical test results. The expected default format should contain the following columns: group1 | group2 | p | y.position | etc. group1 and group2 are the groups that have been compared. p is the resulting p-value. y.position is the y coordinates of the p-values in the plot.
  • label: the column containing the label (e.g.: label = "p" or label = "p.adj"), where p is the p-value. Can be also an expression that can be formatted by the glue package. For example, when specifying label = "t-test, p = {p}", the expression {p} will be replaced by its value.
  • bracket.nudge.y: Vertical adjustment to nudge brackets by. Useful to move up or move down the bracket. If positive value, brackets will be moved up; if negative value, brackets are moved down.
  • remove.bracket: logical value, if TRUE, brackets are removed from the plot.
  • step.increase: fraction of additional space to add between brackets for minimizing overlap.
  • hide.ns: logical value. If TRUE, hide ns symbol when displaying significance levels.

Data preparation

# Transform `dose` into factor variable
df <- ToothGrowth
df$dose <- as.factor(df$dose)
head(df, 3)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5

Two groups by x position

Statistical tests

Group the data by the dose variable and then compare the levels of the supp variable (OJ vs VC):

stat.test <- df %>%
  group_by(dose) %>%
  t_test(len ~ supp) %>%
  adjust_pvalue(method = "bonferroni") %>%
  add_significance("p.adj")
stat.test
## # A tibble: 3 x 11
##   dose  .y.   group1 group2    n1    n2 statistic    df       p   p.adj p.adj.signif
## * <fct> <chr> <chr>  <chr>  <int> <int>     <dbl> <dbl>   <dbl>   <dbl> <chr>       
## 1 0.5   len   OJ     VC        10    10    3.17    15.0 0.00636 0.0191  *           
## 2 1     len   OJ     VC        10    10    4.03    15.4 0.00104 0.00312 **          
## 3 2     len   OJ     VC        10    10   -0.0461  14.0 0.964   1       ns

Grouped box plots

# Create a box plot
bxp <- ggboxplot(
  df, x = "dose", y = "len", 
  color = "supp", palette = c("#00AFBB", "#E7B800")
  )

# Add p-values onto the box plots
stat.test <- stat.test %>%
  add_xy_position(x = "dose", dodge = 0.8)
bxp + stat_pvalue_manual(
  stat.test,  label = "p", tip.length = 0
  )

# Add 10% spaces between the p-value labels and the plot border
bxp + stat_pvalue_manual(
  stat.test,  label = "p", tip.length = 0
  ) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1)))

# Use adjusted p-values as labels
# Remove brackets
bxp + stat_pvalue_manual(
  stat.test,  label = "p.adj", tip.length = 0,
  remove.bracket = TRUE
  )

# Show adjusted p-values and significance levels
# Hide ns (non-significant)
bxp + stat_pvalue_manual(
  stat.test,  label = "{p.adj}{p.adj.signif}", 
  tip.length = 0, hide.ns = TRUE
  )

The following R script create a box plot containing two statistical tests results:

  1. the above stat.test comparing the levels of the supp (OJ vs VC) variable
  2. an additional statistical test (stat.test2) performing pairwise comparisons between the dose levels. P-values are automatically adjusted because the dose variable contains 3 levels.
# Additional statistical test
stat.test2 <- df %>%
  t_test(len ~ dose, p.adjust.method = "bonferroni")
stat.test2
## # A tibble: 3 x 10
##   .y.   group1 group2    n1    n2 statistic    df        p    p.adj p.adj.signif
## * <chr> <chr>  <chr>  <int> <int>     <dbl> <dbl>    <dbl>    <dbl> <chr>       
## 1 len   0.5    1         20    20     -6.48  38.0 1.27e- 7 3.81e- 7 ****        
## 2 len   0.5    2         20    20    -11.8   36.9 4.40e-14 1.32e-13 ****        
## 3 len   1      2         20    20     -4.90  37.1 1.91e- 5 5.73e- 5 ****
# Add p-values of `stat.test` and `stat.test2`
# 1. Add stat.test
stat.test <- stat.test %>%
  add_xy_position(x = "dose", dodge = 0.8)
bxp.complex <- bxp + stat_pvalue_manual(
  stat.test,  label = "p", tip.length = 0
  )
# 2. Add stat.test2
# Add more space between brackets using `step.increase` 
stat.test2 <- stat.test2 %>% add_xy_position(x = "dose")
bxp.complex <- bxp.complex + 
  stat_pvalue_manual(
    stat.test2,  label = "p", tip.length = 0.02,
    step.increase = 0.05
  ) +
  scale_y_continuous(expand = expansion(mult = c(0.05, 0.1)))

# 3. Display the plot
bxp.complex 

Grouped bar plots

The option add = "mean_sd" is specified in the bar plot function for creating bar plot with error bars (mean +/- SD). You need to specify the same summary statistics function to auto-compute p-value labels positions in add_xy_position() using the option fun.

Dodged bar plots

# Create a bar plot with error bars (mean +/- sd)
bp <- ggbarplot(
  df, x = "dose", y = "len", add = "mean_sd", 
  color= "supp", palette = c("#00AFBB", "#E7B800"),
  position = position_dodge(0.8)
  )
# Add p-values onto the bar plots
stat.test <- stat.test %>%
  add_xy_position(fun = "mean_sd", x = "dose", dodge = 0.8) 
bp + stat_pvalue_manual(
  stat.test,  label = "p.adj.signif", tip.length = 0.01
  )

# Move down the brackets using `bracket.nudge.y`
bp + stat_pvalue_manual(
  stat.test,  label = "p.adj.signif", tip.length = 0,
  bracket.nudge.y = -2
  )

Stacked bar plots

# Create a bar plot with error bars (mean +/- sd)
bp2 <- ggbarplot(
  df, x = "dose", y = "len", add = "mean_sd", 
  color = "supp", palette = c("#00AFBB", "#E7B800"),
  position = position_stack()
  )

# Add p-values onto the bar plots
# Specify the p-value y position manually
bp2 + stat_pvalue_manual(
  stat.test,  label = "p.adj.signif", tip.length = 0.01,
  x = "dose", y.position = c(30, 45, 60)
  )

# Auto-compute the p-value y position
# Adjust vertically label positions using vjust
stat.test <- stat.test %>%
  add_xy_position(fun = "mean_sd", x = "dose", stack = TRUE) 
bp2 + stat_pvalue_manual(
  stat.test,  label = "p.adj.signif", 
  remove.bracket = TRUE, vjust = -0.2
  )

Grouped line plots

# Create a line plot with error bars (mean +/- sd)
lp <- ggline(
  df, x = "dose", y = "len", add = "mean_sd", 
  color = "supp", palette = c("#00AFBB", "#E7B800")
  )

# Add p-values onto the line plots
# Remove brackets using linetype = "blank"
stat.test <- stat.test %>%
  add_xy_position(fun = "mean_sd", x = "dose") 
lp + stat_pvalue_manual(
  stat.test,  label = "p.adj.signif", 
  tip.length = 0, linetype  = "blank"
  )

# Move down the significance levels using vjust
lp + stat_pvalue_manual(
  stat.test,  label = "p.adj.signif", 
  linetype  = "blank", vjust = 2
  )

Pairwise comparisons

Statistical tests

Group the data by the supp variable and then perform multiple pairwise comparisons between the levels of the dose variable (0.5, 1 and 2). P-values are adjusted for each group level independently.

pwc <- df %>%
  group_by(supp) %>%
  t_test(len ~ dose, p.adjust.method = "bonferroni")
pwc
## # A tibble: 6 x 11
##   supp  .y.   group1 group2    n1    n2 statistic    df            p      p.adj p.adj.signif
## * <fct> <chr> <chr>  <chr>  <int> <int>     <dbl> <dbl>        <dbl>      <dbl> <chr>       
## 1 OJ    len   0.5    1         10    10     -5.05  17.7 0.0000878    0.000263   ***         
## 2 OJ    len   0.5    2         10    10     -7.82  14.7 0.00000132   0.00000396 ****        
## 3 OJ    len   1      2         10    10     -2.25  15.8 0.039        0.118      ns          
## 4 VC    len   0.5    1         10    10     -7.46  17.9 0.000000681  0.00000204 ****        
## 5 VC    len   0.5    2         10    10    -10.4   14.3 0.0000000468 0.00000014 ****        
## 6 VC    len   1      2         10    10     -5.47  13.6 0.0000916    0.000275   ***

Create plots with the pairwise-comparison p-values

The argument step.group.by is used to group the brackets by a variable.

# Box plot
pwc <- pwc %>% add_xy_position(x = "dose")
bxp +
  stat_pvalue_manual(
    pwc, color = "supp", step.group.by = "supp",
    tip.length = 0, step.increase = 0.1
    )

# Bar plots
pwc <- pwc %>% add_xy_position(x = "dose", fun = "mean_sd", dodge = 0.8)
bp + stat_pvalue_manual(
  pwc, color = "supp", step.group.by = "supp",
  tip.length = 0, step.increase = 0.1
  )

# Line plots
pwc <- pwc %>% add_xy_position(x = "dose", fun = "mean_sd")
lp + stat_pvalue_manual(
  pwc, color = "supp", step.group.by = "supp",
  tip.length = 0, step.increase = 0.1
  )

Show the p-values for a subset of comparisons

In the example below, p-values are show for the pairwise-comparisons in the VC group:

# Bar plots (dodged)
# Take a subset of the pairwise comparisons
pwc.filtered <- pwc %>% 
  add_xy_position(x = "dose", fun = "mean_sd", dodge = 0.8) %>%
  filter(supp == "VC")
bp +
  stat_pvalue_manual(
    pwc.filtered, color = "supp", step.group.by = "supp",
    tip.length = 0, step.increase = 0
    )

# Bar plots (stacked)
pwc.filtered <- pwc %>% 
  add_xy_position(x = "dose", fun = "mean_sd", stack = TRUE) %>%
  filter(supp == "VC")
bp2 +
  stat_pvalue_manual(
    pwc.filtered, color = "supp", step.group.by = "supp",
    tip.length = 0, step.increase = 0.1
    )

Three groups by x position

Simple plots

# Box plots
bxp <- ggboxplot(
  df, x = "supp", y = "len", fill = "dose",
  palette = "npg"
  )
bxp

# Bar plots
bp <- ggbarplot(
  df, x = "supp", y = "len", fill = "dose",
  palette = "npg", add = "mean_sd",
  position = position_dodge(0.8)
  )
bp

Perform all pairwise comparisons

Group by the supp variable and then perform pairwise comparisons between the levels of dose variable.

Statistical test:

stat.test <- df %>%
  group_by(supp) %>%
  t_test(len ~ dose)
stat.test 
## # A tibble: 6 x 11
##   supp  .y.   group1 group2    n1    n2 statistic    df            p      p.adj p.adj.signif
## * <fct> <chr> <chr>  <chr>  <int> <int>     <dbl> <dbl>        <dbl>      <dbl> <chr>       
## 1 OJ    len   0.5    1         10    10     -5.05  17.7 0.0000878    0.000176   ***         
## 2 OJ    len   0.5    2         10    10     -7.82  14.7 0.00000132   0.00000396 ****        
## 3 OJ    len   1      2         10    10     -2.25  15.8 0.039        0.039      *           
## 4 VC    len   0.5    1         10    10     -7.46  17.9 0.000000681  0.00000136 ****        
## 5 VC    len   0.5    2         10    10    -10.4   14.3 0.0000000468 0.00000014 ****        
## 6 VC    len   1      2         10    10     -5.47  13.6 0.0000916    0.0000916  ****

Add the p-values onto the plots:

# Box plots with p-values
stat.test <- stat.test %>%
  add_xy_position(x = "supp", dodge = 0.8)
bxp + 
  stat_pvalue_manual(
    stat.test, label = "p.adj", tip.length = 0.01,
    bracket.nudge.y = -2
    ) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1)))

# Bar plots with p-values
stat.test <- stat.test %>%
  add_xy_position(x = "supp", fun = "mean_sd", dodge = 0.8)
bp + 
  stat_pvalue_manual(
    stat.test, label = "p.adj", tip.length = 0.01,
    bracket.nudge.y = -2
    ) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1)))

Pairwise comparisons against a reference group

Statistical tests:

stat.test <- df %>%
  group_by(supp) %>%
  t_test(len ~ dose, ref.group = "0.5") 
stat.test
## # A tibble: 4 x 11
##   supp  .y.   group1 group2    n1    n2 statistic    df            p        p.adj p.adj.signif
## * <fct> <chr> <chr>  <chr>  <int> <int>     <dbl> <dbl>        <dbl>        <dbl> <chr>       
## 1 OJ    len   0.5    1         10    10     -5.05  17.7 0.0000878    0.0000878    ****        
## 2 OJ    len   0.5    2         10    10     -7.82  14.7 0.00000132   0.00000264   ****        
## 3 VC    len   0.5    1         10    10     -7.46  17.9 0.000000681  0.000000681  ****        
## 4 VC    len   0.5    2         10    10    -10.4   14.3 0.0000000468 0.0000000936 ****
# Box plots with p-values
stat.test <- stat.test %>%
  add_xy_position(x = "supp", dodge = 0.8)
bxp + 
  stat_pvalue_manual(
    stat.test, label = "p.adj", tip.length = 0.01,
    bracket.nudge.y = -2
    ) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1)))

# Show only significance levels
# Move down significance symbols using vjust
bxp + stat_pvalue_manual(
  stat.test, x = "supp",  label = "p.adj.signif", 
  tip.length = 0.01, vjust = 2
  )

# Bar plots with p-values
stat.test <- stat.test %>%
  add_xy_position(x = "supp", fun = "mean_sd", dodge = 0.8)
bp + 
  stat_pvalue_manual(
    stat.test, label = "p.adj", tip.length = 0.01,
    bracket.nudge.y = -2
    ) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1)))

Conclusion

This article describes how to add p-values onto grouped ggplots, such as box plots, bar plots and line plots. See other related frequently questions: ggpubr FAQ.



Version: Français





Comments ( 7 )

  • Brent O'Carrigan

    Outstanding blog post, thank you.

    Just a comment that I think the convenience function `expansion()` inside ggplot2 may have been deprecated; replaced with the equivalent function expand_scale()

    • Kassambara

      Thank you for the positive feedback. In the latest CRAN version of ggplot2 (v3.3.0) it’s the opposite that holds TRUE. `expand_scale()` is deprecated but not `expansion()`.

  • Berkant erman

    First, thanks a lot for great jobs, I’m new at this job and I can do some analysis because of you and there is amazing report graphs.
    But, unfortunately I have problem for one group repeated actions. I tried a lot (with “https://www.datanovia.com/en/blog/how-to-add-p-values-onto-a-grouped-ggplot-using-the-ggpubr-r-package/” to show p values on boxplot like in this page, ..whenever I tried to fix to “one group” for my study..I couldnt achieved… Could you share a link for this?

    I think, It would be great if there are different more representations (ggplots , ggviolin etc.) of repetitive measurements in this “”https://www.datanovia.com/en/lessons/repeated-measures-anova-in-r/ “” link as well.

    thanks again.

    • Kassambara

      Hi,
      If I understood correctly, the issue is about displaying the p-values of one-way repeated measures ANOVA, isn’t it? Please provide a demo data, demo graph or a reproducible script to help me understanding the issue. You can publish here a reproducible script here using the pubr package or on github (https://github.com/kassambara/ggpubr)

      • Berkant erman

        Yes it is..just shared..skipped some steps (normality check , qqplot etc.) tried show 2 way..Issue 1 and 2..

        “””https://github.com/kassambara/ggpubr/compare/master…berman954:patch-1″”

        • Kassambara

          In your example, the following R code should work for adding p-values on the box-plot:

          # 1/ Create a box plot
          bxp <- ggboxplot(ANOVA_Rstudio, x = "Section", y = "GRF",
            color = "Section", palette = "jco", add="point")
          
          # 2/ Stat test:  
          # - don't use group_by (as in your example)
          # - Replace ANOVA_Rstudio>%  by ANOVA_Rstudio %>%
          stat.test <- ANOVA_Rstudio %>%
            pairwise_t_test(
              GRF ~ Section, paired = TRUE, 
              p.adjust.method = "bonferroni"
            ) %>%
            select(-df, -statistic, -p) 
          
          # 3/ Add p-value to boxplot
          # don't use x = "GRF", but x = "section" as in step 1/
          stat.test <- stat.test %>% add_xy_position(x = "GRF")
          bxp + stat_pvalue_manual(
            stat.test, label = "p.adj.signif", 
            step.increase = 0.08
          )
          • Berkant erman

            I need to work more about it. Thanks a lot for your help!

            Note: If someone like me is trying to do this new job, I suggest you follow the comments above. In my opinion, the simplest and effective “one/two way repeated measures ANOVA” analysis among others.
            “https://www.datanovia.com/en/lessons/repeated-measures-anova-in-r/”

Give a comment

Want to post an issue with R? If yes, please make sure you have read this: How to Include Reproducible R Script Examples in Datanovia Comments