This article describes the basics of how to compute and add pvalues to basic ggplots using the rstatix and the ggpubr R packages.
You will learn how to:
 Perform pairwise mean comparisons and add the pvalues onto basic box plots and bar plots.
 Display adjusted pvalues and the significance levels onto the plots
 Format the pvalue labels
 Specify manually the y position of pvalue labels and shorten the width of the brackets
We will follow the steps below for adding significance levels onto a ggplot:

Compute easily statistical tests (
t_test()
orwilcox_test()
) using therstatix
package 
Autocompute pvalue label positions using the function
add_xy_position()
[in rstatix package]. 
Add the pvalues to the plot using the function
stat_pvalue_manual()
[in ggpubr package]. The following key options are illustrated in some of the examples:
The option
bracket.nudge.y
is used to move up or to move down the brackets. 
The option
step.increase
is used to add more space between brackets. 
The option
vjust
is used to vertically adjust the position of the pvalues labels

The option
Note that, in some situations, the pvalue labels are partially hidden by the plot top border. In these cases, the ggplot2 function scale_y_continuous(expand = expansion(mult = c(0, 0.1)))
can be used to add more spaces between labels and the plot top border. The option mult = c(0, 0.1)
indicates that 0% and 10% spaces are respectively added at the bottom and the top of the plot.
Contents:
Prerequisites
Make sure you have installed the following R packages:
ggpubr
for creating easily publication ready plotsrstatix
provides pipefriendly R functions for easy statistical analyses.
Start by loading the following required packages:
library(ggpubr)
library(rstatix)
Data preparation
# Transform `dose` into factor variable
df < ToothGrowth
df$dose < as.factor(df$dose)
head(df, 3)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
Comparing two means
To compare the means of two groups, you can use either the function t_test()
(parametric) or wilcox_test()
(nonparametric). In the following example the ttest will be illustrated.
Compare two independent groups
Box plots with pvalues
# Statistical test
stat.test < df %>%
t_test(len ~ supp) %>%
add_significance()
stat.test
## # A tibble: 1 x 9
## .y. group1 group2 n1 n2 statistic df p p.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 len OJ VC 30 30 1.92 55.3 0.0606 ns
# Box plots with pvalues
bxp < ggboxplot(df, x = "supp", y = "len", fill = "#00AFBB")
stat.test < stat.test %>% add_xy_position(x = "supp")
bxp +
stat_pvalue_manual(stat.test, label = "p") +
scale_y_continuous(expand = expansion(mult = c(0.05, 0.1)))
# Customize pvalue labels using glue expression
# https://github.com/tidyverse/glue
bxp + stat_pvalue_manual(
stat.test, label = "Ttest, p = {p}",
vjust = 1, bracket.nudge.y = 1
) +
scale_y_continuous(expand = expansion(mult = c(0.05, 0.15)))
Grouped data
Group the data by the dose
variable and then compare the levels of the supp
variable.
# Statistical test
stat.test < df %>%
group_by(dose) %>%
t_test(len ~ supp) %>%
adjust_pvalue() %>%
add_significance("p.adj")
stat.test
## # A tibble: 3 x 11
## dose .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif
## * <fct> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 0.5 len OJ VC 10 10 3.17 15.0 0.00636 0.0127 *
## 2 1 len OJ VC 10 10 4.03 15.4 0.00104 0.00312 **
## 3 2 len OJ VC 10 10 0.0461 14.0 0.964 0.964 ns
# Box plots with pvalues
stat.test < stat.test %>% add_xy_position(x = "supp")
bxp < ggboxplot(df, x = "supp", y = "len", fill = "#00AFBB",
facet.by = "dose")
bxp +
stat_pvalue_manual(stat.test, label = "p.adj") +
scale_y_continuous(expand = expansion(mult = c(0.05, 0.10)))
Show pvalues if significant otherwise show ns
This section describes how to display pvalues when they are significant and show “ns” when the pvalues are not significant.
# Add a custom label column
# showing adjusted pvalues if significant otherwise "ns"
stat.test < stat.test %>% add_xy_position(x = "supp")
stat.test$custom.label < ifelse(stat.test$p.adj <= 0.05, stat.test$p.adj, "ns")
# Visualization
bxp +
stat_pvalue_manual(stat.test, label = "custom.label") +
scale_y_continuous(expand = expansion(mult = c(0.05, 0.10)))
Format pvalues using the accuracy option
The pvalues will be formatted using “<” and “>”.
stat.test < stat.test %>% add_xy_position(x = "supp")
stat.test$p.format < p_format(
stat.test$p.adj, accuracy = 0.01,
leading.zero = FALSE
)
# Visualization
bxp +
stat_pvalue_manual(stat.test, label = "p.format") +
scale_y_continuous(expand = expansion(mult = c(0.05, 0.10)))
Format pvalues into scientific notation format
# Format pvalues into scientific format
stat.test < stat.test %>% add_xy_position(x = "supp")
stat.test$p.scient < format(stat.test$p.adj, scientific = TRUE)
bxp +
stat_pvalue_manual(stat.test, label = "p.scient") +
scale_y_continuous(expand = expansion(mult = c(0.05, 0.15)))
Compare paired samples
# Statistical test
stat.test < df %>%
t_test(len ~ supp, paired = TRUE) %>%
add_significance()
stat.test
## # A tibble: 1 x 9
## .y. group1 group2 n1 n2 statistic df p p.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 len OJ VC 30 30 3.30 29 0.00255 **
# Box plots with pvalues
bxp < ggpaired(df, x = "supp", y = "len", fill = "#E7B800",
line.color = "gray", line.size = 0.4)
stat.test < stat.test %>% add_xy_position(x = "supp")
bxp + stat_pvalue_manual(stat.test, label = "p.signif")
# Show the pvalue combined with the significance level
bxp +
stat_pvalue_manual(stat.test, label = "{p}{p.signif}") +
scale_y_continuous(expand = expansion(mult = c(0.05, 0.10)))
Pairwise comparisons
Create simple plots
# Box plots
bxp < ggboxplot(df, x = "dose", y = "len", fill = "dose",
palette = c("#00AFBB", "#E7B800", "#FC4E07"))
bxp
# Bar plots showing mean +/ SD
bp < ggbarplot(df, x = "dose", y = "len", add = "mean_sd", fill = "dose",
palette = c("#00AFBB", "#E7B800", "#FC4E07"))
bp
Statistical test
In the following example, we’ll perform Ttest using the function t_test()
[rstatix package]. It’s also possible to use the function wilcox_test()
.
stat.test < df %>% t_test(len ~ dose)
stat.test
## # A tibble: 3 x 10
## .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 len 0.5 1 20 20 6.48 38.0 1.27e 7 2.54e 7 ****
## 2 len 0.5 2 20 20 11.8 36.9 4.40e14 1.32e13 ****
## 3 len 1 2 20 20 4.90 37.1 1.91e 5 1.91e 5 ****
Create plots with significance levels
# Box plot
stat.test < stat.test %>% add_xy_position(x = "dose")
bxp + stat_pvalue_manual(stat.test, label = "p.adj.signif", tip.length = 0.01)
# Bar plot
stat.test < stat.test %>% add_xy_position(fun = "mean_sd", x = "dose")
bp + stat_pvalue_manual(stat.test, label = "p.adj.signif", tip.length = 0.01)
Specify manually the y position of pvalue labels and shorten the width of the brackets:
bxp +
stat_pvalue_manual(
stat.test, label = "p.adj.signif", tip.length = 0.01,
y.position = c(35, 40, 35), bracket.shorten = 0.05
)
Comparisons against reference groups
# Statistical tests
stat.test < df %>% t_test(len ~ dose, ref.group = "0.5")
stat.test
## # A tibble: 2 x 10
## .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 len 0.5 1 20 20 6.48 38.0 1.27e 7 1.27e 7 ****
## 2 len 0.5 2 20 20 11.8 36.9 4.40e14 8.80e14 ****
# Box plot
stat.test < stat.test %>% add_xy_position(x = "dose")
bxp + stat_pvalue_manual(stat.test, label = "p.adj.signif", tip.length = 0.01)
# Bar plot
stat.test < stat.test %>% add_xy_position(fun = "mean_sd", x = "dose")
bp + stat_pvalue_manual(stat.test, label = "p.adj.signif", tip.length = 0.01)
Comparisons against all (basemean)
Each group is compared to all groups combined.
# Statistical tests
stat.test < df %>% t_test(len ~ dose, ref.group = "all")
stat.test
## # A tibble: 3 x 10
## .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 len all 0.5 60 20 5.82 56.4 0.000000290 0.00000087 ****
## 2 len all 1 60 20 0.660 57.5 0.512 0.512 ns
## 3 len all 2 60 20 5.61 66.5 0.000000425 0.00000087 ****
# Box plot
stat.test < stat.test %>% add_xy_position(x = "dose")
bxp + stat_pvalue_manual(stat.test, label = "p.adj.signif")
# Manually specify the y position
bxp + stat_pvalue_manual(stat.test, label = "p.adj.signif", y.position = 35)
# Bar plot
stat.test < stat.test %>% add_xy_position(fun = "mean_sd", x = "dose")
bp + stat_pvalue_manual(stat.test, label = "p.adj.signif")
Comparisons against null: onesample test
The onesample test is used to compare the mean of one sample to a known standard (or theoretical / hypothetical) mean (mu
). The default value of mu
is zero.
# Statistical tests
stat.test < df %>%
group_by(dose) %>%
t_test(len ~ 1) %>%
adjust_pvalue() %>%
add_significance("p.adj")
stat.test
## # A tibble: 3 x 10
## dose .y. group1 group2 n statistic df p p.adj p.adj.signif
## * <fct> <chr> <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 0.5 len 1 null model 20 10.5 19 2.24e 9 2.24e 9 ****
## 2 1 len 1 null model 20 20.0 19 3.22e14 6.44e14 ****
## 3 2 len 1 null model 20 30.9 19 1.03e17 3.09e17 ****
# Box plot
stat.test < stat.test %>% add_xy_position(x = "dose")
bxp + stat_pvalue_manual(stat.test, x = "dose", label = "p.adj.signif")
# bar plot
stat.test < stat.test %>% add_xy_position(fun = "mean_sd", x = "dose")
bp + stat_pvalue_manual(stat.test, x = "dose", label = "p.adj.signif")
Conclusion
This article introduces how to easily compute and add pvalues onto ggplot, such as box plots and bar plots. See other related frequently questions: ggpubr FAQ.
Recommended for you
This section contains best data science and selfdevelopment resources to help you on your path.
Coursera  Online Courses and Specialization
Data science
 Course: Machine Learning: Master the Fundamentals by Stanford
 Specialization: Data Science by Johns Hopkins University
 Specialization: Python for Everybody by University of Michigan
 Courses: Build Skills for a Top Job in any Industry by Coursera
 Specialization: Master Machine Learning Fundamentals by University of Washington
 Specialization: Statistics with R by Duke University
 Specialization: Software Development in R by Johns Hopkins University
 Specialization: Genomic Data Science by Johns Hopkins University
Popular Courses Launched in 2020
 Google IT Automation with Python by Google
 AI for Medicine by deeplearning.ai
 Epidemiology in Public Health Practice by Johns Hopkins University
 AWS Fundamentals by Amazon Web Services
Trending Courses
 The Science of WellBeing by Yale University
 Google IT Support Professional by Google
 Python for Everybody by University of Michigan
 IBM Data Science Professional Certificate by IBM
 Business Foundations by University of Pennsylvania
 Introduction to Psychology by Yale University
 Excel Skills for Business by Macquarie University
 Psychological First Aid by Johns Hopkins University
 Graphic Design by Cal Arts
Amazon FBA
Amazing Selling Machine
Books  Data Science
Our Books
 Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
 Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
 Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
 R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
 GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
 Network Analysis and Visualization in R by A. Kassambara (Datanovia)
 Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
 InterRater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)
Others
 R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
 HandsOn Machine Learning with ScikitLearn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
 Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
 HandsOn Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
 An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
 Deep Learning with R by François Chollet & J.J. Allaire
 Deep Learning with Python by François Chollet
Version: Français
Hi Dr. Kassambara,
Inside the section Pairwise comparisons, subsection statistical tests, if you check the corrected pvalues one them it’s the same one as the uncorrected one (pvalue: 1.91e 5 and corrected pvalue: 1.91e 5 in the group 1 > 1 vs group 2 > 2 comparison). How were this corrections calculated? Initially I thought it might be Bonferroni correction but the only one in the same example that seems to follow the correction is the group 1 > 0.5 vs group 2 > 2 comparison (pvalue: 4.40e14 ajdusted pvalue: (4.40e14 multiplied by 3 =)1.32e13. But this is just one example. I’ve seen the same issue in other examples in this same page that pvalue and adjusted pvalue are the same. Am I missing something? How is the adjusted pvalue calculated in all the above examples?
Thanks a lot for you hard work. I am a big follower of your webpage. Please keep on doing this great work!
Best,
Josue
Thank you for your comment and the positive feedback. The default method used to adjust the pvalues is the
Holm
method. Pvalues are adjusted using the standard R base functionp.adjust()
; for example:You can specify the pvalue adjustment method in the function
t_test()
[rstatix package] as follow:Hi Dr. Kassambara,
I am a follower of your page. Kudos to you for all the great work.
I am working with kruskal wallis test using ggpubr package for violin plots. I have a scientific p value of 6.4e09, but i would like to show case a simplified p value in this case. Any idea how can i simplify the p value (ex: P<0.0001)?
Thank you
Hi Dr. Kassambara,
First, thanks for everything. This site is amazing for me.Very helpful and I recommend to all my friends.
I have a problem with ANOVA values for ggplot.
I want to use ” Within Subject” values for ggplot.. But I couldnt add this value.. Is there any way to add “Within subject value” for this plot?
For adding this, I need to write 3rd row’s value (Region) for my plot..Also I tried to find another way to add this value, but “paired t test and others” gives me different values..(p =0.021)
For example; when I extract the “between= (Group)” from anova_test()….it gives me totally different results ( DFn=1, DFd= 20, F= 6.306, p=0.021, ges=0.038)
when I try to use: pairwise_t_test() which gives me (p=0,021) again; but in SPSS “pairwise comp” section is also shows “0,24” which is same in ANOVA table. Isnt it interesting for R ? (shows different value)
Do you know why is this happening ? What can I do for this problem?
I just thought that ANOVA is using Ftests, maybe it is happening because of Ftest , I do not know.
Could you help me?
It would be great for me if you can help.
#######################################
ANOVA Table (type III tests)
Effect DFn DFd SSn SSd F p p<.05 ges
1 (Intercept) 1 19 12.157 0.726 318.292 0.000000000000251 * 0.936
2 Group 1 19 0.028 0.726 0.735 0.402000000000000 0.033
3 Region 1 19 0.033 0.104 5.975 0.024000000000000 * 0.038
4 Group:Region 1 19 0.003 0.104 0.545 0.470000000000000 0.004
###################################################
Hi, thank you for the positive feedback, highly appreciated.
Can you please clarify your ANOVA experimental design?
Is it a oneway repeated measure ANOVA (one withinsubjects variable)? If yes, then make sure you have read this: Repeated Measures ANOVA in R Is it a twoway mixed ANOVA (one withinsubjects + onbetween subject variables)? If yes, then make sure you have read this: Mixed ANOVA in R
If you still have the issue, please send to me a reproducible R code with a demo data, so that I can help efficiently.
I tired your way “filter” but in my PWC test also shows p=0,021), it doesnt worked for me, but I want to show “Condition” section in ANOVA table as a “SUBTITLE” in visiual report((((F(1,19)= 5.975, p= 0.024 eta= 0.038.))In short I just want to show as a subtitle in report “Conditon ANova Values” but couldnt filter, I tried it too.
Thank you very much in advance for your time
The following R code should work:
labs(subtitle = get_test_label(res.aov, row = 2, detailed = TRUE))……so, row=2 huh :)))
of course it works!
Dashed is good idea by the way
Thank you very much..
labs(subtitle = get_test_label(res.aov, row = 2, detailed = TRUE))……so, row=2 huh :)))
of course it works!
Dashed is good idea by the way
Thank you very much..
Dear Kassambara,
Thanks for preparing such a great ggplot extension.
I have a little problem with setting the pvalue to display in a ggline plot.
I would like to have control over the number of digits after the decimal point, but I have the impression that these values are displayed randomly, i.e. sometimes 2 digits appear, sometimes 3, sometimes 4 digits after the decimal point. Is there any way to control this?
Below is my code:
stat.test %
group_by (SET)%>%
tukey_hsd (speed_avg ~ Protocol)%>%
adjust_pvalue (method = “bonferroni”)%>%
add_significance (“p.adj”)
stat.test % add_xy_position (x = “SET”, fun = “mean_ci”, dodge = 0.5)
stat.test $ p.format < p_format (stat.test $ p.adj, accuracy = 0.001)
Thank you in advance for any help.