GGPlot Examples Best Reference



GGPlot Examples Best Reference

This article provides a gallery of ggplot examples, including: scatter plot, density plots and histograms, bar and line plots, error bars, box plots, violin plots and more.



Contents:

Related Book

GGPlot2 Essentials for Great Data Visualization in R

Prerequisites

Load required packages and set the theme function theme_bw() as the default theme:

library(tidyverse)
library(ggpubr) 
theme_set(
  theme_bw() + 
  theme(legend.position = "top")
  )

Scatter plot

  • Basic scatter plot with correlation coefficient. The function stat_cor() [ggpubr R package] is used to add the correlation coefficient.
library("ggpubr")
p <- ggplot(mtcars, aes(mpg, wt)) +
  geom_point() +
  geom_smooth(method = lm) +
  stat_cor(method = "pearson", label.x = 20)
p

  • Contextual zoom. Key R function facet_zoom() [ggforce]
library(ggforce)
ggplot(iris, aes(Petal.Length, Petal.Width, colour = Species)) +
    geom_point() +
    facet_zoom(x = Species == "versicolor")

  • Encircle some points. The function geom_encircle() [ggalt R package] can be used to encircle a certain group of points
# Encircle setosa group
library("ggalt")
circle.df <- iris %>% filter(Species == "setosa")
ggplot(iris, aes(Petal.Length, Petal.Width)) +
    geom_point(aes(colour = Species)) + 
  geom_encircle(data = circle.df, linetype = 2)

  • Create jittered points to avoid overlap. The overlapping points are randomly jittered around their original position based on a threshold controlled by the width argument in the function `geom_jitter()
# Basic scatter plot
ggplot(mpg, aes(cty, hwy)) +
  geom_point(size = 0.5)

# Jittered points
ggplot(mpg, aes(cty, hwy)) +
  geom_jitter(size = 0.5, width = 0.5)

  • Create count charts to avoid overlap. Wherever there is more points overlap, the size of the circle gets bigger.
ggplot(mpg, aes(cty, hwy)) +
  geom_count()

  • Bubble chart. In a bubble chart, points size is controlled by a continuous variable, here qsec.
ggplot(mtcars, aes(mpg, wt)) +
  geom_point(aes(size = qsec), alpha = 0.5) +
  scale_size(range = c(0.5, 12))  # Adjust the range of points size

  • Marginal density plots
library(ggpubr)
# Grouped Scatter plot with marginal density plots
ggscatterhist(
  iris, x = "Sepal.Length", y = "Sepal.Width",
  color = "Species", size = 3, alpha = 0.6,
  palette = c("#00AFBB", "#E7B800", "#FC4E07"),
  margin.params = list(fill = "Species", color = "black", size = 0.2)
  )

# Use box plot as marginal plots
ggscatterhist(
  iris, x = "Sepal.Length", y = "Sepal.Width",
  color = "Species", size = 3, alpha = 0.6,
  palette = c("#00AFBB", "#E7B800", "#FC4E07"),
  margin.plot = "boxplot",
  ggtheme = theme_bw()
  )

Distribution

Density plot

  • Basic density plot:
# Basic density plot
ggplot(iris, aes(Sepal.Length)) +
  geom_density()

# Add mean line
ggplot(iris, aes(Sepal.Length)) +
  geom_density(fill = "lightgray") +
  geom_vline(aes(xintercept = mean(Sepal.Length)), linetype = 2)

  • Change color by groups
# Change line color by groups
ggplot(iris, aes(Sepal.Length, color = Species)) +
  geom_density() +
  scale_color_viridis_d()

# Add mean line by groups
mu <- iris %>%
  group_by(Species) %>%
  summarise(grp.mean = mean(Sepal.Length))

ggplot(iris, aes(Sepal.Length, color = Species)) +
  geom_density() +
  geom_vline(aes(xintercept = grp.mean, color = Species),
             data = mu, linetype = 2) +
  scale_color_viridis_d()

Histogram

  • Basic histograms
# Basic histogram with mean line
ggplot(iris, aes(Sepal.Length)) +
  geom_histogram(bins = 20, fill = "white", color = "black")  +
  geom_vline(aes(xintercept = mean(Sepal.Length)), linetype = 2)

# Add density curves
ggplot(iris, aes(Sepal.Length, stat(density))) +
  geom_histogram(bins = 20, fill = "white", color = "black")  +
  geom_density() +
  geom_vline(aes(xintercept = mean(Sepal.Length)), linetype = 2)

  • Change color by groups
ggplot(iris, aes(Sepal.Length)) +
  geom_histogram(aes(fill = Species, color = Species), bins = 20, 
                 position = "identity", alpha = 0.5) +
  scale_fill_viridis_d() +
  scale_color_viridis_d()

QQ Plot

library(ggpubr)
ggqqplot(iris, x = "Sepal.Length",
   ggtheme = theme_bw())

Empirical cumulative distribution (ECDF)

ggplot(iris, aes(Sepal.Length)) +
  stat_ecdf(aes(color = Species)) +
  scale_color_viridis_d()

Density ridgeline plots

The density ridgeline plot is an alternative to the standard geom_density() function that can be useful for visualizing changes in distributions, of a continuous variable, over time or space. Ridgeline plots are partially overlapping line plots that create the impression of a mountain range.

library(ggridges)
ggplot(iris, aes(x = Sepal.Length, y = Species)) +
  geom_density_ridges(aes(fill = Species)) +
  scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))

Bar charts and alternatives

  • Data
df <- mtcars %>%
  rownames_to_column() %>%
  as_data_frame() %>%
  mutate(cyl = as.factor(cyl)) %>%
  select(rowname, wt, mpg, cyl)
df
## # A tibble: 32 x 4
##   rowname              wt   mpg cyl  
##   <chr>             <dbl> <dbl> <fct>
## 1 Mazda RX4          2.62  21   6    
## 2 Mazda RX4 Wag      2.88  21   6    
## 3 Datsun 710         2.32  22.8 4    
## 4 Hornet 4 Drive     3.22  21.4 6    
## 5 Hornet Sportabout  3.44  18.7 8    
## 6 Valiant            3.46  18.1 6    
## # ... with 26 more rows
  • Basic bar plots
# Basic bar plots
ggplot(df, aes(x = rowname, y = mpg)) +
  geom_col() +
  rotate_x_text(angle = 45)

# Reorder row names by mpg values
ggplot(df, aes(x = reorder(rowname, mpg), y = mpg)) +
  geom_col()  +
  rotate_x_text(angle = 45)

  • Horizontal bar plots
# Horizontal bar plots, 
# change fill color by groups and add text labels
ggplot(df, aes(x = reorder(rowname, mpg), y = mpg)) +
  geom_col( aes(fill = cyl)) + 
  geom_text(aes(label = mpg), nudge_y = 2) + 
  coord_flip() +
  scale_fill_viridis_d()

  • Order bars by groups and by mpg values
df2 <- df %>% 
  arrange(cyl, mpg) %>%
  mutate(rowname = factor(rowname, levels = rowname))

ggplot(df2, aes(x = rowname, y = mpg)) +
  geom_col( aes(fill = cyl)) + 
  scale_fill_viridis_d() +
  rotate_x_text(45)

  • Lollipop chart: Lollipop is an alternative to bar charts when you have large data sets.
ggplot(df2, aes(x = rowname, y = mpg)) +
  geom_segment(
    aes(x = rowname, xend = rowname, y = 0, yend = mpg), 
    color = "lightgray"
    ) + 
  geom_point(aes(color = cyl), size = 3) +
  scale_color_viridis_d() +
  theme_pubclean() +
  rotate_x_text(45)

  • Bar plot with multiple groups
# Data
df3 <- data.frame(supp=rep(c("VC", "OJ"), each=3),
                dose=rep(c("D0.5", "D1", "D2"),2),
                len=c(6.8, 15, 33, 4.2, 10, 29.5))


# Stacked bar plots of y = counts by x = cut,
# colored by the variable color
ggplot(df3, aes(x = dose, y = len)) +
  geom_col(aes(color = supp, fill = supp), position = position_stack()) +
  scale_color_manual(values = c("#0073C2FF", "#EFC000FF"))+
  scale_fill_manual(values = c("#0073C2FF", "#EFC000FF"))

# Use position = position_dodge() 
ggplot(df3, aes(x = dose, y = len)) +
  geom_col(aes(color = supp, fill = supp), position = position_dodge(0.8), width = 0.7) +
  scale_color_manual(values = c("#0073C2FF", "#EFC000FF"))+
  scale_fill_manual(values = c("#0073C2FF", "#EFC000FF"))

Line plot

# Data
df3 <- data.frame(supp=rep(c("VC", "OJ"), each=3),
                dose=rep(c("D0.5", "D1", "D2"),2),
                len=c(6.8, 15, 33, 4.2, 10, 29.5))

# Line plot
ggplot(df3, aes(x = dose, y = len, group = supp)) +
  geom_line(aes(linetype = supp)) +
  geom_point(aes(shape = supp))

Error bars

  • Data
# Raw data
df <- ToothGrowth %>% mutate(dose = as.factor(dose))
head(df, 3)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
# Summary statistics
df.summary <- df %>%
  group_by(dose) %>%
  summarise(sd = sd(len, na.rm = TRUE), len = mean(len))
df.summary
## # A tibble: 3 x 3
##   dose     sd   len
##   <fct> <dbl> <dbl>
## 1 0.5    4.50  10.6
## 2 1      4.42  19.7
## 3 2      3.77  26.1
  • Basic line and bar plots with error bars
# (1) Line plot
ggplot(df.summary, aes(dose, len)) +
  geom_line(aes(group = 1)) +
  geom_errorbar( aes(ymin = len-sd, ymax = len+sd),width = 0.2) +
  geom_point(size = 2)

# (2) Bar plot
ggplot(df.summary, aes(dose, len)) +
  geom_bar(stat = "identity", fill = "lightgray", color = "black") +
  geom_errorbar(aes(ymin = len, ymax = len+sd), width = 0.2) 

  • Grouped line/bar plots
# Data preparation
df.summary2 <- df %>%
  group_by(dose, supp) %>%
  summarise( sd = sd(len), len = mean(len))
df.summary2
## # A tibble: 6 x 4
## # Groups:   dose [?]
##   dose  supp     sd   len
##   <fct> <fct> <dbl> <dbl>
## 1 0.5   OJ     4.46 13.2 
## 2 0.5   VC     2.75  7.98
## 3 1     OJ     3.91 22.7 
## 4 1     VC     2.52 16.8 
## 5 2     OJ     2.66 26.1 
## 6 2     VC     4.80 26.1
# (1) Line plot + error bars
ggplot(df.summary2, aes(dose, len)) +
  geom_line(aes(linetype = supp, group = supp))+
  geom_point()+
  geom_errorbar(
    aes(ymin = len-sd, ymax = len+sd, group = supp),
     width = 0.2
    )

# (2) Bar plots + upper error bars.
ggplot(df.summary2, aes(dose, len)) +
  geom_bar(aes(fill = supp), stat = "identity",
           position = position_dodge(0.8), width = 0.7)+
  geom_errorbar(
    aes(ymin = len, ymax = len+sd, group = supp),
    width = 0.2, position = position_dodge(0.8)
    )+
  scale_fill_manual(values = c("grey80", "grey30"))

Box plots and alternatives

  • Data
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
  • Basic box plots
# Basic
ggplot(ToothGrowth, aes(dose, len)) +
  geom_boxplot()

# Box plot + violin plot
ggplot(ToothGrowth, aes(dose, len)) +
  geom_violin(trim = FALSE) +
  geom_boxplot(width = 0.2)

  • Add jittered points and dot plot
# Add jittered points
ggplot(ToothGrowth, aes(dose, len)) +
  geom_boxplot() +
  geom_jitter(width = 0.2)

# Dot plot + box plot
ggplot(ToothGrowth, aes(dose, len)) +
  geom_boxplot() +
  geom_dotplot(binaxis = "y", stackdir = "center")

  • Grouped plots
# Box plots
ggplot(ToothGrowth, aes(dose, len)) +
  geom_boxplot(aes(color = supp)) +
  scale_color_viridis_d()

# Add jittered points
ggplot(ToothGrowth, aes(dose, len, color = supp)) +
  geom_boxplot() +
  geom_jitter(position = position_jitterdodge(jitter.width = 0.2)) +
  scale_color_viridis_d()

Time series data visualization

# Data preparation
df <- economics %>%
  select(date, psavert, uempmed) %>%
  gather(key = "variable", value = "value", -date)
head(df, 3)
## # A tibble: 3 x 3
##   date       variable value
##   <date>     <chr>    <dbl>
## 1 1967-07-01 psavert   12.5
## 2 1967-08-01 psavert   12.5
## 3 1967-09-01 psavert   11.7
# Multiple line plot
ggplot(df, aes(x = date, y = value)) + 
  geom_line(aes(color = variable), size = 1) +
  scale_color_manual(values = c("#00AFBB", "#E7B800")) +
  theme_minimal()

scatter plot matrix

library(GGally)
ggpairs(iris[,-5])+ theme_bw()

Correlation analysis

library("ggcorrplot")
# Compute a correlation matrix
my_data <- mtcars[, c(1,3,4,5,6,7)]
corr <- round(cor(my_data), 1)
# Visualize
ggcorrplot(corr, p.mat = cor_pmat(my_data),
           hc.order = TRUE, type = "lower",
           color = c("#FC4E07", "white", "#00AFBB"),
           outline.col = "white", lab = TRUE)

Cluster analysis

library(factoextra)
USArrests %>%
  scale() %>%                           # Scale the data
  dist() %>%                            # Compute distance matrix
  hclust(method = "ward.D2") %>%        # Hierarchical clustering
  fviz_dend(cex = 0.5, k = 4, palette = "jco") # Visualize and cut 
                                              # into 4 groups

Balloon plot

Balloon plot is an alternative to bar plot for visualizing a large categorical data.

library(ggpubr)
# Data preparation
housetasks <- read.delim(
  system.file("demo-data/housetasks.txt", package = "ggpubr"),
  row.names = 1
  )
head(housetasks, 4)
##            Wife Alternating Husband Jointly
## Laundry     156          14       2       4
## Main_meal   124          20       5       4
## Dinner       77          11       7      13
## Breakfeast   82          36      15       7
# Visualization
ggballoonplot(housetasks, fill = "value")+
  scale_fill_viridis_c(option = "C")



Version: Français





Comments ( 3 )

  • SFer

    Kassambara
    – thanks for this great reference!.

    Q:
    In the 1st example,
    what would the code be
    to print (as top legend):
    R and R2 and p ?
    (right now, the ex. only shows R and P…).

    Thanks,
    SFer

  • Sfer

    Kassambara
    – thanks for this great reference!.

    Q:
    In the 1st example,
    what would the code be
    to print (as top legend):
    R and R2 and P ?
    (right now, the ex. only shows: R and P…).

    Thanks,
    SFer
    San Francisco

    • Kassambara

      Thank Sfer for your feedback!

      Please try the following R code:

      library("ggpubr")
      ggplot(mtcars, aes(mpg, wt)) +
        geom_point() +
        geom_smooth(method = lm) +
        stat_cor(
          aes(label = paste(..r.label.., ..rr.label.., ..p.label.., sep = "~`,`~")),
          method = "pearson", label.x = 20
          )
      

      The output should look like this:

      Scatter Plot with R, R2 and p-value

Give a comment

Want to post an issue with R? If yes, please make sure you have read this: How to Include Reproducible R Script Examples in Datanovia Comments