Data Visualization using GGPlot2

GGPlot Stripchart

Stripcharts are also known as one dimensional scatter plots. These plots are suitable compared to box plots when sample sizes are small.

This article describes how to create and customize Stripcharts using the ggplot2 R package.

Contents:

Related Book

GGPlot2 Essentials for Great Data Visualization in R

Key R functions

  • Key function: geom_jitter()
  • key arguments: color, fill, size, shape. Changes points color, fill, size and shape

Data preparation

  • Demo dataset: ToothGrowth
    • Continuous variable: len (tooth length). Used on y-axis
    • Grouping variable: dose (dose levels of vitamin C: 0.5, 1, and 2 mg/day). Used on x-axis.

First, convert the variable dose from a numeric to a discrete factor variable:

data("ToothGrowth")
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
head(ToothGrowth, 3)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5

Loading required R package

Load the ggplot2 package and set the default theme to theme_classic() with the legend at the top of the plot:

library(ggplot2)
theme_set(
  theme_classic() +
    theme(legend.position = "top")
  )

Basic stripcharts

We start by initiating a plot named e, then we’ll add layers. The following R code creates stripcharts combined with summary statistics (mean +/- SD), boxplots and violin plots.

  • Change points shape and color by groups
  • Adjust the degree of jittering: position_jitter(0.2)
  • Add summary statistics:
# Initiate a ggplot
e <- ggplot(ToothGrowth, aes(x = dose, y = len))

# Stripcharts with summary statistics
# Change color by dose groups
e + geom_jitter(aes(shape = dose, color = dose), 
                position = position_jitter(0.2), size = 1.2) +
  stat_summary(aes(color = dose), size = 0.4,
               fun.data="mean_sdl",  fun.args = list(mult=1))+
  scale_color_manual(values =  c("#00AFBB", "#E7B800", "#FC4E07"))

The function mean_sdl is used for adding mean and standard deviation. It computes the mean plus or minus a constant times the standard deviation. In the R code above, the constant is specified using the argument mult (mult = 1). By default mult = 2. The mean +/- SD can be added as a crossbar or a pointrange.

Combine with box plots and violin plots

# Combine with box plot
e + geom_boxplot() + 
  geom_jitter(position = position_jitter(0.2))

  
# Strip chart + violin plot + stat summary
e + geom_violin(trim = FALSE) +
  geom_jitter(position = position_jitter(0.2)) +
  stat_summary(fun.data="mean_sdl",  fun.args = list(mult=1),
               color = "red")

Create a stripchart with multiple groups

The R code is similar to what we have seen in dot plots section. However, to create dodged jitter points, you should use the function position_jitterdodge() instead of position_dodge().

e + geom_jitter(
  aes(shape = supp, color = supp), size = 1.2,
  position = position_jitterdodge(jitter.width = 0.2, dodge.width = 0.8)
  ) +
  stat_summary(
    aes(color = supp), fun.data="mean_sdl", fun.args = list(mult=1), 
    size = 0.4, position = position_dodge(0.8)
    )+
  scale_color_manual(values =  c("#00AFBB", "#E7B800"))

Conclusion

This article describes how to create a stripchart using the ggplot2 package.

Version: Français

GGPlot Dot Plot (Prev Lesson)
(Next Lesson) GGPlot Line Plot
Back to Data Visualization using GGPlot2

No Comments

Give a comment

Teacher
Alboukadel Kassambara
Role : Founder of Datanovia
Read More