A density plot is an alternative to Histogram used for visualizing the distribution of a continuous variable.
The peaks of a Density Plot help to identify where values are concentrated over the interval of the continuous variable.
Compared to Histograms, Density Plots are better at finding the distribution shape because they are re not affected by the number of bins used (each bar used in a typical histogram).
For example, a Histogram with only 4 bins wouldn’t produce a distinguishable enough shape of distribution as a 30-bin Histogram would. However, with Density Plots, this isn’t an issue.
This article describes how to create density plots using the ggplot2 R package.
Key R functions
- Key function:
geom_density()(for density plots).
- Key arguments to customize the plots:
color, size, linetype: change the line color, size and type, respectively
fill: change the areas fill color (for bar plots, histograms and density plots)
alpha: create a semi-transparent color.
Create some data (
wdata) containing the weights by sex (M for male; F for female):
set.seed(1234) wdata = data.frame( sex = factor(rep(c("F", "M"), each=200)), weight = c(rnorm(200, 55), rnorm(200, 58)) ) head(wdata, 4)
## sex weight ## 1 F 53.8 ## 2 F 55.3 ## 3 F 56.1 ## 4 F 52.7
Compute the mean weight by sex using the
dplyr package. First, the data is grouped by sex and then summarized by computing the mean weight by groups. The operator
%>% is used to combine multiple operations:
library("dplyr") mu <- wdata %>% group_by(sex) %>% summarise(grp.mean = mean(weight)) mu
## # A tibble: 2 x 2 ## sex grp.mean ## <fct> <dbl> ## 1 F 54.9 ## 2 M 58.1
Loading required R package
Load the ggplot2 package and set the default theme to
theme_classic() with the legend at the top of the plot:
library(ggplot2) theme_set( theme_classic() + theme(legend.position = "top") )
Basic density plots
We start by creating a plot, named
a, that we’ll finish in the next section by adding a layer using the function
a <- ggplot(wdata, aes(x = weight))
The following R code creates some basic density plots with a vertical line corresponding to the mean value of the weight variable (
# Basic density plots # y axis scale = stat(density) (default behaviour) a + geom_density() + geom_vline(aes(xintercept = mean(weight)), linetype = "dashed") # Change y axis to count instead of density a + geom_density(aes(y = stat(count)), fill = "lightgray") + geom_vline(aes(xintercept = mean(weight)), linetype = "dashed")
Change color by groups
The following R code will change the density plot line and fill color by groups. The functions
scale_fill_manual() are used to specify custom colors for each group.
We’ll proceed as follow:
- Change areas fill and add line color by groups (sex)
- Add vertical mean lines using
mu, which contains the mean values of weights by sex (computed in the previous section).
- Change color manually:
scale_colour_manual()for changing line color
scale_fill_manual()for changing area fill colors.
# Change line color by sex a + geom_density(aes(color = sex)) + scale_color_manual(values = c("#868686FF", "#EFC000FF")) # Change fill color by sex and add mean line # Use semi-transparent fill: alpha = 0.4 a + geom_density(aes(fill = sex), alpha = 0.4) + geom_vline(aes(xintercept = grp.mean, color = sex), data = mu, linetype = "dashed") + scale_color_manual(values = c("#868686FF", "#EFC000FF"))+ scale_fill_manual(values = c("#868686FF", "#EFC000FF"))