Display a Beautiful Summary Statistics in R using Skimr Package


Warning: Use of undefined constant access_s2member_level2 - assumed 'access_s2member_level2' (this will throw an Error in a future version of PHP) in /home/www/datanovia/en/wp-content/themes/lms-child/framework/loops/content-single.php on line 56


Display a Beautiful Summary Statistics in R using Skimr Package

This article describes how to quickly display summary statistics using the R package skimr.

skimr handles different data types and returns a skim_df object which can be included in a tidyverse pipeline or displayed nicely for the human reader.

Key features of skimr:

  • Provides a larger set of statistics than the R base function summary(), including missing, complete, n, and sd.
  • reports each data types separately
  • handles dates, logicals, and a variety of other types
  • supports spark-bar and spark-line

Contents:

Prerequisite

Install the stable version from CRAN:

install.packages("skimr")

Load the package:

library(skimr)

Summarize a whole dataset

skim(iris)
## Skim summary statistics
##  n obs: 150 
##  n variables: 5 
## 
## Variable type: factor 
##   variable missing complete   n n_unique                       top_counts
## 1  Species       0      150 150        3 set: 50, ver: 50, vir: 50, NA: 0
##   ordered
## 1   FALSE
## 
## Variable type: numeric 
##       variable missing complete   n mean   sd min p25 median p75 max
## 1 Petal.Length       0      150 150 3.76 1.77 1   1.6   4.35 5.1 6.9
## 2  Petal.Width       0      150 150 1.2  0.76 0.1 0.3   1.3  1.8 2.5
## 3 Sepal.Length       0      150 150 5.84 0.83 4.3 5.1   5.8  6.4 7.9
## 4  Sepal.Width       0      150 150 3.06 0.44 2   2.8   3    3.3 4.4
##       hist
## 1 ▇▁▁▂▅▅▃▁
## 2 ▇▁▁▅▃▃▂▂
## 3 ▂▇▅▇▆▅▂▂
## 4 ▁▂▅▇▃▂▁▁

Select specific columns to summarize

skim(iris, Sepal.Length, Petal.Length)
## Skim summary statistics
##  n obs: 150 
##  n variables: 5 
## 
## Variable type: numeric 
##       variable missing complete   n mean   sd min p25 median p75 max
## 1 Petal.Length       0      150 150 3.76 1.77 1   1.6   4.35 5.1 6.9
## 2 Sepal.Length       0      150 150 5.84 0.83 4.3 5.1   5.8  6.4 7.9
##       hist
## 1 ▇▁▁▂▅▅▃▁
## 2 ▂▇▅▇▆▅▂▂

Handle grouped data

skim() can handle data that has been grouped using dplyr::group_by.

iris %>% 
  dplyr::group_by(Species) %>% 
  skim() 
## Skim summary statistics
##  n obs: 150 
##  n variables: 5 
##  group variables: Species 
## 
## Variable type: numeric 
##       Species     variable missing complete  n mean   sd min  p25 median
## 1      setosa Petal.Length       0       50 50 1.46 0.17 1   1.4    1.5 
## 2      setosa  Petal.Width       0       50 50 0.25 0.11 0.1 0.2    0.2 
## 3      setosa Sepal.Length       0       50 50 5.01 0.35 4.3 4.8    5   
## 4      setosa  Sepal.Width       0       50 50 3.43 0.38 2.3 3.2    3.4 
## 5  versicolor Petal.Length       0       50 50 4.26 0.47 3   4      4.35
## 6  versicolor  Petal.Width       0       50 50 1.33 0.2  1   1.2    1.3 
## 7  versicolor Sepal.Length       0       50 50 5.94 0.52 4.9 5.6    5.9 
## 8  versicolor  Sepal.Width       0       50 50 2.77 0.31 2   2.52   2.8 
## 9   virginica Petal.Length       0       50 50 5.55 0.55 4.5 5.1    5.55
## 10  virginica  Petal.Width       0       50 50 2.03 0.27 1.4 1.8    2   
## 11  virginica Sepal.Length       0       50 50 6.59 0.64 4.9 6.23   6.5 
## 12  virginica  Sepal.Width       0       50 50 2.97 0.32 2.2 2.8    3   
##     p75 max     hist
## 1  1.58 1.9 ▁▁▅▇▇▅▂▁
## 2  0.3  0.6 ▂▇▁▂▂▁▁▁
## 3  5.2  5.8 ▂▃▅▇▇▃▁▂
## 4  3.68 4.4 ▁▁▃▅▇▃▂▁
## 5  4.6  5.1 ▁▃▂▆▆▇▇▃
## 6  1.5  1.8 ▆▃▇▅▆▂▁▁
## 7  6.3  7   ▃▂▇▇▇▃▅▂
## 8  3    3.4 ▁▂▃▅▃▇▃▁
## 9  5.88 6.9 ▂▇▃▇▅▂▁▂
## 10 2.3  2.5 ▂▁▇▃▃▆▅▃
## 11 6.9  7.9 ▁▁▃▇▅▃▂▃
## 12 3.18 3.8 ▁▃▇▇▅▃▁▂

Specify your own statistics and classes

Users can specify their own statistics using a list combined with the skim_with() function. This can support any named class found in your data.

funs <- list(
  iqr = IQR,
  mad = mad
)

skim_with(numeric = funs, append = FALSE)
skim(iris, Sepal.Length)
## Skim summary statistics
##  n obs: 150 
##  n variables: 5 
## 
## Variable type: numeric 
##       variable iqr  mad
## 1 Sepal.Length 1.3 1.04
# Restore defaults
skim_with_defaults()


Warning: Use of undefined constant access_s2member_level2 - assumed 'access_s2member_level2' (this will throw an Error in a future version of PHP) in /home/www/datanovia/en/wp-content/themes/lms-child/framework/loops/content-single.php on line 118




No Comments

Post a Reply