{"id":8325,"date":"2018-12-31T09:12:03","date_gmt":"2018-12-31T07:12:03","guid":{"rendered":"https:\/\/www.datanovia.com\/en\/?post_type=dt_lessons&#038;p=8325"},"modified":"2019-11-18T00:11:56","modified_gmt":"2019-11-17T22:11:56","slug":"ggplot-histogram","status":"publish","type":"dt_lessons","link":"https:\/\/www.datanovia.com\/en\/lessons\/ggplot-histogram\/","title":{"rendered":"GGPlot Histogram"},"content":{"rendered":"<div id=\"rdoc\">\n<p>A <strong>histogram plot<\/strong> is an alternative to Density plot for visualizing the distribution of a continuous variable. This chart represents the distribution of a continuous variable by dividing into bins and counting the number of observations in each bin.<\/p>\n<p>This article describes how to create Histogram plots using the <strong>ggplot2<\/strong> R package.<\/p>\n<p>Contents:<\/p>\n<div id=\"TOC\">\n<ul>\n<li><a href=\"#key-r-functions\">Key R functions<\/a><\/li>\n<li><a href=\"#data-preparation\">Data preparation<\/a><\/li>\n<li><a href=\"#loading-required-r-package\">Loading required R package<\/a><\/li>\n<li><a href=\"#basic-histogram-plots\">Basic histogram plots<\/a><\/li>\n<li><a href=\"#change-color-by-groups\">Change color by groups<\/a><\/li>\n<li><a href=\"#combine-histogram-and-density-plots\">Combine histogram and density plots<\/a><\/li>\n<li><a href=\"#conclusion\">Conclusion<\/a><\/li>\n<\/ul>\n<\/div>\n<div class='dt-sc-hr-invisible-medium  '><\/div>\n<div class='dt-sc-ico-content type1'><div class='custom-icon' ><a href='https:\/\/www.datanovia.com\/en\/product\/ggplot2-essentials-for-great-data-visualization-in-r\/' target='_blank'><span class='fa fa-book'><\/span><\/a><\/div><h4><a href='https:\/\/www.datanovia.com\/en\/product\/ggplot2-essentials-for-great-data-visualization-in-r\/' target='_blank'> Related Book <\/a><\/h4>GGPlot2 Essentials for Great Data Visualization in R<\/div>\n<div class='dt-sc-hr-invisible-medium  '><\/div>\n<div id=\"key-r-functions\" class=\"section level2\">\n<h2>Key R functions<\/h2>\n<ul>\n<li>Key function: <code>geom_histgram()<\/code> (for density plots).<\/li>\n<li>Key arguments to customize the plots:\n<ul>\n<li><code>color, size, linetype<\/code>: change the line color, size and type, respectively<\/li>\n<li><code>fill<\/code>: change the areas fill color (for bar plots, histograms and density plots)<\/li>\n<li><code>alpha<\/code>: create a semi-transparent color.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/div>\n<div id=\"data-preparation\" class=\"section level2\">\n<h2>Data preparation<\/h2>\n<p>Create some data (<code>wdata<\/code>) containing the weights by sex (M for male; F for female):<\/p>\n<pre class=\"r\"><code>set.seed(1234)\r\nwdata = data.frame(\r\n        sex = factor(rep(c(\"F\", \"M\"), each=200)),\r\n        weight = c(rnorm(200, 55), rnorm(200, 58))\r\n        )\r\n\r\nhead(wdata, 4)<\/code><\/pre>\n<pre><code>##   sex weight\r\n## 1   F   53.8\r\n## 2   F   55.3\r\n## 3   F   56.1\r\n## 4   F   52.7<\/code><\/pre>\n<p>Compute the mean weight by sex using the <code>dplyr<\/code> package. First, the data is grouped by sex and then summarized by computing the mean weight by groups. The operator <code>%&gt;%<\/code> is used to combine multiple operations:<\/p>\n<pre class=\"r\"><code>library(\"dplyr\")\r\nmu &lt;- wdata %&gt;% \r\n  group_by(sex) %&gt;%\r\n  summarise(grp.mean = mean(weight))\r\nmu<\/code><\/pre>\n<pre><code>## # A tibble: 2 x 2\r\n##   sex   grp.mean\r\n##   &lt;fct&gt;    &lt;dbl&gt;\r\n## 1 F         54.9\r\n## 2 M         58.1<\/code><\/pre>\n<\/div>\n<div id=\"loading-required-r-package\" class=\"section level2\">\n<h2>Loading required R package<\/h2>\n<p>Load the ggplot2 package and set the default theme to <code>theme_classic()<\/code> with the legend at the top of the plot:<\/p>\n<pre class=\"r\"><code>library(ggplot2)\r\ntheme_set(\r\n  theme_classic() +\r\n    theme(legend.position = \"top\")\r\n  )<\/code><\/pre>\n<\/div>\n<div id=\"basic-histogram-plots\" class=\"section level2\">\n<h2>Basic histogram plots<\/h2>\n<p>We start by creating a plot, named <code>a<\/code>, that we\u2019ll finish in the next section by adding a layer using the function <code>geom_histogram()<\/code>.<\/p>\n<pre class=\"r\"><code>a &lt;- ggplot(wdata, aes(x = weight))<\/code><\/pre>\n<p>The following R code creates some basic density plots with a vertical line corresponding to the mean value of the weight variable (<code>geom_vline()<\/code>):<\/p>\n<pre class=\"r\"><code># Basic density plots\r\na + geom_histogram(bins = 30, color = \"black\", fill = \"gray\") +\r\n  geom_vline(aes(xintercept = mean(weight)), \r\n             linetype = \"dashed\", size = 0.6)<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/ggplot2\/figures\/013-ggplot-histogram-geom_histogram-basic-plot-1.png\" width=\"384\" \/><\/p>\n<div class=\"notice\">\n<p>Note that, by default:<\/p>\n<ul>\n<li>By default, <code>geom_histogram()<\/code> uses 30 bins - this might not be good default. You can change the number of bins (e.g.: bins = 50) or the bin width (e.g.: binwidth = 0.5)<\/li>\n<li>The y axis corresponds to the count of weight values. If you want to change the plot in order to have the density on y axis, specify the argument <code>y = ..density..<\/code> in <code>aes()<\/code>.<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<div id=\"change-color-by-groups\" class=\"section level2\">\n<h2>Change color by groups<\/h2>\n<p>The following R code will change the histogram plot line and fill color by groups. The functions <code>scale_color_manual()<\/code> and <code>scale_fill_manual()<\/code> are used to specify custom colors for each group.<\/p>\n<p>We\u2019ll proceed as follow:<\/p>\n<ul>\n<li>Change areas fill and add line color by groups (sex)<\/li>\n<li>Add vertical mean lines using <code>geom_vline()<\/code>. Data: <code>mu<\/code>, which contains the mean values of weights by sex (computed in the previous section).<\/li>\n<li>Change color manually:\n<ul>\n<li>use <code>scale_color_manual()<\/code> or <code>scale_colour_manual()<\/code> for changing line color<\/li>\n<li>use <code>scale_fill_manual()<\/code> for changing area fill colors.<\/li>\n<\/ul>\n<\/li>\n<li>Adjust the position of histogram bars by using the argument <code>position<\/code>. Allowed values: \u201cidentity\u201d, \u201cstack\u201d, \u201cdodge\u201d. Default value is \u201cstack\u201d.<\/li>\n<\/ul>\n<pre class=\"r\"><code># Change line color by sex\r\na + geom_histogram(aes(color = sex), fill = \"white\",\r\n                   position = \"identity\") +\r\n  scale_color_manual(values = c(\"#00AFBB\", \"#E7B800\")) \r\n\r\n# change fill and outline color manually \r\na + geom_histogram(aes(color = sex, fill = sex),\r\n                         alpha = 0.4, position = \"identity\") +\r\n  scale_fill_manual(values = c(\"#00AFBB\", \"#E7B800\")) +\r\n  scale_color_manual(values = c(\"#00AFBB\", \"#E7B800\"))<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/ggplot2\/figures\/013-ggplot-histogram-geom_histogram-change-color-by-groups-1.png\" width=\"326.4\" \/><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/ggplot2\/figures\/013-ggplot-histogram-geom_histogram-change-color-by-groups-2.png\" width=\"326.4\" \/><\/p>\n<\/div>\n<div id=\"combine-histogram-and-density-plots\" class=\"section level2\">\n<h2>Combine histogram and density plots<\/h2>\n<ul>\n<li>Plot histogram with density values on y-axis (instead of count values).<\/li>\n<li>Add density plot with transparent density plot<\/li>\n<\/ul>\n<pre class=\"r\"><code># Histogram with density plot\r\na + geom_histogram(aes(y = stat(density)), \r\n                   colour=\"black\", fill=\"white\") +\r\n  geom_density(alpha = 0.2, fill = \"#FF6666\") \r\n     \r\n\r\n# Color by groups\r\na + geom_histogram(aes(y = stat(density), color = sex), \r\n                   fill = \"white\",position = \"identity\")+\r\n  geom_density(aes(color = sex), size = 1) +\r\n  scale_color_manual(values = c(\"#868686FF\", \"#EFC000FF\"))<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/ggplot2\/figures\/013-ggplot-histogram-geom_histogram-combine-density-and-histogram-1.png\" width=\"316.8\" \/><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/ggplot2\/figures\/013-ggplot-histogram-geom_histogram-combine-density-and-histogram-2.png\" width=\"316.8\" \/><\/p>\n<\/div>\n<div id=\"conclusion\" class=\"section level2\">\n<h2>Conclusion<\/h2>\n<p>This article describes how to create histogram plots using the ggplot2 package.<\/p>\n<\/div>\n<\/div>\n<p><!--end rdoc--><\/p>\n","protected":false},"excerpt":{"rendered":"<p>A histogram plot is an alternative to Density plot for visualizing the distribution of a continuous variable. This chart represents the distribution of a continuous variable by dividing into bins and counting the number of observations in each bin. This article describes how to create Histogram plots using the ggplot2 R package.<\/p>\n","protected":false},"author":1,"featured_media":7710,"parent":0,"menu_order":20,"comment_status":"open","ping_status":"closed","template":"","class_list":["post-8325","dt_lessons","type-dt_lessons","status-publish","has-post-thumbnail","hentry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>GGPlot Histogram Best Reference - Datanovia<\/title>\n<meta name=\"description\" content=\"This article describes how to create Histogram plots using the ggplot2 R package. A histogram plot shows the distribution of a variable by dividing into bins\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.datanovia.com\/en\/lessons\/ggplot-histogram\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"GGPlot Histogram Best Reference - Datanovia\" \/>\n<meta property=\"og:description\" content=\"This article describes how to create Histogram plots using the ggplot2 R package. A histogram plot shows the distribution of a variable by dividing into bins\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.datanovia.com\/en\/lessons\/ggplot-histogram\/\" \/>\n<meta property=\"og:site_name\" content=\"Datanovia\" \/>\n<meta property=\"article:modified_time\" content=\"2019-11-17T22:11:56+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/IMG_4919.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"512\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/ggplot-histogram\/\",\"url\":\"https:\/\/www.datanovia.com\/en\/lessons\/ggplot-histogram\/\",\"name\":\"GGPlot Histogram Best Reference - Datanovia\",\"isPartOf\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/ggplot-histogram\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/ggplot-histogram\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/IMG_4919.jpg\",\"datePublished\":\"2018-12-31T07:12:03+00:00\",\"dateModified\":\"2019-11-17T22:11:56+00:00\",\"description\":\"This article describes how to create Histogram plots using the ggplot2 R package. A histogram plot shows the distribution of a variable by dividing into bins\",\"breadcrumb\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/ggplot-histogram\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.datanovia.com\/en\/lessons\/ggplot-histogram\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/ggplot-histogram\/#primaryimage\",\"url\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/IMG_4919.jpg\",\"contentUrl\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/IMG_4919.jpg\",\"width\":1024,\"height\":512},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/ggplot-histogram\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.datanovia.com\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Lessons\",\"item\":\"https:\/\/www.datanovia.com\/en\/lessons\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"GGPlot Histogram\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.datanovia.com\/en\/#website\",\"url\":\"https:\/\/www.datanovia.com\/en\/\",\"name\":\"Datanovia\",\"description\":\"Data Mining and Statistics for Decision Support\",\"publisher\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.datanovia.com\/en\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.datanovia.com\/en\/#organization\",\"name\":\"Datanovia\",\"url\":\"https:\/\/www.datanovia.com\/en\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png\",\"contentUrl\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png\",\"width\":98,\"height\":99,\"caption\":\"Datanovia\"},\"image\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"GGPlot Histogram Best Reference - Datanovia","description":"This article describes how to create Histogram plots using the ggplot2 R package. A histogram plot shows the distribution of a variable by dividing into bins","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.datanovia.com\/en\/lessons\/ggplot-histogram\/","og_locale":"en_US","og_type":"article","og_title":"GGPlot Histogram Best Reference - Datanovia","og_description":"This article describes how to create Histogram plots using the ggplot2 R package. A histogram plot shows the distribution of a variable by dividing into bins","og_url":"https:\/\/www.datanovia.com\/en\/lessons\/ggplot-histogram\/","og_site_name":"Datanovia","article_modified_time":"2019-11-17T22:11:56+00:00","og_image":[{"width":1024,"height":512,"url":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/IMG_4919.jpg","type":"image\/jpeg"}],"twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.datanovia.com\/en\/lessons\/ggplot-histogram\/","url":"https:\/\/www.datanovia.com\/en\/lessons\/ggplot-histogram\/","name":"GGPlot Histogram Best Reference - Datanovia","isPartOf":{"@id":"https:\/\/www.datanovia.com\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.datanovia.com\/en\/lessons\/ggplot-histogram\/#primaryimage"},"image":{"@id":"https:\/\/www.datanovia.com\/en\/lessons\/ggplot-histogram\/#primaryimage"},"thumbnailUrl":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/IMG_4919.jpg","datePublished":"2018-12-31T07:12:03+00:00","dateModified":"2019-11-17T22:11:56+00:00","description":"This article describes how to create Histogram plots using the ggplot2 R package. A histogram plot shows the distribution of a variable by dividing into bins","breadcrumb":{"@id":"https:\/\/www.datanovia.com\/en\/lessons\/ggplot-histogram\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.datanovia.com\/en\/lessons\/ggplot-histogram\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.datanovia.com\/en\/lessons\/ggplot-histogram\/#primaryimage","url":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/IMG_4919.jpg","contentUrl":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/IMG_4919.jpg","width":1024,"height":512},{"@type":"BreadcrumbList","@id":"https:\/\/www.datanovia.com\/en\/lessons\/ggplot-histogram\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.datanovia.com\/en\/"},{"@type":"ListItem","position":2,"name":"Lessons","item":"https:\/\/www.datanovia.com\/en\/lessons\/"},{"@type":"ListItem","position":3,"name":"GGPlot Histogram"}]},{"@type":"WebSite","@id":"https:\/\/www.datanovia.com\/en\/#website","url":"https:\/\/www.datanovia.com\/en\/","name":"Datanovia","description":"Data Mining and Statistics for Decision Support","publisher":{"@id":"https:\/\/www.datanovia.com\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.datanovia.com\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.datanovia.com\/en\/#organization","name":"Datanovia","url":"https:\/\/www.datanovia.com\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/","url":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png","contentUrl":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png","width":98,"height":99,"caption":"Datanovia"},"image":{"@id":"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/"}}]}},"multi-rating":{"mr_rating_results":[]},"_links":{"self":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/dt_lessons\/8325","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/dt_lessons"}],"about":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/types\/dt_lessons"}],"author":[{"embeddable":true,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/comments?post=8325"}],"version-history":[{"count":0,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/dt_lessons\/8325\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/media\/7710"}],"wp:attachment":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/media?parent=8325"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}