{"id":7646,"date":"2018-10-14T15:10:06","date_gmt":"2018-10-14T15:10:06","guid":{"rendered":"https:\/\/www.datanovia.com\/en\/?post_type=dt_lessons&#038;p=7646"},"modified":"2018-10-20T14:42:10","modified_gmt":"2018-10-20T12:42:10","slug":"cluster-analysis-example-quick-start-r-code","status":"publish","type":"dt_lessons","link":"https:\/\/www.datanovia.com\/en\/lessons\/cluster-analysis-example-quick-start-r-code\/","title":{"rendered":"Cluster Analysis Example: Quick Start R Code"},"content":{"rendered":"<div id=\"rdoc\">\n<p>This chapter describes a <strong>cluster analysis example<\/strong> using R software. We provide a quick start R code to compute and visualize K-means and hierarchical clustering.<\/p>\n<div class='dt-sc-hr-invisible-medium  '><\/div>\n<div class='dt-sc-ico-content type1'><div class='custom-icon' ><a href='https:\/\/www.datanovia.com\/en\/product\/practical-guide-to-cluster-analysis-in-r\/' target='_blank'><span class='fa fa-book'><\/span><\/a><\/div><h4><a href='https:\/\/www.datanovia.com\/en\/product\/practical-guide-to-cluster-analysis-in-r\/' target='_blank'> Related Book <\/a><\/h4>Practical Guide to Cluster Analysis in R<\/div>\n<div class='dt-sc-hr-invisible-medium  '><\/div>\n<div id=\"loading-required-r-packages\" class=\"section level2\">\n<h2>Loading required R packages<\/h2>\n<ul>\n<li><code>cluster<\/code> for cluster analysis<\/li>\n<li><code>factoextra<\/code> for cluster visualization<\/li>\n<\/ul>\n<pre class=\"r\"><code>library(cluster)\r\nlibrary(factoextra)<\/code><\/pre>\n<\/div>\n<div id=\"data-preparation\" class=\"section level2\">\n<h2>Data preparation<\/h2>\n<p>We\u2019ll use the demo data set USArrests. We start by standardizing the data:<\/p>\n<pre class=\"r\"><code>mydata &lt;- scale(USArrests) <\/code><\/pre>\n<\/div>\n<div id=\"k-means-clustering\" class=\"section level2\">\n<h2>K-means clustering<\/h2>\n<p>K-means is a clustering techniques that subdivide the data sets into a set of k groups, where k is the number of groups pre-specified by the analyst.<\/p>\n<p>The following R codes show how to determine the optimal number of clusters and how to compute k-means and PAM clustering in R.<\/p>\n<ol style=\"list-style-type: decimal;\">\n<li><strong>Determining the optimal number of clusters<\/strong>: use <code>factoextra::fviz_nbclust()<\/code><\/li>\n<\/ol>\n<pre class=\"r\"><code>fviz_nbclust(mydata, kmeans, method = \"gap_stat\")<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/001-data-clustering-basics\/figures\/007-cluster-analysis-example-optimal-number-of-clusters-1.png\" width=\"384\" \/><\/p>\n<p>Suggested number of cluster: 3<\/p>\n<ol style=\"list-style-type: decimal;\" start=\"2\">\n<li><strong>Compute and visualize k-means clustering<\/strong>:<\/li>\n<\/ol>\n<pre class=\"r\"><code>set.seed(123) # for reproducibility\r\nkm.res &lt;- kmeans(mydata, 3, nstart = 25)\r\n# Visualize\r\nfviz_cluster(km.res, data = mydata, palette = \"jco\",\r\n             ggtheme = theme_minimal())<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/001-data-clustering-basics\/figures\/007-cluster-analysis-example-k-means-plot-ggplot2-factoextra-1.png\" width=\"480\" \/><\/p>\n<\/div>\n<div id=\"hierarchical-clustering\" class=\"section level2\">\n<h2>Hierarchical clustering<\/h2>\n<p>Hierarchical clustering is an alternative approach to partitioning clustering for identifying groups in the data set. It does not require to pre-specify the number of clusters to be generated.<\/p>\n<p>The result of hierarchical clustering is a tree-based representation of the objects, which is also known as dendrogram. Observations can be subdivided into groups by cutting the dendrogram at a desired similarity level.<\/p>\n<ul>\n<li>Computation: R function: <code>hclust()<\/code>. It takes a dissimilarity matrix as an input, which is calculated using the function <code>dist()<\/code>.<\/li>\n<li>Visualization: <code>fviz_dend()<\/code> [in factoextra]<\/li>\n<\/ul>\n<p>R code to compute and visualize hierarchical clustering:<\/p>\n<pre class=\"r\"><code>res.hc &lt;- hclust(dist(mydata),  method = \"ward.D2\")\r\nfviz_dend(res.hc, cex = 0.5, k = 4, palette = \"jco\") <\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/001-data-clustering-basics\/figures\/007-cluster-analysis-example-cluster-analysis-1.png\" width=\"480\" \/><\/p>\n<p>A heatmap is another way to visualize hierarchical clustering. It\u2019s also called a false colored image, where data values are transformed to color scale. Heat maps allow us to simultaneously visualize groups of samples and features. You can easily create a pretty heatmap using the R package <code>pheatmap<\/code>.<\/p>\n<p>In heatmap, generally, columns are samples and rows are variables. Therefore we start by transposing the data before creating the heatmap.<\/p>\n<pre class=\"r\"><code>library(pheatmap)\r\npheatmap(t(mydata), cutree_cols = 4)<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/001-data-clustering-basics\/figures\/007-cluster-analysis-example-heatmap-1.png\" width=\"672\" \/><\/p>\n<\/div>\n<div id=\"summary\" class=\"section level2\">\n<h2>Summary<\/h2>\n<p>This chapter presents examples of R code to compute and visualize k-means and hierarchical clustering.<\/p>\n<\/div>\n<\/div>\n<p><!--end rdoc--><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This chapter describes a cluster analysis example using R software. We provide a quick start R code to compute and visualize K-means and hierarchical clustering.<\/p>\n","protected":false},"author":1,"featured_media":8018,"parent":0,"menu_order":0,"comment_status":"open","ping_status":"closed","template":"","class_list":["post-7646","dt_lessons","type-dt_lessons","status-publish","has-post-thumbnail","hentry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Cluster Analysis Example: Quick Start R Code - Datanovia<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.datanovia.com\/en\/lessons\/cluster-analysis-example-quick-start-r-code\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Cluster Analysis Example: Quick Start R Code - Datanovia\" \/>\n<meta property=\"og:description\" content=\"This chapter describes a cluster analysis example using R software. We provide a quick start R code to compute and visualize K-means and hierarchical clustering.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.datanovia.com\/en\/lessons\/cluster-analysis-example-quick-start-r-code\/\" \/>\n<meta property=\"og:site_name\" content=\"Datanovia\" \/>\n<meta property=\"article:modified_time\" content=\"2018-10-20T12:42:10+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/P1030210.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"512\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/cluster-analysis-example-quick-start-r-code\/\",\"url\":\"https:\/\/www.datanovia.com\/en\/lessons\/cluster-analysis-example-quick-start-r-code\/\",\"name\":\"Cluster Analysis Example: Quick Start R Code - Datanovia\",\"isPartOf\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/cluster-analysis-example-quick-start-r-code\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/cluster-analysis-example-quick-start-r-code\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/P1030210.jpg\",\"datePublished\":\"2018-10-14T15:10:06+00:00\",\"dateModified\":\"2018-10-20T12:42:10+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/cluster-analysis-example-quick-start-r-code\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.datanovia.com\/en\/lessons\/cluster-analysis-example-quick-start-r-code\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/cluster-analysis-example-quick-start-r-code\/#primaryimage\",\"url\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/P1030210.jpg\",\"contentUrl\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/P1030210.jpg\",\"width\":1024,\"height\":512},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/cluster-analysis-example-quick-start-r-code\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.datanovia.com\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Lessons\",\"item\":\"https:\/\/www.datanovia.com\/en\/lessons\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Cluster Analysis Example: Quick Start R Code\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.datanovia.com\/en\/#website\",\"url\":\"https:\/\/www.datanovia.com\/en\/\",\"name\":\"Datanovia\",\"description\":\"Data Mining and Statistics for Decision Support\",\"publisher\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.datanovia.com\/en\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.datanovia.com\/en\/#organization\",\"name\":\"Datanovia\",\"url\":\"https:\/\/www.datanovia.com\/en\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png\",\"contentUrl\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png\",\"width\":98,\"height\":99,\"caption\":\"Datanovia\"},\"image\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Cluster Analysis Example: Quick Start R Code - Datanovia","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.datanovia.com\/en\/lessons\/cluster-analysis-example-quick-start-r-code\/","og_locale":"en_US","og_type":"article","og_title":"Cluster Analysis Example: Quick Start R Code - Datanovia","og_description":"This chapter describes a cluster analysis example using R software. We provide a quick start R code to compute and visualize K-means and hierarchical clustering.","og_url":"https:\/\/www.datanovia.com\/en\/lessons\/cluster-analysis-example-quick-start-r-code\/","og_site_name":"Datanovia","article_modified_time":"2018-10-20T12:42:10+00:00","og_image":[{"width":1024,"height":512,"url":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/P1030210.jpg","type":"image\/jpeg"}],"twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.datanovia.com\/en\/lessons\/cluster-analysis-example-quick-start-r-code\/","url":"https:\/\/www.datanovia.com\/en\/lessons\/cluster-analysis-example-quick-start-r-code\/","name":"Cluster Analysis Example: Quick Start R Code - Datanovia","isPartOf":{"@id":"https:\/\/www.datanovia.com\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.datanovia.com\/en\/lessons\/cluster-analysis-example-quick-start-r-code\/#primaryimage"},"image":{"@id":"https:\/\/www.datanovia.com\/en\/lessons\/cluster-analysis-example-quick-start-r-code\/#primaryimage"},"thumbnailUrl":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/P1030210.jpg","datePublished":"2018-10-14T15:10:06+00:00","dateModified":"2018-10-20T12:42:10+00:00","breadcrumb":{"@id":"https:\/\/www.datanovia.com\/en\/lessons\/cluster-analysis-example-quick-start-r-code\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.datanovia.com\/en\/lessons\/cluster-analysis-example-quick-start-r-code\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.datanovia.com\/en\/lessons\/cluster-analysis-example-quick-start-r-code\/#primaryimage","url":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/P1030210.jpg","contentUrl":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/P1030210.jpg","width":1024,"height":512},{"@type":"BreadcrumbList","@id":"https:\/\/www.datanovia.com\/en\/lessons\/cluster-analysis-example-quick-start-r-code\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.datanovia.com\/en\/"},{"@type":"ListItem","position":2,"name":"Lessons","item":"https:\/\/www.datanovia.com\/en\/lessons\/"},{"@type":"ListItem","position":3,"name":"Cluster Analysis Example: Quick Start R Code"}]},{"@type":"WebSite","@id":"https:\/\/www.datanovia.com\/en\/#website","url":"https:\/\/www.datanovia.com\/en\/","name":"Datanovia","description":"Data Mining and Statistics for Decision Support","publisher":{"@id":"https:\/\/www.datanovia.com\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.datanovia.com\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.datanovia.com\/en\/#organization","name":"Datanovia","url":"https:\/\/www.datanovia.com\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/","url":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png","contentUrl":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png","width":98,"height":99,"caption":"Datanovia"},"image":{"@id":"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/"}}]}},"multi-rating":{"mr_rating_results":[]},"_links":{"self":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/dt_lessons\/7646","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/dt_lessons"}],"about":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/types\/dt_lessons"}],"author":[{"embeddable":true,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/comments?post=7646"}],"version-history":[{"count":0,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/dt_lessons\/7646\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/media\/8018"}],"wp:attachment":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/media?parent=7646"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}