{"id":7687,"date":"2018-10-18T22:40:39","date_gmt":"2018-10-18T20:40:39","guid":{"rendered":"https:\/\/www.datanovia.com\/en\/?post_type=dt_lessons&#038;p=7687"},"modified":"2018-10-20T17:59:03","modified_gmt":"2018-10-20T15:59:03","slug":"agglomerative-hierarchical-clustering","status":"publish","type":"dt_lessons","link":"https:\/\/www.datanovia.com\/en\/lessons\/agglomerative-hierarchical-clustering\/","title":{"rendered":"Agglomerative Hierarchical Clustering"},"content":{"rendered":"<div id=\"rdoc\">\n<p>The <strong>agglomerative clustering<\/strong> is the most common type of hierarchical clustering used to group objects in clusters based on their similarity. It\u2019s also known as <em>AGNES<\/em> (<em>Agglomerative Nesting<\/em>). The algorithm starts by treating each object as a singleton cluster. Next, pairs of clusters are successively merged until all clusters have been merged into one big cluster containing all objects. The result is a tree-based representation of the objects, named <em>dendrogram<\/em>.<\/p>\n<div class=\"block\">\n<p>In this article, we start by describing the agglomerative clustering algorithms. Next, we provide R lab sections with many examples for computing and visualizing hierarchical clustering. We continue by explaining how to interpret dendrogram. Finally, we provide R codes for cutting dendrograms into groups.<\/p>\n<\/div>\n<p>Contents:<\/p>\n<div id=\"TOC\">\n<ul>\n<li><a href=\"#algorithm\">Algorithm<\/a><\/li>\n<li><a href=\"#steps-to-agglomerative-hierarchical-clustering\">Steps to agglomerative hierarchical clustering<\/a>\n<ul>\n<li><a href=\"#data-structure-and-preparation\">Data structure and preparation<\/a><\/li>\n<li><a href=\"#similarity-measures\">Similarity measures<\/a><\/li>\n<li><a href=\"#linkage\">Linkage<\/a><\/li>\n<li><a href=\"#dendrogram\">Dendrogram<\/a><\/li>\n<\/ul>\n<\/li>\n<li><a href=\"#verify-the-cluster-tree\">Verify the cluster tree<\/a><\/li>\n<li><a href=\"#cut-the-dendrogram-into-different-groups\">Cut the dendrogram into different groups<\/a><\/li>\n<li><a href=\"#cluster-r-package\">Cluster R package<\/a><\/li>\n<li><a href=\"#application-of-hierarchical-clustering-to-gene-expression-data-analysis\">Application of hierarchical clustering to gene expression data analysis<\/a><\/li>\n<li><a href=\"#summary\">Summary<\/a><\/li>\n<\/ul>\n<\/div>\n<div class='dt-sc-hr-invisible-medium  '><\/div>\n<div class='dt-sc-ico-content type1'><div class='custom-icon' ><a href='https:\/\/www.datanovia.com\/en\/product\/practical-guide-to-cluster-analysis-in-r\/' target='_blank'><span class='fa fa-book'><\/span><\/a><\/div><h4><a href='https:\/\/www.datanovia.com\/en\/product\/practical-guide-to-cluster-analysis-in-r\/' target='_blank'> Related Book <\/a><\/h4>Practical Guide to Cluster Analysis in R<\/div>\n<div class='dt-sc-hr-invisible-medium  '><\/div>\n<div id=\"algorithm\" class=\"section level2\">\n<h2>Algorithm<\/h2>\n<p>Agglomerative clustering works in a \u201cbottom-up\u201d manner. That is, each object is initially considered as a single-element cluster (leaf). At each step of the algorithm, the two clusters that are the most similar are combined into a new bigger cluster (nodes). This procedure is iterated until all points are member of just one single big cluster (root) (see figure below).<\/p>\n<p>The inverse of agglomerative clustering is <em>divisive clustering<\/em>, which is also known as DIANA (<em>Divise Analysis<\/em>) and it works in a \u201ctop-down\u201d manner. It begins with the root, in which all objects are included in a single cluster. At each step of iteration, the most heterogeneous cluster is divided into two. The process is iterated until all objects are in their own cluster (see figure below).<\/p>\n<p><span class=\"warning\">Note that, agglomerative clustering is good at identifying small clusters. Divisive clustering is good at identifying large clusters. In this article, we\u2019ll focus mainly on agglomerative hierarchical clustering.<\/span><\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/003-hierarchical-clustering-in-r\/images\/hierarchical-clustering-agnes-diana.png\" alt=\"Hierarchical clustering methods\" \/><\/p>\n<\/div>\n<div id=\"steps-to-agglomerative-hierarchical-clustering\" class=\"section level2\">\n<h2>Steps to agglomerative hierarchical clustering<\/h2>\n<p>We\u2019ll follow the steps below to perform agglomerative hierarchical clustering using R software:<\/p>\n<ol style=\"list-style-type: decimal;\">\n<li>Preparing the data<\/li>\n<li>Computing (dis)similarity information between every pair of objects in the data set.<\/li>\n<li>Using linkage function to group objects into hierarchical cluster tree, based on the distance information generated at step 1. Objects\/clusters that are in close proximity are linked together using the linkage function.<\/li>\n<li>Determining where to cut the hierarchical tree into clusters. This creates a partition of the data.<\/li>\n<\/ol>\n<p>We\u2019ll describe each of these steps in the next section.<\/p>\n<div id=\"data-structure-and-preparation\" class=\"section level3\">\n<h3>Data structure and preparation<\/h3>\n<p>The data should be a numeric matrix with:<\/p>\n<ul>\n<li>rows representing observations (individuals);<\/li>\n<li>and columns representing variables.<\/li>\n<\/ul>\n<p>Here, we\u2019ll use the R base USArrests data sets.<\/p>\n<div class=\"warning\">\n<p>Note that, it\u2019s generally recommended to standardize variables in the data set before performing subsequent analysis. Standardization makes variables comparable, when they are measured in different scales. For example one variable can measure the height in meter and another variable can measure the weight in kg. The R function <em>scale<\/em>() can be used for standardization, See ?scale documentation.<\/p>\n<\/div>\n<pre class=\"r\"><code># Load the data\r\ndata(\"USArrests\")\r\n\r\n# Standardize the data\r\ndf &lt;- scale(USArrests)\r\n\r\n# Show the first 6 rows\r\nhead(df, nrow = 6)<\/code><\/pre>\n<pre><code>##            Murder Assault UrbanPop     Rape\r\n## Alabama    1.2426   0.783   -0.521 -0.00342\r\n## Alaska     0.5079   1.107   -1.212  2.48420\r\n## Arizona    0.0716   1.479    0.999  1.04288\r\n## Arkansas   0.2323   0.231   -1.074 -0.18492\r\n## California 0.2783   1.263    1.759  2.06782\r\n## Colorado   0.0257   0.399    0.861  1.86497<\/code><\/pre>\n<\/div>\n<div id=\"similarity-measures\" class=\"section level3\">\n<h3>Similarity measures<\/h3>\n<p>In order to decide which objects\/clusters should be combined or divided, we need methods for measuring the similarity between objects.<\/p>\n<p>There are many methods to calculate the (dis)similarity information, including <a href=\"https:\/\/www.datanovia.com\/en\/lessons\/clustering-distance-measures\/\">Euclidean and manhattan distances<\/a>. In R software, you can use the function <em>dist<\/em>() to compute the distance between every pair of object in a data set. The results of this computation is known as a distance or dissimilarity matrix.<\/p>\n<p>By default, the function <em>dist<\/em>() computes the Euclidean distance between objects; however, it\u2019s possible to indicate other metrics using the argument method. See ?dist for more information.<\/p>\n<p>For example, consider the R base data set USArrests, you can compute the distance matrix as follow:<\/p>\n<pre class=\"r\"><code># Compute the dissimilarity matrix\r\n# df = the standardized data\r\nres.dist &lt;- dist(df, method = \"euclidean\")<\/code><\/pre>\n<div class=\"success\">\n<p>Note that, the function <em>dist<\/em>() computes the distance between the rows of a data matrix using the specified distance measure method.<\/p>\n<\/div>\n<p>To see easily the distance information between objects, we reformat the results of the function <em>dist<\/em>() into a matrix using the <em>as.matrix<\/em>() function. In this matrix, value in the cell formed by the row i, the column j, represents the distance between object i and object j in the original data set. For instance, element 1,1 represents the distance between object 1 and itself (which is zero). Element 1,2 represents the distance between object 1 and object 2, and so on.<\/p>\n<p>The R code below displays the first 6 rows and columns of the distance matrix:<\/p>\n<pre class=\"r\"><code>as.matrix(res.dist)[1:6, 1:6]<\/code><\/pre>\n<pre><code>##            Alabama Alaska Arizona Arkansas California Colorado\r\n## Alabama       0.00   2.70    2.29     1.29       3.26     2.65\r\n## Alaska        2.70   0.00    2.70     2.83       3.01     2.33\r\n## Arizona       2.29   2.70    0.00     2.72       1.31     1.37\r\n## Arkansas      1.29   2.83    2.72     0.00       3.76     2.83\r\n## California    3.26   3.01    1.31     3.76       0.00     1.29\r\n## Colorado      2.65   2.33    1.37     2.83       1.29     0.00<\/code><\/pre>\n<\/div>\n<div id=\"linkage\" class=\"section level3\">\n<h3>Linkage<\/h3>\n<p>The linkage function takes the distance information, returned by the function <em>dist<\/em>(), and groups pairs of objects into clusters based on their similarity. Next, these newly formed clusters are linked to each other to create bigger clusters. This process is iterated until all the objects in the original data set are linked together in a hierarchical tree.<\/p>\n<p>For example, given a distance matrix \u201cres.dist\u201d generated by the function <em>dist<\/em>(), the R base function <em>hclust<\/em>() can be used to create the hierarchical tree.<\/p>\n<p><em>hclust<\/em>() can be used as follow:<\/p>\n<pre class=\"r\"><code>res.hc &lt;- hclust(d = res.dist, method = \"ward.D2\")<\/code><\/pre>\n<ul>\n<li><strong>d<\/strong>: a dissimilarity structure as produced by the <strong>dist()<\/strong> function.<\/li>\n<li><strong>method<\/strong>: The agglomeration (linkage) method to be used for computing distance between clusters. Allowed values is one of \u201cward.D\u201d, \u201cward.D2\u201d, \u201csingle\u201d, \u201ccomplete\u201d, \u201caverage\u201d, \u201cmcquitty\u201d, \u201cmedian\u201d or \u201ccentroid\u201d.<\/li>\n<\/ul>\n<p>There are many cluster agglomeration methods (i.e, linkage methods). The most common linkage methods are described below.<\/p>\n<div class=\"block\">\n<ul>\n<li>Maximum or <em>complete linkage<\/em>: The distance between two clusters is defined as the maximum value of all pairwise distances between the elements in cluster 1 and the elements in cluster 2. It tends to produce more compact clusters.<\/li>\n<li>Minimum or <em>single linkage<\/em>: The distance between two clusters is defined as the minimum value of all pairwise distances between the elements in cluster 1 and the elements in cluster 2. It tends to produce long, \u201cloose\u201d clusters.<\/li>\n<li>Mean or <em>average linkage<\/em>: The distance between two clusters is defined as the average distance between the elements in cluster 1 and the elements in cluster 2.<\/li>\n<li><em>Centroid linkage<\/em>: The distance between two clusters is defined as the distance between the centroid for cluster 1 (a mean vector of length p variables) and the centroid for cluster 2.<\/li>\n<li><em>Ward\u2019s minimum variance method<\/em>: It minimizes the total within-cluster variance. At each step the pair of clusters with minimum between-cluster distance are merged.<\/li>\n<\/ul>\n<\/div>\n<p>Note that, at each stage of the clustering process the two clusters, that have the smallest linkage distance, are linked together.<\/p>\n<div class=\"success\">\n<p>Complete linkage and Ward\u2019s method are generally preferred.<\/p>\n<\/div>\n<\/div>\n<div id=\"dendrogram\" class=\"section level3\">\n<h3>Dendrogram<\/h3>\n<p>Dendrograms correspond to the graphical representation of the hierarchical tree generated by the function <em>hclust<\/em>(). Dendrogram can be produced in R using the base function <em>plot<\/em>(res.hc), where res.hc is the output of <em>hclust<\/em>(). Here, we\u2019ll use the function <em>fviz_dend<\/em>()[ in <em>factoextra<\/em> R package] to produce a beautiful dendrogram.<\/p>\n<p>First install factoextra by typing this: install.packages(\u201cfactoextra\u201d); next visualize the dendrogram as follow:<\/p>\n<pre class=\"r\"><code># cex: label size\r\nlibrary(\"factoextra\")\r\nfviz_dend(res.hc, cex = 0.5)<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/003-hierarchical-clustering-in-r\/figures\/002-agglomerative-clustering-visualize-dendrogram-1.png\" width=\"518.4\" \/><\/p>\n<p>In the dendrogram displayed above, each leaf corresponds to one object. As we move up the tree, objects that are similar to each other are combined into branches, which are themselves fused at a higher height.<\/p>\n<p>The height of the fusion, provided on the vertical axis, indicates the (dis)similarity\/distance between two objects\/clusters. The higher the height of the fusion, the less similar the objects are. This height is known as the <em>cophenetic distance<\/em> between the two objects.<\/p>\n<div class=\"notice\">\n<p>Note that, conclusions about the proximity of two objects can be drawn only based on the height where branches containing those two objects first are fused. We cannot use the proximity of two objects along the horizontal axis as a criteria of their similarity.<\/p>\n<\/div>\n<p>In order to identify sub-groups, we can cut the dendrogram at a certain height as described in the next sections.<\/p>\n<\/div>\n<\/div>\n<div id=\"verify-the-cluster-tree\" class=\"section level2\">\n<h2>Verify the cluster tree<\/h2>\n<p>After linking the objects in a data set into a hierarchical cluster tree, you might want to assess that the distances (i.e., heights) in the tree reflect the original distances accurately.<\/p>\n<p>One way to measure how well the cluster tree generated by the <em>hclust<\/em>() function reflects your data is to compute the correlation between the <em>cophenetic<\/em> distances and the original distance data generated by the <em>dist<\/em>() function. If the clustering is valid, the linking of objects in the cluster tree should have a strong correlation with the distances between objects in the original distance matrix.<\/p>\n<p>The closer the value of the correlation coefficient is to 1, the more accurately the clustering solution reflects your data. Values above 0.75 are felt to be good. The \u201caverage\u201d linkage method appears to produce high values of this statistic. This may be one reason that it is so popular.<\/p>\n<p>The R base function <em>cophenetic<\/em>() can be used to compute the cophenetic distances for hierarchical clustering.<\/p>\n<pre class=\"r\"><code># Compute cophentic distance\r\nres.coph &lt;- cophenetic(res.hc)\r\n\r\n# Correlation between cophenetic distance and\r\n# the original distance\r\ncor(res.dist, res.coph)<\/code><\/pre>\n<pre><code>## [1] 0.698<\/code><\/pre>\n<p>Execute the <em>hclust<\/em>() function again using the average linkage method. Next, call <em>cophenetic<\/em>() to evaluate the clustering solution.<\/p>\n<pre class=\"r\"><code>res.hc2 &lt;- hclust(res.dist, method = \"average\")\r\n\r\ncor(res.dist, cophenetic(res.hc2))<\/code><\/pre>\n<pre><code>## [1] 0.718<\/code><\/pre>\n<p>The correlation coefficient shows that using a different linkage method creates a tree that represents the original distances slightly better.<\/p>\n<\/div>\n<div id=\"cut-the-dendrogram-into-different-groups\" class=\"section level2\">\n<h2>Cut the dendrogram into different groups<\/h2>\n<p>One of the problems with hierarchical clustering is that, it does not tell us how many clusters there are, or where to cut the dendrogram to form clusters.<\/p>\n<p>You can cut the hierarchical tree at a given height in order to partition your data into clusters. The R base function <em>cutree<\/em>() can be used to cut a tree, generated by the <em>hclust<\/em>() function, into several groups either by specifying the desired number of groups or the cut height. It returns a vector containing the cluster number of each observation.<\/p>\n<pre class=\"r\"><code># Cut tree into 4 groups\r\ngrp &lt;- cutree(res.hc, k = 4)\r\nhead(grp, n = 4)<\/code><\/pre>\n<pre><code>##  Alabama   Alaska  Arizona Arkansas \r\n##        1        2        2        3<\/code><\/pre>\n<pre class=\"r\"><code># Number of members in each cluster\r\ntable(grp)<\/code><\/pre>\n<pre><code>## grp\r\n##  1  2  3  4 \r\n##  7 12 19 12<\/code><\/pre>\n<pre class=\"r\"><code># Get the names for the members of cluster 1\r\nrownames(df)[grp == 1]<\/code><\/pre>\n<pre><code>## [1] \"Alabama\"        \"Georgia\"        \"Louisiana\"      \"Mississippi\"   \r\n## [5] \"North Carolina\" \"South Carolina\" \"Tennessee\"<\/code><\/pre>\n<p>The result of the cuts can be visualized easily using the function <em>fviz_dend<\/em>() [in factoextra]:<\/p>\n<p>&nbsp;<\/p>\n<\/p>\n<div class=\"error\">Here, there are contents\/codes hidden to non-premium members. Signup now to read all of our premium contents and to be awarded a certificate of course completion.<br \/>\n<a href='https:\/\/www.datanovia.com\/en\/pricing\/' target='_self'  class='dt-sc-button   medium  '  style=\"background-color:#FF6600;border-color:#FF6600;color:#ffffff;\">Claim Your Membership Now<\/a>.<\/div>\n<p>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/003-hierarchical-clustering-in-r\/figures\/002-agglomerative-clustering-cutree-cut-dendrogram-1.png\" width=\"518.4\" \/><\/p>\n<p>Using the function <em>fviz_cluster<\/em>() [in <em>factoextra<\/em>], we can also visualize the result in a scatter plot. Observations are represented by points in the plot, using principal components. A frame is drawn around each cluster.<\/p>\n<\/p>\n<div class = \"error\">\nHere, there are contents\/codes hidden to non-premium members. Signup now to read all of our premium contents and to be awarded a certificate of course completion.<br \/>\n<a href='https:\/\/www.datanovia.com\/en\/pricing\/' target='_self'  class='dt-sc-button   medium  '  style=\"background-color:#FF6600;border-color:#FF6600;color:#ffffff;\">Claim Your Membership Now<\/a>.\n<\/div>\n<p>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/003-hierarchical-clustering-in-r\/figures\/002-agglomerative-clustering-cluster-plot-1.png\" width=\"576\" \/><\/p>\n<\/div>\n<div id=\"cluster-r-package\" class=\"section level2\">\n<h2>Cluster R package<\/h2>\n<p>The R package <em>cluster<\/em> makes it easy to perform cluster analysis in R. It provides the function <em>agnes<\/em>() and <em>diana<\/em>() for computing agglomerative and divisive clustering, respectively. These functions perform all the necessary steps for you. You don\u2019t need to execute the <em>scale<\/em>(), <em>dist<\/em>() and <em>hclust<\/em>() function separately.<\/p>\n<p>The functions can be executed as follow:<\/p>\n<pre class=\"r\"><code>library(\"cluster\")\r\n# Agglomerative Nesting (Hierarchical Clustering)\r\nres.agnes &lt;- agnes(x = USArrests, # data matrix\r\n                   stand = TRUE, # Standardize the data\r\n                   metric = \"euclidean\", # metric for distance matrix\r\n                   method = \"ward\" # Linkage method\r\n                   )\r\n\r\n# DIvisive ANAlysis Clustering\r\nres.diana &lt;- diana(x = USArrests, # data matrix\r\n                   stand = TRUE, # standardize the data\r\n                   metric = \"euclidean\" # metric for distance matrix\r\n                   )<\/code><\/pre>\n<p>After running <em>agnes<\/em>() and <em>diana<\/em>(), you can use the function <em>fviz_dend<\/em>()[in <em>factoextra<\/em>] to visualize the output:<\/p>\n<pre class=\"r\"><code>fviz_dend(res.agnes, cex = 0.6, k = 4)<\/code><\/pre>\n<\/div>\n<div id=\"application-of-hierarchical-clustering-to-gene-expression-data-analysis\" class=\"section level2\">\n<h2>Application of hierarchical clustering to gene expression data analysis<\/h2>\n<p>In <em>gene expression data analysis<\/em>, <em>clustering<\/em> is generaly used as one of the first step to explore the data. We are interested in whether there are groups of genes or groups of samples that have similar gene expression patterns.<\/p>\n<p>Several <a href=\"https:\/\/www.datanovia.com\/en\/lessons\/clustering-distance-measures\/\">clustering distance measures<\/a> have been described for assessing the similarity or the dissimilarity between items, in order to decide which items have to be grouped together or not. These measures can be used to cluster genes or samples that are similar.<\/p>\n<p>For most common clustering softwares, the default distance measure is the Euclidean distance. The most popular methods for gene expression data are to use log2(expression + 0.25), correlation distance and complete linkage clustering agglomerative-clustering.<\/p>\n<p>Single and Complete linkage give the same dendrogram whether you use the raw data, the log of the data or any other transformation of the data that preserves the order because what matters is which ones have the smallest distance. The other methods are sensitive to the measurement scale.<\/p>\n<div class=\"notice\">\n<p>Note that, when the data are scaled, the Euclidean distance of the z-scores is the same as correlation distance.<\/p>\n<p>Pearson\u2019s correlation is quite sensitive to outliers. When clustering genes, it is important to be aware of the possible impact of outliers. An alternative option is to use Spearman\u2019s correlation instead of Pearson\u2019s correlation.<\/p>\n<\/div>\n<p>In principle it is possible to cluster all the genes, although visualizing a huge dendrogram might be problematic. Usually, some type of preliminary analysis, such as differential expression analysis is used to select genes for clustering.<\/p>\n<p>Selecting genes based on differential expression analysis removes genes which are likely to have only chance patterns. This should enhance the patterns found in the gene clusters.<\/p>\n<\/div>\n<div id=\"summary\" class=\"section level2\">\n<h2>Summary<\/h2>\n<p>Hierarchical clustering is a cluster analysis method, which produce a tree-based representation (i.e.: dendrogram) of a data. Objects in the dendrogram are linked together based on their similarity.<\/p>\n<p>To perform hierarchical cluster analysis in R, the first step is to calculate the pairwise distance matrix using the function <em>dist<\/em>(). Next, the result of this computation is used by the <em>hclust<\/em>() function to produce the hierarchical tree. Finally, you can use the function <em>fviz_dend<\/em>() [in factoextra R package] to plot easily a beautiful dendrogram.<\/p>\n<p>It\u2019s also possible to cut the tree at a given height for partitioning the data into multiple groups (R function <em>cutree<\/em>()).<\/p>\n<\/div>\n<\/div>\n<p><!--end rdoc--><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this article, we start by describing the agglomerative clustering algorithms. Next, we provide R lab sections with many examples for computing and visualizing hierarchical clustering. We continue by explaining how to interpret dendrogram. Finally, we provide R codes for cutting dendrograms into groups.<\/p>\n","protected":false},"author":1,"featured_media":8011,"parent":0,"menu_order":0,"comment_status":"open","ping_status":"closed","template":"","class_list":["post-7687","dt_lessons","type-dt_lessons","status-publish","has-post-thumbnail","hentry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Agglomerative Hierarchical Clustering - Datanovia<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.datanovia.com\/en\/lessons\/agglomerative-hierarchical-clustering\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Agglomerative Hierarchical Clustering - Datanovia\" \/>\n<meta property=\"og:description\" content=\"In this article, we start by describing the agglomerative clustering algorithms. Next, we provide R lab sections with many examples for computing and visualizing hierarchical clustering. We continue by explaining how to interpret dendrogram. Finally, we provide R codes for cutting dendrograms into groups.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.datanovia.com\/en\/lessons\/agglomerative-hierarchical-clustering\/\" \/>\n<meta property=\"og:site_name\" content=\"Datanovia\" \/>\n<meta property=\"article:modified_time\" content=\"2018-10-20T15:59:03+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/IMG_0066.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"512\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"13 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/agglomerative-hierarchical-clustering\/\",\"url\":\"https:\/\/www.datanovia.com\/en\/lessons\/agglomerative-hierarchical-clustering\/\",\"name\":\"Agglomerative Hierarchical Clustering - Datanovia\",\"isPartOf\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/agglomerative-hierarchical-clustering\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/agglomerative-hierarchical-clustering\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/IMG_0066.jpg\",\"datePublished\":\"2018-10-18T20:40:39+00:00\",\"dateModified\":\"2018-10-20T15:59:03+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/agglomerative-hierarchical-clustering\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.datanovia.com\/en\/lessons\/agglomerative-hierarchical-clustering\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/agglomerative-hierarchical-clustering\/#primaryimage\",\"url\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/IMG_0066.jpg\",\"contentUrl\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/IMG_0066.jpg\",\"width\":1024,\"height\":512},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/agglomerative-hierarchical-clustering\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.datanovia.com\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Lessons\",\"item\":\"https:\/\/www.datanovia.com\/en\/lessons\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Agglomerative Hierarchical Clustering\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.datanovia.com\/en\/#website\",\"url\":\"https:\/\/www.datanovia.com\/en\/\",\"name\":\"Datanovia\",\"description\":\"Data Mining and Statistics for Decision Support\",\"publisher\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.datanovia.com\/en\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.datanovia.com\/en\/#organization\",\"name\":\"Datanovia\",\"url\":\"https:\/\/www.datanovia.com\/en\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png\",\"contentUrl\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png\",\"width\":98,\"height\":99,\"caption\":\"Datanovia\"},\"image\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Agglomerative Hierarchical Clustering - Datanovia","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.datanovia.com\/en\/lessons\/agglomerative-hierarchical-clustering\/","og_locale":"en_US","og_type":"article","og_title":"Agglomerative Hierarchical Clustering - Datanovia","og_description":"In this article, we start by describing the agglomerative clustering algorithms. Next, we provide R lab sections with many examples for computing and visualizing hierarchical clustering. We continue by explaining how to interpret dendrogram. Finally, we provide R codes for cutting dendrograms into groups.","og_url":"https:\/\/www.datanovia.com\/en\/lessons\/agglomerative-hierarchical-clustering\/","og_site_name":"Datanovia","article_modified_time":"2018-10-20T15:59:03+00:00","og_image":[{"width":1024,"height":512,"url":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/IMG_0066.jpg","type":"image\/jpeg"}],"twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"13 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.datanovia.com\/en\/lessons\/agglomerative-hierarchical-clustering\/","url":"https:\/\/www.datanovia.com\/en\/lessons\/agglomerative-hierarchical-clustering\/","name":"Agglomerative Hierarchical Clustering - Datanovia","isPartOf":{"@id":"https:\/\/www.datanovia.com\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.datanovia.com\/en\/lessons\/agglomerative-hierarchical-clustering\/#primaryimage"},"image":{"@id":"https:\/\/www.datanovia.com\/en\/lessons\/agglomerative-hierarchical-clustering\/#primaryimage"},"thumbnailUrl":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/IMG_0066.jpg","datePublished":"2018-10-18T20:40:39+00:00","dateModified":"2018-10-20T15:59:03+00:00","breadcrumb":{"@id":"https:\/\/www.datanovia.com\/en\/lessons\/agglomerative-hierarchical-clustering\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.datanovia.com\/en\/lessons\/agglomerative-hierarchical-clustering\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.datanovia.com\/en\/lessons\/agglomerative-hierarchical-clustering\/#primaryimage","url":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/IMG_0066.jpg","contentUrl":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/IMG_0066.jpg","width":1024,"height":512},{"@type":"BreadcrumbList","@id":"https:\/\/www.datanovia.com\/en\/lessons\/agglomerative-hierarchical-clustering\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.datanovia.com\/en\/"},{"@type":"ListItem","position":2,"name":"Lessons","item":"https:\/\/www.datanovia.com\/en\/lessons\/"},{"@type":"ListItem","position":3,"name":"Agglomerative Hierarchical Clustering"}]},{"@type":"WebSite","@id":"https:\/\/www.datanovia.com\/en\/#website","url":"https:\/\/www.datanovia.com\/en\/","name":"Datanovia","description":"Data Mining and Statistics for Decision Support","publisher":{"@id":"https:\/\/www.datanovia.com\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.datanovia.com\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.datanovia.com\/en\/#organization","name":"Datanovia","url":"https:\/\/www.datanovia.com\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/","url":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png","contentUrl":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png","width":98,"height":99,"caption":"Datanovia"},"image":{"@id":"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/"}}]}},"multi-rating":{"mr_rating_results":[]},"_links":{"self":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/dt_lessons\/7687","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/dt_lessons"}],"about":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/types\/dt_lessons"}],"author":[{"embeddable":true,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/comments?post=7687"}],"version-history":[{"count":1,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/dt_lessons\/7687\/revisions"}],"predecessor-version":[{"id":7688,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/dt_lessons\/7687\/revisions\/7688"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/media\/8011"}],"wp:attachment":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/media?parent=7687"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}