{"id":7645,"date":"2018-10-14T15:03:28","date_gmt":"2018-10-14T15:03:28","guid":{"rendered":"https:\/\/www.datanovia.com\/en\/?post_type=dt_lessons&#038;p=7645"},"modified":"2018-10-20T14:42:48","modified_gmt":"2018-10-20T12:42:48","slug":"clustering-distance-measures","status":"publish","type":"dt_lessons","link":"https:\/\/www.datanovia.com\/en\/lessons\/clustering-distance-measures\/","title":{"rendered":"Clustering Distance Measures"},"content":{"rendered":"<p>&nbsp;<\/p>\n<div id=\"rdoc\">\n<p>The classification of observations into groups requires some methods for computing the <strong>distance<\/strong> or the (dis)<strong>similarity<\/strong> between each pair of observations. The result of this computation is known as a dissimilarity or <strong>distance matrix<\/strong>.<\/p>\n<div class=\"block\">\n<p>There are many methods to calculate this distance information. In this article, we describe the common <strong>distance measures<\/strong> and provide R codes for computing and visualizing distances.<\/p>\n<\/div>\n<p>Contents:<\/p>\n<div id=\"TOC\">\n<ul>\n<li><a href=\"#methods-for-measuring-distances\">Methods for measuring distances<\/a><\/li>\n<li><a href=\"#what-type-of-distance-measures-should-we-choose\">What type of distance measures should we choose?<\/a><\/li>\n<li><a href=\"#data-standardization\">Data standardization<\/a><\/li>\n<li><a href=\"#distance-matrix-computation\">Distance matrix computation<\/a>\n<ul>\n<li><a href=\"#data-preparation\">Data preparation<\/a><\/li>\n<li><a href=\"#r-functions-and-packages\">R functions and packages<\/a><\/li>\n<li><a href=\"#computing-euclidean-distance\">Computing euclidean distance<\/a><\/li>\n<li><a href=\"#computing-correlation-based-distances\">Computing correlation based distances<\/a><\/li>\n<li><a href=\"#computing-distances-for-mixed-data\">Computing distances for mixed data<\/a><\/li>\n<\/ul>\n<\/li>\n<li><a href=\"#visualizing-distance-matrices\">Visualizing distance matrices<\/a><\/li>\n<li><a href=\"#summary\">Summary<\/a><\/li>\n<\/ul>\n<\/div>\n<div class='dt-sc-hr-invisible-medium  '><\/div>\n<div class='dt-sc-ico-content type1'><div class='custom-icon' ><a href='https:\/\/www.datanovia.com\/en\/product\/practical-guide-to-cluster-analysis-in-r\/' target='_blank'><span class='fa fa-book'><\/span><\/a><\/div><h4><a href='https:\/\/www.datanovia.com\/en\/product\/practical-guide-to-cluster-analysis-in-r\/' target='_blank'> Related Book <\/a><\/h4>Practical Guide to Cluster Analysis in R<\/div>\n<div class='dt-sc-hr-invisible-medium  '><\/div>\n<div id=\"methods-for-measuring-distances\" class=\"section level2\">\n<h2>Methods for measuring distances<\/h2>\n<p>The choice of distance measures is a critical step in clustering. It defines how the similarity of two elements (x, y) is calculated and it will influence the shape of the clusters.<\/p>\n<p>The classical methods for distance measures are <em>Euclidean<\/em> and <em>Manhattan distances<\/em>, which are defined as follow:<\/p>\n<ol style=\"list-style-type: decimal;\">\n<li><em>Euclidean distance<\/em>:<\/li>\n<\/ol>\n<p><span class=\"math display\">\\[<br \/>\nd_{euc}(x,y) = \\sqrt{\\sum_{i=1}^n(x_i - y_i)^2}<br \/>\n\\]<\/span><\/p>\n<ol style=\"list-style-type: decimal;\" start=\"2\">\n<li><em>Manhattan distance<\/em>:<\/li>\n<\/ol>\n<p><span class=\"math display\">\\[<br \/>\nd_{man}(x,y) = \\sum_{i=1}^n |{(x_i - y_i)|}<br \/>\n\\]<\/span><\/p>\n<p>Where, <em>x<\/em> and <em>y<\/em> are two vectors of length <em>n<\/em>.<\/p>\n<p>Other dissimilarity measures exist such as <strong>correlation-based distances<\/strong>, which is widely used for gene expression data analyses. Correlation-based distance is defined by subtracting the correlation coefficient from 1. Different types of correlation methods can be used such as:<\/p>\n<ol style=\"list-style-type: decimal;\">\n<li><strong>Pearson correlation distance<\/strong>:<\/li>\n<\/ol>\n<p><span class=\"math display\">\\[<br \/>\nd_{cor}(x, y) = 1 - \\frac{\\sum\\limits_{i=1}^n (x_i - \\bar{x})(y_i - \\bar{y})}{\\sqrt{\\sum\\limits_{i=1}^n(x_i - \\bar{x})^2 \\sum\\limits_{i=1}^n(y_i -\\bar{y})^2}}<br \/>\n\\]<\/span><\/p>\n<div class=\"notice\">\n<p>Pearson correlation measures the degree of a linear relationship between two profiles.<\/p>\n<\/div>\n<ol style=\"list-style-type: decimal;\" start=\"2\">\n<li><strong>Eisen cosine correlation distance<\/strong> (Eisen et al., 1998):<\/li>\n<\/ol>\n<p>It\u2019s a special case of Pearson\u2019s correlation with <span class=\"math inline\">\\(\\bar{x}\\)<\/span> and <span class=\"math inline\">\\(\\bar{y}\\)<\/span> both replaced by zero:<\/p>\n<p><span class=\"math display\">\\[<br \/>\nd_{eisen}(x, y) = 1 - \\frac{\\left|\\sum\\limits_{i=1}^n x_iy_i\\right|}{\\sqrt{\\sum\\limits_{i=1}^n x^2_i \\sum\\limits_{i=1}^n y^2_i}}<br \/>\n\\]<\/span><\/p>\n<ol style=\"list-style-type: decimal;\" start=\"3\">\n<li><strong>Spearman correlation distance<\/strong>:<\/li>\n<\/ol>\n<p>The spearman correlation method computes the correlation between the rank of x and the rank of y variables.<\/p>\n<p><span class=\"math display\">\\[<br \/>\nd_{spear}(x, y) = 1 - \\frac{\\sum\\limits_{i=1}^n (x'_i - \\bar{x'})(y'_i - \\bar{y'})}{\\sqrt{\\sum\\limits_{i=1}^n(x'_i - \\bar{x'})^2 \\sum\\limits_{i=1}^n(y'_i -\\bar{y'})^2}}<br \/>\n\\]<\/span><\/p>\n<p>Where <span class=\"math inline\">\\(x'_i = rank(x_i)\\)<\/span> and <span class=\"math inline\">\\(y'_i = rank(y)\\)<\/span>.<\/p>\n<ol style=\"list-style-type: decimal;\" start=\"4\">\n<li><strong>Kendall correlation distance<\/strong>:<\/li>\n<\/ol>\n<p>Kendall correlation method measures the correspondence between the ranking of x and y variables. The total number of possible pairings of x with y observations is <span class=\"math inline\">\\(n(n-1)\/2\\)<\/span>, where n is the size of x and y. Begin by ordering the pairs by the x values. If x and y are correlated, then they would have the same relative rank orders. Now, for each <span class=\"math inline\">\\(y_i\\)<\/span>, count the number of <span class=\"math inline\">\\(y_j &gt; y_i\\)<\/span> (concordant pairs (c)) and the number of <span class=\"math inline\">\\(y_j &lt; y_i\\)<\/span> (discordant pairs (d)).<\/p>\n<p>Kendall correlation distance is defined as follow:<\/p>\n<p><span class=\"math display\">\\[<br \/>\nd_{kend}(x, y) = 1 - \\frac{n_c - n_d}{\\frac{1}{2}n(n-1)}<br \/>\n\\]<\/span><\/p>\n<p>Where,<\/p>\n<ul>\n<li><span class=\"math inline\">\\(n_c\\)<\/span>: total number of concordant pairs<\/li>\n<li><span class=\"math inline\">\\(n_d\\)<\/span>: total number of discordant pairs<\/li>\n<li><span class=\"math inline\">\\(n\\)<\/span>: size of x and y<\/li>\n<\/ul>\n<div class=\"notice\">\n<p>Note that,<\/p>\n<ul>\n<li>Pearson correlation analysis is the most commonly used method. It is also known as a parametric correlation which depends on the distribution of the data.<\/li>\n<li>Kendall and Spearman correlations are non-parametric and they are used to perform rank-based correlation analysis.<\/li>\n<\/ul>\n<\/div>\n<div class=\"success\">\n<p>In the formula above, <span class=\"math inline\"><em>x<\/em><\/span> and <span class=\"math inline\"><em>y<\/em><\/span> are two vectors of length <span class=\"math inline\"><em>n<\/em><\/span> and, means <span class=\"math inline\"><span class=\"math inline\">\\(\\bar{x}\\)<\/span><\/span> and <span class=\"math inline\"><span class=\"math inline\">\\(\\bar{y}\\)<\/span><\/span>, respectively. The distance between x and y is denoted <span class=\"math inline\"><em>d<\/em>(<em>x<\/em>,\u2006<em>y<\/em>)<\/span>.<\/p>\n<\/div>\n<\/div>\n<div id=\"what-type-of-distance-measures-should-we-choose\" class=\"section level2\">\n<h2>What type of distance measures should we choose?<\/h2>\n<p>The choice of distance measures is very important, as it has a strong influence on the clustering results. For most common clustering software, the default distance measure is the Euclidean distance.<\/p>\n<p>Depending on the type of the data and the researcher questions, other dissimilarity measures might be preferred. For example, correlation-based distance is often used in gene expression data analysis.<\/p>\n<p>Correlation-based distance considers two objects to be similar if their features are highly correlated, even though the observed values may be far apart in terms of Euclidean distance. The distance between two objects is 0 when they are perfectly correlated. Pearson\u2019s correlation is quite sensitive to outliers. This does not matter when clustering samples, because the correlation is over thousands of genes. When clustering genes, it is important to be aware of the possible impact of outliers. This can be mitigated by using Spearman\u2019s correlation instead of Pearson\u2019s correlation.<\/p>\n<p>If we want to identify clusters of observations with the same overall profiles regardless of their magnitudes, then we should go with <em>correlation-based distance<\/em> as a dissimilarity measure. This is particularly the case in gene expression data analysis, where we might want to consider genes similar when they are \u201cup\u201d and \u201cdown\u201d together. It is also the case, in marketing if we want to identify group of shoppers with the same preference in term of items, regardless of the volume of items they bought.<\/p>\n<p>If Euclidean distance is chosen, then observations with high values of features will be clustered together. The same holds true for observations with low values of features.<\/p>\n<\/div>\n<div id=\"data-standardization\" class=\"section level2\">\n<h2>Data standardization<\/h2>\n<p>The value of distance measures is intimately related to the scale on which measurements are made. Therefore, variables are often scaled (i.e.\u00a0standardized) before measuring the inter-observation dissimilarities. This is particularly recommended when variables are measured in different scales (e.g: kilograms, kilometers, centimeters, \u2026); otherwise, the dissimilarity measures obtained will be severely affected.<\/p>\n<p>The goal is to make the variables comparable. Generally variables are scaled to have i) standard deviation one and ii) mean zero.<\/p>\n<p>The standardization of data is an approach widely used in the context of gene expression data analysis before clustering. We might also want to scale the data when the mean and\/or the standard deviation of variables are largely different.<\/p>\n<p>When scaling variables, the data can be transformed as follow:<\/p>\n<p><span class=\"math display\">\\[<br \/>\n\\frac{x_i - center(x)}{scale(x)}<br \/>\n\\]<\/span><\/p>\n<p>Where <span class=\"math inline\">\\(center(x)\\)<\/span> can be the mean or the median of x values, and <span class=\"math inline\">\\(scale(x)\\)<\/span> can be the standard deviation (SD), the interquartile range, or the MAD (median absolute deviation).<\/p>\n<p>The R base function <em>scale<\/em>() can be used to standardize the data. It takes a numeric matrix as an input and performs the scaling on the columns.<\/p>\n<div class=\"block\">\n<p>Standardization makes the four distance measure methods - Euclidean, Manhattan, Correlation and Eisen - more similar than they would be with non-transformed data.<\/p>\n<p>Note that, when the data are standardized, there is a functional relationship between the Pearson correlation coefficient <span class=\"math inline\"><em>r<\/em>(<em>x<\/em>,\u2006<em>y<\/em>)<\/span> and the Euclidean distance.<\/p>\n<p>With some maths, the relationship can be defined as follow:<\/p>\n<p><span class=\"math display\"><span class=\"math display\">\\[<br \/>\nd_{euc}(x, y) = \\sqrt{2m[1 - r(x, y)]}<br \/>\n\\]<\/span><\/span><\/p>\n<p>Where x and y are two standardized m-vectors with zero mean and unit length.<\/p>\n<p>Therefore, the result obtained with Pearson correlation measures and standardized Euclidean distances are comparable.<\/p>\n<\/div>\n<\/div>\n<div id=\"distance-matrix-computation\" class=\"section level2\">\n<h2>Distance matrix computation<\/h2>\n<div id=\"data-preparation\" class=\"section level3\">\n<h3>Data preparation<\/h3>\n<p>We\u2019ll use the USArrests data as demo data sets. We\u2019ll use only a subset of the data by taking 15 random rows among the 50 rows in the data set. This is done by using the function <em>sample<\/em>(). Next, we standardize the data using the function <em>scale<\/em>():<\/p>\n<pre class=\"r\"><code># Subset of the data\r\nset.seed(123)\r\nss &lt;- sample(1:50, 15)   # Take 15 random rows\r\ndf &lt;- USArrests[ss, ]    # Subset the 15 rows\r\ndf.scaled &lt;- scale(df)   # Standardize the variables<\/code><\/pre>\n<\/div>\n<div id=\"r-functions-and-packages\" class=\"section level3\">\n<h3>R functions and packages<\/h3>\n<p>There are many R functions for computing distances between pairs of observations:<\/p>\n<ol style=\"list-style-type: decimal;\">\n<li><em>dist<\/em>() R base function [<em>stats<\/em> package]: Accepts only numeric data as an input.<\/li>\n<li><em>get_dist<\/em>() function [<em>factoextra<\/em> package]: Accepts only numeric data as an input. Compared to the standard dist() function, it supports correlation-based distance measures including \u201cpearson\u201d, \u201ckendall\u201d and \u201cspearman\u201d methods.<\/li>\n<li><em>daisy()<\/em> function [<em>cluster<\/em> package]: Able to handle other variable types (e.g.\u00a0nominal, ordinal, (a)symmetric binary). In that case, the Gower\u2019s coefficient will be automatically used as the metric. It\u2019s one of the most popular measures of proximity for mixed data types. For more details, read the R documentation of the <em>daisy<\/em>() function (<em>?daisy<\/em>).<\/li>\n<\/ol>\n<div class=\"success\">\n<p>All these functions compute distances between rows of the data.<\/p>\n<\/div>\n<\/div>\n<div id=\"computing-euclidean-distance\" class=\"section level3\">\n<h3>Computing euclidean distance<\/h3>\n<p>To compute Euclidean distance, you can use the R base <em>dist<\/em>() function, as follow:<\/p>\n<pre class=\"r\"><code>dist.eucl &lt;- dist(df.scaled, method = \"euclidean\")<\/code><\/pre>\n<p>Note that, allowed values for the option method include one of: \u201ceuclidean\u201d, \u201cmaximum\u201d, \u201cmanhattan\u201d, \u201ccanberra\u201d, \u201cbinary\u201d, \u201cminkowski\u201d.<\/p>\n<p>To make it easier to see the distance information generated by the <em>dist<\/em>() function, you can reformat the distance vector into a matrix using the <em>as.matrix<\/em>() function.<\/p>\n<pre class=\"r\"><code># Reformat as a matrix\r\n# Subset the first 3 columns and rows and Round the values\r\nround(as.matrix(dist.eucl)[1:3, 1:3], 1)<\/code><\/pre>\n<pre><code>##              Iowa Rhode Island Maryland\r\n## Iowa          0.0          2.8      4.1\r\n## Rhode Island  2.8          0.0      3.6\r\n## Maryland      4.1          3.6      0.0<\/code><\/pre>\n<p>In this matrix, the value represent the distance between objects. The values on the diagonal of the matrix represent the distance between objects and themselves (which are zero).<\/p>\n<div class=\"warning\">\n<p>In this data set, the columns are variables. Hence, if we want to compute pairwise distances between variables, we must start by transposing the data to have variables in the rows of the data set before using the <em>dist<\/em>() function. The function <em>t<\/em>() is used for transposing the data.<\/p>\n<\/div>\n<\/div>\n<div id=\"computing-correlation-based-distances\" class=\"section level3\">\n<h3>Computing correlation based distances<\/h3>\n<p>Correlation-based distances are commonly used in gene expression data analysis.<\/p>\n<p>The function <em>get_dist<\/em>()[<em>factoextra<\/em> package] can be used to compute correlation-based distances. Correlation method can be either <em>pearson<\/em>, <em>spearman<\/em> or <em>kendall<\/em>.<\/p>\n<pre class=\"r\"><code># Compute\r\nlibrary(\"factoextra\")\r\ndist.cor &lt;- get_dist(df.scaled, method = \"pearson\")\r\n\r\n# Display a subset\r\nround(as.matrix(dist.cor)[1:3, 1:3], 1)<\/code><\/pre>\n<pre><code>##              Iowa Rhode Island Maryland\r\n## Iowa          0.0          0.4      1.9\r\n## Rhode Island  0.4          0.0      1.5\r\n## Maryland      1.9          1.5      0.0<\/code><\/pre>\n<\/div>\n<div id=\"computing-distances-for-mixed-data\" class=\"section level3\">\n<h3>Computing distances for mixed data<\/h3>\n<p>The function <em>daisy<\/em>() [<em>cluster<\/em> package] provides a solution (<em>Gower\u2019s metric<\/em>) for computing the distance matrix, in the situation where the data contain no-numeric columns.<\/p>\n<p>The R code below applies the <em>daisy<\/em>() function on <em>flower<\/em> data which contains <em>factor<\/em>, <em>ordered<\/em> and <em>numeric<\/em> variables:<\/p>\n<pre class=\"r\"><code>library(cluster)\r\n# Load data\r\ndata(flower)\r\nhead(flower, 3)<\/code><\/pre>\n<pre><code>##   V1 V2 V3 V4 V5 V6  V7 V8\r\n## 1  0  1  1  4  3 15  25 15\r\n## 2  1  0  0  2  1  3 150 50\r\n## 3  0  1  0  3  3  1 150 50<\/code><\/pre>\n<pre class=\"r\"><code># Data structure\r\nstr(flower)<\/code><\/pre>\n<pre><code>## 'data.frame':    18 obs. of  8 variables:\r\n##  $ V1: Factor w\/ 2 levels \"0\",\"1\": 1 2 1 1 1 1 1 1 2 2 ...\r\n##  $ V2: Factor w\/ 2 levels \"0\",\"1\": 2 1 2 1 2 2 1 1 2 2 ...\r\n##  $ V3: Factor w\/ 2 levels \"0\",\"1\": 2 1 1 2 1 1 1 2 1 1 ...\r\n##  $ V4: Factor w\/ 5 levels \"1\",\"2\",\"3\",\"4\",..: 4 2 3 4 5 4 4 2 3 5 ...\r\n##  $ V5: Ord.factor w\/ 3 levels \"1\"&lt;\"2\"&lt;\"3\": 3 1 3 2 2 3 3 2 1 2 ...\r\n##  $ V6: Ord.factor w\/ 18 levels \"1\"&lt;\"2\"&lt;\"3\"&lt;\"4\"&lt;..: 15 3 1 16 2 12 13 7 4 14 ...\r\n##  $ V7: num  25 150 150 125 20 50 40 100 25 100 ...\r\n##  $ V8: num  15 50 50 50 15 40 20 15 15 60 ...<\/code><\/pre>\n<pre class=\"r\"><code># Distance matrix\r\ndd &lt;- daisy(flower)\r\nround(as.matrix(dd)[1:3, 1:3], 2)<\/code><\/pre>\n<pre><code>##      1    2    3\r\n## 1 0.00 0.89 0.53\r\n## 2 0.89 0.00 0.51\r\n## 3 0.53 0.51 0.00<\/code><\/pre>\n<\/div>\n<\/div>\n<div id=\"visualizing-distance-matrices\" class=\"section level2\">\n<h2>Visualizing distance matrices<\/h2>\n<p>A simple solution for visualizing the distance matrices is to use the function <em>fviz_dist<\/em>() [<em>factoextra<\/em> package]. Other specialized methods, such as agglomerative hierarchical clustering or heatmap will be comprehensively described in the dedicated courses.<\/p>\n<p>To use <em>fviz_dist<\/em>() type this:<\/p>\n<pre class=\"r\"><code>library(factoextra)\r\nfviz_dist(dist.eucl)<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/001-data-clustering-basics\/figures\/005-clustering-distance-measures-visualize-distance-measures-1.png\" width=\"432\" \/><\/p>\n<ul>\n<li><strong>Red<\/strong>: high similarity (ie: low dissimilarity) | <strong>Blue<\/strong>: low similarity<\/li>\n<\/ul>\n<p>The color level is proportional to the value of the dissimilarity between observations: pure red if <span class=\"math inline\">\\(dist(x_i, x_j) = 0\\)<\/span> and pure blue corresponds to the highest value of euclidean distance computed. Objects belonging to the same cluster are displayed in consecutive order.<\/p>\n<\/div>\n<div id=\"summary\" class=\"section level2\">\n<h2>Summary<\/h2>\n<p>We described how to compute distance matrices using either Euclidean or correlation-based measures. It\u2019s generally recommended to standardize the variables before distance matrix computation. Standardization makes variable comparable, in the situation where they are measured in different scales.<\/p>\n<\/div>\n<\/div>\n<p><!--end rdoc--><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this article, we describe the common distance measures used to compute distance matrix for cluster analysis.  We also provide R codes for computing and visualizing distances. <\/p>\n","protected":false},"author":1,"featured_media":8019,"parent":0,"menu_order":0,"comment_status":"open","ping_status":"closed","template":"","class_list":["post-7645","dt_lessons","type-dt_lessons","status-publish","has-post-thumbnail","hentry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Clustering Distance Measures - Datanovia<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.datanovia.com\/en\/lessons\/clustering-distance-measures\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Clustering Distance Measures - Datanovia\" \/>\n<meta property=\"og:description\" content=\"In this article, we describe the common distance measures used to compute distance matrix for cluster analysis. We also provide R codes for computing and visualizing distances.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.datanovia.com\/en\/lessons\/clustering-distance-measures\/\" \/>\n<meta property=\"og:site_name\" content=\"Datanovia\" \/>\n<meta property=\"article:modified_time\" content=\"2018-10-20T12:42:48+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/bord-de-mer-1.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"358\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"10 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/clustering-distance-measures\/\",\"url\":\"https:\/\/www.datanovia.com\/en\/lessons\/clustering-distance-measures\/\",\"name\":\"Clustering Distance Measures - Datanovia\",\"isPartOf\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/clustering-distance-measures\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/clustering-distance-measures\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/bord-de-mer-1.jpg\",\"datePublished\":\"2018-10-14T15:03:28+00:00\",\"dateModified\":\"2018-10-20T12:42:48+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/clustering-distance-measures\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.datanovia.com\/en\/lessons\/clustering-distance-measures\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/clustering-distance-measures\/#primaryimage\",\"url\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/bord-de-mer-1.jpg\",\"contentUrl\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/bord-de-mer-1.jpg\",\"width\":1024,\"height\":358},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/clustering-distance-measures\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.datanovia.com\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Lessons\",\"item\":\"https:\/\/www.datanovia.com\/en\/lessons\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Clustering Distance Measures\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.datanovia.com\/en\/#website\",\"url\":\"https:\/\/www.datanovia.com\/en\/\",\"name\":\"Datanovia\",\"description\":\"Data Mining and Statistics for Decision Support\",\"publisher\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.datanovia.com\/en\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.datanovia.com\/en\/#organization\",\"name\":\"Datanovia\",\"url\":\"https:\/\/www.datanovia.com\/en\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png\",\"contentUrl\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png\",\"width\":98,\"height\":99,\"caption\":\"Datanovia\"},\"image\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Clustering Distance Measures - Datanovia","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.datanovia.com\/en\/lessons\/clustering-distance-measures\/","og_locale":"en_US","og_type":"article","og_title":"Clustering Distance Measures - Datanovia","og_description":"In this article, we describe the common distance measures used to compute distance matrix for cluster analysis. We also provide R codes for computing and visualizing distances.","og_url":"https:\/\/www.datanovia.com\/en\/lessons\/clustering-distance-measures\/","og_site_name":"Datanovia","article_modified_time":"2018-10-20T12:42:48+00:00","og_image":[{"width":1024,"height":358,"url":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/bord-de-mer-1.jpg","type":"image\/jpeg"}],"twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.datanovia.com\/en\/lessons\/clustering-distance-measures\/","url":"https:\/\/www.datanovia.com\/en\/lessons\/clustering-distance-measures\/","name":"Clustering Distance Measures - Datanovia","isPartOf":{"@id":"https:\/\/www.datanovia.com\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.datanovia.com\/en\/lessons\/clustering-distance-measures\/#primaryimage"},"image":{"@id":"https:\/\/www.datanovia.com\/en\/lessons\/clustering-distance-measures\/#primaryimage"},"thumbnailUrl":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/bord-de-mer-1.jpg","datePublished":"2018-10-14T15:03:28+00:00","dateModified":"2018-10-20T12:42:48+00:00","breadcrumb":{"@id":"https:\/\/www.datanovia.com\/en\/lessons\/clustering-distance-measures\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.datanovia.com\/en\/lessons\/clustering-distance-measures\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.datanovia.com\/en\/lessons\/clustering-distance-measures\/#primaryimage","url":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/bord-de-mer-1.jpg","contentUrl":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/bord-de-mer-1.jpg","width":1024,"height":358},{"@type":"BreadcrumbList","@id":"https:\/\/www.datanovia.com\/en\/lessons\/clustering-distance-measures\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.datanovia.com\/en\/"},{"@type":"ListItem","position":2,"name":"Lessons","item":"https:\/\/www.datanovia.com\/en\/lessons\/"},{"@type":"ListItem","position":3,"name":"Clustering Distance Measures"}]},{"@type":"WebSite","@id":"https:\/\/www.datanovia.com\/en\/#website","url":"https:\/\/www.datanovia.com\/en\/","name":"Datanovia","description":"Data Mining and Statistics for Decision Support","publisher":{"@id":"https:\/\/www.datanovia.com\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.datanovia.com\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.datanovia.com\/en\/#organization","name":"Datanovia","url":"https:\/\/www.datanovia.com\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/","url":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png","contentUrl":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png","width":98,"height":99,"caption":"Datanovia"},"image":{"@id":"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/"}}]}},"multi-rating":{"mr_rating_results":[]},"_links":{"self":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/dt_lessons\/7645","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/dt_lessons"}],"about":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/types\/dt_lessons"}],"author":[{"embeddable":true,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/comments?post=7645"}],"version-history":[{"count":1,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/dt_lessons\/7645\/revisions"}],"predecessor-version":[{"id":7647,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/dt_lessons\/7645\/revisions\/7647"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/media\/8019"}],"wp:attachment":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/media?parent=7645"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}