{"id":8131,"date":"2018-11-04T09:54:34","date_gmt":"2018-11-04T07:54:34","guid":{"rendered":"https:\/\/www.datanovia.com\/en\/?p=8131"},"modified":"2019-12-25T11:27:02","modified_gmt":"2019-12-25T09:27:02","slug":"types-of-clustering-methods-overview-and-quick-start-r-code","status":"publish","type":"post","link":"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/","title":{"rendered":"Types of Clustering Methods: Overview and Quick Start R Code"},"content":{"rendered":"<p>&nbsp;<\/p>\n<div id=\"rdoc\">\n<p><strong>Clustering methods<\/strong> are used to identify groups of similar objects in a multivariate data sets collected from fields such as marketing, bio-medical and geo-spatial. They are different <strong>types of clustering<\/strong> methods, including:<\/p>\n<ul>\n<li>Partitioning methods<\/li>\n<li>Hierarchical clustering<\/li>\n<li>Fuzzy clustering<\/li>\n<li>Density-based clustering<\/li>\n<li>Model-based clustering<\/li>\n<\/ul>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/cluster-analysis\/images\/types-of-clustering-methods.png\" alt=\"Types of clustering methods\" \/><\/p>\n<div class=\"block\">\n<p>In this article, we provide an overview of clustering methods and quick start R code to perform cluster analysis in R:<\/p>\n<ul>\n<li>we start by presenting required R packages and data format for cluster analysis and visualization.<\/li>\n<li>next, we describe the two standard <em>clustering techniques<\/em> [partitioning methods (k-MEANS, PAM, CLARA) and hierarchical clustering] as well as how to assess the quality of clustering analysis.<\/li>\n<li>finally, we describe advanced clustering approaches to find pattern of any shape in large data sets with noise and outliers.<\/li>\n<\/ul>\n<\/div>\n<p>Contents:<\/p>\n<p>Contents:<\/p>\n<div id=\"TOC\">\n<ul>\n<li><a href=\"#installing-and-loading-required-r-packages\">Installing and loading required R packages<\/a><\/li>\n<li><a href=\"#data-preparation\">Data preparation<\/a><\/li>\n<li><a href=\"#distance-measures\">Distance measures<\/a><\/li>\n<li><a href=\"#partitioning-clustering\">Partitioning clustering<\/a><\/li>\n<li><a href=\"#hierarchical-clustering\">Hierarchical clustering<\/a><\/li>\n<li><a href=\"#clustering-validation-and-evaluation\">Clustering validation and evaluation<\/a>\n<ul>\n<li><a href=\"#assessing-clustering-tendency\">Assessing clustering tendency<\/a><\/li>\n<li><a href=\"#determining-the-optimal-number-of-clusters\">Determining the optimal number of clusters<\/a><\/li>\n<li><a href=\"#clustering-validation-statistics\">Clustering validation statistics<\/a><\/li>\n<li><a href=\"#see-also\">See also:<\/a><\/li>\n<\/ul>\n<\/li>\n<li><a href=\"#advanced-clustering-methods\">Advanced clustering methods<\/a>\n<ul>\n<li><a href=\"#hybrid-clustering-methods\">Hybrid clustering methods<\/a><\/li>\n<li><a href=\"#fuzzy-clustering\">Fuzzy clustering<\/a><\/li>\n<li><a href=\"#model-based-clustering\">Model-based clustering<\/a><\/li>\n<li><a href=\"#dbscan-density-based-clustering\">DBSCAN: Density-based clustering<\/a><\/li>\n<\/ul>\n<\/li>\n<li><a href=\"#references\">References<\/a><\/li>\n<\/ul>\n<\/div>\n<div class='dt-sc-hr-invisible-medium  '><\/div>\n<div class='dt-sc-ico-content type1'><div class='custom-icon' ><a href='https:\/\/www.datanovia.com\/en\/product\/practical-guide-to-cluster-analysis-in-r\/' target='_blank'><span class='fa fa-book'><\/span><\/a><\/div><h4><a href='https:\/\/www.datanovia.com\/en\/product\/practical-guide-to-cluster-analysis-in-r\/' target='_blank'> Related Book <\/a><\/h4>Practical Guide to Cluster Analysis in R<\/div>\n<div class='dt-sc-hr-invisible-medium  '><\/div>\n<div id=\"installing-and-loading-required-r-packages\" class=\"section level2\">\n<h2>Installing and loading required R packages<\/h2>\n<p>We\u2019ll use mainly two R packages:<\/p>\n<ul>\n<li>cluster package: for computing clustering<\/li>\n<li>factoextra package : for elegant ggplot2-based data visualization. Online documentation at: <a class=\"uri\" href=\"https:\/\/rpkgs.datanovia.com\/factoextra\/\">https:\/\/rpkgs.datanovia.com\/factoextra\/<\/a><\/li>\n<\/ul>\n<p>Accessory packages:<\/p>\n<ul>\n<li>magrittr for piping: %&gt;%<\/li>\n<\/ul>\n<p>Install:<\/p>\n<pre class=\"r\"><code>install.packages(\"factoextra\")\r\ninstall.packages(\"cluster\")\r\ninstall.packages(\"magrittr\")<\/code><\/pre>\n<p>Load packages:<\/p>\n<pre class=\"r\"><code>library(\"cluster\")\r\nlibrary(\"factoextra\")\r\nlibrary(\"magrittr\")<\/code><\/pre>\n<\/div>\n<div id=\"data-preparation\" class=\"section level2\">\n<h2>Data preparation<\/h2>\n<ul>\n<li>Demo data set: the built-in R dataset named USArrest<\/li>\n<li>Remove missing data<\/li>\n<li>Scale variables to make them comparable<\/li>\n<\/ul>\n<p>Read more: <a href=\"\/?p=7644\">Data Preparation and Essential R Packages for Cluster Analysis<\/a><\/p>\n<pre class=\"r\"><code># Load  and prepare the data\r\ndata(\"USArrests\")\r\n\r\nmy_data &lt;- USArrests %&gt;%\r\n  na.omit() %&gt;%          # Remove missing values (NA)\r\n  scale()                # Scale variables\r\n\r\n# View the firt 3 rows\r\nhead(my_data, n = 3)<\/code><\/pre>\n<pre><code>##         Murder Assault UrbanPop     Rape\r\n## Alabama 1.2426   0.783   -0.521 -0.00342\r\n## Alaska  0.5079   1.107   -1.212  2.48420\r\n## Arizona 0.0716   1.479    0.999  1.04288<\/code><\/pre>\n<\/div>\n<div id=\"distance-measures\" class=\"section level2\">\n<h2>Distance measures<\/h2>\n<p>The classification of objects, into clusters, requires some methods for measuring the distance or the (dis)similarity between the objects. Chapter <a href=\"\/?p=7645\">Clustering Distance Measures Essentials<\/a> covers the common distance measures used for assessing similarity between observations.<\/p>\n<p>It\u2019s simple to compute and visualize distance matrix using the functions <a href=\"https:\/\/rpkgs.datanovia.com\/factoextra\/reference\/dist.html\">get_dist() and fviz_dist()<\/a> [factoextra R package]:<\/p>\n<ul>\n<li><code>get_dist()<\/code>: for computing a distance matrix between the rows of a data matrix. Compared to the standard <code>dist()<\/code> function, it supports correlation-based distance measures including \u201cpearson\u201d, \u201ckendall\u201d and \u201cspearman\u201d methods.<\/li>\n<li><code>fviz_dist()<\/code>: for visualizing a distance matrix<\/li>\n<\/ul>\n<pre class=\"r\"><code>res.dist &lt;- get_dist(USArrests, stand = TRUE, method = \"pearson\")\r\n\r\nfviz_dist(res.dist, \r\n   gradient = list(low = \"#00AFBB\", mid = \"white\", high = \"#FC4E07\"))<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/cluster-analysis\/figures\/060-types-of-clustering-methods-distance-matrix-1.png\" width=\"518.4\" \/><\/p>\n<p>Read more: <a href=\"\/?p=7645\">Clustering Distance Measures Essentials<\/a><\/p>\n<\/div>\n<div id=\"partitioning-clustering\" class=\"section level2\">\n<h2>Partitioning clustering<\/h2>\n<p>Partitioning algorithms are clustering techniques that subdivide the data sets into a set of k groups, where k is the number of groups pre-specified by the analyst.<\/p>\n<p>There are different types of partitioning clustering methods. The most popular is the <a href=\"\/?p=7674\">K-means clustering<\/a> <span class=\"citation\">(MacQueen 1967)<\/span>, in which, each cluster is represented by the center or means of the data points belonging to the cluster. The K-means method is sensitive to outliers.<\/p>\n<p>An alternative to k-means clustering is the <a href=\"\/?p=7676\">K-medoids clustering<\/a> or PAM (Partitioning Around Medoids, Kaufman &amp; Rousseeuw, 1990), which is less sensitive to outliers compared to k-means.<\/p>\n<p>Read more: <a href=\"\/?p=7673\">Partitioning Clustering methods<\/a>.<\/p>\n<p>The following R codes show how to determine the optimal number of clusters and how to compute k-means and PAM clustering in R.<\/p>\n<ol style=\"list-style-type: decimal;\">\n<li><a href=\"\/?p=8062\">Determining the optimal number of clusters<\/a>: use <code>factoextra::fviz_nbclust()<\/code><\/li>\n<\/ol>\n<pre class=\"r\"><code>library(\"factoextra\")\r\nfviz_nbclust(my_data, kmeans, method = \"gap_stat\")<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/cluster-analysis\/figures\/060-types-of-clustering-methods-optimal-number-of-clusters-1.png\" width=\"384\" \/><\/p>\n<p>Suggested number of cluster: 3<\/p>\n<p><a href=\"\/?p=7676\">Compute and visualize k-means clustering<\/a><\/p>\n<pre class=\"r\"><code>set.seed(123)\r\nkm.res &lt;- kmeans(my_data, 3, nstart = 25)\r\n# Visualize\r\nlibrary(\"factoextra\")\r\nfviz_cluster(km.res, data = my_data,\r\n             ellipse.type = \"convex\",\r\n             palette = \"jco\",\r\n             ggtheme = theme_minimal())<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/cluster-analysis\/figures\/060-types-of-clustering-methods-k-means-plot-ggplot2-factoextra-1.png\" width=\"480\" \/><\/p>\n<p>Similarly, the <a href=\"\/?p=7676\">k-medoids\/pam clustering<\/a> can be computed as follow:<\/p>\n<pre class=\"r\"><code># Compute PAM\r\nlibrary(\"cluster\")\r\npam.res &lt;- pam(my_data, 3)\r\n# Visualize\r\nfviz_cluster(pam.res)<\/code><\/pre>\n<\/div>\n<div id=\"hierarchical-clustering\" class=\"section level2\">\n<h2>Hierarchical clustering<\/h2>\n<p>Hierarchical clustering is an alternative approach to partitioning clustering for identifying groups in the dataset. It does not require to pre-specify the number of clusters to be generated.<\/p>\n<p>The result of hierarchical clustering is a tree-based representation of the objects, which is also known as dendrogram. Observations can be subdivided into groups by cutting the dendrogram at a desired similarity level.<\/p>\n<p>R code to compute and visualize hierarchical clustering:<\/p>\n<pre class=\"r\"><code># Compute hierarchical clustering\r\nres.hc &lt;- USArrests %&gt;%\r\n  scale() %&gt;%                    # Scale the data\r\n  dist(method = \"euclidean\") %&gt;% # Compute dissimilarity matrix\r\n  hclust(method = \"ward.D2\")     # Compute hierachical clustering\r\n\r\n# Visualize using factoextra\r\n# Cut in 4 groups and color by groups\r\nfviz_dend(res.hc, k = 4, # Cut in four groups\r\n          cex = 0.5, # label size\r\n          k_colors = c(\"#2E9FDF\", \"#00AFBB\", \"#E7B800\", \"#FC4E07\"),\r\n          color_labels_by_k = TRUE, # color labels by groups\r\n          rect = TRUE # Add rectangle around groups\r\n          )<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/cluster-analysis\/figures\/060-types-of-clustering-methods-hierarchical-clustering-r-1.png\" width=\"518.4\" \/><\/p>\n<p>Read more: <a href=\"\/?p=7685\">Hierarchical clustering<\/a><\/p>\n<p>See also:<\/p>\n<ul>\n<li><a href=\"\/?p=7689\">Divisive Clustering<\/a><\/li>\n<li><a href=\"\/?p=7690\">Compare Dendrograms<\/a><\/li>\n<li><a href=\"\/?p=7691\">Visualize Dendrograms<\/a><\/li>\n<li><a href=\"\/?p=7693\">Heatmap: Static and Interactive<\/a><\/li>\n<\/ul>\n<\/div>\n<div id=\"clustering-validation-and-evaluation\" class=\"section level2\">\n<h2>Clustering validation and evaluation<\/h2>\n<p>Clustering validation and evaluation strategies, consist of measuring the goodness of clustering results. Before applying any clustering algorithm to a data set, the first thing to do is to assess the <em>clustering tendency<\/em>. That is, whether the data contains any inherent grouping structure.<\/p>\n<p>If yes, then how many clusters are there. Next, you can perform hierarchical clustering or partitioning clustering (with a pre-specified number of clusters). Finally, you can use a number of measures, described in this chapter, to evaluate the goodness of the clustering results.<\/p>\n<p>Read more: <a href=\"\/?p=8058\">Cluster Validation Essentials<\/a><\/p>\n<div id=\"assessing-clustering-tendency\" class=\"section level3\">\n<h3>Assessing clustering tendency<\/h3>\n<p>To assess the clustering tendency, the Hopkins\u2019 statistic and a visual approach can be used. This can be performed using the function <code>get_clust_tendency()<\/code> [factoextra package], which creates an ordered dissimilarity image (ODI).<\/p>\n<ul>\n<li><em>Hopkins statistic<\/em>: If the value of Hopkins statistic is close to 1 (far above 0.5), then we can conclude that the dataset is significantly clusterable.<\/li>\n<li><em>Visual approach<\/em>: The visual approach detects the clustering tendency by counting the number of square shaped dark (or colored) blocks along the diagonal in the ordered dissimilarity image.<\/li>\n<\/ul>\n<p>R code:<\/p>\n<pre class=\"r\"><code>gradient.color &lt;- list(low = \"steelblue\",  high = \"white\")\r\n\r\niris[, -5] %&gt;%    # Remove column 5 (Species)\r\n  scale() %&gt;%     # Scale variables\r\n  get_clust_tendency(n = 50, gradient = gradient.color)<\/code><\/pre>\n<pre><code>## $hopkins_stat\r\n## [1] 0.8\r\n## \r\n## $plot<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/cluster-analysis\/figures\/060-types-of-clustering-methods-clustering-tendency-1.png\" width=\"432\" \/><\/p>\n<p>Read more: <a href=\"\/?p=8060\">Assessing Clustering Tendency<\/a><\/p>\n<\/div>\n<div id=\"determining-the-optimal-number-of-clusters\" class=\"section level3\">\n<h3>Determining the optimal number of clusters<\/h3>\n<p>There are different methods for <a href=\"\/?p=8062\">determining the optimal number of clusters<\/a>.<\/p>\n<p>In the R code below, we\u2019ll use the <code>NbClust<\/code> R package, which provides 30 indices for determining the best number of clusters. First, install it using <code>install.packages(\"NbClust\")<\/code>, then type this:<\/p>\n<pre class=\"r\"><code>set.seed(123)\r\n\r\n# Compute\r\nlibrary(\"NbClust\")\r\nres.nbclust &lt;- USArrests %&gt;%\r\n  scale() %&gt;%\r\n  NbClust(distance = \"euclidean\",\r\n          min.nc = 2, max.nc = 10, \r\n          method = \"complete\", index =\"all\") <\/code><\/pre>\n<pre class=\"r\"><code># Visualize\r\nlibrary(factoextra)\r\nfviz_nbclust(res.nbclust, ggtheme = theme_minimal())<\/code><\/pre>\n<pre><code>## Among all indices: \r\n## ===================\r\n## * 2 proposed  0 as the best number of clusters\r\n## * 1 proposed  1 as the best number of clusters\r\n## * 9 proposed  2 as the best number of clusters\r\n## * 4 proposed  3 as the best number of clusters\r\n## * 6 proposed  4 as the best number of clusters\r\n## * 2 proposed  5 as the best number of clusters\r\n## * 1 proposed  8 as the best number of clusters\r\n## * 1 proposed  10 as the best number of clusters\r\n## \r\n## Conclusion\r\n## =========================\r\n## * According to the majority rule, the best number of clusters is  2 .<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/cluster-analysis\/figures\/060-types-of-clustering-methods-determine-the-number-of-clusters-nbclust-1.png\" width=\"518.4\" \/><\/p>\n<p>Read more: <a href=\"\/?p=8062\">Determining the Optimal Number of Clusters<\/a><\/p>\n<\/div>\n<div id=\"clustering-validation-statistics\" class=\"section level3\">\n<h3>Clustering validation statistics<\/h3>\n<p>A variety of measures has been proposed in the literature for evaluating clustering results. The term clustering validation is used to design the procedure of evaluating the results of a clustering algorithm.<\/p>\n<p>The <em>silhouette plot<\/em> is one of the many measures for inspecting and validating clustering results. Recall that the silhouette (<span class=\"math inline\">\\(S_i\\)<\/span>) measures how similar an object <span class=\"math inline\">\\(i\\)<\/span> is to the the other objects in its own cluster versus those in the neighbor cluster. <span class=\"math inline\">\\(S_i\\)<\/span> values range from 1 to - 1:<\/p>\n<ul>\n<li>A value of <span class=\"math inline\">\\(S_i\\)<\/span> close to 1 indicates that the object is well clustered. In the other words, the object <span class=\"math inline\">\\(i\\)<\/span> is similar to the other objects in its group.<\/li>\n<li>A value of <span class=\"math inline\">\\(S_i\\)<\/span> close to -1 indicates that the object is poorly clustered, and that assignment to some other cluster would probably improve the overall results.<\/li>\n<\/ul>\n<p>In the following R code, we\u2019ll compute and evaluate the result of hierarchical clustering methods.<\/p>\n<ol style=\"list-style-type: decimal;\">\n<li>Compute and visualize hierarchical clustering:<\/li>\n<\/ol>\n<pre class=\"r\"><code>set.seed(123)\r\n# Enhanced hierarchical clustering, cut in 3 groups\r\nres.hc &lt;- iris[, -5] %&gt;%\r\n  scale() %&gt;%\r\n  eclust(\"hclust\", k = 3, graph = FALSE)\r\n\r\n# Visualize with factoextra\r\nfviz_dend(res.hc, palette = \"jco\",\r\n          rect = TRUE, show_labels = FALSE)<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/cluster-analysis\/figures\/060-types-of-clustering-methods-hierarchical-clustering-1.png\" width=\"518.4\" \/><\/p>\n<ol style=\"list-style-type: decimal;\" start=\"2\">\n<li>Inspect the silhouette plot:<\/li>\n<\/ol>\n<pre class=\"r\"><code>fviz_silhouette(res.hc)<\/code><\/pre>\n<pre><code>##   cluster size ave.sil.width\r\n## 1       1   49          0.63\r\n## 2       2   30          0.44\r\n## 3       3   71          0.32<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/cluster-analysis\/figures\/060-types-of-clustering-methods-silhouette-plot-1.png\" width=\"518.4\" \/><\/p>\n<ol style=\"list-style-type: decimal;\" start=\"3\">\n<li>Which samples have negative silhouette? To what cluster are they closer?<\/li>\n<\/ol>\n<pre class=\"r\"><code># Silhouette width of observations\r\nsil &lt;- res.hc$silinfo$widths[, 1:3]\r\n\r\n# Objects with negative silhouette\r\nneg_sil_index &lt;- which(sil[, 'sil_width'] &lt; 0)\r\nsil[neg_sil_index, , drop = FALSE]<\/code><\/pre>\n<pre><code>##     cluster neighbor sil_width\r\n## 84        3        2   -0.0127\r\n## 122       3        2   -0.0179\r\n## 62        3        2   -0.0476\r\n## 135       3        2   -0.0530\r\n## 73        3        2   -0.1009\r\n## 74        3        2   -0.1476\r\n## 114       3        2   -0.1611\r\n## 72        3        2   -0.2304<\/code><\/pre>\n<p>Read more: <a href=\"\/?p=8058\">Cluster Validation Statistics<\/a><\/p>\n<\/div>\n<div id=\"see-also\" class=\"section level3\">\n<h3>See also:<\/h3>\n<ul>\n<li><a href=\"\/?p=8066\">Choosing the Best Clustering Algorithms<\/a><\/li>\n<li><a href=\"\/?p=8067\">Computing p-value for Hierarchical Clustering<\/a><\/li>\n<\/ul>\n<\/div>\n<\/div>\n<div id=\"advanced-clustering-methods\" class=\"section level2\">\n<h2>Advanced clustering methods<\/h2>\n<div id=\"hybrid-clustering-methods\" class=\"section level3\">\n<h3>Hybrid clustering methods<\/h3>\n<ul>\n<li><a href=\"\/?p=8078\">Hierarchical K-means Clustering<\/a>: an hybrid approach for improving k-means results<\/li>\n<li><a href=\"http:\/\/www.sthda.com\/english\/articles\/22-principal-component-methods\/74-hcpc-hierarchical-clustering-on-principal-components\/\">HCPC: Hierarchical clustering on principal components<\/a><\/li>\n<\/ul>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/cluster-analysis\/figures\/060-types-of-clustering-methods-hcpc-1.png\" width=\"518.4\" \/><\/p>\n<\/div>\n<div id=\"fuzzy-clustering\" class=\"section level3\">\n<h3>Fuzzy clustering<\/h3>\n<p><em>Fuzzy clustering<\/em> is also known as soft method. Standard clustering approaches produce partitions (K-means, PAM), in which each observation belongs to only one cluster. This is known as hard clustering.<\/p>\n<p>In <em>Fuzzy clustering<\/em>, items can be a member of more than one cluster. Each item has a set of membership coefficients corresponding to the degree of being in a given cluster. The <em>Fuzzy c-means<\/em> method is the most popular fuzzy clustering algorithm.<\/p>\n<p>Read more: <a href=\"\/?p=8079\">Fuzzy Clustering<\/a>.<\/p>\n<\/div>\n<div id=\"model-based-clustering\" class=\"section level3\">\n<h3>Model-based clustering<\/h3>\n<p>In <em>model-based clustering<\/em>, the data are viewed as coming from a distribution that is mixture of two ore more clusters. It finds best fit of models to data and estimates the number of clusters.<\/p>\n<p>Read more: <a href=\"\/?p=8080\">Model-Based Clustering<\/a>.<\/p>\n<\/div>\n<div id=\"dbscan-density-based-clustering\" class=\"section level3\">\n<h3>DBSCAN: Density-based clustering<\/h3>\n<p>DBSCAN is a partitioning method that has been introduced in Ester et al. (1996). It can find out clusters of different shapes and sizes from data containing noise and outliers <span class=\"citation\">(Ester et al. 1996)<\/span>. The basic idea behind density-based clustering approach is derived from a human intuitive clustering method.<\/p>\n<p>The description and implementation of DBSCAN in R are provided at this link: <a href=\"\/?p=8081\">DBSCAN: Density-Based Clustering<\/a>.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/cluster-analysis\/images\/dbscan-idea.png\" alt=\"Density based clustering\" \/><\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/cluster-analysis\/figures\/060-types-of-clustering-methods-dbscan-1.png\" width=\"518.4\" \/><\/p>\n<\/div>\n<\/div>\n<div id=\"references\" class=\"section level2 unnumbered\">\n<h2>References<\/h2>\n<div id=\"refs\" class=\"references\">\n<div id=\"ref-ester1996\">\n<p>Ester, Martin, Hans-Peter Kriegel, J\u00f6rg Sander, and Xiaowei Xu. 1996. \u201cA Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise.\u201d In, 226\u201331. AAAI Press.<\/p>\n<\/div>\n<div id=\"ref-macqueen1967\">\n<p>MacQueen, J. 1967. \u201cSome Methods for Classification and Analysis of Multivariate Observations.\u201d In <em>Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics<\/em>, 281\u201397. Berkeley, Calif.: University of California Press. <a class=\"uri\" href=\"http:\/\/projecteuclid.org:443\/euclid.bsmsp\/1200512992\">http:\/\/projecteuclid.org:443\/euclid.bsmsp\/1200512992<\/a>.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p><!--end rdoc--><\/p>\n","protected":false},"excerpt":{"rendered":"<p>&nbsp; Clustering methods are used to identify groups of similar objects in a multivariate data sets collected from fields such as marketing, bio-medical and geo-spatial. They are different types of [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":7802,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"rating_form_position":"","rating_results_position":"","mr_structured_data_type":"","footnotes":""},"categories":[123],"tags":[],"class_list":["post-8131","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cluster-analysis"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>5 Amazing Types of Clustering Methods You Should Know - Datanovia<\/title>\n<meta name=\"description\" content=\"We provide an overview of clustering methods and quick start R codes. You will also learn how to assess the quality of clustering analysis.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"5 Amazing Types of Clustering Methods You Should Know - Datanovia\" \/>\n<meta property=\"og:description\" content=\"We provide an overview of clustering methods and quick start R codes. You will also learn how to assess the quality of clustering analysis.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/\" \/>\n<meta property=\"og:site_name\" content=\"Datanovia\" \/>\n<meta property=\"article:published_time\" content=\"2018-11-04T07:54:34+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2019-12-25T09:27:02+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/P1030370.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"512\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Alboukadel\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Alboukadel\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/\"},\"author\":{\"name\":\"Alboukadel\",\"@id\":\"https:\/\/www.datanovia.com\/en\/#\/schema\/person\/7767cf2bd5c91a1610c6eb53a0ff069e\"},\"headline\":\"Types of Clustering Methods: Overview and Quick Start R Code\",\"datePublished\":\"2018-11-04T07:54:34+00:00\",\"dateModified\":\"2019-12-25T09:27:02+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/\"},\"wordCount\":1339,\"commentCount\":4,\"publisher\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/P1030370.jpg\",\"articleSection\":[\"Cluster Analysis\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/\",\"url\":\"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/\",\"name\":\"5 Amazing Types of Clustering Methods You Should Know - Datanovia\",\"isPartOf\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/P1030370.jpg\",\"datePublished\":\"2018-11-04T07:54:34+00:00\",\"dateModified\":\"2019-12-25T09:27:02+00:00\",\"description\":\"We provide an overview of clustering methods and quick start R codes. You will also learn how to assess the quality of clustering analysis.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/#primaryimage\",\"url\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/P1030370.jpg\",\"contentUrl\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/P1030370.jpg\",\"width\":1024,\"height\":512},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.datanovia.com\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Types of Clustering Methods: Overview and Quick Start R Code\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.datanovia.com\/en\/#website\",\"url\":\"https:\/\/www.datanovia.com\/en\/\",\"name\":\"Datanovia\",\"description\":\"Data Mining and Statistics for Decision Support\",\"publisher\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.datanovia.com\/en\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.datanovia.com\/en\/#organization\",\"name\":\"Datanovia\",\"url\":\"https:\/\/www.datanovia.com\/en\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png\",\"contentUrl\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png\",\"width\":98,\"height\":99,\"caption\":\"Datanovia\"},\"image\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.datanovia.com\/en\/#\/schema\/person\/7767cf2bd5c91a1610c6eb53a0ff069e\",\"name\":\"Alboukadel\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.datanovia.com\/en\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/ed3108646c5c7c3d188324ab972f96ad7d9975b41b94014d7f68257791be395a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/ed3108646c5c7c3d188324ab972f96ad7d9975b41b94014d7f68257791be395a?s=96&d=mm&r=g\",\"caption\":\"Alboukadel\"},\"url\":\"https:\/\/www.datanovia.com\/en\/blog\/author\/kassambara\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"5 Amazing Types of Clustering Methods You Should Know - Datanovia","description":"We provide an overview of clustering methods and quick start R codes. You will also learn how to assess the quality of clustering analysis.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/","og_locale":"en_US","og_type":"article","og_title":"5 Amazing Types of Clustering Methods You Should Know - Datanovia","og_description":"We provide an overview of clustering methods and quick start R codes. You will also learn how to assess the quality of clustering analysis.","og_url":"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/","og_site_name":"Datanovia","article_published_time":"2018-11-04T07:54:34+00:00","article_modified_time":"2019-12-25T09:27:02+00:00","og_image":[{"width":1024,"height":512,"url":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/P1030370.jpg","type":"image\/jpeg"}],"author":"Alboukadel","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Alboukadel","Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/#article","isPartOf":{"@id":"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/"},"author":{"name":"Alboukadel","@id":"https:\/\/www.datanovia.com\/en\/#\/schema\/person\/7767cf2bd5c91a1610c6eb53a0ff069e"},"headline":"Types of Clustering Methods: Overview and Quick Start R Code","datePublished":"2018-11-04T07:54:34+00:00","dateModified":"2019-12-25T09:27:02+00:00","mainEntityOfPage":{"@id":"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/"},"wordCount":1339,"commentCount":4,"publisher":{"@id":"https:\/\/www.datanovia.com\/en\/#organization"},"image":{"@id":"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/#primaryimage"},"thumbnailUrl":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/P1030370.jpg","articleSection":["Cluster Analysis"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/","url":"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/","name":"5 Amazing Types of Clustering Methods You Should Know - Datanovia","isPartOf":{"@id":"https:\/\/www.datanovia.com\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/#primaryimage"},"image":{"@id":"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/#primaryimage"},"thumbnailUrl":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/P1030370.jpg","datePublished":"2018-11-04T07:54:34+00:00","dateModified":"2019-12-25T09:27:02+00:00","description":"We provide an overview of clustering methods and quick start R codes. You will also learn how to assess the quality of clustering analysis.","breadcrumb":{"@id":"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/#primaryimage","url":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/P1030370.jpg","contentUrl":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/P1030370.jpg","width":1024,"height":512},{"@type":"BreadcrumbList","@id":"https:\/\/www.datanovia.com\/en\/blog\/types-of-clustering-methods-overview-and-quick-start-r-code\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.datanovia.com\/en\/"},{"@type":"ListItem","position":2,"name":"Types of Clustering Methods: Overview and Quick Start R Code"}]},{"@type":"WebSite","@id":"https:\/\/www.datanovia.com\/en\/#website","url":"https:\/\/www.datanovia.com\/en\/","name":"Datanovia","description":"Data Mining and Statistics for Decision Support","publisher":{"@id":"https:\/\/www.datanovia.com\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.datanovia.com\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.datanovia.com\/en\/#organization","name":"Datanovia","url":"https:\/\/www.datanovia.com\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/","url":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png","contentUrl":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png","width":98,"height":99,"caption":"Datanovia"},"image":{"@id":"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/www.datanovia.com\/en\/#\/schema\/person\/7767cf2bd5c91a1610c6eb53a0ff069e","name":"Alboukadel","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.datanovia.com\/en\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/ed3108646c5c7c3d188324ab972f96ad7d9975b41b94014d7f68257791be395a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/ed3108646c5c7c3d188324ab972f96ad7d9975b41b94014d7f68257791be395a?s=96&d=mm&r=g","caption":"Alboukadel"},"url":"https:\/\/www.datanovia.com\/en\/blog\/author\/kassambara\/"}]}},"multi-rating":{"mr_rating_results":[{"adjusted_star_result":0,"star_result":0,"total_max_option_value":5,"adjusted_score_result":0,"score_result":0,"percentage_result":0,"adjusted_percentage_result":0,"count":0,"post_id":8131}]},"_links":{"self":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/posts\/8131","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/comments?post=8131"}],"version-history":[{"count":3,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/posts\/8131\/revisions"}],"predecessor-version":[{"id":11645,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/posts\/8131\/revisions\/11645"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/media\/7802"}],"wp:attachment":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/media?parent=8131"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/categories?post=8131"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/tags?post=8131"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}