Cluster Validation Essentials

Cluster Validation Essentials

Cluster Validation Essentials

The cluster validation consists of measuring the goodness of clustering results. Before applying any clustering algorithm to a data set, the first thing to do is to assess the clustering tendency. That is, whether applying clustering is suitable for the data. If yes, then how many clusters are there. Next, you can perform hierarchical clustering or partitioning clustering (with a pre-specified number of clusters). Finally, you can use a number of measures, described in this part, to evaluate the goodness of the clustering results.

Cluster validation

In this course, you will learn the following contents. We also provide practical examples in R software:

  • Assessing clustering tendency using visual and statistical methods
  • Determining the optimal number of clusters using elbow method, cluster silhouette analysis and gap statistics
  • Cluster validation statistics using internal and external measures (silhouette coefficients and Dunn index)
  • Choosing the best clustering algorithms. We’ll present different measures for comparing clustering algorithms and choosing the best one
  • Computing p-value for hierarchical clustering using the pvclust() R function

Related Book

Practical Guide to Cluster Analysis in R


  1. In this chapter, we start by describing why we should evaluate the clustering tendency before applying any clustering method on a data. Next, we provide statistical and visual methods for assessing the clustering tendency in R software.
  2. In this article, we start by describing the different methods for clustering validation. Next, we'll demonstrate how to compare the quality of clustering results obtained with different clustering algorithms. Finally, we'll provide R scripts for validating clustering results.
  3. In this article, we’ll start by describing the different measures in the clValid R package for comparing clustering algorithms. Next, we’ll present the function clValid(). Finally, we’ll provide R scripts for validating clustering results and comparing clustering algorithms.

Comment ( 1 )

  • Ghina Nadiah


Give a comment

Want to post an issue with R? If yes, please make sure you have read this: How to Include Reproducible R Script Examples in Datanovia Comments