Cluster Validation Essentials

5 Lessons

2 hours 0 mins

Free

The cluster validation consists of measuring the goodness of clustering results. Before applying any clustering algorithm to a data set, the first thing to do is to assess the clustering tendency. That is, whether applying clustering is suitable for the data. If yes, then how many clusters are there. Next, you can perform hierarchical clustering or partitioning clustering (with a pre-specified number of clusters). Finally, you can use a number of measures, described in this part, to evaluate the goodness of the clustering results.

In this course, you will learn the following contents. We also provide practical examples in R software:

Assessing clustering tendency using visual and statistical methods
Determining the optimal number of clusters using elbow method, cluster silhouette analysis and gap statistics
Cluster validation statistics using internal and external measures (silhouette coefficients and Dunn index)
Choosing the best clustering algorithms. We’ll present different measures for comparing clustering algorithms and choosing the best one
Computing p-value for hierarchical clustering using the pvclust() R function

Related Book

Practical Guide to Cluster Analysis in R

Lessons

Choosing the Best Clustering Algorithms
15 mins
Alboukadel Kassambara

In this article, we’ll start by describing the different measures in the clValid R package for comparing clustering algorithms. Next, we’ll present the function clValid(). Finally, we’ll provide R scripts for validating clustering results and comparing clustering algorithms.
Computing P-value for Hierarchical Clustering
15 mins
Alboukadel Kassambara

This article describes the R package pvclust, which uses bootstrap resampling techniques to compute p-value for each hierarchical clusters.
Assessing Clustering Tendency
30 mins
Alboukadel Kassambara

In this chapter, we start by describing why we should evaluate the clustering tendency before applying any clustering method on a data. Next, we provide statistical and visual methods for assessing the clustering tendency in R software.
Determining The Optimal Number Of Clusters: 3 Must Know Methods
30 mins
Alboukadel Kassambara

In this article, we'll describe different methods for determining the optimal number of clusters for k-means, k-medoids (PAM) and hierarchical clustering.
Cluster Validation Statistics: Must Know Methods
30 mins
Alboukadel Kassambara

In this article, we start by describing the different methods for clustering validation. Next, we'll demonstrate how to compare the quality of clustering results obtained with different clustering algorithms. Finally, we'll provide R scripts for validating clustering results.