Cluster Validation Essentials

Cluster Validation Essentials

The cluster validation consists of measuring the goodness of clustering results. Before applying any clustering algorithm to a data set, the first thing to do is to assess the clustering tendency. That is, whether applying clustering is suitable for the data. If yes, then how many clusters are there. Next, you can perform hierarchical clustering or partitioning clustering (with a pre-specified number of clusters). Finally, you can use a number of measures, described in this part, to evaluate the goodness of the clustering results.

In this course, you will learn the following contents. We also provide practical examples in R software:

• Assessing clustering tendency using visual and statistical methods
• Determining the optimal number of clusters using elbow method, cluster silhouette analysis and gap statistics
• Cluster validation statistics using internal and external measures (silhouette coefficients and Dunn index)
• Choosing the best clustering algorithms. We’ll present different measures for comparing clustering algorithms and choosing the best one
• Computing p-value for hierarchical clustering using the pvclust() R function

Related Book

Practical Guide to Cluster Analysis in R

1. Assessing Clustering Tendency

In this chapter, we start by describing why we should evaluate the clustering tendency before applying any clustering method on a data. Next, we provide statistical and visual methods for assessing the clustering tendency in R software.
2. Determining The Optimal Number Of Clusters: 3 Must Know Methods

In this article, we'll describe different methods for determining the optimal number of clusters for k-means, k-medoids (PAM) and hierarchical clustering.
3. Cluster Validation Statistics: Must Know Methods

In this article, we start by describing the different methods for clustering validation. Next, we'll demonstrate how to compare the quality of clustering results obtained with different clustering algorithms. Finally, we'll provide R scripts for validating clustering results.
4. Choosing the Best Clustering Algorithms

In this article, we’ll start by describing the different measures in the clValid R package for comparing clustering algorithms. Next, we’ll present the function clValid(). Finally, we’ll provide R scripts for validating clustering results and comparing clustering algorithms.
5. Computing P-value for Hierarchical Clustering

This article describes the R package pvclust, which uses bootstrap resampling techniques to compute p-value for each hierarchical clusters.

good

5

(1 votes, average: 5.00 out of 5)