Advanced Clustering

Hierarchical K-Means Clustering: Optimize Clusters

K-means represents one of the most popular clustering algorithm. However, it has some limitations: it requires the user to specify the number of clusters in advance and selects initial centroids randomly. The final k-means clustering solution is very sensitive to this initial random selection of cluster centers. The result might be (slightly) different each time you compute k-means.

In this chapter, we described an hybrid method, named hierarchical k-means clustering (hkmeans), for improving k-means results.


Related Book

Practical Guide to Cluster Analysis in R


The algorithm is summarized as follow:

  1. Compute hierarchical clustering and cut the tree into k-clusters
  2. Compute the center (i.e the mean) of each cluster
  3. Compute k-means by using the set of cluster centers (defined in step 2) as the initial cluster centers

Note that, k-means algorithm will improve the initial partitioning generated at the step 2 of the algorithm. Hence, the initial partitioning can be slightly different from the final partitioning obtained in the step 4.

R code

The R function hkmeans() [in factoextra], provides an easy solution to compute the hierarchical k-means clustering. The format of the result is similar to the one provided by the standard kmeans() function (see Chapter @ref(kmeans-clustering)).

To install factoextra, type this: install.packages(“factoextra”).

We’ll use the USArrest data set and we start by standardizing the data:

df <- scale(USArrests)
# Compute hierarchical k-means clustering
library(factoextra) <-hkmeans(df, 4)
# Elements returned by hkmeans()
##  [1] "cluster"      "centers"      "totss"        "withinss"    
##  [5] "tot.withinss" "betweenss"    "size"         "iter"        
##  [9] "ifault"       "data"         "hclust"

To print all the results, type this:

# Print the results
# Visualize the tree
fviz_dend(, cex = 0.6, palette = "jco", 
          rect = TRUE, rect_border = "jco", rect_fill = TRUE)

# Visualize the hkmeans final clusters
fviz_cluster(, palette = "jco", repel = TRUE,
             ggtheme = theme_classic())


We described hybrid hierarchical k-means clustering for improving k-means results.

cmeans() R function: Compute Fuzzy clustering (Prev Lesson)
Back to Advanced Clustering

Comments ( 2 )

  • Bryan Hutchinson

    Dear Alboukadel

    If I were to use hierarchical k-means clustering, what function method should I use in the eclust function when doing cluster validation to initialise the function?

Give a comment

Alboukadel Kassambara
Role : Founder of Datanovia
Read More