Data Clustering Basics

3 Lessons

1 hour 0 mins

Free

Data clustering consists of data mining methods for identifying groups of similar objects in a multivariate data sets collected from fields such as marketing, bio-medical and geo-spatial.

Similarity between observations (or individuals) is defined using some inter-observation distance measures including Euclidean and correlation-based distance measures.

There are different types of data clustering techniques, including:

Partitioning clustering approaches, which subdivide the data into a set of k groups. One of the popular partitioning method is the k-means clustering
Hierarchical clustering approaches, which identify groups in the data without subdividing it.

This course presents the basics to know for clustering analysis in R. You will learn:

Data preparation and essential R packages for cluster analysis
Clustering distance measures essentials
Quick start R code to perform k-means clustering and hierarchical clustering in R.

Related Book

Practical Guide to Cluster Analysis in R

Lessons

Data Preparation and R Packages for Cluster Analysis
5 mins
Alboukadel Kassambara

This chapter introduces how to prepare your data for cluster analysis and describes the essential R package for cluster analysis.
Clustering Distance Measures
35 mins
Alboukadel Kassambara

In this article, we describe the common distance measures used to compute distance matrix for cluster analysis. We also provide R codes for computing and visualizing distances.
Cluster Analysis Example: Quick Start R Code
20 mins
Alboukadel Kassambara

This chapter describes a cluster analysis example using R software. We provide a quick start R code to compute and visualize K-means and hierarchical clustering.