Comments on: Data Preparation and R Packages for Cluster Analysis

By: Julian

Julian — Sun, 12 May 2019 06:45:25 +0000

Dear Dr Kassambara,
as many others have already said – thank you very much for this great site! It is indeed a resource I see myself coming back to again and again.
Regarding data preprocessing, I have been wondering how to deal with skewed data – should some form of power transformation be applied to get them into a more “Gaussian” shape, or are different distance metrics better suited than the Euclidean distance, or does it not matter in the end?

By: PS

PS — Wed, 23 Jan 2019 12:33:24 +0000

Hi,

I love this site. It really is helping me out in a “Cluster Analysis” project,

I wanted to understand what kinds of techniques should be used to perform clustering on very large datasets (my data set has about 3 million rows), I am stuck as using functions like “get_clust_tendency” or even the kmeans and hclust algorithms are throwing “cannot allocate vector of 17000 Gb” error.

Is there a better way to approach this problem with clustering on big datasets?

By: Alexis Idlette-Wilson

Alexis Idlette-Wilson — Sun, 02 Dec 2018 01:38:16 +0000

This site is awesome. Thank you!