<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	
	>
<channel>
	<title>
	Comments on: Advanced Clustering	</title>
	<atom:link href="https://www.datanovia.com/en/courses/advanced-clustering/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.datanovia.com/en/courses/advanced-clustering/</link>
	<description>Data Mining and Statistics for Decision Support</description>
	<lastBuildDate>Wed, 20 Jan 2021 06:41:25 +0000</lastBuildDate>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.8.2</generator>
	<item>
		<title>
		By: Hema Latha Krishna Nair		</title>
		<link>https://www.datanovia.com/en/courses/advanced-clustering/#comment-21600</link>

		<dc:creator><![CDATA[Hema Latha Krishna Nair]]></dc:creator>
		<pubDate>Wed, 20 Jan 2021 06:41:25 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_courses&#038;p=8077#comment-21600</guid>

					<description><![CDATA[Hi,
It would be helpful if anyone can explain on how may I use K-Means clustering in a situation where I have more than 2 dimension/ argument for evaluation. I would like to cluster them into 5 clusters (K-5) but I am afraid basic Kmeans only takes up 2 dimensions for distance measure.  Any best practice?]]></description>
			<content:encoded><![CDATA[<p>Hi,<br />
It would be helpful if anyone can explain on how may I use K-Means clustering in a situation where I have more than 2 dimension/ argument for evaluation. I would like to cluster them into 5 clusters (K-5) but I am afraid basic Kmeans only takes up 2 dimensions for distance measure.  Any best practice?</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Yulin		</title>
		<link>https://www.datanovia.com/en/courses/advanced-clustering/#comment-2226</link>

		<dc:creator><![CDATA[Yulin]]></dc:creator>
		<pubDate>Sat, 24 Aug 2019 23:42:36 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_courses&#038;p=8077#comment-2226</guid>

					<description><![CDATA[Excellent course! Many thanks of sharing the knowledge!]]></description>
			<content:encoded><![CDATA[<p>Excellent course! Many thanks of sharing the knowledge!</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Noven		</title>
		<link>https://www.datanovia.com/en/courses/advanced-clustering/#comment-1665</link>

		<dc:creator><![CDATA[Noven]]></dc:creator>
		<pubDate>Wed, 30 Jan 2019 00:54:51 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_courses&#038;p=8077#comment-1665</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://www.datanovia.com/en/courses/advanced-clustering/#comment-1561&quot;&gt;kassambara&lt;/a&gt;.

Hi, Kassambara. Please make a post/tutorial about K-Prototype Clustering for mixed attribute and how to get the cluster accuracy.]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://www.datanovia.com/en/courses/advanced-clustering/#comment-1561">kassambara</a>.</p>
<p>Hi, Kassambara. Please make a post/tutorial about K-Prototype Clustering for mixed attribute and how to get the cluster accuracy.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: poorwa_kunwar		</title>
		<link>https://www.datanovia.com/en/courses/advanced-clustering/#comment-1633</link>

		<dc:creator><![CDATA[poorwa_kunwar]]></dc:creator>
		<pubDate>Wed, 16 Jan 2019 21:18:23 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_courses&#038;p=8077#comment-1633</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://www.datanovia.com/en/courses/advanced-clustering/#comment-1632&quot;&gt;kassambara&lt;/a&gt;.

Thankyou for your reply. But the number of observations is 90 lacs or 9 million and not 90.]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://www.datanovia.com/en/courses/advanced-clustering/#comment-1632">kassambara</a>.</p>
<p>Thankyou for your reply. But the number of observations is 90 lacs or 9 million and not 90.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: kassambara		</title>
		<link>https://www.datanovia.com/en/courses/advanced-clustering/#comment-1632</link>

		<dc:creator><![CDATA[kassambara]]></dc:creator>
		<pubDate>Wed, 16 Jan 2019 21:10:13 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_courses&#038;p=8077#comment-1632</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://www.datanovia.com/en/courses/advanced-clustering/#comment-1629&quot;&gt;poorwa_kunwar&lt;/a&gt;.

You can also try the CLARA algorithm (https://www.datanovia.com/en/lessons/clara-in-r-clustering-large-applications/) for large data set. 

For me, 90 observations is not a big dataset... But, it depends on the number of variables you have in the dataset]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://www.datanovia.com/en/courses/advanced-clustering/#comment-1629">poorwa_kunwar</a>.</p>
<p>You can also try the CLARA algorithm (<a href="https://www.datanovia.com/en/lessons/clara-in-r-clustering-large-applications/" rel="ugc">https://www.datanovia.com/en/lessons/clara-in-r-clustering-large-applications/</a>) for large data set. </p>
<p>For me, 90 observations is not a big dataset&#8230; But, it depends on the number of variables you have in the dataset</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: poorwa_kunwar		</title>
		<link>https://www.datanovia.com/en/courses/advanced-clustering/#comment-1629</link>

		<dc:creator><![CDATA[poorwa_kunwar]]></dc:creator>
		<pubDate>Wed, 16 Jan 2019 12:05:23 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_courses&#038;p=8077#comment-1629</guid>

					<description><![CDATA[I am working on a very large dataset (over 90 lac observations) and also the dataset has both categorical and continuous variables. I tried using gowerand PAM but it simply fails to work because the dataset is too large. I&#039;m thinking of using k-prototypes algorithm in the clustMixType package. Do you have any suggestions? Thanks.]]></description>
			<content:encoded><![CDATA[<p>I am working on a very large dataset (over 90 lac observations) and also the dataset has both categorical and continuous variables. I tried using gowerand PAM but it simply fails to work because the dataset is too large. I&#8217;m thinking of using k-prototypes algorithm in the clustMixType package. Do you have any suggestions? Thanks.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: kassambara		</title>
		<link>https://www.datanovia.com/en/courses/advanced-clustering/#comment-1561</link>

		<dc:creator><![CDATA[kassambara]]></dc:creator>
		<pubDate>Sat, 08 Dec 2018 06:47:54 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_courses&#038;p=8077#comment-1561</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://www.datanovia.com/en/courses/advanced-clustering/#comment-1558&quot;&gt;Connie&lt;/a&gt;.

Hi Connie,

My previous comment shows just an example of how to perform clustering on mixed data. Note that, Clara algorithm doesn&#039;t take a distance matrix as input, so you can&#039;t apply it on Gower distance.

For soft clustering, I would suggest the fuzzy clustering method using the fanny() R function [in cluster R package]. It supports distance matrix as an input.


You might be interested by the &lt;a href=&quot;http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/117-hcpc-hierarchical-clustering-on-principal-components-essentials/&quot; target=&quot;_blank&quot; rel=&quot;noopener nofollow&quot;&gt;Hierarchical Clustering on Principal Components (HCPC)&lt;/a&gt;, which can be also used for performing clustering on mixed data.]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://www.datanovia.com/en/courses/advanced-clustering/#comment-1558">Connie</a>.</p>
<p>Hi Connie,</p>
<p>My previous comment shows just an example of how to perform clustering on mixed data. Note that, Clara algorithm doesn&#8217;t take a distance matrix as input, so you can&#8217;t apply it on Gower distance.</p>
<p>For soft clustering, I would suggest the fuzzy clustering method using the fanny() R function [in cluster R package]. It supports distance matrix as an input.</p>
<p>You might be interested by the <a href="http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/117-hcpc-hierarchical-clustering-on-principal-components-essentials/" target="_blank" rel="noopener nofollow">Hierarchical Clustering on Principal Components (HCPC)</a>, which can be also used for performing clustering on mixed data.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Connie		</title>
		<link>https://www.datanovia.com/en/courses/advanced-clustering/#comment-1558</link>

		<dc:creator><![CDATA[Connie]]></dc:creator>
		<pubDate>Thu, 06 Dec 2018 02:38:01 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_courses&#038;p=8077#comment-1558</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://www.datanovia.com/en/courses/advanced-clustering/#comment-1555&quot;&gt;kassambara&lt;/a&gt;.

Kassambara, thank you for quick reply. Your explanation is always clear and straightforward. As I have 20,000 observations, my first thought is to use CLARA. I will adopt hierarchical clustering as you suggested. As a beginner to cluster analysis, may I ask  why hierarchical clustering is better than CLARA in my case? That is the question I need to answer when I write the method section of the paper. 

When I read Fraley&#039;s paper (2002), I like the idea of &#039;soft&#039; clustering, which however has some limitations such as large dataset. I don&#039;t want to make things complicated at first place. But in future, after running basic method, is it possible to apply &#039;soft&#039; clustering in my case? Which soft clustering method you recommend? Thank you!]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://www.datanovia.com/en/courses/advanced-clustering/#comment-1555">kassambara</a>.</p>
<p>Kassambara, thank you for quick reply. Your explanation is always clear and straightforward. As I have 20,000 observations, my first thought is to use CLARA. I will adopt hierarchical clustering as you suggested. As a beginner to cluster analysis, may I ask  why hierarchical clustering is better than CLARA in my case? That is the question I need to answer when I write the method section of the paper. </p>
<p>When I read Fraley&#8217;s paper (2002), I like the idea of &#8216;soft&#8217; clustering, which however has some limitations such as large dataset. I don&#8217;t want to make things complicated at first place. But in future, after running basic method, is it possible to apply &#8216;soft&#8217; clustering in my case? Which soft clustering method you recommend? Thank you!</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: kassambara		</title>
		<link>https://www.datanovia.com/en/courses/advanced-clustering/#comment-1555</link>

		<dc:creator><![CDATA[kassambara]]></dc:creator>
		<pubDate>Wed, 05 Dec 2018 14:00:28 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_courses&#038;p=8077#comment-1555</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://www.datanovia.com/en/courses/advanced-clustering/#comment-1554&quot;&gt;Connie&lt;/a&gt;.

For a mixed data, you can, first, compute a distance matrix between variables using the daisy() R function [in cluster package].

Next, you can apply hierarchical clustering on the computed distance matrix. 

For example:

&lt;pre class = &quot;r_code&quot;&gt;
library(cluster)
library(factoextra)

# Load data
data(flower)
head(flower, 3)

# Compute the gower distance matrix and visualize
gower.dist &lt;- daisy(flower, metric = &quot;gower&quot;)
fviz_dist(gower.dist)

# Perform aglomerative hierarchical clustering
hc.clust &lt;- agnes(gower.dist)
fviz_dend(hc.clust)
&lt;/pre&gt;]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://www.datanovia.com/en/courses/advanced-clustering/#comment-1554">Connie</a>.</p>
<p>For a mixed data, you can, first, compute a distance matrix between variables using the daisy() R function [in cluster package].</p>
<p>Next, you can apply hierarchical clustering on the computed distance matrix. </p>
<p>For example:</p>
<pre class = "r_code">
library(cluster)
library(factoextra)

# Load data
data(flower)
head(flower, 3)

# Compute the gower distance matrix and visualize
gower.dist <- daisy(flower, metric = "gower")
fviz_dist(gower.dist)

# Perform aglomerative hierarchical clustering
hc.clust <- agnes(gower.dist)
fviz_dend(hc.clust)
</pre>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Connie		</title>
		<link>https://www.datanovia.com/en/courses/advanced-clustering/#comment-1554</link>

		<dc:creator><![CDATA[Connie]]></dc:creator>
		<pubDate>Wed, 05 Dec 2018 13:17:46 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_courses&#038;p=8077#comment-1554</guid>

					<description><![CDATA[Thank you so much for the very clear and excellent teaching on cluster analysis! I am wondering if I want to cluster observations based on three ordered categorical variables and one continuous variable in panel data, which method should I use? I would appreciate if you would like to answer my question.]]></description>
			<content:encoded><![CDATA[<p>Thank you so much for the very clear and excellent teaching on cluster analysis! I am wondering if I want to cluster observations based on three ordered categorical variables and one continuous variable in panel data, which method should I use? I would appreciate if you would like to answer my question.</p>
]]></content:encoded>
		
			</item>
	</channel>
</rss>

<!--
Performance optimized by W3 Total Cache. Learn more: https://www.boldgrid.com/w3-total-cache/

Object Caching 111/177 objects using Memcached
Page Caching using Disk: Enhanced 
Lazy Loading (feed)
Database Caching 37/54 queries in 0.033 seconds using APC

Served from: www.datanovia.com @ 2025-07-23 05:20:16 by W3 Total Cache
-->