<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	
	>
<channel>
	<title>
	Comments on: Model Based Clustering Essentials	</title>
	<atom:link href="https://www.datanovia.com/en/lessons/model-based-clustering-essentials/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.datanovia.com/en/lessons/model-based-clustering-essentials/</link>
	<description>Data Mining and Statistics for Decision Support</description>
	<lastBuildDate>Thu, 22 Oct 2020 12:31:31 +0000</lastBuildDate>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.8.2</generator>
	<item>
		<title>
		By: Tayesh		</title>
		<link>https://www.datanovia.com/en/lessons/model-based-clustering-essentials/#comment-21184</link>

		<dc:creator><![CDATA[Tayesh]]></dc:creator>
		<pubDate>Thu, 22 Oct 2020 12:31:31 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_lessons&#038;p=8080#comment-21184</guid>

					<description><![CDATA[Hi!

Thank you very much  for a very clear and hands-on post. If I am understanding it well, model based clustering is based on the assumption that the covariates are normally distributed. What if one or more of  your variables follow another distribution say Poisson. What do you? Looking forward to hearing from you.

thanks,
Tayesh]]></description>
			<content:encoded><![CDATA[<p>Hi!</p>
<p>Thank you very much  for a very clear and hands-on post. If I am understanding it well, model based clustering is based on the assumption that the covariates are normally distributed. What if one or more of  your variables follow another distribution say Poisson. What do you? Looking forward to hearing from you.</p>
<p>thanks,<br />
Tayesh</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Mahesh		</title>
		<link>https://www.datanovia.com/en/lessons/model-based-clustering-essentials/#comment-2212</link>

		<dc:creator><![CDATA[Mahesh]]></dc:creator>
		<pubDate>Tue, 20 Aug 2019 13:25:17 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_lessons&#038;p=8080#comment-2212</guid>

					<description><![CDATA[Hi Kas,

How to find out observations where points falling outside of all clusters ellipse in fviz_clust classification? Can we get index numbers or data-frame of that observations? Also, how we can find out boundary of each ellipse clusters?

Thanks,
Mahesh]]></description>
			<content:encoded><![CDATA[<p>Hi Kas,</p>
<p>How to find out observations where points falling outside of all clusters ellipse in fviz_clust classification? Can we get index numbers or data-frame of that observations? Also, how we can find out boundary of each ellipse clusters?</p>
<p>Thanks,<br />
Mahesh</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: kassambara		</title>
		<link>https://www.datanovia.com/en/lessons/model-based-clustering-essentials/#comment-2167</link>

		<dc:creator><![CDATA[kassambara]]></dc:creator>
		<pubDate>Wed, 31 Jul 2019 17:18:18 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_lessons&#038;p=8080#comment-2167</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://www.datanovia.com/en/lessons/model-based-clustering-essentials/#comment-2164&quot;&gt;Teadora Tyler&lt;/a&gt;.

Thank you for your positive feedback, highly appreciated!

Yes you can use model based clustering on two-dimensional data sets:

&lt;pre class = &quot;r_code&quot;&gt;
# Data preparation
data(&quot;geyser&quot;, package = &quot;MASS&quot;)
df &lt;- scale(geyser) 

# Model baseq clusering
library(mclust)
library(factoextra)
mc &lt;- Mclust(df) 
fviz_mclust(mc, &quot;uncertainty&quot;, palette = &quot;jco&quot;)
&lt;/pre&gt;

You might find the following article useful for evaluating and validating clustering: https://www.datanovia.com/en/courses/cluster-validation-essentials/]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://www.datanovia.com/en/lessons/model-based-clustering-essentials/#comment-2164">Teadora Tyler</a>.</p>
<p>Thank you for your positive feedback, highly appreciated!</p>
<p>Yes you can use model based clustering on two-dimensional data sets:</p>
<pre class = "r_code">
# Data preparation
data("geyser", package = "MASS")
df <- scale(geyser) 

# Model baseq clusering
library(mclust)
library(factoextra)
mc <- Mclust(df) 
fviz_mclust(mc, "uncertainty", palette = "jco")
</pre>
<p>You might find the following article useful for evaluating and validating clustering: <a href="https://www.datanovia.com/en/courses/cluster-validation-essentials/" rel="ugc">https://www.datanovia.com/en/courses/cluster-validation-essentials/</a></p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Teadora Tyler		</title>
		<link>https://www.datanovia.com/en/lessons/model-based-clustering-essentials/#comment-2164</link>

		<dc:creator><![CDATA[Teadora Tyler]]></dc:creator>
		<pubDate>Wed, 31 Jul 2019 15:50:26 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_lessons&#038;p=8080#comment-2164</guid>

					<description><![CDATA[Thank you SO much for this to-the-point, beautiful post! 
It really helped me with my first steps.

I rarely come across such easy-to-understand and useful post in this topic that helps beginners too!

Do you think using this mclust approach is adequate for a dataset that contains only X and Y coordinates of objects? I mean it works beautifully, but is it the proper way? I can get very lost in all the possible spatial clustering methods. I was also looking at DBSCAN but mclust gives much better plots (I think).

Thanks again:)
Teadora]]></description>
			<content:encoded><![CDATA[<p>Thank you SO much for this to-the-point, beautiful post!<br />
It really helped me with my first steps.</p>
<p>I rarely come across such easy-to-understand and useful post in this topic that helps beginners too!</p>
<p>Do you think using this mclust approach is adequate for a dataset that contains only X and Y coordinates of objects? I mean it works beautifully, but is it the proper way? I can get very lost in all the possible spatial clustering methods. I was also looking at DBSCAN but mclust gives much better plots (I think).</p>
<p>Thanks again:)<br />
Teadora</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: kassambara		</title>
		<link>https://www.datanovia.com/en/lessons/model-based-clustering-essentials/#comment-1520</link>

		<dc:creator><![CDATA[kassambara]]></dc:creator>
		<pubDate>Mon, 12 Nov 2018 05:38:00 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_lessons&#038;p=8080#comment-1520</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://www.datanovia.com/en/lessons/model-based-clustering-essentials/#comment-1517&quot;&gt;San Emmanuel&lt;/a&gt;.

Hi San Emmanuel,

Thank you for your feedback!


The choice of the (dis)simality metric shoud be based on the research question and the type of dataset.

For example, Euclidian distance is best for variables with continuous data while Bray Curtis is best for categorical or binary data.

Particularly for continuous data it is expected that all variables are in the &quot;same&quot; scale and with the &quot;same&quot; distribution. So if your variables are not, you will need to standardize or normalize them.

If You want to reflect ecological differences, then Bray-Curtis will do a much better job, since it used to quantify the compositional dissimilarity between two different sites, based on counts at each site. 

The Bray–Curtis dissimilarity is often erroneously called a distance. It is not a distance since it does not satisfy triangle inequality, and should always be called a dissimilarity to avoid confusion.

The use of Euclidean (metric distance) and Bray-Curtis (semi metric) depends on your data and the way you want to handle it. Metric distances comply with the triangle inequality criterion (the sum of two sides of a triangle equal must be greatet or equal than the other side) while semi metric don&#039;t. 

This is particularly relevant when zeros are not true absences (eg when you sample species from a site, you&#039;ll never know for sure if the species is truly absent or you failed to sample it but is present, or in your case metals). 

This is very important because if your zeros aren&#039;t true absences and you use Euclidean distance, the dissimilarities among sites won&#039;t be a good description of your data, that is, two sites with a bunch of shared zeros will be more similar to each other this two sites with a few shared observations. This is why, when dealing with composition data, it is more appropriate to use Bray-Curtis over Euclidean distance.


See also:

- Clustering Distance Measures, https://www.datanovia.com/en/lessons/clustering-distance-measures/]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://www.datanovia.com/en/lessons/model-based-clustering-essentials/#comment-1517">San Emmanuel</a>.</p>
<p>Hi San Emmanuel,</p>
<p>Thank you for your feedback!</p>
<p>The choice of the (dis)simality metric shoud be based on the research question and the type of dataset.</p>
<p>For example, Euclidian distance is best for variables with continuous data while Bray Curtis is best for categorical or binary data.</p>
<p>Particularly for continuous data it is expected that all variables are in the &#8220;same&#8221; scale and with the &#8220;same&#8221; distribution. So if your variables are not, you will need to standardize or normalize them.</p>
<p>If You want to reflect ecological differences, then Bray-Curtis will do a much better job, since it used to quantify the compositional dissimilarity between two different sites, based on counts at each site. </p>
<p>The Bray–Curtis dissimilarity is often erroneously called a distance. It is not a distance since it does not satisfy triangle inequality, and should always be called a dissimilarity to avoid confusion.</p>
<p>The use of Euclidean (metric distance) and Bray-Curtis (semi metric) depends on your data and the way you want to handle it. Metric distances comply with the triangle inequality criterion (the sum of two sides of a triangle equal must be greatet or equal than the other side) while semi metric don&#8217;t. </p>
<p>This is particularly relevant when zeros are not true absences (eg when you sample species from a site, you&#8217;ll never know for sure if the species is truly absent or you failed to sample it but is present, or in your case metals). </p>
<p>This is very important because if your zeros aren&#8217;t true absences and you use Euclidean distance, the dissimilarities among sites won&#8217;t be a good description of your data, that is, two sites with a bunch of shared zeros will be more similar to each other this two sites with a few shared observations. This is why, when dealing with composition data, it is more appropriate to use Bray-Curtis over Euclidean distance.</p>
<p>See also:</p>
<p>&#8211; Clustering Distance Measures, <a href="https://www.datanovia.com/en/lessons/clustering-distance-measures/" rel="ugc">https://www.datanovia.com/en/lessons/clustering-distance-measures/</a></p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: San Emmanuel		</title>
		<link>https://www.datanovia.com/en/lessons/model-based-clustering-essentials/#comment-1517</link>

		<dc:creator><![CDATA[San Emmanuel]]></dc:creator>
		<pubDate>Sun, 11 Nov 2018 18:15:12 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_lessons&#038;p=8080#comment-1517</guid>

					<description><![CDATA[Hi Kas, 

Thanks for sharing, like Chafia, I agree that this is very helpful.

I have .a quick question about the intuition around scale and using a distance matrix (or method). I find that in certain instance, data is scaled and in others, a distance method such as Bray Curtis or Jaccardi is used.  Am referring to microbiome studies. What are your thoughts? 

Thanks,]]></description>
			<content:encoded><![CDATA[<p>Hi Kas, </p>
<p>Thanks for sharing, like Chafia, I agree that this is very helpful.</p>
<p>I have .a quick question about the intuition around scale and using a distance matrix (or method). I find that in certain instance, data is scaled and in others, a distance method such as Bray Curtis or Jaccardi is used.  Am referring to microbiome studies. What are your thoughts? </p>
<p>Thanks,</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: kassambara		</title>
		<link>https://www.datanovia.com/en/lessons/model-based-clustering-essentials/#comment-1493</link>

		<dc:creator><![CDATA[kassambara]]></dc:creator>
		<pubDate>Sun, 28 Oct 2018 19:22:11 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_lessons&#038;p=8080#comment-1493</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://www.datanovia.com/en/lessons/model-based-clustering-essentials/#comment-1492&quot;&gt;Chafia&lt;/a&gt;.

Hi Chafia,

Thank you very much for the feedback.
These kind of appreciations really help and motivate us to perform well and deliver better contents forever.

Thank you again.

Best regards]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://www.datanovia.com/en/lessons/model-based-clustering-essentials/#comment-1492">Chafia</a>.</p>
<p>Hi Chafia,</p>
<p>Thank you very much for the feedback.<br />
These kind of appreciations really help and motivate us to perform well and deliver better contents forever.</p>
<p>Thank you again.</p>
<p>Best regards</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Chafia		</title>
		<link>https://www.datanovia.com/en/lessons/model-based-clustering-essentials/#comment-1492</link>

		<dc:creator><![CDATA[Chafia]]></dc:creator>
		<pubDate>Sun, 28 Oct 2018 15:52:45 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_lessons&#038;p=8080#comment-1492</guid>

					<description><![CDATA[MERCI BEAUCOUP
THANK YOU SO MUCH

FOR THE COLORS YOU PUT ON THE DATA TO MAKE THEM 
REALLY TALK TO US
I ENJOY PLOTTING AND MODELLING AND CLUSTRING
LEARNING FROM YOU HOW TO DO A BEAUTIFULL DATA ANALYSIS

THANK YOU SIR TO SHARE THIS JOY
WISH YOU THE BEST
REGARDS]]></description>
			<content:encoded><![CDATA[<p>MERCI BEAUCOUP<br />
THANK YOU SO MUCH</p>
<p>FOR THE COLORS YOU PUT ON THE DATA TO MAKE THEM<br />
REALLY TALK TO US<br />
I ENJOY PLOTTING AND MODELLING AND CLUSTRING<br />
LEARNING FROM YOU HOW TO DO A BEAUTIFULL DATA ANALYSIS</p>
<p>THANK YOU SIR TO SHARE THIS JOY<br />
WISH YOU THE BEST<br />
REGARDS</p>
]]></content:encoded>
		
			</item>
	</channel>
</rss>

<!--
Performance optimized by W3 Total Cache. Learn more: https://www.boldgrid.com/w3-total-cache/

Object Caching 108/173 objects using Memcached
Page Caching using Disk: Enhanced 
Lazy Loading (feed)
Database Caching 33/54 queries in 0.015 seconds using APC

Served from: www.datanovia.com @ 2025-07-24 00:16:48 by W3 Total Cache
-->