<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	
	>
<channel>
	<title>
	Comments on: Identify and Remove Duplicate Data in R	</title>
	<atom:link href="https://www.datanovia.com/en/lessons/identify-and-remove-duplicate-data-in-r/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.datanovia.com/en/lessons/identify-and-remove-duplicate-data-in-r/</link>
	<description>Data Mining and Statistics for Decision Support</description>
	<lastBuildDate>Fri, 22 May 2020 10:21:56 +0000</lastBuildDate>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.8.2</generator>
	<item>
		<title>
		By: kassambara		</title>
		<link>https://www.datanovia.com/en/lessons/identify-and-remove-duplicate-data-in-r/#comment-19887</link>

		<dc:creator><![CDATA[kassambara]]></dc:creator>
		<pubDate>Fri, 22 May 2020 10:21:56 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_lessons&#038;p=7235#comment-19887</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://www.datanovia.com/en/lessons/identify-and-remove-duplicate-data-in-r/#comment-19884&quot;&gt;Hasan Ayouby&lt;/a&gt;.

&lt;div class=&quot;rdoc&quot;&gt;
&lt;p&gt;To overwrite, your original file, type this:&lt;/p&gt;
&lt;pre class=&quot;r&quot;&gt;&lt;code&gt;my_data = my_data %&#062;% 
  distinct(Sepal.Length, .keep_all = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://www.datanovia.com/en/lessons/identify-and-remove-duplicate-data-in-r/#comment-19884">Hasan Ayouby</a>.</p>
<div class="rdoc">
<p>To overwrite, your original file, type this:</p>
<pre class="r"><code>my_data = my_data %&gt;% 
  distinct(Sepal.Length, .keep_all = TRUE)</code></pre>
</div>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Hasan Ayouby		</title>
		<link>https://www.datanovia.com/en/lessons/identify-and-remove-duplicate-data-in-r/#comment-19884</link>

		<dc:creator><![CDATA[Hasan Ayouby]]></dc:creator>
		<pubDate>Thu, 21 May 2020 16:46:24 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_lessons&#038;p=7235#comment-19884</guid>

					<description><![CDATA[How to permanently remove the duplicates? because once i used this function, it acts only like a filter. but the original table stays intact.]]></description>
			<content:encoded><![CDATA[<p>How to permanently remove the duplicates? because once i used this function, it acts only like a filter. but the original table stays intact.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: kassambara		</title>
		<link>https://www.datanovia.com/en/lessons/identify-and-remove-duplicate-data-in-r/#comment-19880</link>

		<dc:creator><![CDATA[kassambara]]></dc:creator>
		<pubDate>Wed, 20 May 2020 13:49:24 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_lessons&#038;p=7235#comment-19880</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://www.datanovia.com/en/lessons/identify-and-remove-duplicate-data-in-r/#comment-19879&quot;&gt;Moses&lt;/a&gt;.

Thank you for your feedback, highly appreciated. There is also  &lt;a href=&quot;https://rpkgs.datanovia.com/rstatix/reference/outliers.html&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer nofollow ugc&quot;&gt;Outlier identifications&lt;/a&gt;]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://www.datanovia.com/en/lessons/identify-and-remove-duplicate-data-in-r/#comment-19879">Moses</a>.</p>
<p>Thank you for your feedback, highly appreciated. There is also  <a href="https://rpkgs.datanovia.com/rstatix/reference/outliers.html" target="_blank" rel="noopener noreferrer nofollow ugc">Outlier identifications</a></p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Moses		</title>
		<link>https://www.datanovia.com/en/lessons/identify-and-remove-duplicate-data-in-r/#comment-19879</link>

		<dc:creator><![CDATA[Moses]]></dc:creator>
		<pubDate>Wed, 20 May 2020 13:44:33 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_lessons&#038;p=7235#comment-19879</guid>

					<description><![CDATA[You are always on point! A quick one...
What are the major check points in data management? I know there are duplicates, missing data, ....]]></description>
			<content:encoded><![CDATA[<p>You are always on point! A quick one&#8230;<br />
What are the major check points in data management? I know there are duplicates, missing data, &#8230;.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: kassambara		</title>
		<link>https://www.datanovia.com/en/lessons/identify-and-remove-duplicate-data-in-r/#comment-11267</link>

		<dc:creator><![CDATA[kassambara]]></dc:creator>
		<pubDate>Wed, 15 Apr 2020 18:42:23 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_lessons&#038;p=7235#comment-11267</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://www.datanovia.com/en/lessons/identify-and-remove-duplicate-data-in-r/#comment-11266&quot;&gt;Zbig&lt;/a&gt;.

&lt;div class=&quot;rdoc&quot;&gt;
&lt;p&gt;If you want to keep distinct rows based on multiple columns, you can go as follow:&lt;/p&gt;
&lt;pre class=&quot;r&quot;&gt;&lt;code&gt;library(dplyr)
df &#060;- data.frame(id=c(1,1,1,2,2,2), time=rep(1:3, 2), place=c(1,2,1,1,1,2))
df %&#062;% distinct(id, time, place, .keep_all = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://www.datanovia.com/en/lessons/identify-and-remove-duplicate-data-in-r/#comment-11266">Zbig</a>.</p>
<div class="rdoc">
<p>If you want to keep distinct rows based on multiple columns, you can go as follow:</p>
<pre class="r"><code>library(dplyr)
df &lt;- data.frame(id=c(1,1,1,2,2,2), time=rep(1:3, 2), place=c(1,2,1,1,1,2))
df %&gt;% distinct(id, time, place, .keep_all = TRUE)</code></pre>
</div>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Zbig		</title>
		<link>https://www.datanovia.com/en/lessons/identify-and-remove-duplicate-data-in-r/#comment-11266</link>

		<dc:creator><![CDATA[Zbig]]></dc:creator>
		<pubDate>Wed, 15 Apr 2020 18:23:02 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_lessons&#038;p=7235#comment-11266</guid>

					<description><![CDATA[Now I have a slightly harder task:
what to do if I want to remove only subsequent, immediate duplicates, but if they are divided by something I want to preserve them.
Example: you have a data frame with object id, time and the place where it happened:
df &#060;- data.frame(id=c(1,1,1,2,2,2), time=rep(1:3, 2), place=c(1,2,1,1,1,2)) 
and I would like to extract paths of these object - for example object 1 was at place 1, then 2, then back to 1 - and I would like to preserve that in data so that later I can see that it moved from 1 to 2 and then from 2 to 1 
any ideas?]]></description>
			<content:encoded><![CDATA[<p>Now I have a slightly harder task:<br />
what to do if I want to remove only subsequent, immediate duplicates, but if they are divided by something I want to preserve them.<br />
Example: you have a data frame with object id, time and the place where it happened:<br />
df &lt;- data.frame(id=c(1,1,1,2,2,2), time=rep(1:3, 2), place=c(1,2,1,1,1,2))<br />
and I would like to extract paths of these object &#8211; for example object 1 was at place 1, then 2, then back to 1 &#8211; and I would like to preserve that in data so that later I can see that it moved from 1 to 2 and then from 2 to 1<br />
any ideas?</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Andreas Rybicki		</title>
		<link>https://www.datanovia.com/en/lessons/identify-and-remove-duplicate-data-in-r/#comment-7027</link>

		<dc:creator><![CDATA[Andreas Rybicki]]></dc:creator>
		<pubDate>Mon, 20 Jan 2020 07:31:08 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_lessons&#038;p=7235#comment-7027</guid>

					<description><![CDATA[Kassambara,

the lesson &quot;Identify and Remove Duplicate Data in R&quot; was extremely helpful for my task,

Question:
two dataframes like &quot;iris&quot;, say iris for Country A and B,
the dataframes are quite large, up to 1 mio rows and &#062; 10 columns,
I&#039;d like to check, whether a row in B contains the same input in A.
E.g. in &#039;iris&#039;  row 102 == 143;
let&#039;s assume row 102 is in iris country_A and row 143 in iris...._B. How could I identify any duplicates in these two DF&#039;s?
I searched in stackexchange but didn&#039;t find any helpful solution.
Thks]]></description>
			<content:encoded><![CDATA[<p>Kassambara,</p>
<p>the lesson &#8220;Identify and Remove Duplicate Data in R&#8221; was extremely helpful for my task,</p>
<p>Question:<br />
two dataframes like &#8220;iris&#8221;, say iris for Country A and B,<br />
the dataframes are quite large, up to 1 mio rows and &gt; 10 columns,<br />
I&#8217;d like to check, whether a row in B contains the same input in A.<br />
E.g. in &#8216;iris&#8217;  row 102 == 143;<br />
let&#8217;s assume row 102 is in iris country_A and row 143 in iris&#8230;._B. How could I identify any duplicates in these two DF&#8217;s?<br />
I searched in stackexchange but didn&#8217;t find any helpful solution.<br />
Thks</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: kassambara		</title>
		<link>https://www.datanovia.com/en/lessons/identify-and-remove-duplicate-data-in-r/#comment-2137</link>

		<dc:creator><![CDATA[kassambara]]></dc:creator>
		<pubDate>Tue, 23 Jul 2019 16:58:44 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_lessons&#038;p=7235#comment-2137</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://www.datanovia.com/en/lessons/identify-and-remove-duplicate-data-in-r/#comment-2133&quot;&gt;robyn&lt;/a&gt;.

You can use the following R code:

&lt;pre class = &quot;r_code&quot;&gt;
library(dplyr)
Jan_19 %&gt;% distinct(CON, .keep_all = TRUE)
&lt;/pre&gt;]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://www.datanovia.com/en/lessons/identify-and-remove-duplicate-data-in-r/#comment-2133">robyn</a>.</p>
<p>You can use the following R code:</p>
<pre class = "r_code">
library(dplyr)
Jan_19 %>% distinct(CON, .keep_all = TRUE)
</pre>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: robyn		</title>
		<link>https://www.datanovia.com/en/lessons/identify-and-remove-duplicate-data-in-r/#comment-2133</link>

		<dc:creator><![CDATA[robyn]]></dc:creator>
		<pubDate>Fri, 19 Jul 2019 17:16:45 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_lessons&#038;p=7235#comment-2133</guid>

					<description><![CDATA[hi I&#039;m trying to KEEP ONLY duplicate rows base on a column.  I first tested for unique; 
unique(Jan_19)
# A tibble: 178,492 x 22

then the number of duplicates base on my CON column
Jan_19[duplicated(Jan_19$CON), ]
# A tibble: 251 x 22

then tried to drop the rows where CON was not duplicated 
Jan_19 %&#062;% !distinct(CON, .keep_all = TRUE)
Error in distinct(CON, .keep_all = TRUE) : object &#039;CON&#039; not found

any advise? Thanks for the codes, quite useful]]></description>
			<content:encoded><![CDATA[<p>hi I&#8217;m trying to KEEP ONLY duplicate rows base on a column.  I first tested for unique;<br />
unique(Jan_19)<br />
# A tibble: 178,492 x 22</p>
<p>then the number of duplicates base on my CON column<br />
Jan_19[duplicated(Jan_19$CON), ]<br />
# A tibble: 251 x 22</p>
<p>then tried to drop the rows where CON was not duplicated<br />
Jan_19 %&gt;% !distinct(CON, .keep_all = TRUE)<br />
Error in distinct(CON, .keep_all = TRUE) : object &#8216;CON&#8217; not found</p>
<p>any advise? Thanks for the codes, quite useful</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: kassambara		</title>
		<link>https://www.datanovia.com/en/lessons/identify-and-remove-duplicate-data-in-r/#comment-2077</link>

		<dc:creator><![CDATA[kassambara]]></dc:creator>
		<pubDate>Wed, 03 Jul 2019 19:57:57 +0000</pubDate>
		<guid isPermaLink="false">https://www.datanovia.com/en/?post_type=dt_lessons&#038;p=7235#comment-2077</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://www.datanovia.com/en/lessons/identify-and-remove-duplicate-data-in-r/#comment-2076&quot;&gt;stonemonroy&lt;/a&gt;.

Thank you for your positive feedback!]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://www.datanovia.com/en/lessons/identify-and-remove-duplicate-data-in-r/#comment-2076">stonemonroy</a>.</p>
<p>Thank you for your positive feedback!</p>
]]></content:encoded>
		
			</item>
	</channel>
</rss>

<!--
Performance optimized by W3 Total Cache. Learn more: https://www.boldgrid.com/w3-total-cache/

Object Caching 108/177 objects using Memcached
Page Caching using Disk: Enhanced 
Lazy Loading (feed)
Database Caching 36/56 queries in 0.035 seconds using APC

Served from: www.datanovia.com @ 2025-07-22 21:31:20 by W3 Total Cache
-->