{"id":7224,"date":"2018-10-01T19:48:25","date_gmt":"2018-10-01T19:48:25","guid":{"rendered":"https:\/\/www.datanovia.com\/en\/?post_type=dt_courses&#038;p=7224"},"modified":"2020-01-13T22:29:51","modified_gmt":"2020-01-13T20:29:51","slug":"data-manipulation-in-r","status":"publish","type":"dt_courses","link":"https:\/\/www.datanovia.com\/en\/courses\/data-manipulation-in-r\/","title":{"rendered":"Data Manipulation in R"},"content":{"rendered":"<div id=\"rdoc\">\n<div id=\"course-description\" class=\"section level2\">\n<h2>Course description<\/h2>\n<p>In this course, you will learn how to easily perform <strong>data manipulation<\/strong> using <strong>R software<\/strong>. We\u2019ll cover the following data manipulation techniques:<\/p>\n<ul>\n<li>filtering and ordering rows,<\/li>\n<li>renaming and adding columns,<\/li>\n<li>computing summary statistics<\/li>\n<\/ul>\n<p>We\u2019ll use mainly the popular <strong>dplyr<\/strong> R package, which contains important R functions to carry out easily your data manipulation. In the final section, we\u2019ll show you how to group your data by a grouping variable, and then compute some summary statitistics on each subset. You will also learn how to chain your data manipulation operations.<\/p>\n<p>At the end of this course, you will be familiar with data manipulation tools and approaches that will allow you to efficiently manipulate data.<\/p>\n<\/div>\n<div id=\"required-r-packages\" class=\"section level2\">\n<h2>Required R packages<\/h2>\n<p>We recommend to install the <code>tidyverse<\/code> packages, which include the <code>dplyr<\/code> package (for data manipulation) and additional R packages for easily reading (<code>readr<\/code>), transforming (<code>tidyr<\/code>) and visualizing (<code>ggplot2<\/code>) datasets.<\/p>\n<ul>\n<li>Install:<\/li>\n<\/ul>\n<pre class=\"r\"><code>install.packages(\"tidyverse\")<\/code><\/pre>\n<ul>\n<li>Load the <code>tidyverse<\/code> packages, which also include the <code>dplyr<\/code> package:<\/li>\n<\/ul>\n<pre class=\"r\"><code>library(\"tidyverse\")<\/code><\/pre>\n<\/div>\n<div id=\"demo-datasets\" class=\"section level2\">\n<h2>Demo datasets<\/h2>\n<p>We\u2019ll use mainly the R built-in <code>iris<\/code> data set, which we start by converting into a tibble data frame (<code>tbl_df<\/code>) for easier data analysis. <code>tbl_df<\/code> data object is a data frame providing a nicer printing method, useful when working with large data sets.<\/p>\n<pre class=\"r\"><code>library(\"tidyverse\")\r\nmy_data &lt;- as_tibble(iris)\r\nmy_data<\/code><\/pre>\n<pre><code>## # A tibble: 150 x 5\r\n##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species\r\n##          &lt;dbl&gt;       &lt;dbl&gt;        &lt;dbl&gt;       &lt;dbl&gt; &lt;fct&gt;  \r\n## 1          5.1         3.5          1.4         0.2 setosa \r\n## 2          4.9         3            1.4         0.2 setosa \r\n## 3          4.7         3.2          1.3         0.2 setosa \r\n## 4          4.6         3.1          1.5         0.2 setosa \r\n## 5          5           3.6          1.4         0.2 setosa \r\n## 6          5.4         3.9          1.7         0.4 setosa \r\n## # ... with 144 more rows<\/code><\/pre>\n<div class=\"success\">\n<p>Note that, the type of data in each column is specified. Common types include:<\/p>\n<ul>\n<li><em>int<\/em>: integers<\/li>\n<li><em>dbl<\/em>: double (real numbers),<\/li>\n<li><em>chr<\/em>: character vectors, strings, texts<\/li>\n<li><em>fctr<\/em>: factor,<\/li>\n<li><em>dttm<\/em>: date-times (date + time)<\/li>\n<li><em>lgl<\/em>: logical (TRUE or FALSE)<\/li>\n<li><em>date<\/em>: dates<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<div id=\"main-data-manipulation-functions\" class=\"section level2\">\n<h2>Main data manipulation functions<\/h2>\n<p>There are 8 fundamental data manipulation verbs that you will use to do most of your data manipulations. These functions are included in the <code>dplyr<\/code> package:<\/p>\n<ul>\n<li><code>filter()<\/code>: Pick rows (observations\/samples) based on their values.<\/li>\n<li><code>distinct()<\/code>: Remove duplicate rows.<\/li>\n<li><code>arrange()<\/code>: Reorder the rows.<\/li>\n<li><code>select()<\/code>: Select columns (variables) by their names.<\/li>\n<li><code>rename()<\/code>: Rename columns.<\/li>\n<li><code>mutate()<\/code> and <code>transmutate()<\/code>: Add\/create new variables.<\/li>\n<li><code>summarise()<\/code>: Compute statistical summaries (e.g., computing the mean or the sum)<\/li>\n<\/ul>\n<div class=\"success\">\n<p>It\u2019s also possible to combine each of these verbs with the function <strong>group_by<\/strong>() to operate on subsets of the data set (<strong>group-by-group<\/strong>).<\/p>\n<\/div>\n<p>All these functions work similarly as follow:<\/p>\n<ul>\n<li>The first argument is a data frame<\/li>\n<li>The subsequent arguments are comma separated list of unquoted variable names and the specification of what you want to do<\/li>\n<li>The result is a new data frame<\/li>\n<\/ul>\n<p>You will learn how to use these functions, as well as, how to chain your data manipulation operations using the pipe operator (<code>%&gt;%<\/code>).<\/p>\n<div class=\"success\">\n<p>Note that, dplyr package allows to use the forward-pipe chaining operator (%&gt;%) for combining multiple operations. For example, x %&gt;% f is equivalent to f(x). Using the pipe (%&gt;%), the output of each operation is passed to the next operation. This makes R programming easy.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<p><!--end rdoc--><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this course, you will learn important R functions and techniques for manipulating easily your data. These include: 1) filtering and ordering rows; 2) renaming and adding columns and 3) computing summary statistics<\/p>\n","protected":false},"author":1,"featured_media":7948,"menu_order":5,"comment_status":"open","ping_status":"closed","template":"","class_list":["post-7224","dt_courses","type-dt_courses","status-publish","has-post-thumbnail","hentry","course_category-data-manipulation"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Data Manipulation in R - Datanovia<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.datanovia.com\/en\/courses\/data-manipulation-in-r\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Data Manipulation in R - Datanovia\" \/>\n<meta property=\"og:description\" content=\"In this course, you will learn important R functions and techniques for manipulating easily your data. These include: 1) filtering and ordering rows; 2) renaming and adding columns and 3) computing summary statistics\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.datanovia.com\/en\/courses\/data-manipulation-in-r\/\" \/>\n<meta property=\"og:site_name\" content=\"Datanovia\" \/>\n<meta property=\"article:modified_time\" content=\"2020-01-13T20:29:51+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/2013-12-27_15.25.18.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"512\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.datanovia.com\/en\/courses\/data-manipulation-in-r\/\",\"url\":\"https:\/\/www.datanovia.com\/en\/courses\/data-manipulation-in-r\/\",\"name\":\"Data Manipulation in R - Datanovia\",\"isPartOf\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/courses\/data-manipulation-in-r\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/courses\/data-manipulation-in-r\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/2013-12-27_15.25.18.jpg\",\"datePublished\":\"2018-10-01T19:48:25+00:00\",\"dateModified\":\"2020-01-13T20:29:51+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/courses\/data-manipulation-in-r\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.datanovia.com\/en\/courses\/data-manipulation-in-r\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.datanovia.com\/en\/courses\/data-manipulation-in-r\/#primaryimage\",\"url\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/2013-12-27_15.25.18.jpg\",\"contentUrl\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/2013-12-27_15.25.18.jpg\",\"width\":1024,\"height\":512},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.datanovia.com\/en\/courses\/data-manipulation-in-r\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.datanovia.com\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Courses\",\"item\":\"https:\/\/www.datanovia.com\/en\/courses\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Data Manipulation in R\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.datanovia.com\/en\/#website\",\"url\":\"https:\/\/www.datanovia.com\/en\/\",\"name\":\"Datanovia\",\"description\":\"Data Mining and Statistics for Decision Support\",\"publisher\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.datanovia.com\/en\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.datanovia.com\/en\/#organization\",\"name\":\"Datanovia\",\"url\":\"https:\/\/www.datanovia.com\/en\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png\",\"contentUrl\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png\",\"width\":98,\"height\":99,\"caption\":\"Datanovia\"},\"image\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Data Manipulation in R - Datanovia","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.datanovia.com\/en\/courses\/data-manipulation-in-r\/","og_locale":"en_US","og_type":"article","og_title":"Data Manipulation in R - Datanovia","og_description":"In this course, you will learn important R functions and techniques for manipulating easily your data. These include: 1) filtering and ordering rows; 2) renaming and adding columns and 3) computing summary statistics","og_url":"https:\/\/www.datanovia.com\/en\/courses\/data-manipulation-in-r\/","og_site_name":"Datanovia","article_modified_time":"2020-01-13T20:29:51+00:00","og_image":[{"width":1024,"height":512,"url":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/2013-12-27_15.25.18.jpg","type":"image\/jpeg"}],"twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.datanovia.com\/en\/courses\/data-manipulation-in-r\/","url":"https:\/\/www.datanovia.com\/en\/courses\/data-manipulation-in-r\/","name":"Data Manipulation in R - Datanovia","isPartOf":{"@id":"https:\/\/www.datanovia.com\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.datanovia.com\/en\/courses\/data-manipulation-in-r\/#primaryimage"},"image":{"@id":"https:\/\/www.datanovia.com\/en\/courses\/data-manipulation-in-r\/#primaryimage"},"thumbnailUrl":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/2013-12-27_15.25.18.jpg","datePublished":"2018-10-01T19:48:25+00:00","dateModified":"2020-01-13T20:29:51+00:00","breadcrumb":{"@id":"https:\/\/www.datanovia.com\/en\/courses\/data-manipulation-in-r\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.datanovia.com\/en\/courses\/data-manipulation-in-r\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.datanovia.com\/en\/courses\/data-manipulation-in-r\/#primaryimage","url":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/2013-12-27_15.25.18.jpg","contentUrl":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/10\/2013-12-27_15.25.18.jpg","width":1024,"height":512},{"@type":"BreadcrumbList","@id":"https:\/\/www.datanovia.com\/en\/courses\/data-manipulation-in-r\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.datanovia.com\/en\/"},{"@type":"ListItem","position":2,"name":"Courses","item":"https:\/\/www.datanovia.com\/en\/courses\/"},{"@type":"ListItem","position":3,"name":"Data Manipulation in R"}]},{"@type":"WebSite","@id":"https:\/\/www.datanovia.com\/en\/#website","url":"https:\/\/www.datanovia.com\/en\/","name":"Datanovia","description":"Data Mining and Statistics for Decision Support","publisher":{"@id":"https:\/\/www.datanovia.com\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.datanovia.com\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.datanovia.com\/en\/#organization","name":"Datanovia","url":"https:\/\/www.datanovia.com\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/","url":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png","contentUrl":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png","width":98,"height":99,"caption":"Datanovia"},"image":{"@id":"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/"}}]}},"multi-rating":{"mr_rating_results":[{"adjusted_star_result":4.33,"star_result":4.33,"total_max_option_value":5,"adjusted_score_result":4.33,"score_result":4.33,"percentage_result":86.67,"adjusted_percentage_result":86.67,"count":6,"post_id":7224}]},"_links":{"self":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/dt_courses\/7224","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/dt_courses"}],"about":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/types\/dt_courses"}],"author":[{"embeddable":true,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/comments?post=7224"}],"version-history":[{"count":0,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/dt_courses\/7224\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/media\/7948"}],"wp:attachment":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/media?parent=7224"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}