How to Easily Manipulate Files and Directories in R


Warning: Use of undefined constant access_s2member_level2 - assumed 'access_s2member_level2' (this will throw an Error in a future version of PHP) in /home/www/datanovia/en/wp-content/themes/lms-child/framework/loops/content-single.php on line 56


How to Easily Manipulate Files and Directories in R

This article presents the fs R package, which provides a cross-platform, uniform interface to file system operations.

fs functions are divided into four main categories:

  • path_ for manipulating and constructing paths
  • file_ for files
  • dir_ for directories
  • link_ for links

Contents:

Prerequistes

Install the package from CRAN (install.packages("fs")) or from GitHub (devtools::install_github("r-lib/fs"))

Load required packages:

library("fs")  # File manipulations
library(tidyverse)  # Data manipulation

Some Key R functions

File manipulation:

  • file_copy(), dir_copy(), link_copy(): Copy files, directories or links
  • file_create(), dir_create(), link_create(): Create files, directories, or links
  • file_delete(), dir_delete(), link_delete(): Delete files, directories, or links
  • file_access(), file_exists(), dir_exists(), link_exists(): Query for existence and access permissions
  • file_chmod(): Change file permissions
  • file_chown(): Change owner or group of a file
  • file_info(): Query file metadata
  • file_move(): Move or rename files

Path manipulation:

  • path(), path_wd(): Construct path to a file or directory
  • file_temp(), path_temp(): Create names for temporary files
  • path_expand(), path_expand_r(), path_home(), path_home_r(): Finding the User Home Directory
  • path_file() path_dir() path_ext() path_ext_remove() path_ext_set(): Manipulate file paths
    • path_file() returns the filename portion of the path,
    • path_dir() returns the directory portion,
    • path_ext() returns the last extension (if any) for a path,
    • path_ext_remove() removes the last extension and returns the rest of the path,
    • path_ext_set() replaces the extension with a new extension. If there is no existing extension the new extension is appended.
  • path_filter(): Filter paths
  • path_real() path_split() path_join() path_abs() path_norm() path_rel() path_common() path_has_parent(): Path computations
    • path_real: returns the canonical path
    • path_split: splits paths into parts
    • path_abs: returns a normalized, absolute version of a path
    • path_norm: eliminates . references and rationalizes up-level .. references, so A/./B and A/foo/../B both become A/B, but ../B is not changed. If one of the paths is a symbolic link, this may change the meaning of the path, so consider using path_real() instead.
    • path_common: finds the common parts of two (or more) paths.
    • path_has_parent: determine if a path has a given parent.

Helpers:

  • is_file(), is_dir(), is_link(): Functions to test for file types

Basic usage

  • List the files in a directory/folder
  • Create and delete files/directory
# Construct a path to a file with `path()`
path("foo", "bar", letters[1:3], ext = "txt")
## foo/bar/a.txt foo/bar/b.txt foo/bar/c.txt
# list files in the current directory
dir_ls()
## 002-create-icon.html
## 003-r-histogram-example.html
## _output.yaml
## _settings.R
## _settings.Rmd
## book.bib
## correlation-matrix-analysis-in-r-using-corrr.html
## correlation-network-using-corrr.html
## figures
## file-and-directory-manipulation.Rmd
## gganimate.html
## gghighlight.html
## include
## interactive-data-summary.html
## libs
## mathjax.Rmd
## packages.bib
## plot-all-variables-in-a-dataset.html
## plot-one-variable-against-multiples-others.html
## wp-content
# create a new directory
tmp <- dir_create(file_temp())
tmp
## /var/folders/xm/8p6yj4bj6s57n4v_51714lwm0000gp/T/Rtmp6lCt2d/filed958126c105c
# create new files in that directory
file_create(path(tmp, "my-file.txt"))
dir_ls(tmp)
## /var/folders/xm/8p6yj4bj6s57n4v_51714lwm0000gp/T/Rtmp6lCt2d/filed958126c105c/my-file.txt
# remove files from the directory
file_delete(path(tmp, "my-file.txt"))
dir_ls(tmp)
## character(0)
# remove the directory
dir_delete(tmp)

Filter files

Filter files by type, permission and size

dir_info(path = ".", recursive = FALSE) %>%
  filter(type == "file", permissions == "u+r", size > "10KB") %>%
  arrange(desc(size)) %>%
  select(path, permissions, size, modification_time)
## # A tibble: 2 x 4
##   path                                              permissions  size
##   <fs::path>                                        <fs::perms> <fs:>
## 1 correlation-matrix-analysis-in-r-using-corrr.html rw-r--r--   20.3K
## 2 gganimate.html                                    rw-r--r--   15.1K
## # … with 1 more variable: modification_time <dttm>

Tabulate and display folder size.

dir_info(path = ".", recursive = TRUE) %>%
  group_by(directory = path_dir(path)) %>%
  tally(wt = size, sort = TRUE)
## # A tibble: 37 x 2
##   directory                                                    n
##   <fs::path>                                         <fs::bytes>
## 1 https://www.datanovia.com/en/wp-content/uploads/dn-tutorials/r-tutorial/images       11.76M
## 2 https://www.datanovia.com/en/wp-content/uploads/dn-tutorials/r-tutorial/figures       2.36M
## 3 libs/bootstrap-3.3.5/css                                 2.31M
## 4 libs/plotlyjs-1.16.3                                     1.66M
## 5 libs/bootstrap-3.3.5/css/fonts                         953.23K
## 6 libs/font-awesome-4.1.0/fonts                          611.77K
## # … with 31 more rows

Read a collection of files into one data frame

dir_ls() returns a named vector, so it can be used directly with purrr::map_df(.id).

# Create separate files for each species
iris %>%
  split(.$Species) %>%
  map(select, -Species) %>%
  iwalk(~ write_tsv(.x, paste0(.y, ".tsv")))
  
# Show the files
iris_files <- dir_ls(glob = "*.tsv")
iris_files
## setosa.tsv     versicolor.tsv virginica.tsv
# Read the data into a single table, including the filenames
iris_files %>%
  map_df(read_tsv, .id = "file", col_types = cols(), n_max = 2)
## # A tibble: 6 x 5
##   file           Sepal.Length Sepal.Width Petal.Length Petal.Width
##   <chr>                 <dbl>       <dbl>        <dbl>       <dbl>
## 1 setosa.tsv              5.1         3.5          1.4         0.2
## 2 setosa.tsv              4.9         3            1.4         0.2
## 3 versicolor.tsv          7           3.2          4.7         1.4
## 4 versicolor.tsv          6.4         3.2          4.5         1.5
## 5 virginica.tsv           6.3         3.3          6           2.5
## 6 virginica.tsv           5.8         2.7          5.1         1.9
file_delete(iris_files)


Warning: Use of undefined constant access_s2member_level2 - assumed 'access_s2member_level2' (this will throw an Error in a future version of PHP) in /home/www/datanovia/en/wp-content/themes/lms-child/framework/loops/content-single.php on line 118




No Comments

Post a Reply