Data Manipulation in R

Rename Data Frame Columns in R

In this tutorial, you will learn how to rename the columns of a data frame in R.This can be done easily using the function rename() [dplyr package]. It’s also possible to use R base functions, but they require more typing.

Renaming Columns of a Data Table in R

Contents:

Required packages

Load the tidyverse packages, which include dplyr:

library(tidyverse)

Demo dataset

We’ll use the R built-in iris data set, which we start by converting into a tibble data frame (tbl_df) for easier data analysis.

my_data <- as_tibble(iris)
my_data
## # A tibble: 150 x 5
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
## 1          5.1         3.5          1.4         0.2 setosa 
## 2          4.9         3            1.4         0.2 setosa 
## 3          4.7         3.2          1.3         0.2 setosa 
## 4          4.6         3.1          1.5         0.2 setosa 
## 5          5           3.6          1.4         0.2 setosa 
## 6          5.4         3.9          1.7         0.4 setosa 
## # ... with 144 more rows

Renaming columns with dplyr::rename()

Rename the column Sepal.Length to sepal_length and Sepal.Width to sepal_width:

my_data %>% 
  rename(
    sepal_length = Sepal.Length,
    sepal_width = Sepal.Width
    )
## # A tibble: 150 x 5
##   sepal_length sepal_width Petal.Length Petal.Width Species
##          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
## 1          5.1         3.5          1.4         0.2 setosa 
## 2          4.9         3            1.4         0.2 setosa 
## 3          4.7         3.2          1.3         0.2 setosa 
## 4          4.6         3.1          1.5         0.2 setosa 
## 5          5           3.6          1.4         0.2 setosa 
## 6          5.4         3.9          1.7         0.4 setosa 
## # ... with 144 more rows

Renaming columns with R base functions

To rename the column Sepal.Length to sepal_length, the procedure is as follow:

  1. Get column names using the function names() or colnames()
  2. Change column names where name = Sepal.Length
# get column names
colnames(my_data)
## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" 
## [5] "Species"
# Rename column where names is "Sepal.Length"
names(my_data)[names(my_data) == "Sepal.Length"] <- "sepal_length"
names(my_data)[names(my_data) == "Sepal.Width"] <- "sepal_width"
my_data
## # A tibble: 150 x 5
##   sepal_length sepal_width Petal.Length Petal.Width Species
##          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
## 1          5.1         3.5          1.4         0.2 setosa 
## 2          4.9         3            1.4         0.2 setosa 
## 3          4.7         3.2          1.3         0.2 setosa 
## 4          4.6         3.1          1.5         0.2 setosa 
## 5          5           3.6          1.4         0.2 setosa 
## 6          5.4         3.9          1.7         0.4 setosa 
## # ... with 144 more rows

It’s also possible to rename by index in names vector as follow:

names(my_data)[1] <- "sepal_length"
names(my_data)[2] <- "sepal_width"

Summary

In this chapter, we describe how to rename data frame columns using the function rename()[in dplyr package].

Reorder Data Frame Rows in R (Prev Lesson)
(Next Lesson) Compute and Add new Variables to a Data Frame in R
Back to Data Manipulation in R

Comments ( 13 )

  • Suhani

    what should i do if i want to change setosa to Setosa

    • It’s possible to use the function mutate() as follow:

      library("tidyverse")
      iris.modified < - iris %>%
        mutate(Species = ifelse(Species == "setosa", "Setosa", Species))
      head(iris.modified)
      
      • Norman Munyengwa

        How do i add the letter “V” to row names in R. For example, row name codes are 1023, 1024, 1025 and i want to change it to V1023, V1024, V2025.

        Thank you.

  • You can proceed as follow:

    rownames(mydata) < - paste0("V",  rownames(mydata))
    
  • Anil Kumar

    If I have a quite big data suppose 200+ column?

    • The functions described here still work, even if you have a large number of columns

      • Thomas

        Hi Kassambara,

        You seem to be really on top of how to rename columns and I’m been struggling with writing a code that can rename columns based on their names. I have many different dataset where a number of columns will start with “alt” (e.g. alt1.price, alt1.pol, alt1.x, alt2.price, alt2.pol, alt2.x) and I would like to rename these columns to price_1, pol_1, x_1, price_2, pol_2, x_3.

        Essentially, I would like to select columns starting with alt, add an underscore, delete the ‘alt’ and move the number to the end of the column name. Is that possible in any way?

        Kind regards, Thomas

        • Hi Thomas,

          you need to perform some string manipulations as shown below.

          library(tidyverse)
          library(stringr) 
          
          # Demo data peparation
          iris < - as_tibble(iris)
          colnames(iris) <- c("alt1.price", "alt2.price", "alt2.pol", "alt2.x", "y")
          iris
          
          # Helper function to rename columns containing alt
          rename_column <- function(x){
            library(stringr)
            alt <- x %>% str_extract("^alt[0-9]+\\.")
            if(is.na(alt)){
              # stop here and return x, because it doesn't start with "alt"
              return(x)
            }
            suffix < - x %>% str_replace(pattern = alt, replacement = "")
            number < - alt %>% str_replace_all(pattern = "alt|\\.", "")
            new.name < - paste(suffix, number, sep = "_")
            return(new.name)
          }
          
          # Renaming columns
          columns <- colnames(iris)
          colnames(iris) <- columns %>% map(rename_column)
          iris
          
          • Thomas

            Kassambara – you are a hero. Thanks a million for your extremely detailed answer. I was hoping for some hints and get a full code – much appreciated.
            /T

          • Moses

            You are goooood!

  • Felix Kennith Chan

    If I have a large data set with 200+ columns?
    is there a way where I don’t do each column manually one by one? could you possibly create a forloop or something to do it? if you can how would that work and what would it look like?

    Thanks

    • You can also go as follow:

      colnames(my_data) = c("newname1", "newname2", "newname3")
      
      • Felix Kennith Chan

        Is there a way where I don’t do c(“newname1”, “newname2”, “newname3”, … , “newname200”)?

Post a Reply

Teacher
Alboukadel Kassambara
Role : Founder of Datanovia
Read More