Welcome Back

Google icon Sign in with Google
OR
I agree to abide by Pharmadaily Terms of Service and its Privacy Policy

Create Account

Google icon Sign up with Google
OR
By signing up, you agree to our Terms of Service and Privacy Policy
Instagram
youtube
Facebook

Data Reshaping with tidyr

Data reshaping is the process of transforming the structure of a dataset to make it more suitable for analysis. In many cases, raw data is not in a tidy format, which makes it difficult to analyze or visualize. The tidyr package in R provides simple and consistent functions to reshape data into a tidy structure.

A tidy dataset follows three main principles: each variable forms a column, each observation forms a row, and each type of observational unit forms a table. The tidyr package helps convert data into this format.

To use tidyr, the package must first be installed and loaded.

install.packages("tidyr")
library(tidyr)

One common reshaping task is converting data from wide format to long format. In wide format, multiple variables are stored in separate columns. In long format, these variables are stored in a single column with corresponding values.

# Example wide dataset
data_wide <- data.frame(
  id = 1:3,
  math = c(80, 90, 85),
  science = c(75, 88, 82)
)

The pivot_longer() function is used to convert wide data into long format.

data_long <- data_wide %>%
  pivot_longer(
    cols = c(math, science),
    names_to = "subject",
    values_to = "score"
  )

Another common task is converting long format data back to wide format.

data_wide_again <- data_long %>%
  pivot_wider(
    names_from = subject,
    values_from = score
  )

The table below shows the difference between wide and long data formats.

Format Description Example Structure
Wide Format Multiple variables stored in separate columns math, science as separate columns
Long Format Variables stored in a single column with values subject column with math and science values

Data reshaping is an important step in data cleaning and preparation. The tidyr package makes it easy to transform datasets into tidy formats, which improves compatibility with analysis and visualization tools in R.