Data Reshaping with tidyr
Join our community on Telegram!
Join the biggest community of Pharma students and professionals.
Data reshaping is the process of transforming the structure of a dataset to make it more suitable for analysis. In many cases, raw data is not in a tidy format, which makes it difficult to analyze or visualize. The tidyr package in R provides simple and consistent functions to reshape data into a tidy structure.
A tidy dataset follows three main principles: each variable forms a column, each observation forms a row, and each type of observational unit forms a table. The tidyr package helps convert data into this format.
To use tidyr, the package must first be installed and loaded.
install.packages("tidyr")
library(tidyr)
One common reshaping task is converting data from wide format to long format. In wide format, multiple variables are stored in separate columns. In long format, these variables are stored in a single column with corresponding values.
# Example wide dataset
data_wide <- data.frame(
id = 1:3,
math = c(80, 90, 85),
science = c(75, 88, 82)
)
The pivot_longer() function is used to convert wide data into long format.
data_long <- data_wide %>%
pivot_longer(
cols = c(math, science),
names_to = "subject",
values_to = "score"
)
Another common task is converting long format data back to wide format.
data_wide_again <- data_long %>%
pivot_wider(
names_from = subject,
values_from = score
)
The table below shows the difference between wide and long data formats.
| Format | Description | Example Structure |
|---|---|---|
| Wide Format | Multiple variables stored in separate columns | math, science as separate columns |
| Long Format | Variables stored in a single column with values | subject column with math and science values |
Data reshaping is an important step in data cleaning and preparation. The tidyr package makes it easy to transform datasets into tidy formats, which improves compatibility with analysis and visualization tools in R.
