Data Frames and Tibbles
Join our community on Telegram!
Join the biggest community of Pharma students and professionals.
In R, data frames and tibbles are used to store tabular data. They are among the most important data structures for data analysis because they organize data in rows and columns, similar to a spreadsheet or database table. Each column usually represents a variable, and each row represents an observation.
A data frame is the traditional tabular structure in R. It can store different types of data in different columns, such as numeric, character, or logical values. For example, a data frame for students might contain columns for name, age, marks, and pass status. Data frames are created using the data.frame() function. Each column must have the same number of rows.
A tibble is a modern version of a data frame provided by the tidyverse packages, especially the tibble and dplyr libraries. Tibbles are designed to be more user-friendly and consistent than traditional data frames. They display data more neatly in the console and avoid some common issues found in data frames, such as automatic conversion of characters into factors.
Below is a table showing the main differences between data frames and tibbles:
| Feature | Data Frame | Tibble |
|---|---|---|
| Package | Base R | Tidyverse (tibble package) |
| Creation Function | data.frame() |
tibble() |
| Default Behavior | Converts characters to factors (in older R versions) | Keeps characters as characters |
| Display | Shows full data by default | Shows only first few rows and columns |
| Column Name Handling | May modify invalid names | Keeps names as they are |
| Accessing Columns | $ or [ ] |
$ or [ ] (more consistent behavior) |
For example, you can create a data frame using
df <- data.frame(name = c("Amit","Ravi"), age = c(20,22))
A tibble can be created using
tib <- tibble(name = c("Amit","Ravi"), age = c(20,22))
Data frames are widely used in base R, while tibbles are commonly used in modern data analysis workflows with tidyverse packages. Both structures are essential for storing and analyzing structured data in R. Understanding how to work with data frames and tibbles is important for performing data manipulation, visualization, and statistical analysis.
