R Programming for Data Analysis and Pharmaceutical Research Tutorial

Selecting and Filtering Data

❮ Previous Next ❯

Selecting and filtering data are two of the most common tasks performed during data analysis. Before analyzing or visualizing data, it is often necessary to focus only on the relevant columns and rows. The dplyr package provides simple and readable functions to perform these operations efficiently.

Selecting data refers to choosing specific columns from a dataset. This is useful when a dataset contains many variables, but only a few of them are needed for analysis. Filtering data refers to extracting only those rows that meet certain conditions, such as values greater than a threshold, matching categories, or falling within a range.

In dplyr, the select() function is used to choose columns, and the filter() function is used to select rows based on conditions. Both functions are designed to work directly with data frames and produce clear, easy-to-read code.

To begin, the dplyr package must be loaded into the R session.

library(dplyr)

Suppose we have a dataset called employees that contains the columns name, age, department, and salary.

Selecting specific columns can be done using the select() function. For example, if only the name and salary columns are needed:

employees %>%
  select(name, salary)

Filtering rows is done using the filter() function. For example, to see employees older than 30 years:

employees %>%
  filter(age > 30)

Multiple conditions can also be applied. For example, employees older than 30 and working in Sales:

employees %>%
  filter(age > 30, department == "Sales")

Selecting and filtering can also be combined:

employees %>%
  filter(age > 30) %>%
  select(name, salary)

These operations are essential in data manipulation because they allow analysts to focus only on the relevant parts of the dataset. They are usually the first step in preparing data for deeper analysis, visualization, or modeling.

❮ Previous Next ❯

Welcome Back

Create Account

R Programming for Data Analysis and Pharmaceutical Research Tutorial

R Programming for Data Analysis and Pharmaceutical Research Tutorial

Module 1: Introduction to R and RStudio

Module 2: R Programming Fundamentals

Module 3: Data Structures in R

Module 4: Data Import, Cleaning, and Preprocessing

Module 5: Data Manipulation with dplyr

Module 6: Data Visualization with ggplot2

Module 7: Statistical Analysis in R

Module 8: Working with Real-World Datasets

Module 9: Clinical and Pharmaceutical Data Analysis

Module 10: Reporting and Automation in R

Module 11: Advanced R Concepts and Packages

Module 12: Projects, Interview Preparation, and Job Readiness

Selecting and Filtering Data

Join our community on Telegram!

Join the biggest community of Pharma students and professionals.