Generating Statistical Summaries
Join our community on Telegram!
Join the biggest community of Pharma students and professionals.
Generating statistical summaries is an essential step in data analysis. Statistical summaries provide a quick overview of the main characteristics of a dataset without examining every individual data point. These summaries help analysts understand the distribution, central tendency, and variability of the data.
In clinical and pharmaceutical data analysis, statistical summaries are commonly used to describe patient characteristics, treatment outcomes, laboratory values, and other important variables.
# Example dataset
data <- data.frame(
patient_id = 1:6,
age = c(45, 52, 37, 60, 49, 55),
weight = c(70, 65, 80, 68, 75, 72),
treatment = c("Drug", "Drug", "Placebo", "Drug", "Placebo", "Drug")
)
The summary() function provides basic statistical information for each variable in the dataset.
summary(data)
Common statistical measures are shown in the table below.
| Measure | Description | R Function |
|---|---|---|
| Mean | Average value | mean(data$age) |
| Median | Middle value | median(data$age) |
| Minimum | Smallest value | min(data$age) |
| Maximum | Largest value | max(data$age) |
| Standard Deviation | Measure of data spread | sd(data$age) |
Group-wise statistical summaries are often required in clinical studies to compare treatment groups.
library(dplyr)
data %>%
group_by(treatment) %>%
summarise(
average_age = mean(age),
median_age = median(age),
sd_age = sd(age),
patient_count = n()
)
Statistical summaries provide a concise description of the dataset and help analysts quickly understand key characteristics before performing more advanced statistical analysis.
