Exploring Clinical Trial Data with clintrialx
Indraneel Chakraborty
2024-09-03
usecase-doc.Rmd
Introduction
Welcome to the clintrialx
vignette! This package
simplifies the process of fetching and analyzing clinical trial data. In
this guide, we’ll demonstrate how to use clintrialx
alongside popular R packages to examine and visualize clinical trial
data specifically for cancer studies in India. 🚀
Setup
To start, load the necessary libraries. We use
suppressPackageStartupMessages
to keep the output
clean:
Fetching Data
Retrieve clinical trial data related to cancer studies in India using
the ctg_bulk_fetch
function:
# Fetch cancer study data in India
df <- ctg_bulk_fetch(condition = "cancer", location = "India")
Visualizing Study Status Distribution
Understand the distribution of study statuses by creating a bar plot:
# Create a table of study statuses
status_counts <- table(df$`Study Status`)
# Convert the table to a data frame
status_df <- data.frame(status = names(status_counts), count = as.numeric(status_counts))
# Generate the bar plot
ggplotly(ggplot(status_df, aes(x = reorder(status, -count), y = count)) +
geom_bar(stat = "identity", fill = "orange") +
theme_minimal() +
labs(title = "Distribution of Study Statuses",
x = "Study Status",
y = "Count") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
geom_text(aes(label = count), vjust = -0.5))
This plot provides an overview of the number of studies in each status category. 📉
Analyzing Enrollment by Study Phase
Compare enrollment numbers across different study phases using an interactive box plot:
# Create an interactive box plot of enrollment by study phase
ggplotly(ggplot(df, aes(x = Phases, y = Enrollment)) +
geom_boxplot(fill = "lightblue", outlier.colour = "red", outlier.shape = 1) +
geom_jitter(color = "darkblue", size = 0.5, alpha = 0.5, width = 0.2) +
theme_minimal(base_size = 14) +
labs(title = "Enrollment by Study Phase",
x = "Study Phase",
y = "Enrollment") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 12),
plot.title = element_text(hjust = 0.5)))
This interactive plot allows you to explore enrollment numbers across different phases and identify trends. 🔍
Visualizing Study Duration Timeline
Examine the timeline of studies with a scatter plot:
# Convert date strings to Date objects
df$start_date <- as.Date(df$`Start Date`, format = "%Y-%m-%d")
df$completion_date <- as.Date(df$`Completion Date`, format = "%Y-%m-%d")
# Create a scatter plot with a horizontal line at 2024
ggplot(df, aes(x = start_date, y = completion_date, color = `Study Status`)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = as.Date("2024-01-01"), linetype = "dashed", color = "blue") +
theme_minimal() +
labs(title = "Study Duration Timeline",
x = "Start Date",
y = "Completion Date") +
scale_color_brewer(palette = "Set1")
This scatter plot helps visualize study durations and their statuses, providing insights into timelines. ⏳
Analyzing Funding Sources and Study Types
Examine the relationship between funding sources and study types using a stacked bar plot:
# Summarize and plot funding sources by study type
df_summary <- df %>%
count(`Funder Type`, `Study Type`) %>%
group_by(`Funder Type`) %>%
mutate(prop = n / sum(n))
ggplotly(ggplot(df_summary, aes(x = `Funder Type`, y = prop, fill = `Study Type`)) +
geom_bar(stat = "identity", position = "dodge") +
theme_minimal() +
labs(title = "Funding Sources and Study Types",
x = "Funder Type",
y = "Proportion") +
scale_fill_brewer(palette = "Set2") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)))
This plot uncovers patterns in how different funders support various study types. 💡
Conclusion
Using the clintrialx
package along with visualization
tools like ggplot2
and plotly
, you can extract
valuable insights from clinical trial data. This vignette has
illustrated techniques for analyzing cancer clinical trials in India,
and these methods are adaptable to other datasets fetched with
clintrialx
. Happy analyzing! 😊