| Title: | Datasets and Functions for the Class "Modelling and Data Analysis for Pharmaceutical Sciences" |
|---|---|
| Description: | Provides datasets and functions for the class "Modelling and Data Analysis for Pharmaceutical Sciences". The datasets can be used to present various methods of data analysis and statistical modeling. Functions for data visualization are also implemented. |
| Authors: | Lionel Voirol [aut, cre], Stéphane Guerrier [aut], Yuming Zhang [aut], Luca Insolia [aut] |
| Maintainer: | Lionel Voirol <[email protected]> |
| License: | AGPL-3 |
| Version: | 0.0.6 |
| Built: | 2026-05-11 21:18:25 UTC |
| Source: | https://github.com/cran/idarps |
boxplot_w_points
boxplot_w_points( ..., col_points = "#9033FF3F", col_boxplot = "#d2d2d2", horizontal = FALSE, main = "", names = NULL, las = 0, xlab = "", ylab = "", seed = 123, jitter_param = 0.25 )boxplot_w_points( ..., col_points = "#9033FF3F", col_boxplot = "#d2d2d2", horizontal = FALSE, main = "", names = NULL, las = 0, xlab = "", ylab = "", seed = 123, jitter_param = 0.25 )
... |
data vectors to be visualized. |
col_points |
color of the points to be added to the boxplot. |
col_boxplot |
color of the boxplot. |
horizontal |
logical indicating if the boxplots should be horizontal; default FALSE means vertical boxes. |
main |
string indicating the title of the plot. |
names |
vector of string indicating the group labels which will be printed under each boxplot. |
las |
a numeric value indicating the orientation of the tick mark labels and any other text added to a plot after its initialization. The options are as follows: always parallel to the axis (the default, 0), always horizontal (1), always perpendicular to the axis (2), and always vertical (3). |
xlab |
a string indicating the x label. |
ylab |
a string indicating the y label. |
seed |
an integer specifying a seed for the random jitter of the boxplot points. |
jitter_param |
a double specifying the amount of jittering applied on points. |
No return value. Plot a boxplot.
x <- rnorm(20, mean = 5) y <- rnorm(20, mean = 10) z <- rnorm(20, mean = 15) boxplot_w_points(x, main = "test") boxplot_w_points(x, y, names = c("x", "y"), las = 1, main = "Data") boxplot_w_points(x, y, z, names = c("x", "y", "z"), horizontal = TRUE, las = 1, main = "Data") boxplot_w_points(x, y, z, names = c("x", "y", "z"), horizontal = FALSE, las = 1, main = "Data")x <- rnorm(20, mean = 5) y <- rnorm(20, mean = 10) z <- rnorm(20, mean = 15) boxplot_w_points(x, main = "test") boxplot_w_points(x, y, names = c("x", "y"), las = 1, main = "Data") boxplot_w_points(x, y, z, names = c("x", "y", "z"), horizontal = TRUE, las = 1, main = "Data") boxplot_w_points(x, y, z, names = c("x", "y", "z"), horizontal = FALSE, las = 1, main = "Data")
This dataset consists of several clinical features observed or measured for 116 participants in a study of breast cancer.
BreastCancerBreastCancer
Age in years
Body mass index in kg/
Glucose in mg/dL
Insulin in U/mL
Homeostasis model assessment
Presence of breast cancer (0 if no cancer, 1 if with cancer)
https://link.springer.com/article/10.1186/s12885-017-3877-1
Patricio, Miguel, et al. "Using Resistin, glucose, age and BMI to predict the presence of breast cancer", BMC Cancer, (2018).
Data collected in a study to assess the effects of smoking and pollution on being diagnosed with bronchitis. This dataset is based on 212 subjects.
bronchitisbronchitis
Presence of bronchitis (0 for no and 1 for yes)
Average daily number of smoked cigarettes
Pollution index
This dataset consists of variables that are potentially related to blood pressure measurements and contains one group of patients aged between 52 and 89 years old who live in urban areas, and another group of 50 centenarian women aged between 101-121 who live in the island of Okinawa, which is known for its high number of centenarians.The dataset lists the following variables:
centenariancentenarian
Age in years
Chin skinfold in cm
Forearm skinfold in cm
Calf skinfold in cm
Resting pulse rate
The Body Mass Index (BMI) of the participant
A dummy variable indicating if the participant is Centenarian
Systolic blood pressure
This dataset is based on an observational study conducted at Geneva University Hospitals to assess the impact of weight on the pharmacokinetics of dexamethasone in normal-weight versus obese patients hospitalized for COVID-19.
codexcodex
ID of the patient
Gender (0 for men and 1 for women)
Age
Body mass index
Weight in kg
Number of doses of the dexamethasone (DEX) drug
The time it takes for the drug to reach the maximum concentration (i.e. Cmax) after its administration in hours (h)
The maximum concentration that achieves in the blood after the drug has been administered (ng/m)
t1_2 is the time required to decrease the drug concentration within the body by one-half during elimination in hours (h)
The integral (from 0 to 8 hours) of a curve that describes the variation of a drug concentration in the blood as a function of time it takes for a drug to reach the maximum concentration (Cmax) after administration of a drug (ng.h/m)
Number of days the patient were hospitalized
Number of days the patient were hospitalized at the intermediate and intensive care unit
crp
Presence of cormobidity type e
Presence of cormobidity type p
Presence of cormobidity type v
Presence of cormobidity type c
Presence of cormobidity type r
Indicator variable based on whether the subject is obese (i.e. with BMI > 30), 0 for no and 1 for yes.
This dataset contains measured biomarkers in pigs fed with various diets.
cortisolcortisol
A data frame with 61 rows and 9 variables:
the id of the pig
the diet fed to the pig (chipped diet or non-chipped diet)
the gender of the pig
urine costisol in pg/ml
serum acth in pg/ml
serum crh in pg/ml
testosterone in ng/ml
LH in ng/ml
daily caloric intake in kcal
Data from Parisi, et al., (2021) which studies the applicability of predictive models for intensive care admission of COVID-19 patients in a secondary care hospital in Belgium. This study is based on data of patients admitted to an emergency department with a positive RT-PCR SARS-CoV-2 test.
covidcovid
A data frame with 64 rows and 5 variables:
admission to an Intensive Care Unit (0 for no, 1 for yes)
sex (men, women)
age in years
lactate dehydrogenase in U/L
oxygen saturation in percentage
https://jeccm.amegroups.org/article/view/6927/html
Parisi, Nicolas, et al. "Non applicability of validated predictive models for intensive care admission and death of COVID-19 patients in a secondary care hospital in Belgium.", Journal of Emergency and Critical Care Medicine, (2021).
Data from the COVID-19 Data Hub joined with spatial features for Switzerland.
data_covid_switzerland_spatialdata_covid_switzerland_spatial
Country
3-letter code of the country according to the standard ISO 3166-1 Alpha-3
Date
Cumulative number of confirmed cases
Total population
Cumulative number of tests
Daily number of confirmed cases
Daily number of tests
Number of daily confirmed cases divided per the country population
Moving Average applied to confirmed_per_pop with a window of 7 days
'sf' geometry list of country
This dataset contains reports of diabetes symptoms from 520 individuals, encompassing symptoms potentially associated with the condition. It was compiled through a questionnaire aimed at recently diagnosed diabetics or individuals displaying one or more symptoms. Data collection took place via direct questionnaire at Sylhet Diabetes Hospital in Bangladesh.
diabetesdiabetes
Age of the patient in years
Gender of the patient (Male, Female)
Presence of polyuria (excessive urination) (Yes, No)
Presence of polydipsia (excessive thirst) (Yes, No)
Presence of sudden weight loss (Yes, No)
Presence of weakness (Yes, No)
Presence of polyphagia (excessive hunger) (Yes, No)
Presence of genital thrush (Yes, No)
Presence of visual blurring (Yes, No)
Presence of itching (Yes, No)
Presence of irritability (Yes, No)
Presence of delayed healing (Yes, No)
Presence of partial paresis (Yes, No)
Presence of muscle stiffness (Yes, No)
Presence of alopecia (Yes, No)
Presence of obesity (Yes, No)
Diagnosis class (1 if presence of diabetes, 0 otherwise)
https://link.springer.com/chapter/10.1007/978-981-13-8798-2_12
Islam, M. M. F., et al. "Likelihood prediction of diabetes at early stage using data mining techniques", Computer vision and machine intelligence in medical image analysis, (2020).
Diet
dietdiet
ID
Gender (male or female)
Age in years
Height in m
Type of diet (A, B or C)
Initial weight in kg
Final weight in kg
This dataset is based on a study conducted in suburban Boston in the late 1970s to investigate the relationship between forced expiratory volume and smoking behavior in 654 youths between the ages of 3 and 19.
fevfev
forced expiratory volume or FEV, which measures the amount of air a person can exhale during a forced breath.
age in years
gender of the person (0 for males and 1 for females)
height in cm
smoking behavior (0 for non-smokers and 1 for smokers)
hist_compare_to_normal
hist_compare_to_normal( x, col = "lightgray", main = "", xlab = "", ylab = "", lwd_line = 1.5, col_line1 = "#ff160e", col_line2 = "#335bff", add_legend = TRUE, legend_position = "topleft", delta = 0.2, ... )hist_compare_to_normal( x, col = "lightgray", main = "", xlab = "", ylab = "", lwd_line = 1.5, col_line1 = "#ff160e", col_line2 = "#335bff", add_legend = TRUE, legend_position = "topleft", delta = 0.2, ... )
x |
data vector to be visualized. |
col |
color of the histogram. |
main |
string indicating the title of the plot. |
xlab |
a string indicating the x label. |
ylab |
a string indicating the y label. |
lwd_line |
width of density lines. |
col_line1 |
color of density line classic mle estimation. |
col_line2 |
color of density line classic robust estimation. |
add_legend |
a Boolean if the estimated parameters of the Normal distribution should be plotted. |
legend_position |
a string specifying the position of the legend. |
delta |
graphic parameter to determine the shrinkage of the axis. |
... |
Extra graphical arguments. |
No return value. Plot a histogram.
n <- 1000 x <- rnorm(n = n) hist_compare_to_normal(x) x2 <- rexp(n, rate = 25) hist_compare_to_normal(x2, legend_position = "topright")n <- 1000 x <- rnorm(n = n) hist_compare_to_normal(x) x2 <- rexp(n, rate = 25) hist_compare_to_normal(x2, legend_position = "topright")
Data from an experiment made on rats which compares the HP13C bicarbonate signal intensities normalized to the total sum of metabolites and corresponding initial reaction rate as a function of the injected dose of HP1-13C pyruvate. Two groups of rats were compared (i.e. fed and overnight-fasted). Dataset from Can et al. 2022.
HP13CbicarbonateHP13Cbicarbonate
HP13C bicarbonate signal intensities normalized to the total sum of metabolites
initial reaction rate as a function of the injected dose of HP13C pyruvate
fed and overnight-fasted
https://www.nature.com/articles/s42003-021-02978-2
This dataset contains a collection of variables believed to be potentially associated with the blood pressure measurements of 213 individuals from Kuwait. The dataset lists the following variables:
kuwait_bpkuwait_bp
Age in years
Weight in kg
Height in mm
Chin skinfold in cm
Forearm skinfold in cm
Calf skinfold in cm
Resting pulse rate
Whether or not the participant is left-handed
The Body Mass Index (BMI) of the participant
Systolic blood pressure
This dataset consists of variables possibly relating to blood pressures of 39 Peruvians who have moved from rural high-altitude areas to urban lower-altitude areas.
PeruvianBPPeruvianBP
Age in years
Years in urban area
Weight in kg
Height in mm
Chin skinfold
Forearm skinfold
Calf skinfold
Resting pulse rate
Systolic blood pressure
This dataset contains the number of clients in a pharmacy for each hour over two years.
pharmacypharmacy
A data frame with 17520 rows and 4 variables:
the date
the hour of the day
the week day
the recorded number of clients
This dataset contains demographic, environmental and respiratory data for 200 male adults (ages 40–75) to analyze factors affecting lung capacity. It contrasts populations across three distinct air-quality environments: a pristine high-altitude rural town, a suburban area with moderate air quality, and a highly industrialized urban region.
pm_exposurepm_exposure
A data frame with 200 rows and 7 variables:
Age of the participant in years
Height of the participant in cm
Weight of the participant in kg
Estimated 5-years average exposure to fine particulate matter (PM2.5), measured in micrograms per cubic meter
A binary indicator of residential location (1 = Rural, 0 = Non-rural)
A factor indicating the specific type of residential area ("rural", "suburban", or "urban")
Forced Expiratory Volume in 1 second (measured in Liters), representing lung function
This dataset is based on the effectiveness of directed reading activities for elementary school students (6-12 years old).
readingreading
Student id
Degree of Reading Power (DRP) test score
Age of the students
Binary variable indicating whether a student participated to the directed reading activities (Treatment if the student participated, Control otherwise)
This dataset is based on a study on the physical and behavioral characteristics of snorers.
snoringsnoring
gender of the person (0 for males and 1 for females)
age in years
height in cm
weight in kg
smoking behavior (0 for non-smokers and 1 for smokers)
number of glasses drunk per day (in red wine equivalent)
snoring diagnosis (0 for not snoring, 1 for snoring)