Reproducibility Diagnostics for Statistical Modeling
ReproStat helps you answer a practical question that standard model summaries do not answer well:
If the data changed a little, would the substantive result still look the same?
The package repeatedly perturbs a dataset, refits the model, and measures how stable the outputs remain. It summarizes that behavior through:
This makes ReproStat useful when you want to move beyond single-fit inference and assess how sensitive your modeling conclusions are to ordinary data variation.
ReproStat is designed for analysts who want to know whether a model result is:
Typical use cases include:
ReproStat does not claim to prove scientific reproducibility on its own. It quantifies the stability of model outputs under controlled perturbation schemes so you can assess how fragile or dependable the modeling result appears.
If the package is available on CRAN:
install.packages("ReproStat")To install the development version from GitHub:
remotes::install_github("ntiGideon/ReproStat")| Package | Purpose |
|---|---|
MASS |
Robust M-estimation backend via backend = "rlm" |
glmnet |
Penalized regression backend via
backend = "glmnet" |
ggplot2 |
ggplot-based plotting helpers |
The package follows a simple workflow:
original data
->
perturb data many times
->
refit the model each time
->
measure how much results change
->
summarize the stability of those changes
If coefficient signs, significance decisions, selected variables, predictions, and model rankings remain similar across perturbations, the analysis is more reproducible in the sense ReproStat measures. If they vary substantially, the result may be fragile.
library(ReproStat)
set.seed(1)
diag_obj <- run_diagnostics(
mpg ~ wt + hp + disp,
data = mtcars,
B = 200,
method = "bootstrap"
)
print(diag_obj)
coef_stability(diag_obj)
pvalue_stability(diag_obj)
selection_stability(diag_obj)
prediction_stability(diag_obj)
reproducibility_index(diag_obj)
ri_confidence_interval(diag_obj, R = 500, seed = 1)run_diagnostics() is the main entry point. It fits the
original model, perturbs the data B times, refits the
model, and stores the results for downstream summaries.
diag_obj <- run_diagnostics(
mpg ~ wt + hp + disp,
data = mtcars,
B = 200,
method = "bootstrap"
)coef_stability(diag_obj)
pvalue_stability(diag_obj)
selection_stability(diag_obj)
prediction_stability(diag_obj)These functions answer different questions:
coef_stability(): Do coefficient magnitudes fluctuate a
lot?pvalue_stability(): Do significance decisions stay
consistent?selection_stability(): Do predictors keep the same sign
or selection pattern?prediction_stability(): Do predictions stay similar
across perturbations?ri <- reproducibility_index(diag_obj)
riThe RI is a compact summary of multiple stability components, scaled to 0-100.
oldpar <- par(mfrow = c(2, 2))
plot_stability(diag_obj, "coefficient")
plot_stability(diag_obj, "pvalue")
plot_stability(diag_obj, "selection")
plot_stability(diag_obj, "prediction")
par(oldpar)| Method | What it tests | Good when you want to assess |
|---|---|---|
"bootstrap" |
resampling variability | ordinary data-driven stability |
"subsample" |
sample composition sensitivity | robustness to who enters the sample |
"noise" |
measurement perturbation | sensitivity to noisy recorded values |
The RI is best treated as an interpretive summary, not as a hard universal threshold.
| RI range | Interpretation |
|---|---|
90-100 |
Highly stable under the chosen perturbation design |
70-89 |
Moderately stable; overall pattern is fairly dependable |
50-69 |
Mixed stability; some conclusions may be sensitive |
< 50 |
Low stability; investigate model dependence and data fragility |
Two important cautions:
glmnet, because not all components are defined
the same way.ReproStat supports four model-fitting backends through the same interface:
| Backend | Model family | Notes |
|---|---|---|
"lm" |
ordinary least squares | default |
"glm" |
generalized linear models | use family = ... |
"rlm" |
robust regression | requires MASS |
"glmnet" |
penalized regression | requires glmnet |
Examples:
# Logistic regression
diag_glm <- run_diagnostics(
am ~ wt + hp + qsec,
data = mtcars,
B = 100,
family = stats::binomial()
)
# Robust regression
if (requireNamespace("MASS", quietly = TRUE)) {
diag_rlm <- run_diagnostics(
mpg ~ wt + hp,
data = mtcars,
B = 100,
backend = "rlm"
)
}
# Penalized regression
if (requireNamespace("glmnet", quietly = TRUE)) {
diag_lasso <- run_diagnostics(
mpg ~ wt + hp + disp + qsec,
data = mtcars,
B = 100,
backend = "glmnet",
en_alpha = 1
)
}ReproStat also helps evaluate model-selection stability, not just single-model stability.
cv_ranking_stability() repeatedly runs cross-validation
across competing formulas and records how often each model ranks
best.
models <- list(
baseline = mpg ~ wt + hp,
medium = mpg ~ wt + hp + disp,
full = mpg ~ wt + hp + disp + qsec
)
cv_obj <- cv_ranking_stability(models, mtcars, v = 5, R = 50)
cv_obj$summary
plot_cv_stability(cv_obj, metric = "top1_frequency")
plot_cv_stability(cv_obj, metric = "mean_rank")This is useful when the lowest average error and the most consistently top-ranked model are not the same thing.
library(ReproStat)
set.seed(42)
diag_obj <- run_diagnostics(
mpg ~ wt + hp + disp,
data = mtcars,
B = 200,
method = "bootstrap"
)
# Individual summaries
coef_stability(diag_obj)
pvalue_stability(diag_obj)
selection_stability(diag_obj)
prediction_stability(diag_obj)$mean_variance
# Composite summary
ri <- reproducibility_index(diag_obj)
cat(sprintf("RI = %.1f\n", ri$index))
ri_confidence_interval(diag_obj, R = 500, seed = 42)
# Visuals
oldpar <- par(mfrow = c(2, 2))
plot_stability(diag_obj, "coefficient")
plot_stability(diag_obj, "pvalue")
plot_stability(diag_obj, "selection")
plot_stability(diag_obj, "prediction")
par(oldpar)The pkgdown site is organized for different user needs:
Key pages:
vignette("ReproStat-intro")If you use ReproStat in published work, cite the package and any associated manuscript or software record that accompanies the release you used.
GPL (>= 3)