ReproStat

Reproducibility Diagnostics for Statistical Modeling

ReproStat helps you answer a practical question that standard model summaries do not answer well:

If the data changed a little, would the substantive result still look the same?

The package repeatedly perturbs a dataset, refits the model, and measures how stable the outputs remain. It summarizes that behavior through:

coefficient stability
p-value stability
selection stability
prediction stability
a composite Reproducibility Index (RI) on a 0-100 scale
cross-validation ranking stability for comparing multiple candidate models

This makes ReproStat useful when you want to move beyond single-fit inference and assess how sensitive your modeling conclusions are to ordinary data variation.

What ReproStat Is For

ReproStat is designed for analysts who want to know whether a model result is:

stable under bootstrap resampling
sensitive to sample composition
sensitive to small measurement noise
robust across competing modeling choices

Typical use cases include:

regression diagnostics for research analyses
checking whether selected predictors remain important across perturbations
comparing candidate models by how consistently they rank best in repeated CV
producing a transparent reproducibility summary for reports, theses, or papers

ReproStat does not claim to prove scientific reproducibility on its own. It quantifies the stability of model outputs under controlled perturbation schemes so you can assess how fragile or dependable the modeling result appears.

Installation

If the package is available on CRAN:

install.packages("ReproStat")

To install the development version from GitHub:

remotes::install_github("ntiGideon/ReproStat")

Optional dependencies

Package	Purpose
`MASS`	Robust M-estimation backend via `backend = "rlm"`
`glmnet`	Penalized regression backend via `backend = "glmnet"`
`ggplot2`	ggplot-based plotting helpers

Core Idea

The package follows a simple workflow:

original data
    ->
perturb data many times
    ->
refit the model each time
    ->
measure how much results change
    ->
summarize the stability of those changes

If coefficient signs, significance decisions, selected variables, predictions, and model rankings remain similar across perturbations, the analysis is more reproducible in the sense ReproStat measures. If they vary substantially, the result may be fragile.

Quick Start

library(ReproStat)

set.seed(1)

diag_obj <- run_diagnostics(
  mpg ~ wt + hp + disp,
  data = mtcars,
  B = 200,
  method = "bootstrap"
)

print(diag_obj)
coef_stability(diag_obj)
pvalue_stability(diag_obj)
selection_stability(diag_obj)
prediction_stability(diag_obj)
reproducibility_index(diag_obj)
ri_confidence_interval(diag_obj, R = 500, seed = 1)

Main Workflow

1. Fit the diagnostic object

run_diagnostics() is the main entry point. It fits the original model, perturbs the data B times, refits the model, and stores the results for downstream summaries.

diag_obj <- run_diagnostics(
  mpg ~ wt + hp + disp,
  data = mtcars,
  B = 200,
  method = "bootstrap"
)

2. Inspect individual stability dimensions

coef_stability(diag_obj)
pvalue_stability(diag_obj)
selection_stability(diag_obj)
prediction_stability(diag_obj)

These functions answer different questions:

coef_stability(): Do coefficient magnitudes fluctuate a lot?
pvalue_stability(): Do significance decisions stay consistent?
selection_stability(): Do predictors keep the same sign or selection pattern?
prediction_stability(): Do predictions stay similar across perturbations?

3. Summarize with the Reproducibility Index

ri <- reproducibility_index(diag_obj)
ri

The RI is a compact summary of multiple stability components, scaled to 0-100.

4. Visualize the result

oldpar <- par(mfrow = c(2, 2))
plot_stability(diag_obj, "coefficient")
plot_stability(diag_obj, "pvalue")
plot_stability(diag_obj, "selection")
plot_stability(diag_obj, "prediction")
par(oldpar)

Interpreting the Outputs

Perturbation methods

Method	What it tests	Good when you want to assess
`"bootstrap"`	resampling variability	ordinary data-driven stability
`"subsample"`	sample composition sensitivity	robustness to who enters the sample
`"noise"`	measurement perturbation	sensitivity to noisy recorded values

Reproducibility Index

The RI is best treated as an interpretive summary, not as a hard universal threshold.

RI range	Interpretation
`90-100`	Highly stable under the chosen perturbation design
`70-89`	Moderately stable; overall pattern is fairly dependable
`50-69`	Mixed stability; some conclusions may be sensitive
`< 50`	Low stability; investigate model dependence and data fragility

Two important cautions:

The RI depends on your perturbation scheme, backend, and tuning choices.
RI values are not directly comparable across all backends, especially glmnet, because not all components are defined the same way.

Supported Backends

ReproStat supports four model-fitting backends through the same interface:

Backend	Model family	Notes
`"lm"`	ordinary least squares	default
`"glm"`	generalized linear models	use `family = ...`
`"rlm"`	robust regression	requires `MASS`
`"glmnet"`	penalized regression	requires `glmnet`

Examples:

# Logistic regression
diag_glm <- run_diagnostics(
  am ~ wt + hp + qsec,
  data = mtcars,
  B = 100,
  family = stats::binomial()
)

# Robust regression
if (requireNamespace("MASS", quietly = TRUE)) {
  diag_rlm <- run_diagnostics(
    mpg ~ wt + hp,
    data = mtcars,
    B = 100,
    backend = "rlm"
  )
}

# Penalized regression
if (requireNamespace("glmnet", quietly = TRUE)) {
  diag_lasso <- run_diagnostics(
    mpg ~ wt + hp + disp + qsec,
    data = mtcars,
    B = 100,
    backend = "glmnet",
    en_alpha = 1
  )
}

Comparing Models with CV Ranking Stability

ReproStat also helps evaluate model-selection stability, not just single-model stability.

cv_ranking_stability() repeatedly runs cross-validation across competing formulas and records how often each model ranks best.

models <- list(
  baseline = mpg ~ wt + hp,
  medium   = mpg ~ wt + hp + disp,
  full     = mpg ~ wt + hp + disp + qsec
)

cv_obj <- cv_ranking_stability(models, mtcars, v = 5, R = 50)
cv_obj$summary
plot_cv_stability(cv_obj, metric = "top1_frequency")
plot_cv_stability(cv_obj, metric = "mean_rank")

This is useful when the lowest average error and the most consistently top-ranked model are not the same thing.

Example: End-to-End Analysis

library(ReproStat)

set.seed(42)

diag_obj <- run_diagnostics(
  mpg ~ wt + hp + disp,
  data = mtcars,
  B = 200,
  method = "bootstrap"
)

# Individual summaries
coef_stability(diag_obj)
pvalue_stability(diag_obj)
selection_stability(diag_obj)
prediction_stability(diag_obj)$mean_variance

# Composite summary
ri <- reproducibility_index(diag_obj)
cat(sprintf("RI = %.1f\n", ri$index))
ri_confidence_interval(diag_obj, R = 500, seed = 42)

# Visuals
oldpar <- par(mfrow = c(2, 2))
plot_stability(diag_obj, "coefficient")
plot_stability(diag_obj, "pvalue")
plot_stability(diag_obj, "selection")
plot_stability(diag_obj, "prediction")
par(oldpar)

Learn More on the pkgdown Site

The pkgdown site is organized for different user needs:

Get started: a narrative introduction and first analysis
Articles: interpretation guidance, backend choices, and workflow patterns
Reference: individual function documentation
Changelog: package evolution over releases

Key pages:

vignette("ReproStat-intro")
the interpretation article
the backend guide
the workflow patterns article

Citation

If you use ReproStat in published work, cite the package and any associated manuscript or software record that accompanies the release you used.

License

GPL (>= 3)