Bounded Outcome Risk Guard for Model Evaluation
BORG catches data leakage that inflates your model’s performance — before you report the wrong number.
library(BORG)
# You scaled the data, then split it. Looks fine?
data_scaled <- scale(iris[, 1:4])
train_idx <- 1:100
test_idx <- 101:150
borg_inspect(data_scaled, train_idx = train_idx, test_idx = test_idx)
#> INVALID — Hard violation: preprocessing_leak
#> "Normalization parameters were computed on data beyond training set"The test set means leaked into the scaler. Your reported accuracy is wrong. BORG finds this automatically — for scaling, PCA, recipes, caret pipelines, and more.
A model shows 95% accuracy on test data, then drops to 60% in production. The usual cause: data leakage. Information from the test set contaminated training, and the reported metrics were wrong.
A Princeton meta-analysis found leakage errors in 648 published papers across 30 fields. In civil war prediction research, correcting leakage revealed that “complex ML models do not perform substantively better than decades-old Logistic Regression.” The reported gains were artifacts.
BORG addresses this problem by automatically detecting six categories of leakage — index overlap, duplicate rows, preprocessing leakage, target leakage, group leakage, and temporal violations — across common R frameworks (base R, caret, tidymodels, mlr3). Beyond detection, BORG diagnoses data dependencies (spatial, temporal, clustered), generates appropriate cross-validation schemes, and produces publication-ready methods paragraphs with test statistics.
These features make the package useful in domains like:
borg(): Main entry point for all
validation
borg_inspect(): Detailed inspection of
specific objects
caret::preProcess,
recipes::recipe, prcomprsample resampling objectslm, glm,
ranger, etc.)borg_diagnose(): Analyze data for
dependency structure
borg_compare_cv(): Run random and
blocked CV side by side on the same data
plot() for visual comparisonborg_power(): Estimate power loss from
switching to blocked CV
summary(): Generate publication-ready
methods paragraphs
borg_compare_cv() inflation estimates when
availableborg_certificate() /
borg_export(): Machine-readable validation
certificates in YAML/JSON for audit trails| Category | Impact | Response |
|---|---|---|
| Hard Violation | Results invalid | Blocks evaluation |
| Soft Inflation | Results biased | Warns, allows with caution |
Hard Violations: - index_overlap - Same
row in train and test - duplicate_rows - Identical
observations across sets - preprocessing_leak - Scaler/PCA
fitted on full data - target_leakage - Feature with |r|
> 0.99 with target - group_leakage - Same group in train
and test - temporal_leak - Test data predates training
Soft Inflation: - proxy_leakage -
Feature with |r| 0.95-0.99 with target - spatial_proximity
- Test points close to training - spatial_overlap - Test
inside training convex hull
# Install from GitHub
# install.packages("pak")
pak::pak("gcol33/BORG")
# Or using devtools
# install.packages("devtools")
devtools::install_github("gcol33/BORG")library(BORG)
# Clean split — passes validation
result <- borg(iris, train_idx = 1:100, test_idx = 101:150)
result
#> Status: VALID
#> Hard violations: 0
#> Soft inflations: 0
# Overlapping indices — caught immediately
borg(iris, train_idx = 1:100, test_idx = 51:150)
#> INVALID — index_overlap: Train and test indices overlap (50 shared indices)# caret preProcess fitted on ALL data (common mistake)
library(caret)
pp <- preProcess(mtcars, method = c("center", "scale"))
borg_inspect(pp, train_idx = 1:25, test_idx = 26:32, data = mtcars)
#> Hard violation: preprocessing_leak
#> "preProcess centering parameters were computed on data beyond training set"# Feature highly correlated with outcome
leaky_data <- data.frame(
x = rnorm(100),
outcome = rnorm(100)
)
leaky_data$leaked <- leaky_data$outcome + rnorm(100, sd = 0.01)
borg_inspect(leaky_data, train_idx = 1:70, test_idx = 71:100, target = "outcome")
#> Hard violation: target_leakage_direct# Clinical data with patient IDs
clinical <- data.frame(
patient_id = rep(1:10, each = 10),
measurement = rnorm(100)
)
# Random split ignoring patients
set.seed(123)
idx <- sample(100)
train_idx <- idx[1:70]
test_idx <- idx[71:100]
borg_inspect(clinical, train_idx, test_idx, groups = "patient_id")
#> Hard violation: group_leakagespatial_data <- data.frame(
lon = runif(200, -10, 10),
lat = runif(200, -10, 10),
response = rnorm(200)
)
# Let BORG diagnose and generate appropriate CV folds
result <- borg(spatial_data, coords = c("lon", "lat"), target = "response", v = 5)
result$diagnosis@recommended_cv
#> "spatial_block"# Prove to reviewers that random CV inflates metrics
comparison <- borg_compare_cv(
spatial_data,
formula = response ~ lon + lat,
coords = c("lon", "lat"),
repeats = 10
)
print(comparison)
plot(comparison)# summary() writes a publication-ready methods paragraph
result <- borg(spatial_data, coords = c("lon", "lat"), target = "response")
summary(result)
#> Model performance was evaluated using spatial block cross-validation
#> (k = 5 folds). Spatial autocorrelation was detected in the data
#> (Moran's I = 0.12, p < 0.001)...
# Three citation styles
summary(result, style = "nature")
summary(result, style = "ecology")BORG works with common ML frameworks:
# caret
library(caret)
pp <- preProcess(mtcars[, -1], method = c("center", "scale"))
borg_inspect(pp, train_idx = 1:25, test_idx = 26:32, data = mtcars)
# tidymodels
library(recipes)
rec <- recipe(mpg ~ ., data = mtcars) |>
step_normalize(all_numeric_predictors()) |>
prep()
borg_inspect(rec, train_idx = 1:25, test_idx = 26:32, data = mtcars)| Function | Purpose |
|---|---|
borg() |
Main entry point — diagnose data or validate splits |
borg_inspect() |
Detailed inspection of objects |
borg_diagnose() |
Analyze data dependencies |
borg_validate() |
Validate complete workflow |
borg_assimilate() |
Assimilate leaky pipelines into compliance |
borg_compare_cv() |
Empirical random vs blocked CV comparison |
borg_power() |
Power analysis after blocking |
plot() |
Visualize results |
summary() |
Generate methods text for papers |
borg_certificate() |
Create validation certificate |
borg_export() |
Export certificate to YAML/JSON |
“Software is like sex: it’s better when it’s free.” — Linus Torvalds
I’m a PhD student who builds R packages in my free time because I believe good tools should be free and open. I started these projects for my own work and figured others might find them useful too.
If this package saved you some time, buying me a coffee is a nice way to say thanks. It helps with my coffee addiction.
MIT (see the LICENSE.md file)
@software{BORG,
author = {Colling, Gilles},
title = {BORG: Bounded Outcome Risk Guard for Model Evaluation},
year = {2026},
url = {https://github.com/gcol33/BORG}
}