This vignette documents rformat’s stress testing methodology and results against 126 CRAN and base R packages.
rformat is designed around three invariants:
Parse preservation. Formatted output must always parse. If input parses, output parses. This is the minimum correctness bar for any formatter.
Semantic preservation. Formatting changes only
whitespace and style tokens. Assignment conversion (= to
<-) and brace insertion use R’s parse tree token types
(e.g., EQ_ASSIGN vs EQ_SUB), ensuring they
never change program semantics.
Idempotency. Formatting is a fixed point:
rformat(rformat(x)) == rformat(x). A formatter that isn’t
idempotent produces different output on each run, which makes it
unusable in CI pipelines and pre-commit hooks.
A whitespace-only formatter could verify correctness by comparing
ASTs before and after formatting. But rformat also converts
= to <- and inserts braces around bare
control flow bodies, both of which modify the AST. These changes are
guided by R’s parser token types and are semantically safe, but they
raise the testing bar: we need both parse preservation and idempotency
as quality gates.
The parse gate is necessary but not sufficient. Code that parses may still have subtle formatting issues (e.g., continuation lines at the wrong indent level that happen to produce valid R). Idempotency catches these: if a second formatting pass produces different output, the first pass made a formatting decision that the second pass disagrees with. This signals an internal inconsistency.
The stress test script (stress_test.R) does the
following for each package:
R/ directory from cached source
tarballs.R files (some packages have files
that don’t parse)rformat() using
randomized style parameters (indent, wrap, brace_style,
control_braces, line_limit, expand_if, etc.)rformat(rformat(x)) == rformat(x)Each file is formatted with a randomly chosen parameter combination. This exercises all option interactions and catches edge cases that fixed parameters would miss. The 10 strategic combinations cover:
# Full 126-package run (parallel)
cd ~/rformat-lab
r -e 'cat(read.csv("stress_timing.csv")$package, sep="\n")' |
parallel -j16 --timeout 900 r stress_test.R {} 2>/dev/null |
grep -E "FAIL|IDEMP|_DONE_"
# Test specific packages
r stress_test.R dplyr
r stress_test.R ggplot2
# Test one file with all 10 strategic combos
parallel -j10 --timeout 120 r stress_one_combo.R {1} {2} \
::: path/to/file.R ::: $(seq 1 10)
The test suite covers 126 packages across multiple ecosystems:
| Category | Packages |
|---|---|
| Base R (15) | base, compiler, graphics, grDevices, grid, methods, parallel, splines, stats, stats4, tcltk, tools, utils, datasets, class |
| Recommended (15) | boot, cluster, codetools, foreign, KernSmooth, lattice, MASS, Matrix, mgcv, nlme, nnet, rpart, spatial, survival |
| Core / Infrastructure | Rcpp, rlang, vctrs, glue, cli, withr, lifecycle, magrittr, pillar, crayon |
| Data Manipulation | dplyr, tidyr, tibble, readr, stringr, forcats, lubridate, purrr, data.table, janitor |
| Visualization | ggplot2, scales, patchwork, cowplot, plotly, lattice, leaflet, ggridges, viridis, DT |
| Shiny / Web | shiny, shinydashboard, bslib, htmltools, httpuv, httr, httr2, plumber, rvest, xml2 |
| Modeling / Stats | caret, recipes, parsnip, workflows, yardstick, glmnet, lme4, survival, brms, xgboost |
| Devtools / Packaging | testthat, tinytest, devtools, usethis, roxygen2, pkgdown, covr, remotes, pak, renv |
| Data Import / Storage | readxl, writexl, DBI, RSQLite, duckdb, arrow, sparklyr, pins, qs2, fst |
| Time Series / Specialized | forecast, zoo, xts, tsibble, fable, igraph, sf, terra, sp, rmarkdown |
| String / Parsing / Lang | stringi, jsonlite, yaml, digest, R6, xmlparsedata, evaluate, callr, processx, here |
| Performance / Parallel | future, furrr, parallelly, RcppParallel, bench, profvis, memoise, progress, curl, openssl |
Results as of March 2026 with the C++ (Rcpp) backend.
| Metric | Value |
|---|---|
| Packages tested | 126 |
| Files formatted | 5,843 |
| Parse failures (FAIL) | 0 |
| Idempotency failures (IDEMP) | 0 |
| Wall clock time (16 cores) | ~58 seconds |
0 failures across 5,843 files with randomized style parameters.
The stress test has been instrumental in finding and fixing bugs. Each round of testing surfaced new failure patterns, all now covered by regression tests.
R’s getParseData() reports column numbers with tabs
expanded to 8-column tab stops, but substring() works on
character positions. Using col1/col2 directly
with substring() produces wrong results when source lines
contain tabs. Fixed with col_to_charpos() and
display_width() helper functions.
getParseData() truncates STR_CONST tokens
longer than ~1000 characters to a placeholder like
[1200 chars quoted with '"']. Fixed by detecting truncated
tokens and recovering the original text from source lines using the
token’s line/column positions.
reformat_one_inline_if searches forward from
x <- if (cond) value to find a matching
else. The search tracked if-else nesting but not brace
boundaries. For code like:
if (outer) {
space <- if (cond) padding
use(space)
} else {
other()
}
The search would cross the } and find the outer
else. Fixed by tracking brace depth and stopping when
brace_depth < 0.
When expanding
x <- if (cond) a else { stmt1; stmt2 }, the formatter
would try to inline the braced false body, producing
x <- { stmt1 stmt2 } with missing semicolons. Fixed by
skipping expansion when the false body starts with {.
When a long line contained nested function calls, the wrap pass would
try to wrap an inner call already inside another call’s parentheses on
the same line. Fixed by checking paren_depth_before > 0
and skipping nested calls.
Comments between a trailing operator and a value (e.g.,
else # comment with body on the next line) were treated as
statement boundaries. Fixed by tracking last_real_tok and
only checking non-comment tokens against statement-ending token
types.
Achieving idempotency required careful coordination between formatting passes. The key insight: two systems compute indentation, and they must agree.
format_tokens (the initial pass)
computes indentation from brace and bracket depth:
indent = (brace_depth + paren_depth) * indent_size. It
tracks (, ), [, ],
and [[ as part of paren depth.
Wrap functions (wrap_long_calls,
wrap_long_operators) split long lines and must produce
continuation indentation that format_tokens will preserve
on the next pass. They use depth-based indentation matching
format_tokens, rather than column-aligned indentation
(which format_tokens would normalize away):
base_indent + bracket_depth * indent_sizebase_indent + (bracket_depth_before + 1) * indent_sizeParen-column alignment (aligning continuation to the opening
parenthesis) is preserved for function definitions via
reformat_function_defs, which rebuilds function signatures
from scratch on each pass.