r4subtrace

r4subtrace is the traceability engine in the R4SUB ecosystem. It quantifies and explains end-to-end traceability between clinical submission artifacts – primarily ADaM outputs <-> derivations <-> SDTM sources <-> specs <-> code – and converts trace evidence into standardized R4SUB Evidence Table rows (from r4subcore).

It focuses on answering one question:

Can we prove where each analysis variable/value came from, and can a reviewer follow it?

Why r4subtrace?

In real submissions, issues are rarely “a single failed rule.” Many are trace failures: - Missing or ambiguous derivation documentation - ADaM variable not linkable to SDTM sources - Mismatch between spec and what code produces - Inconsistent naming across specs, define.xml, and datasets - Reviewer cannot reproduce or validate lineage

r4subtrace formalizes traceability as evidence + measurable indicators.

What r4subtrace measures

Traceability levels

Installation

pak::pak(c("R4SUB/r4subcore", "R4SUB/r4subtrace"))

Quick start

1) Create run context

library(r4subcore)
library(r4subtrace)

ctx <- r4sub_run_context(study_id = "ABC123", environment = "DEV")

2) Load metadata

adam_meta <- read.csv("adam_metadata.csv")  # columns: dataset, variable, label, type
sdtm_meta <- read.csv("sdtm_metadata.csv")  # same structure

map <- read.csv("trace_map.csv")
# recommended columns:
# adam_dataset, adam_var, sdtm_domain, sdtm_var, derivation_text(optional), confidence(optional)

3) Build trace model and evidence

tm <- build_trace_model(
  adam_meta = adam_meta,
  sdtm_meta = sdtm_meta,
  mapping   = map
)

ev <- trace_model_to_evidence(tm, ctx = ctx, source_name = "r4subtrace", source_version = "0.1.0")

validate_evidence(ev)
evidence_summary(ev)

4) Compute trace coverage score

ind <- trace_indicator_scores(ev)
ind

Core objects

Trace Model

A list with:

Trace Evidence

Evidence rows are emitted for:

Indicators

Design principles

License

MIT