Clinical trials with time-to-event endpoints often rely on hazard-based methods such as the proportional hazards (PH) model and the hazard ratio (HR). The HR can be hard to interpret when treatment effects vary over time, and the proportional hazards assumption is frequently violated in practice.
The Restricted Mean Survival Time (RMST) is the expected event-free time up to a pre-specified follow-up point, \(L\) (Royston and Parmar 2013; Uno et al. 2014). It yields a treatment contrast on the time scale, such as an average gain of several months over a clinically relevant horizon.
Recent work models RMST directly as a function of baseline covariates instead of estimating it only from a survival curve or hazard model. Methods based on Inverse Probability of Censoring Weighting (IPCW) (Tian et al. 2014) now cover stratified studies (Wang et al. 2019; Zhang and Schaubel 2024) and covariate-dependent censoring (Wang and Schaubel 2018).
Most available software emphasizes estimation from existing data rather than study design. As a result, trial statisticians often need custom code for sample size and power calculations.
RMSTpowerBoost implements direct RMST methods for
power and sample size calculations:
analytical
calculation and a simulation-based boot method.This vignette outlines the main model families and shows how to use them.
To keep CRAN checks fast, this vignette uses precomputed outputs for
the long-running examples by default. Set
RMSTPOWERBOOST_FULL_VIGNETTES=true before knitting to
recompute every analytical and bootstrap result.
RMSTpowerBoost
PackageThe functions in this package are grounded in a regression-based formulation of the restricted mean survival time (RMST). For a given subject \(i\) with event time \(T_i\), covariate vector \(Z_i\) and treatment indicator \(\mathrm{Trt}_i\), the conditional RMST is modeled as
\[ \mathbb{E}[\min(T_i, L)\mid Z_i] \;=\; \beta_0 + \beta_{\text{effect}}\,\mathrm{Trt}_i + \beta_2^{\top} Z_i, \]
where \(L\) is the restriction time,
and \(\beta_{\text{effect}}\)
represents the modeled treatment contrast on the RMST scale, that is,
the expected difference in event-free time between treatment arms after
adjustment for the included covariates. The quantity \(\beta_{\text{effect}}\) therefore defines
the effect size used throughout the analytical power
and sample size functions in RMSTpowerBoost.
The analytical functions estimate power from a closed-form expression, so they are useful for rapid scenario exploration. The process is:
N, the power is calculated as: \[
\text{Power} = \Phi\left( \frac{|\beta_{\text{effect}}|}{\sigma_N} -
z_{1-\alpha/2} \right)
\] where:
N, which is scaled from the
reference data’s variance.The bootstrap functions estimate power empirically and therefore require more computation. The process is:
sample_size by resampling with replacement from
the reference data.n_sim).alpha. \[
\text{Power} = \frac{\text{Number of simulations with } p <
\alpha}{n_{\text{sim}}}
\]rmst.ss() uses an iterative search algorithm to find the
N required to achieve a target_power:
n_start.current_n using either the analytic
formula or a full bootstrap simulation.calculated_power >= target_power, the search
succeeds and returns current_n.current_n = current_n + n_step) and repeats the
process.max_n or, for bootstrap methods, if the power
fails to improve for a set number of patience steps.RMSTpowerBoost exposes all models through two top-level
functions that use a familiar Surv()-based formula
interface:
| Function | Purpose |
|---|---|
rmst.power(Surv(time, status) ~ covariates, data, arm, sample_sizes, L, ...) |
Power curve over a vector of sample sizes |
rmst.ss(Surv(time, status) ~ covariates, data, arm, target_power, L, ...) |
Minimum sample size to reach a power target |
Key arguments that control model selection:
| Argument | Values | Effect |
|---|---|---|
type |
"analytical" (default) /
"boot" |
Analytic vs. bootstrap method |
strata |
column name / ~col /
NULL |
Activates stratified model |
strata_type |
"additive" /
"multiplicative" |
Stratified model type |
dep_cens |
TRUE / FALSE |
Dependent-censoring model |
| Smooth terms | s(var) in formula |
Activates GAM (forces type = "boot") |
The underlying model-specific functions remain exported for direct use when needed.
Model choice depends on the study design and the assumptions you are willing to make. The table below summarizes typical settings for each approach:
| Model | Key Assumption / Scenario | Use when |
|---|---|---|
| Linear IPCW | Assumes a linear relationship between covariates and RMST. | there is no strong evidence of non-linear effects or complex stratification. |
| Additive Stratified | Assumes the treatment adds a constant amount of survival time across strata. | treatment effects are expected to be comparable across centers or other strata. |
| Multiplicative Stratified | Assumes the treatment multiplies survival time proportionally across strata. | treatment effects are better expressed as relative changes in RMST across strata. |
| Semiparametric GAM | Allows for non-linear covariate effects on RMST. | covariates such as age or biomarkers are likely to have non-linear associations with the outcome. |
| Dependent Censoring | Accounts for covariate-dependent censoring under a single censoring mechanism. | censoring depends on measured covariates and competing risks are not being modeled explicitly. |
These functions implement the foundational direct linear regression model for the RMST. This model is appropriate when a linear relationship between covariates and the RMST is assumed, and when censoring is independent of the event of interest.
Based on the methods of (Tian et al. 2014), these functions model the conditional RMST as a linear function of covariates: \[\mathbb{E}[\min(T_i, L) | Z_i] = \beta_0 + \beta_{\text{effect}} \text{Treatment}_i + \beta_2 \text{Covariate}_{i}\] In this model, the expected RMST up to a pre-specified time L for subject i is modeled as a linear combination of their treatment arm and other variables \(Z_i\).
To handle right-censoring, the method uses Inverse Probability of Censoring Weighting (IPCW). This is achieved through the following steps:
lm()) is then fitted
using these weights. The model only includes subjects who experienced
the event.The analytical functions use a formula based on the asymptotic variance of the regression coefficients to calculate power or sample size, making them fast to evaluate.
Scenario: We use the veteran dataset to
estimate power for a trial comparing standard vs. test chemotherapy
(trt), adjusting for the Karnofsky performance score
(karno).
First, let’s inspect the prepared veteran dataset.
Now, we calculate the power for a range of sample sizes using a truncation time of 9 months (270 days).
power_results_vet <- vignette_cache("veteran_power_calc", {
rmst.power(
Surv(time, status) ~ karno,
data = vet,
arm = "arm",
sample_sizes = c(100, 150, 200, 250),
L = 270
)
})
The results are returned as a data frame and a ggplot
object.
| N_per_Arm | Power |
|---|---|
| 100 | 0.1266 |
| 150 | 0.1687 |
| 200 | 0.2106 |
| 250 | 0.2521 |
We can also use the analytical method to find the required sample size to achieve a target power for a truncation time of one year (365 days).
ss_results_vet <- vignette_cache("veteran_ss_calc", {
rmst.ss(
Surv(time, status) ~ karno,
data = vet,
arm = "arm",
target_power = 0.40,
L = 365,
n_start = 1000, n_step = 250, max_n = 5000
)
})
| Statistic | Value |
|---|---|
| Assumed RMST Difference (from pilot) | -3.9666 |
Passing type = "boot" to rmst.power() or
rmst.ss() switches to a simulation-based approach. This
method repeatedly resamples from the reference data, fits the model on
each sample, and calculates power as the proportion of simulations where
the treatment effect is significant. It relies less on closed-form
large-sample approximations at the cost of greater computation time.
Here is how you would call the bootstrap method for power for the
linear model. The following examples use the same veteran
dataset, but with a smaller number of simulations for demonstration
purposes. In practice, a larger number of simulations (e.g., 1,000 or
more) is recommended to ensure stable results.
First we calculate the power for a range of sample sizes.
power_boot_vet <- vignette_cache("linear_boot_example", {
rmst.power(
Surv(time, status) ~ karno,
data = vet,
arm = "arm",
sample_sizes = c(150, 200, 250),
L = 365,
type = "boot",
n_sim = 50
)
})
Here is how you would call the bootstrap method for sample size calculation, targeting 50% power and truncating at 180 days (6 months).
ss_boot_vet <- vignette_cache("linear_boot_ss_example", {
rmst.ss(
Surv(time, status) ~ karno,
data = vet,
arm = "arm",
target_power = 0.5,
L = 180,
type = "boot",
n_sim = 100,
patience = 5
)
})
In multi-center clinical trials, the analysis is often stratified by a categorical variable with many levels, such as clinical center or a discretized biomarker. Estimating a separate parameter for each stratum can be inefficient when the number of strata is large. The additive stratified model removes the stratum-specific effects through conditioning.
The semiparametric additive model for RMST, as developed by (Zhang and Schaubel 2024), is defined as: \[\mu_{ij} = \mu_{0j} + \beta'Z_i\] This model assumes that the effect of the covariates \(Z_i\) (which includes the treatment arm) is additive and constant across all strata \(j\). Each stratum has its own baseline RMST, denoted by \(\mu_{0j}\).
The common treatment effect, \(\beta\), is estimated with a stratum-centering approach applied to IPCW-weighted data. This avoids direct estimation of the many \(\mu_{0j}\) parameters.
Scenario: We use the colon dataset to
design a trial stratified by the extent of local disease
(extent), a factor with 4 levels. We want to find the
sample size per stratum to achieve 60% power. Let’s inspect the prepared
colon dataset.
Now, we run the sample size search for 60% power, truncating at 5 years (1825 days).
ss_results_colon <- vignette_cache("colon_ss_calc", {
rmst.ss(
Surv(time, status) ~ 1,
data = colon_death,
arm = "arm",
strata = "strata",
strata_type = "additive",
target_power = 0.60,
L = 1825,
n_start = 100, n_step = 100, max_n = 10000
)
})
| Statistic | Value |
|---|---|
| Assumed RMST Difference (from pilot) | -36.7735 |
This example calculates the power for a given set of sample sizes in
a stratified additive model using the same colon
dataset.
power_results_colon <- vignette_cache("additive_power_calc", {
rmst.power(
Surv(time, status) ~ 1,
data = colon_death,
arm = "arm",
strata = "strata",
strata_type = "additive",
sample_sizes = c(1000, 3000, 5000),
L = 1825
)
})
| N_per_Stratum | Power |
|---|---|
| 1000 | 0.3259 |
| 3000 | 0.7432 |
| 5000 | 0.9213 |
Use the multiplicative model when treatment is expected to act on the RMST scale through a relative effect, such as a percentage increase or decrease in survival time.
The multiplicative model, based on the work of (Wang et al. 2019), is defined as: \[\mu_{ij} = \mu_{0j} \exp(\beta'Z_i)\] In this model, the covariates \(Z_i\) have a multiplicative effect on the baseline stratum-specific RMST, \(\mu_{0j}\). This structure is equivalent to a linear model on the log-RMST.
Formal estimation of \(\beta\)
requires an iterative solver. This package instead fits a weighted
log-linear model (lm(log(Y_rmst) ~ ...)) to approximate the
log-RMST ratio and its variance.
This function calculates the power for various sample sizes using the analytical method for the multiplicative stratified model.
power_ms_analytical <- vignette_cache("ms_power_analytical_example", {
rmst.power(
Surv(time, status) ~ 1,
data = colon_death,
arm = "arm",
strata = "strata",
strata_type = "multiplicative",
sample_sizes = c(300, 400, 500),
L = 1825
)
})
| N_per_Stratum | Power |
|---|---|
| 300 | 0.5062 |
| 400 | 0.6259 |
| 500 | 0.7225 |
The following example demonstrates the sample size calculation for the multiplicative model.
ms_ss_results_colon <- vignette_cache("ms_ss_analytical_example", {
rmst.ss(
Surv(time, status) ~ 1,
data = colon_death,
arm = "arm",
strata = "strata",
strata_type = "multiplicative",
target_power = 0.6,
L = 1825
)
})
| Statistic | Value |
|---|---|
| Assumed log(RMST Ratio) (from pilot) | -0.0898 |
The bootstrap approach provides a simulation-based analysis for the
multiplicative model. Pass type = "boot" together with
strata_type = "multiplicative".
power_ms_boot <- vignette_cache("ms_power_boot_example", {
rmst.power(
Surv(time, status) ~ 1,
data = colon_death,
arm = "arm",
strata = "strata",
strata_type = "multiplicative",
sample_sizes = c(100, 300, 500),
L = 1825,
type = "boot",
n_sim = 30,
parallel.cores = 1
)
})
| Statistic | Value |
|---|---|
| Mean RMST Ratio | 1.0001 |
| 95% CI Lower | 0.9579 |
| 95% CI Upper | 1.0501 |
ss_ms_boot <- vignette_cache("ms_ss_boot_example", {
rmst.ss(
Surv(time, status) ~ 1,
data = colon_death,
arm = "arm",
strata = "strata",
strata_type = "multiplicative",
target_power = 0.75,
L = 1825,
type = "boot",
n_sim = 30,
n_start = 100,
n_step = 50,
patience = 4,
parallel.cores = 1
)
})
| Statistic | Value |
|---|---|
| Mean RMST Ratio | 1.0081 |
| 95% CI Lower | 0.9538 |
| 95% CI Upper | 1.0684 |
When a covariate is expected to have a non-linear effect on the outcome, standard linear models may be misspecified. Generalized Additive Models (GAMs) handle this by fitting smooth functions.
These functions use a bootstrap simulation approach combined with a GAM. The method involves two main steps:
Jackknife Pseudo-Observations: The time-to-event outcome is first converted into jackknife pseudo-observations for the RMST. This technique (Andersen et al. 2004; Parner and Andersen 2010) creates a continuous, uncensored variable that represents each subject’s contribution to the RMST. This makes the outcome suitable for use in a standard regression framework.
GAM Fitting: A GAM is then fitted to these pseudo-observations. The model has the form: \[\mathbb{E}[\text{pseudo}_i] = \beta_0 + \beta_{\text{effect}} \cdot \text{Treatment}_i + \sum_{k=1}^{q} f_k(\text{Covariate}_{ik})\] Here, \(f_k()\) are the non-linear smooth functions (splines) that the GAM estimates from the data.
Because this is a bootstrap method, power is not calculated from a direct formula but is instead estimated empirically from the simulations: \[\text{Power} = \frac{1}{B} \sum_{b=1}^{B} \mathbb{I}(p_b < \alpha)\] Where:
\(B\) is the total number of
bootstrap simulations (n_sim).
\(p_b\) is the p-value for the treatment effect in the \(b\)-th simulation.
\(\mathbb{I}(\cdot)\) is the indicator function, which is 1 if the condition is true and 0 otherwise.
Wrap smooth covariates in s() in the formula — this
automatically routes to the GAM bootstrap engine regardless of the
type argument.
Scenario: We use the gbsg (German
Breast Cancer Study Group) dataset, suspecting that the progesterone
receptor count (pgr) has a non-linear effect on
recurrence-free survival. Here is a look at the prepared
gbsg data.
The following code shows how to calculate power. Wrapping
pgr in s() signals a smooth non-linear
effect.
power_gam <- vignette_cache("gbsg_power_calc", {
rmst.power(
Surv(rfstime, status) ~ s(pgr),
data = gbsg_prepared,
arm = "arm",
sample_sizes = c(50, 200, 400),
L = 2825,
n_sim = 50,
parallel.cores = 1
)
})
print(power_gam$results_plot)
Scenario: We want to find the sample size needed to
achieve 95% power for detecting an effect of pgr on
recurrence-free survival.
ss_gam <- vignette_cache("gbsg_ss_calc", {
rmst.ss(
Surv(rfstime, status) ~ s(pgr),
data = gbsg_prepared,
arm = "arm",
target_power = 0.95,
L = 182,
n_sim = 50,
patience = 5,
parallel.cores = 1
)
})
In many observational or longitudinal studies, the probability of being censored may depend on measured subject characteristics such as age, disease stage, or biomarker level, but not on the unobserved event time after conditioning on those covariates. This setting is referred to as covariate-dependent (or conditionally independent) censoring. Inverse-probability-of-censoring weighting (IPCW) can account for this dependence under the stated model and support valid estimation of the restricted mean survival time (RMST).
We assume a single censoring mechanism described by a Cox proportional-hazards model for the censoring time: \[ \text{Pr}(C \le t \mid X) = 1 - G(t \mid X), \qquad G(t \mid X) = \exp\{-\Lambda_C(t \mid X)\}, \] where \(X\) denotes the observed covariates (e.g., age, sex, treatment arm). Each subject receives an inverse-probability weight \[ W_i = \frac{1}{\widehat{G}(Y_i \mid X_i)}, \qquad Y_i = \min(T_i, L), \] with \(T_i\) the event time and \(L\) the truncation horizon for RMST. Weights are stabilized and truncated to prevent numerical instability. The weighted regression \[ E[Y_i \mid A_i, X_i] = \beta_0 + \beta_A A_i + X_i^\top\beta_X \] is then fitted using weighted least squares. The variance of \(\hat\beta_A\) is estimated with a sandwich-type estimator that treats the censoring model as known.
Analytic power is computed as \[ \text{Power} = \Phi\!\left( \frac{|\hat\beta_A|}{\sigma_N} - z_{1-\alpha/2} \right), \qquad \sigma_N^2 = \frac{\widehat{\mathrm{Var}}(\hat\beta_A)}{N}, \] where \(\widehat{\mathrm{Var}}(\hat\beta_A)\) is the asymptotic variance estimated from the reference data and \(N\) is the total sample size. Because the same IPCW model is used for all subjects, no cause-specific components appear in the variance.
We demonstrate this approach using the mgus2 dataset,
modeling censoring as a function of age and sex under a single censoring
mechanism.
Setting dep_cens = TRUE routes the call to the
dependent-censoring adjusted estimator.
dc_power_results <- vignette_cache("dc_power_calc", {
rmst.power(
Surv(time, status) ~ age,
data = mgus_prepared,
arm = "arm",
sample_sizes = c(100, 250, 500),
L = 120,
dep_cens = TRUE,
alpha = 0.05
)
})
| Statistic | Value |
|---|---|
| Assumed RMST Difference (from pilot) | -10.6135 |
We next estimate the per-arm sample size required to achieve 80% power at the same truncation time (10 years).
ss_dc_mgus <- vignette_cache("mgus2_ss_calc", {
rmst.ss(
Surv(time, status) ~ age,
data = mgus_prepared,
arm = "arm",
target_power = 0.80,
L = 120,
dep_cens = TRUE,
alpha = 0.05,
n_start = 100, n_step = 50, max_n = 5000
)
})
| Statistic | Value |
|---|---|
| Assumed RMST Difference (from pilot) | -10.6135 |
RMSTpowerBoost also includes a Shiny web application for
users who prefer a graphical interface.
There are two ways to access the application:
Live Web Version: Access the application directly in your browser without any installation.
Run Locally from the R Package: If you have
installed the RMSTpowerBoost-Package, you can run the
application on your own machine with the following command:
RMSTpowerBoost::run_app()
RMSTpowerBoost centers its interface on
rmst.power() and rmst.ss() for RMST-based
power and sample size calculations under linear, stratified,
semiparametric, and covariate-dependent censoring models. Both functions
use the familiar Surv(time, status) ~ covariates syntax,
with model choice controlled by a small set of arguments
(strata, strata_type, dep_cens,
type).
In practice, these procedures depend on representative reference data because effect sizes and variance components are estimated from that dataset. Bootstrap-based methods can also require substantial computation.
Future development could extend simulation-based procedures for covariate-dependent censoring and add model diagnostic tools for assessing reference-data assumptions.