| Title: | Parametric Time-to-Event Analysis with Variable Incubation Phases |
| Version: | 1.4.0 |
| Date: | 2026-03-20 |
| Description: | Fit parametric models for time-to-event data that show an initial 'incubation period', i.e., a variable delay phase where no events occur. The delayed Weibull distribution serves as the foundational data model. For parameter estimation, different flavours of maximum likelihood estimation ('MLE') and the method of maximum product of spacings estimation ('MPSE') are implemented. Bootstrap confidence intervals for parameters and significance tests in a two group setting are provided. |
| License: | LGPL (≥ 3) |
| Encoding: | UTF-8 |
| LazyData: | true |
| Imports: | future (≥ 1.21), future.apply (≥ 1.6), glue (≥ 1.4), MASS, minqa, purrr (≥ 0.3), rlang (≥ 0.4), stats, survival, tibble |
| Suggests: | boot, dplyr, future.callr, ggplot2 (≥ 3.3), knitr, numDeriv, patchwork, rmarkdown, spelling, testthat (≥ 3.0.0), tidyr, withr |
| URL: | https://gitlab.com/imb-dev/incubate/ |
| BugReports: | https://gitlab.com/imb-dev/incubate/-/issues/ |
| Language: | en-GB |
| RoxygenNote: | 7.3.3 |
| Config/testthat/edition: | 3 |
| Config/Needs/check: | spelling |
| Depends: | R (≥ 3.5.0) |
| Collate: | 'aaa.R' 'cpp11.R' 'data.R' 'utils.R' 'delay_estimation.R' 'delay.R' 'delay_test.R' 'incubate-package.R' 'integration.R' 'zzz.R' |
| VignetteBuilder: | knitr |
| LinkingTo: | cpp11 |
| NeedsCompilation: | yes |
| Packaged: | 2026-03-22 23:26:54 UTC; kuhnmat |
| Author: | Matthias Kuhn |
| Maintainer: | Matthias Kuhn <matthias.kuhn@tu-dresden.de> |
| Repository: | CRAN |
| Date/Publication: | 2026-03-22 23:50:02 UTC |
Incubate package for parametric time-to-event analysis with delay
Description
Estimation and statistical tests on parameters in parametric time-to-event analyses with delay.
Author(s)
Maintainer: Matthias Kuhn matthias.kuhn@tu-dresden.de (ORCID) [copyright holder]
See Also
Useful links:
Delayed Exponential Distribution
Description
Density, distribution function, quantile function, random generation and restricted mean survival time function for the delayed exponential distribution.
There is an initial delay phase (parameter delay1) where no events occur. After that, rate1 applies.
Optionally, a second phase is possible where the hazard rate might change (parameters delay2 and rate2).
Usage
dexp_delayed(
x,
delay1 = 0,
rate1 = 1,
delay2 = NULL,
rate2 = NULL,
delay = delay1,
rate = rate1,
log = FALSE
)
pexp_delayed(
q,
delay1 = 0,
rate1 = 1,
delay2 = NULL,
rate2 = NULL,
delay = delay1,
rate = rate1,
lower.tail = TRUE,
log.p = FALSE,
grad = FALSE
)
qexp_delayed(
p,
delay1 = 0,
rate1 = 1,
delay2 = NULL,
rate2 = NULL,
delay = delay1,
rate = rate1,
lower.tail = TRUE,
log.p = FALSE
)
rexp_delayed(
n,
delay1 = 0,
rate1 = 1,
delay2 = NULL,
rate2 = NULL,
delay = delay1,
rate = rate1,
cens = 0
)
mexp_delayed(
t = +Inf,
delay1 = 0,
rate1 = 1,
delay2 = NULL,
rate2 = NULL,
delay = delay1,
rate = rate1
)
Arguments
x |
A numeric vector of values for which to get the density. |
delay1 |
numeric. The first delay, must be non-negative. |
rate1 |
numeric. The event rate, must be non-negative. |
delay2 |
numeric. The second delay, must be non-negative. |
rate2 |
numeric. The second event rate, must be non-negative. |
delay |
numeric. Alias for first delay. |
rate |
numeric. Alias for first rate. |
log |
logical. Return value on log-scale? |
q |
A numeric vector of quantile values. |
lower.tail |
logical. Give cumulative probability of lower tail? |
log.p |
logical. P-value on log-scale? |
grad |
logical. Should the gradient be calculated at the given quantile values (and not the cumulative probabilities)? |
p |
A numeric vector of probabilities. |
n |
integer. Number of random observations requested. |
cens |
numeric. Expected proportion of random right-censored observations. |
t |
A numeric vector of times that restrict the mean survival. Default is |
Details
If only a single initial delay phase is there, the numerical arguments other than n are recycled to the length of the result (as with the exponential distribution in stats).
With two phases, the arguments are not recycled. Only the first element of delays and rates are used as it otherwise becomes ambiguous which delay and rate parameter apply for observations in different phases.
Generally, only the first elements of the logical arguments are used.
Value
Functions pertaining to the delayed exponential distribution:
-
dexp_delayedgives the density -
pexp_delayedgives the vector of cumulative probabilities or the gradient matrix (nbr parameters x quantile times) -
qexp_delayedgives the quantile function -
rexp_delayedgenerates a pseudo-random sample -
mexp_delayedgives the restricted mean survival time
The length of the result is determined by n for rexp_delayed, and is the maximum of the lengths of the numerical arguments for the other functions,
R's recycling rules apply when only single initial delay phase is used.
Vector of cumulative probabilities or gradient matrix (nbr parameter x quantile times)
See Also
Delayed Weibull Distribution
Description
Density, distribution function, quantile function and random generation for the delayed Weibull distribution.
Besides the additional parameter delay, the other two Weibull-parameters are in principle retained as in R's stats-package:
-
shape -
scale(as inverse of rate)
Usage
dweib_delayed(
x,
delay1,
shape1,
scale1 = 1,
delay2 = NULL,
shape2 = NULL,
scale2 = 1,
delay = delay1,
shape = shape1,
scale = scale1,
log = FALSE
)
pweib_delayed(
q,
delay1,
shape1,
scale1 = 1,
delay2 = NULL,
shape2 = NULL,
scale2 = 1,
delay = delay1,
shape = shape1,
scale = scale1,
lower.tail = TRUE,
log.p = FALSE,
grad = FALSE
)
qweib_delayed(
p,
delay1,
shape1,
scale1 = 1,
delay2 = NULL,
shape2 = NULL,
scale2 = 1,
delay = delay1,
shape = shape1,
scale = scale1,
lower.tail = TRUE,
log.p = FALSE
)
rweib_delayed(
n,
delay1,
shape1,
scale1 = 1,
delay2 = NULL,
shape2 = NULL,
scale2 = 1,
delay = delay1,
shape = shape1,
scale = scale1,
cens = 0
)
mweib_delayed(
t = +Inf,
delay1,
shape1,
scale1 = 1,
delay2 = NULL,
shape2 = NULL,
scale2 = 1,
delay = delay1,
shape = shape1,
scale = scale1
)
Arguments
x |
A numeric vector of values for which to get the density. |
delay1 |
numeric. The first delay, must be non-negative. |
shape1 |
numeric. First shape parameter, must be positive. |
scale1 |
numeric. First scale parameter (inverse of rate), must be positive. |
delay2 |
numeric. The second delay, must be non-negative. |
shape2 |
numeric. The second shape parameter, must be non-negative. |
scale2 |
numeric. The second scale parameter (inverse of rate), must be positive. |
delay |
numeric. Alias for first delay. |
shape |
numeric. Alias for first shape. |
scale |
numeric. Alias for first scale. |
log |
logical. Return value on log-scale? |
q |
A numeric vector of quantile values. |
lower.tail |
logical. Give cumulative probability of lower tail? |
log.p |
logical. P-value on log-scale? |
grad |
logical. Should the gradient be calculated at the given quantile values (and not the cumulative probabilities)? |
p |
A numeric vector of probabilities. |
n |
integer. Number of random observations requested. |
cens |
numeric. Proportion of random right-censored observations. For small values of shape1, on average fewer censorings are achieved. |
t |
A numeric vector of times that restrict the mean survival. Default
is |
Details
Additional arguments are forwarded via ... to the underlying functions of
the exponential distribution in the stats-package.
The numerical arguments other than n are recycled to the length of the
result. Only the first elements of the logical arguments are used.
Value
Functions pertaining to the delayed Weibull distribution:
-
dweib_delayedgives the density -
pweib_delayedgives the vector of cumulative probabilities or the gradient matrix (nbr parameters x quantile times) -
qweib_delayedgives the quantile function -
rweib_delayedgenerates a pseudo-random sample -
mweib_delayedgives the restricted mean survival time
The length of the result is determined by n for rweib_delayed, and is the maximum of the lengths of the numerical arguments for the other functions, R's recycling rules apply.
Format a number as percentage.
Description
Internal helper function that is not exported.
Usage
as_percent(x, digits = 1)
Arguments
x |
numeric vector to be formatted as percentage |
digits |
requested number of decimal digits of the percentage |
Value
number formatted as percentage character
Generate bootstrap distribution of model parameters to fitted incubate model.
Description
Bootstrap data are here estimated coefficients from models fitted to bootstrap samples.
The bootstrap data is used to make bootstrap inference in the second step.
It is an internal function, the main entry point is confint.incubate_fit().
Usage
bsDataStep(
object,
bs_data = c("parametric", "ordinary"),
R,
useBoot = FALSE,
smd_factor = 0.25
)
Arguments
object |
an |
bs_data |
character. Which type of bootstrap method to generate data? |
R |
integer. Number of bootstrapped model coefficient estimates |
useBoot |
flag. Do you want to use the boot-package? Default value is |
smd_factor |
numeric. smooth-delay factor: influence the amount of smoothing. 0 means no smoothing at all. Default is 0.25 (as was optimal in simulation for log-quantile together with log-delay-shift = 5) |
Value
bootstrap data, either as matrix or of class boot (depending on the useBoot-flag)
Build control list for delay_model() and objFunFactory()
Description
Default values are set as default argument values in this constructor.
Usage
buildControl(
verbose = 0,
profiled,
pen_shape = FALSE,
MLEw_weight = "sdist_median",
MLEw_optim = "min",
ties = "density"
)
Arguments
verbose |
numeric. Degree of verbosity. Default value 0 means no extra output. |
profiled |
logical. Use objective function based on parameters that are profiled out? |
pen_shape |
logical. Should we penalize high shape parameters? |
MLEw_weight |
character. How to derive the weights? |
MLEw_optim |
character. How to find the optimum? Currently only "min" for minimization is supported. |
ties |
character. How to handle tied observations? |
Details
This is an internal function. The function might change without precautionary measures taken.
Value
list. Control settings for fitting routine delay_model
Builds the distribution object
Description
This object contains all relevant informations about the chosen distribution.
Usage
buildDist(distribution)
Arguments
distribution |
character(1). Which distribution? |
Value
distribution object. currently, it is simply a list.
Coefficients of a delay-model fit.
Description
Coefficients of a delay-model fit.
Usage
## S3 method for class 'incubate_fit'
coef(object, transformed = FALSE, group = NULL, ...)
Arguments
object |
object that is a |
transformed |
flag. Do we request the transformed parameters as used within the optimization? |
group |
character string to request the canonical parameter for one group |
... |
further arguments, currently not used. |
Value
named coefficient vector
Confidence intervals for parameters of incubate-model fits.
Description
Bias-corrected bootstrap confidence limits (either quantile-based or normal-approximation based) are generated. Optionally, there are also variants that use a log-transformation first. At least R=1000 bootstrap replications are recommended. Default are quantile-based confidence intervals that internally use a log-transformation.
Usage
## S3 method for class 'incubate_fit'
confint(
object,
parm,
level = 0.95,
R = 199L,
bs_data,
bs_infer = c("logquantile", "lognormal", "quantile", "quantile0", "normal", "normal0"),
useBoot = FALSE,
...
)
Arguments
object |
object of class |
parm |
character. Which parameters to get confidence interval for? |
level |
numeric. Which is the requested confidence level for the interval? Default value is 0.95 |
R |
number of bootstrap replications. Used only if not |
bs_data |
character or bootstrap data object. If character, it specifies which type of bootstrap is requested and the bootstrap data will be generated. Data can also be provided here directly. If missing it uses parametric bootstrap. |
bs_infer |
character. Which type of bootstrap inference is requested to generate the confidence interval? |
useBoot |
logical. Delegate bootstrap confint calculation to the |
... |
further arguments, currently not used. |
Value
A matrix (or vector) with columns giving lower and upper confidence limits for each parameter.
Fit optimal parameters according to the objective function (either MPSE or MLE-based).
Description
The objective function carries the given data in its environment and it is to be minimized.
R's standard routine stats::optim does the numerical optimization, using numerical derivatives.
or the analytical solution is returned directly if available.
Usage
delay_fit(objFun, optim_args = NULL, verbose = 0)
Arguments
objFun |
objective function to be minimized |
optim_args |
list of own arguments for optimization. If |
verbose |
integer that indicates the level of verboseness. Default 0 is quiet. |
Value
optimization object including a named parameter vector or NULL in case of errors during optimization
Fit a delayed Exponential or Weibull model to one or two given sample(s)
Description
Maximum product of spacings estimation is used by default to fit the
parameters. Estimation via naive maximum likelihood (method = "MLEn") is
available, too, but MLEn yields severely biased estimates for small samples.
MLEc is a corrected version of MLEn due to Cheng and Iles (1987).
Usage
delay_model(
x = stop("Specify observations for first group x=!", call. = FALSE),
y = NULL,
distribution = c("exponential", "weibull", "normal"),
twoPhase = FALSE,
bind = NULL,
method = c("MPSE", "MLEn", "MLEw", "MLEc"),
control = list()
)
Arguments
x |
numeric. observations of 1st group. Can also be a list of data from two groups. |
y |
numeric. observations from 2nd group |
distribution |
Which delayed distribution is assumed? Exponential or Weibull. Can be given as character or as distribution list-object. |
twoPhase |
logical. Allow for two phases? |
bind |
character. parameter names that are bound together in 2-group situation. |
method |
character. Which method to fit the model? 'MPSE' = maximum product of spacings estimation or 'MLEn' = naive maximum likelihood estimation or 'MLEw' = weighted MLE' or MLEc' = corrected MLE |
control |
list. Details that control the optimization. E.g., profiling, penalization. |
Details
The parameter control= allows to set specifics of the optimization
process of the delay model fit. Possible list entries are
-
verboselevel of verboseness. Default 0 is quiet -
tiescharacter. Strategy to handle ties formethod = "MPSE". Either 'density' (default), 'equispaced' or 'error'. -
profiledaim to profile out a parameter -
pen_shapelogical. Should high values of shape (for Weibull distribution) be penalized? Default is FALSE. -
MLEw_weightcharacter. Name of method to build weights for MLEw-method. -
MLEw_optimcharacter. Name for strategy to find extremum: either minimization of L2-norm of 1st partial derivatives (as stated in Cousineau, 2009) or using root-finding
Numerical minimization is normally done by stats::optim. If this
minimization attempt fails minqa::bobyqa is used as fall-back. For MLEw, we
can also use root finding of gradient function instead of minimization of
L2-norm of gradient of MLEw-objective function.
Value
incubate_fit the delay-model fit object with criterion to minimize.
Or NULL if optimization failed (e.g. too few observations).
References
R. C. H. Cheng, T. C. Iles, Corrected Maximum Likelihood in Non-Regular Problems, Journal of the Royal Statistical Society: Series B (Methodological), Volume 49, Issue 1, September 1987, pp. 95–101, doi:10.1111/j.2517-6161.1987.tb01428.x
Estimate rounding error based on given sample of metric values The idea is to check at which level of rounding the sample values do not change.
Description
Estimate rounding error based on given sample of metric values The idea is to check at which level of rounding the sample values do not change.
Usage
estimRoundingError(
obs,
roundDigits = seq.int(from = -4L, to = 6L),
n_obs = 100L
)
Arguments
obs |
numeric. Metric values from a sample to estimate the corresponding rounding error |
roundDigits |
integer. Which level of rounding to test? Negative numbers round to corresponding powers of 10 |
n_obs |
integer. How many observations to consider at most? If the provided sample has more observations a sub-sample is used. |
Value
estimated rounding error
Get delay distribution function
Description
Get delay distribution function
Usage
getDist(
distribution = c("exponential", "weibull", "normal"),
type = c("cdf", "prob", "density", "random", "param"),
twoPhase = FALSE,
twoGroup = FALSE,
bind = NULL,
profiled = FALSE,
transformed = FALSE
)
Arguments
distribution |
character(1). Which distribution? |
type |
character(1). type of function, cdf: cumulative distribution function, density or random function |
twoPhase |
logical(1). For |
twoGroup |
logical(1). For |
bind |
character. For |
profiled |
logical(1). For |
transformed |
logical(1). For |
Value
selected distribution function or parameter names
Add a line fit to a plot of (another) incubate_fit object
Description
Carries over the concept of base-plot lines to ggplot. This function returns a
ggplot-layer to be added to an existing ggplot-object of an incubate fit. It allows to set
aesthetics of the plot manually.
Usage
## S3 method for class 'incubate_fit'
lines(x, mapping = NULL, ...)
Arguments
x |
|
mapping |
a |
... |
further arguments to passed to |
Value
a geom_function ggplot-layer
Extract Log-Likelihood
Description
The user has the possibility to request different flavours of log-likelihood. By default the flavour matching the fitting method is used.
Usage
## S3 method for class 'incubate_fit'
logLik(object, method = NULL, ...)
Arguments
object |
an |
method |
Which flavour of the log-likelihood? By default, it uses the flavour from the model fit |
... |
further arguments passed on to the object function of the model fit object |
Value
Log-likelihood value for the model fit object
Relapse-free survival of melanoma patients under adjuvant treatment
Description
Data stem from a double-blind, placebo-controlled phase 3 trial where n=870 patients with completely resected, stage III melanoma with BRAF V600E or V600K mutations were randomly assigned to receive oral dabrafenib plus trametinib (combination therapy, 438 patients) or two matched placebo tablets (432 patients) for 12 months.
Usage
long2017
Format
A data frame with 870 rows and 4 variables:
- ID
artificially generated patient ID
- time
Time to relapse-free survival in months
- status
Status of observation, encoded as 0 for right-censoring vs 1 for RFS-event
- trtmt
Treatment group: Dabrafenib+Trametinib vs Placebo
Details
Unfortunately, the data set of the clinical trial is not publicly available. Instead, the data were digitized by Sean Devlin based on the published survival plot (see Fig 1A in the original publication of the trial results). Therefore, the data given here do not claim to be a 100% faithful representation of the clinical trial. Some deviations are to be expected.
Source
Long GV, Hauschild A, Santinami M, et al. Adjuvant Dabrafenib plus Trametinib in Stage III BRAF-Mutated Melanoma. N Engl J Med. 2017;377(19):1813-1823. doi:10.1056/NEJMoa1708539
Devlin SM and O'Quigley J, The nph2ph-transform: applications to the statistical analysis of completed clinical trials, arXiv:2407.18905, 2024. doi:10.48550/arXiv.2407.18905.
Serial interval times for measles during a long journey on a sailing vessel
Description
Measles broke out during a sailing ship passage from England to Australia involving six persons on the ship. Serial interval times, i.e. the time of clinical onset between successive cases in a chain of transmission, were recorded. The interval times for the first three cases was not observed completely as they brought measles on board.
Usage
measles_sailer
Format
A data frame with 6 rows and 4 variables:
- generation
Disease generation on sailer
- symptomOnset
Days of first symptoms since the start of the journey
- serialInterval
Days between successive cases in chain of transmission, from symptom to symptom
- status
Status indicator for serial interval time: 0 right-censored vs 1 for observed
Details
In 1829, the British sailing vessel HMS America carried 176 prisoners from England to New South Wales, Australia. Alexander Stewart, the ship surgeon, recorded cases of measles during the passage in his medical journal. The journey started on 4 March 1829. A guard, who embarked from Chatham (England), was the first measles case when the ship berthed at Woolwich (England) on 28 March 1829. The measles began affecting children of the guards on 31 March 1829 and spreading to some of the soldiers, later.
In the medical journal, it is not clearly said if clinical onset is defined as fever or rash. The first generation of measles on the sailer comprises three cases for which we assume the minimum serial interval time for measles which is generally estimated to be six days (for both, fever-to-fever and for rash-to-rash).
Source
Records of the Admiralty, Naval Forces, Royal Marines, Coastguard,
and related bodies, ADM 101/2/3,
https://discovery.nationalarchives.gov.uk/details/r/C4106406
References
Paterson BJ, Kirk MD, Cameron AS, et al. Historical data and modern methods reveal insights in measles epidemiology, BMJ Open 2013;3:e002033. doi:10.1136/bmjopen-2012-002033
Fine PE. The interval between successive cases of an infectious disease. Am J Epidemiol. 2003;158(11):1039-1047. doi:10.1093/aje/kwg251
Minimize an objective function with alternative optimizer
Description
The primary optimization routine is BFGS from stats::optim.
If this fails for some reason we try an alternative which is implemented here.
It can use the derivative-free minimizaiton through bobyqa or the PORT-routine nlminb.
Usage
minObjFunAlt(
objFun,
start,
lower = -Inf,
upper = +Inf,
verbose = 0,
method = c("bobyqa", "nlminb")
)
Arguments
objFun |
function to minimize |
start |
vector of start values for parameters |
lower |
numeric. lower bound for parameters (boxed constraint) |
upper |
numeric. upper bound for parameters (boxed constraint) |
verbose |
numeric. Verbosity level |
method |
Specifies which optimizer to use |
Details
This is only a thin wrapper to the chosen alternative optimizer.
Value
optimization object with some common entries like parOpt, valOpt convergence, methodOpt and counts. Or NULL in case of failure.
Checks if arguments are numerically close.
Description
The function is vectorized and R's recycling rules apply.
Usage
near(x, y)
Arguments
x |
numeric first vector |
y |
numeric second vector |
Value
logical vector if arguments from x and y are close
See Also
Factory method for objective function
Description
Given the observed data this factory method produces an objective function
which is either the negative of the MPSE-criterion H or some flavour of the negative log-likelihood for MLE.
Implemented variants of MLE-objective functions are naive MLE ('MLEn'), corrected MLE ('MLEc') or weighted MLE ('MLEw').
In any case, the objective function is to be minimized.
Usage
objFunFactory(
x,
y = NULL,
distO,
method = c("MPSE", "MLEn", "MLEc", "MLEw"),
twoPhase = FALSE,
bind = NULL,
control
)
Arguments
x |
numeric. observations |
y |
numeric. observations in second group. |
distO |
distribution object |
method |
character(1). Specifies the method for which to build the objective function. Default value is |
twoPhase |
logical flag. Do we allow for two delay phases where event rate may change? Default is |
bind |
character. parameter names that are bound together (i.e. equated) between both groups |
control |
list. Fine-tune parameters for optimization. Needs to be set! |
Details
The objective function takes a vector of model parameters as argument. From the observations, negative or infinite values are discarded during pre-processing.
Profiling is implemented for single-phase models for Weibull and Exponential distributions: profiling allows to estimate scale1 parameter (resp. rate1= for exponential) based on delay1 (and shape1 for Weibull). Except for MLEw, the formula is derived from the conventional log-likelihood through equating its partial derivative with respect to scale1 to zer0. This leads to the candidate value for scale1. This approach can be used also for the methods MPSE and MLEc. For MLEw (weighted MLE) we have a weighting factor also in the formula for scale1.
Value
the objective function (e.g., the negative MPSE criterion) for given choice of model parameters or NULL upon errors
Plot a fitted delay-model object of class incubate_fit
Description
The fitted delay-model is plotted: a Kaplan-Meier survival curve is shown together with the parametric model fit. Optionally, a fit of a second delay-model can be added. When given, we do some preliminary checks that the two models do match together.
Usage
## S3 method for class 'incubate_fit'
plot(x, y, title, subtitle, xlim, ...)
Arguments
x |
a fitted delay-model |
y |
an optional second fitted delay-model |
title |
character. Optionally, provide a title to the plot. |
subtitle |
character. Optionally, provide a subtitle to the plot. By default the coefficients are shown. |
xlim |
numeric. Optionally, limits for the x-axis (time). If unspecified starts from 0 to last observation. |
... |
further arguments. Not in use here (it is required for generic plot function) |
Details
This function requires the ggplot2-package to be installed.
Power simulation function for a two-group comparison
Description
There are two modes of operation:
-
power=NULL: simulate power based on given sample sizen -
n=NULL: search iteratively for a suitable sample sizenfor a given power
Usage
power_diff(
distribution = c("exponential", "weibull"),
twoPhase = FALSE,
eff = stop("Provide parameters for both groups that reflect the effect!"),
param = "delay1",
test = c("bootstrap", "logrank", "logrank_pp", "LRT"),
method = c("MPSE", "MLEw", "MLEc", "MLEn"),
n = NULL,
r = 1,
sig.level = 0.05,
power = NULL,
nPowerSim = 1600,
R = 200,
nRange = c(5, 250),
verbose = 0
)
Arguments
distribution |
character. Which assumed distribution is used for the
power calculation. Default is |
twoPhase |
logical(1). Do we model two phases per group? Default is
|
eff |
list of length 2. The two list elements must be numeric vectors that contain the model parameters (as understood by the delay-distribution functions provided by this package) for the two groups. |
param |
character. Parameter name(s) which are to be tested for
difference and for which to simulate the power. Default value is
|
test |
character. Which test to use for this power estimation? Defaults
to |
method |
character. Which fitting method to use in case of a parametric test. |
n |
integer. Number of observations per group for the power simulation
or |
r |
numeric. Ratio of both groups sizes, ny / nx. Default value is 1, i.e. balanced group sizes. Must be positive. |
sig.level |
numeric. Significance level. Default is 0.05. |
power |
numeric. |
nPowerSim |
integer. Number of simulation rounds. Default value 1600 yields a standard error of 0.01 for power if the true power is 80%. |
R |
integer. Number of bootstrap samples for test of difference
within each power simulation. It affects the resolution of the
P-value for each simulation round. A value of around |
nRange |
integer. Admissible range for sample size when power is specified and sample size is requested. The routine might not find the optimal sample size when this range is set too wide. |
verbose |
numeric. How many details are requested? Higher value means more details. 0=off, no details. |
Details
In both cases, the distribution, the parameters that are tested for, the type
of test and the effect size (eff=) need to be specified. Specify the effect as a list with two elements,
each element holds the parameter vector describing the distribution of the outcome per group.
The fitting method (method=) and the kind of significance test (test=) are handed down to test_diff().
The more power simulation rounds (parameter nPowerSim=) the more densely the space of data
according to the specified model is sampled.
Note that estimating sample size n is computationally intensive.
The iterative search uses heuristics to find a sample size n within the
provided range (nRange=), and the estimated sample size might yield a
slightly different power level. Hence, check the reported power in the
output. The search algorithm comes to better results when the admissible
range for sample size (nRange=) is chosen sensibly and not too wide.
In case the estimated sample size and the achieved power is too high it might
pay off to rerun the function with an adapted admissible range for the sample
size by giving a narrower range in nRange=.
Value
List of results of power simulation. Or NULL in case of errors.
See Also
Examples
# Simulate power for a given sample size:
# test for any difference in a delay-exponential model using logrank test
# the assumed effect is given in terms of model parameters for both groups
power_diff(
eff = list(grA = c(delay1 = 5, rate1 = .09),
grB = c(delay1 = 7, rate1 = .12)),
test = "logrank",
n = 16, power = NULL, nPowerSim = 300)
## Not run:
# test for difference in delay in a delay-exponential model via a bootstrap test:
# the power is estimated based on nPowersim = 420 simulated datasets
# for real applications, use a higher nPowerSim (e.g. 1600) for more
# precise power estimation and a higher R (e.g. 400) for more precise
# P-value estimation in each simulation round
set.seed(123) # for reproducibility
power_diff(
eff = list(grA = c(delay1 = 5.2, rate1 = .1),
grB = c(delay1 = 7, rate1 = .12)),
param = "delay1",
test = "bootstrap", method = "MPSE",
n = 16, power = NULL,
nPowerSim = 420, R = 160)
## End(Not run)
## Not run:
# test for difference in rate in a delay-exponential model via a bootstrap test:
# required sample size is estimated for a given power.
# provide a range for the sample size search, e.g. nRange = c(10, 30)
# this takes more time than the previous example, as the function
# iteratively searches for a sample size that yields the requested power
set.seed(1234) # for reproducibility
power_diff(
eff = list(grA = c(delay1 = 5, rate1 = .07),
grB = c(delay1 = 7, rate1 = .18)),
param = "rate1",
test = "bootstrap", method = "MPSE",
n = NULL, power = 0.8,
nPowerSim = 480, R = 150,
nRange = c(18, 48))
## End(Not run)
Check and prepare the survival response(s)
Description
Allowed censoring types are right-, left-, and interval-censoring.
If y0 is not NULL this function will return either both numeric, non-Surv or both Surv-objects of the same type.
Usage
prepResponseVar(x0, y0 = NULL, simplify = TRUE)
Arguments
x0 |
response as numeric or survival::Surv using left, right or interval-coding |
y0 |
response as numeric or survival::Surv using left, right or interval-coding |
simplify |
logical. Should the result be as simple as possible? If |
Value
a list of the two responses, either both as survival::Surv or plain numeric (if no censorings and simplify=TRUE)
Small data sets from miscellaneous publications
Description
Most data sets come from publications about parameter estimation in Weibull models. See the references below, in section "Source".
Usage
rockette74
fatigue
susquehanna
pollution
graphite
Format
An object of class numeric of length 4.
An object of class numeric of length 10.
An object of class numeric of length 20.
An object of class numeric of length 20.
An object of class numeric of length 41.
Details
The following small data sets are provided as numeric vectors.
rockette:Artificial sample of length 4 given by Rockette. The maximum likelihood function has two stationary points, none of them is the global maximum.
fatigue:Fatigue times of ten bearings of a specific type in hours.
susquehanna:Maximum flood levels (in millions of cubic feet per second) for the Susquehanna River of Harrisburg (Pennsylvania, USA) over 20 4-year periods.
pollution:Beach pollution levels in South Wales (measured in number of coliform per 100 ml) on 20 days over a 5-week period.
graphite:Breaking stress (in MPa x 10^6) of 41 beam specimens cut from a single graphite H590 block, from a reliability study reported by Margetson & Cooper (1984), cited by Cheng & Stephen (1989)
Source
Different publications related to estimating Weibull data.
References
McCool, J.I., 1974. Inferential techniques for Weibull populations. Technical Report TR 74-0180, Wright Patterson Air Force Base, Ohio.
Rockette, H., 1974. Maximum Likelihood Estimation with the Weibull Model.
Dumonceaux, R. and Antle, C. E., 1973. Discrimination between the lognormal and the Weibull distributions. Technometrics, 15, 923-926.
Steen, P. J. and Stickler, D. J., 1976. A Sewage Pollution Study of Beaches from Cardiff to Ogmore. Report January 1976, Cardiff: Department of Applied Biology, UWIST.
Cheng, R.C.H. and Stephen, M.A., 1989. A Goodness of Fit Test Using Moran’s Statistic with Estimated Parameters. Biometrika, 76, 386-392.
Calculate parameter scaling for optimization routine.
Description
The scale per parameter corresponds to the step width within the optimization path.
Usage
scalePars(parV, lowerB = 0.00001, upperB = 100000)
Arguments
parV |
named numeric parameter vector for optimization |
lowerB |
numeric. lower bound for parameter scales |
upperB |
numeric. upper bound for parameter scales |
Value
numeric. vector of parameter scaling
Survival time of mice with glioma under different treatments
Description
This data set stems from an animal experiment described in Stankovic (2018). In particular, the data in question is shown in Figure 6J and 6K.
Usage
stankovic
Format
A data frame with 45 rows and 5 variables:
- Figure
The figure in the publication where the data is shown
- Time
Survival in days
- Status
Right-censor status: 1 means observed event
- Group
Experimental group identifier
- Colour
Colour used in the Stankovic publication to mark this group
Details
The data were read directly from the survival plots in the publication with the help of Plot Digitizer, version 2.6.9.
Source
Dudvarski Stankovic N, Bicker F, Keller S, et al. EGFL7 enhances surface expression of integrin a5b1 to promote angiogenesis in malignant brain tumors. EMBO Mol Med. 2018;10(9):e8420. doi:10.15252/emmm.201708420 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6127886/
Goodness-of-fit (GOF) test statistic (experimental!)
Description
The GOF-test is performed for a fitted delay-model that was fit using MPSE. There are different GOF-tests implemented:
-
Moran GOF is based on spacings, like the MPSE-criterion itself.
-
Pearson GOF uses categories and compares observed to expected frequencies.
Usage
test_GOF(
delayFit,
method = c("moran", "pearson", "nikulin", "NRR"),
estimated = TRUE,
verbose = 0
)
Arguments
delayFit |
delay_model fit object |
method |
character(1). which method to use for GOF. Default is 'moran'. |
estimated |
flag. Moran test: was the parameter estimated? |
verbose |
integer. Verbosity level. The higher the more verbose debugging output. |
Details
Note that the GOF-tests are currently only implemented for models fitted with maximum product of spacings estimation (MPSE). These tests (Moran & Pearson) are still experimental. So, use with caution. Experimental code!
Value
An htest-object containing the GOF-test result
Test the difference for model parameter(s) between two uncorrelated groups
Description
The test is in fact a model comparison between a null model where the
parameters are enforced to be equal and an unconstrained full model. The
model parameters can be fit with various methods: MPSE, MLEn, MLEc or MLEw.
Parametric bootstrap tests and likelihood ratio tests are supported. As test
statistic for the bootstrap test we use twice the difference in best
(=lowest) objective function value, i.e. 2 * (val_0 - val_1). The factor
2 does not matter but it becomes reminiscent of a likelihood ratio test
statistic albeit the objective function is not a negative log-likelihood in
all cases (e.g. it is the negative of the maximum product spacing metric for
the MPSE-method).
Usage
test_diff(
x,
y = stop("Provide data for group y!"),
distribution = c("exponential", "weibull"),
twoPhase = FALSE,
type = c("all", "bootstrap", "GOF", "moran", "pearson", "logrank", "LRT"),
param = "delay1",
method = c("MPSE", "MLEw", "MLEc", "MLEn"),
profiled = method != "MPSE",
ties = c("density", "equispaced", "error"),
doLogrank = TRUE,
R = 400,
chiSqApprox = FALSE,
verbose = 0
)
Arguments
x |
data from reference/control group. |
y |
data from the treatment group. |
distribution |
Name of the distribution to use or distribution object. |
twoPhase |
logical(1). Do we model two phases per group? Default is |
type |
character. Which type of tests to perform? |
param |
character. Names of parameters to test difference for. Default value is |
method |
character. Which method to fit the models. |
profiled |
logical. Use the profiled likelihood? |
ties |
character. How to handle ties in data vector of a group? |
doLogrank |
logical. Do also non-parametric logrank tests? |
R |
numeric(1). Number of bootstrap samples to evaluate the distribution of the test statistic. |
chiSqApprox |
logical flag. In bootstrap, should we estimate the best degrees of freedom for chi-square to match the distribution of the test statistic under H0? |
verbose |
numeric. How many details are requested? Higher value means more details. 0=off, no details. |
Details
High values of this difference speak against the null model (i.e., high
val_0 indicates bad fit under the null model0 and/or low values of val_1
indicate a good fit under the more general model1. The test is implemented as
a parametric bootstrap test, i.e., we
take the given null-model fit as ground truth
regenerate data according to this model fit
recalculate the test statistic
appraise the observed test statistic in light of the generated distribution under H0
Value
list with the results of the test. Element P contains the different P-values, for instance from parametric bootstrap
Examples
set.seed(123)
# generate example data
grA <- rweib_delayed(n = 70, delay1 = 5, shape1 = 2, scale1 = 8)
grB <- rweib_delayed(n = 60, delay1 = 7, shape1 = 1.8, scale1 = 6)
# difference in delay parameter is significant at 5% level
test_diff(x = grA, y = grB,
distribution = "weibull", param = "delay1",
type = "bootstrap", method = "MPSE", R = 50)
# but the non-parametric logrank test is not significant
# no need to specify parameters
test_diff(x = grA, y = grB,
type = "logrank")
Transform observed data to unit interval
Description
The transformation used is the probability integral transform: the cumulative distribution function with the estimated parameters of the model fit takes the data into the 0-1 interval. All available data in the model fit is transformed. Censored observations lead to censored back-transformed observations as well.
Usage
## S3 method for class 'incubate_fit'
transform(`_data`, ...)
Arguments
_data |
a fitted model object of class |
... |
currently ignored |
Value
The transformed data, either a vector (for single group) or a list with entries x and y (in two group scenario)
Note
This S3-method implementation is quite different from its default
method that allows for non-standard evaluation on data frames, primarily
intended for interactive use. But the name transform fits so nicely to the
intended purpose that it is re-used for the probability integral transform,
here.
Refit an incubate_fit-object with specified optimization arguments
Description
This function is useful when only an optimization argument is to be changed.
If more things need to be changed go back to delay_model and start from scratch.
Usage
## S3 method for class 'incubate_fit'
update(object, optim_args = NULL, verbose = 0, ...)
Arguments
object |
|
optim_args |
optimization arguments |
verbose |
integer flag. Requested verbosity during |
... |
further arguments, currently not used. |
Value
The updated fitted object of class incubate_fit or NULL in case of failure.
Internal MLEw-weight W1 function according to sampling distribution
Weight W1 for given sample sizes (of one group).
For small nObs we use direct results from Monte-Carlo simulation.
For higher nObs we use an approximation (based on Wilson-Hilferty transformation).
Description
Note that in R there is the numeric routine stats::qgamma which could also be used
as in stats::qgamma(p=.5, shape = 10, rate = 10) for n=10.
Usage
w1Fint(nObs)
Arguments
nObs |
numeric. number of observations (vectorized) |
Value
numeric. W1-value corrsponding to nObs. Same length as nObs
Internal MLEw weight function W2 according to sampling distribution W2 is either taken directly from the MCSS-results (if available) or are taken from the smooth approximation function, otherwise.
Description
Internal MLEw weight function W2 according to sampling distribution W2 is either taken directly from the MCSS-results (if available) or are taken from the smooth approximation function, otherwise.
Usage
w2Fint(nObs)
Arguments
nObs |
numeric. Sample size (vectorized) of one group |
Value
W2 (same size as nObs)
Internal factory method for W3-function based on sampling distribution
Description
Generally, the weight W3 depends on the sample size and the shape parameter. The sample size of a group is fixed. Hence, we return a function that returns W3 for provided shape parameter as argument. If the sample size was used during the Monte-Carlo simulation study the coefficients of generalized logistic curve are returned directly. Otherwise the coefficients stem from a natural cubic spline fit based on the MCS.
Usage
w3FFint(nObs)
Arguments
nObs |
sample size for which to build the W3-function |
Details
We run the cubic spline fit once when the package is loaded. This guarantees
that the spline function of the R-version of the current user is used and
hopefully with good performance. Function w3FFint is run repeatedly by the
objFunFactory(), once per group.
Value
W3-function for the given sample size. It is a function of shape.