The delayed exponential distribution and the delayed Weibull
distribution are the foundational data models in the
incubate-package.
The classical 2-parameter Weibull distribution has shape parameter \(γ>0\) and scale parameter \(β>0\). As additional third parameter we use delay \(α>0\) and then, density (PDF) and cumulative distribution function (CDF) read, for any \(x>0\): \[ \begin{align*} f(x) &= \frac{γ· (x-α)^{γ-1}}{β^γ} · \exp(-\big(\frac{x-α}{β}\big)^γ)\\ F(x) &= 1 - \exp(-\big(\frac{x-α}{β}\big)^γ) \qquad \text{for } x > α \text{ and 0 otherwise} \end{align*} \]
If we have a data sample \(\mathbf{x}\) of size \(n\) and want to apply maximum likelihood (ML) estimation for parameters we get as log-likelihood function
\[ l_\mathbf{x}(α, γ, β) = n \log(γ) - nγ \log(β) + (γ-1) · ∑_{i=1}^n \log(x_i-α) - β^{-γ} · ∑_{i=1}^n (x_i - α)^γ \]
In the presence of right-censored observations, the log-likelihood function changes as well. Assume, we have in total \(n=n_e + n_r\) observations, where
\[\begin{multline*} l_\mathbf{x}(α, γ, β) = n_e \log(γ) - n_e γ \log(β) + (γ-1) · ∑_{i ∈ O} \log(x_i-α) - β^{-γ} · ∑_{i ∈ O} (x_i - α)^γ\\ - β^{-γ} · ∑_{r ∈ R} (x_r - α)^γ \end{multline*}\]
This simplifies to \[ l_\mathbf{x}(α, γ, β) = n_e \log(γ) - n_e γ \log(β) + (γ-1) · ∑_{i ∈ O} \log(x_i-α) - β^{-γ} · ∑_{i=1}^n (x_i - α)^γ \]
One can directly go to (numerically) maximize this log-likelihood function. Cousineau (2009) calls this the iterative approach.
We are able to profile out the scale parameter estimate \(\hat β\): the candidate value for a local extremum with respect to scale β is found by taking the 1st derivative and equating to 0: \[ \frac{\partial l}{\partial β}(α, γ, β) = -\frac{nγ}{β} + γ β^{-γ-1} · ∑_{i=1}^n (x_i - α)^γ \overset{!}{=}0 \] Hence (as \(γ>0\)), it must hold \(-n + β^{-γ} · ∑_{i=1}^n (x_i - α)^γ \overset{!}{=}0\) and further simplifying, \(\hat β = \sqrt[γ]{\frac{1}{n} ∑_{i=1}^n (x_i - α)^γ}\).
With right-censored observations present in the data, the result is similar:
\[ \frac{\partial l}{\partial β}(α, γ, β) = -\frac{n_e γ}{β} + γ β^{-γ-1} · ∑_{i=1}^n (x_i - α)^γ \overset{!}{=}0 \] Hence (as \(γ>0\)), it must hold \(-n_e + β^{-γ} · ∑_{i=1}^n (x_i - α)^γ \overset{!}{=}0\). The solution is \(\hat β = \sqrt[γ]{\frac{1}{n_e} ∑_{i=1}^n (x_i - α)^γ}\), we divide by the number of events \(n_e\) only but we sum over all \(x_i\) (observed and right-censored).
The only problem is when we have right-censorings earlier than the (estimated) delay α, then we potentially get negative contributions to the radicand. It seems appropriate to set these terms to zero.
If we substitute the β-estimate into the log-likelihood equation we get a simplified function to maximize. \[ \begin{align} l^P_{\mathbf{x}}(α, γ) &= n \log(γ) - n \log(\frac{1}{n} ∑_{i=1}^n (x_i - α)^γ) + (γ-1) · ∑_{i=1}^n \log(x_i-α) - n\\ &= n · \Big[ \log(γ) - \log\big(\frac{1}{n} ∑_{i=1}^n (x_i - α)^γ\big) + (γ-1) · \frac{1}{n} ∑_{i=1}^n \log(x_i-α) - 1\Big] \end{align} \]
In the presence of right-censored observations, the equation for \(l^P_{\mathbf{x}}(α, γ)\) is similar but different. We have to use the number of observed event \(n_e\) (instead of \(n\)) in most cases.
\[\begin{align*} l^P_{\mathbf{x}}(α, γ) &= n_e \log(γ) - n_e \log\big(\frac{1}{n_e} ∑_{i=1}^n (x_i - α)^γ\big) + (γ-1) · ∑_{i∈O} \log(x_i-α) - n_e\\ &= n_e · \Big[ \log(γ) - \log\big(\frac{1}{n_e} ∑_{i=1}^n (x_i - α)^γ\big) + (γ-1) · \frac{1}{n_e} ∑_{i∈O} \log(x_i-α) - 1 \Big] \end{align*}\]
The right-censored observations occur both in the simplified log-likelihood as well as in the scale parameter estimate \(\hat β\).
We can use derivatives of the simplified log-likelihood to find candidates for local maxima of it. Cousineau (2009) calls this the two step approach (as opposed to the non-profiled iterative approach above).
Forming the partial derivatives of \(l^P(α, γ)\) we get: \[ \begin{aligned} \frac{\partial l^P}{\partial α}(α, γ) &= n · \Big[ \frac{\frac{1}{n} ∑_{i=1}^n γ (x_i - α)^{γ-1}}{\frac{1}{n} ∑_{i=1}^n (x_i - α)^γ} - (γ-1) · \frac{1}{n} ∑_{i=1}^n \frac{1}{x_i-α}\Big]\\ &= γ · \frac{∑_{i=1}^n (x_i - α)^{γ-1}}{\frac{1}{n} ∑_{i=1}^n (x_i - α)^γ} - (γ-1) · ∑_{i=1}^n \frac{1}{x_i-α}\\ \frac{\partial l^P}{\partial γ}(α, γ) &= n · \Big[ \frac{1}{γ} - \frac{\frac{1}{n} ∑_i \log(x_i-α) · (x_i-α)^γ}{\frac{1}{n} ∑_{i=1}^n (x_i - α)^γ} + \frac{1}{n} ∑_{i=1}^n \log(x_i-α) \Big]\\ &= \frac{n}{γ} - \frac{∑_i \log(x_i-α) · (x_i-α)^γ}{\frac{1}{n} ∑_{i=1}^n (x_i - α)^γ} + ∑_{i=1}^n \log(x_i-α) \end{aligned} \]
In the presence of right-censored observations, the derivatives become:
\[\begin{align*} \frac{\partial l^P}{\partial α}(α, γ) &= n_e · \Big[ \frac{\frac{1}{n_e} ∑_{i=1}^n γ (x_i - α)^{γ-1}}{\frac{1}{n_e} ∑_{i=1}^n (x_i - α)^γ} - (γ-1) · \frac{1}{n_e} ∑_{i∈O} \frac{1}{x_i-α}\Big]\\ &= γ · \frac{∑_{i=1}^n (x_i - α)^{γ-1}}{\frac{1}{n_e}∑_{i=1}^n (x_i - α)^γ} - (γ-1) · ∑_{i∈O} \frac{1}{x_i-α}\\ \frac{\partial l^P}{\partial γ}(α, γ) &= n_e · \Big[ \frac{1}{γ} - \frac{\frac{1}{n_e} ∑_{i=1}^n \log(x_i-α) · (x_i-α)^γ}{\frac{1}{n_e} ∑_{i=1}^n (x_i - α)^γ} + \frac{1}{n_e} ∑_{i∈O} \log(x_i-α) \Big]\\ &= \frac{n_e}{γ} - \frac{∑_{i=1}^n\log(x_i-α) · (x_i-α)^γ}{\frac{1}{n_e} ∑_{i=1}^n (x_i - α)^γ} + ∑_{i∈O} \log(x_i-α) \end{align*}\]
Setting these two partial derivatives to 0 we get the score equations that lead to candidate points for local extrema:
\[\begin{align*} \frac{γ}{γ-1} · \frac{∑_{i=1}^n (x_i - α)^{γ-1}}{∑_{i=1}^n (x_i - α)^γ} - \frac{1}{n} ∑_{i=1}^n \frac{1}{x_i-α} &= 0\\ \frac{γ}{γ-1} - \frac{1}{n} \big(∑_{i=1}^n \frac{1}{x_i-α}\big) · \frac{∑_{i=1}^n (x_i - α)^γ}{∑_{i=1}^n (x_i - α)^{γ-1}} &= 0 \end{align*}\]
In the presence of right-censored observations, the solution equations are:
from delay α \[ \frac{γ}{γ-1} - \frac{1}{n_e} \big(∑_{i∈O} \frac{1}{x_i-α}\big) · \frac{∑_{i=1}^n (x_i - α)^γ}{∑_{i=1}^n (x_i - α)^{γ-1}} = 0 \]
from shape \(γ>0\) \[ \frac{1}{γ} - \frac{∑_{i=1}^n \log(x_i-α) · (x_i-α)^γ}{∑_{i=1}^n (x_i - α)^γ} + \frac{1}{n_e} ∑_{i∈O} \log(x_i-α) = 0 \]
Cousineau (2009) proposes a slightly varied equation to solve for the parameter estimates. The scale parameter β gets estimated using a weighting factor \(W_1\), \[ \hat β = \sqrt[γ]{\frac{1}{n} ∑_{i=1}^n \frac{(x_i - α)^γ}{W_1}} \]
For the remaining parameters, we modify the partial derivatives equations of the profiled log-likelihood function. It appears to come from the following partial derivatives of a modified log-likelihood function, where \(W_1\), \(W_2\) and \(W_3(γ)\) are seen as weights that depend on sample size \(n\). Actually, these weights are random variables that ultimately depend on the true parameter α and \(γ\) but we treat \(W_1\) and \(W_2\) as fixed for given \(n\) and we model \(W_3\) as function of shape \(γ\) for given \(n\).
\[ \begin{aligned} \frac{\partial l_w^{P}}{\partial α}(α, γ) &= n · \Big[ W_3(γ) · (γ-1) · \frac{∑_{i=1}^n (x_i - α)^{γ-1}}{∑_{i=1}^n (x_i - α)^γ} - (γ-1) · \frac{1}{n} ∑_{i=1}^n \frac{1}{x_i-α}\Big]\\ \frac{\partial l_w^{P}}{\partial γ}(α, γ) &= n · \Big[ \frac{W_2}{γ} - \frac{∑_i \log(x_i-α) · (x_i-α)^γ}{∑_{i=1}^n (x_i - α)^γ} + \frac{1}{n} ∑_{i=1}^n \log(x_i-α) \Big] \end{aligned} \]
Compared to the MLE-equations (partial derivative equation for delay), we see that \(W_3(γ)\) takes the role of \(\frac{γ}{γ-1}\).
The model fitting uses numeric optimization routines. We use parameter transformation to improve this process, in particular through suitable parameter transformation we can avoid constraints on the bounds of the parameters.
For instance, the initial delay parameter \(α_1\) is by definition bounded upwards by the first observation \(x_{(1)}\) and conventionally we assume \(α_1 ≥ 0\) as well, in particular when we model a survival time. This 2nd assumption is not strictly necessary, however, as a model with negative delay might also describe well a survival data.
To get rid of the (upper) bounds, an idea is to use the 1st observation as a normalization for \(α_1\) and find a transformation to whole ℝ:
For instance, assuming only an upper bound (\(α_1 ≤ x_{(1)}\), also negative values allowed) we get a transformation involving the logarithm: \[ \tilde{α}_1 := \log(\frac{x_{(1)}-α_1}{x_{(1)}}) = \log(1 - \frac{α_1}{x_{(1)}}) \]
Alternatively, if we want to enforce also the lower bound for \(α_1\) we can apply a logit-based
transformation to rescale the delay from \([0,x_{(1)}]\) to whole ℝ: \[
\tilde{α}_1 := \operatorname{logit}(\frac{α_1}{x_{(1)}}) = -\log(1 +
\frac{x_{(1)} - 2α_1}{α_1})
\] In R, the logit function is implemented efficiently in
stats::qlogis. The back-transformation is \(α_1 =
\frac{x_{(1)}}{1+\exp(-\tilde{α}_1)}\) (cf
stats::plogis in R). We visualize this logit-based
transformation:
The functions for random generation of data
rexp_delayed() and rweib_delayed() have an
option to produce right censored data. Currently, we implement random
right censoring under a uniform censoring scheme,
starting after the initial delay phase.1 The censoring process
\(C\) is independent of the event
process \(X\). The observed time is
\(\min(X,C)\) and it is an event if and
only if \(X≤C\), i.e., when the event
“makes it” before the censoring happens.
You provide the desired expected proportion of right censoring π via
parameter cens=. This information determines the upper
limit Z for the censoring process U(α,Z). The
expected proportion of censorings π is \(P(C<X)\). We compute this probability as
an integral of the density, we start integrating after the initial
delay. \[
\begin{equation*}
\begin{split}
π = P(C<X) &= ∫_α^∞ ∫_α^x f_{X,C}(x,c)dc dx\\
&= ∫_α^∞ ∫_α^x f_X(x) f_C(c) dc dx \qquad \text{ (independence)
}\\
&= ∫_α^∞ f_X(x) · \Big(∫_α^x f_C(c) dc\Big) dx\\
&= ∫_α^∞ f_X(x) · F_C(x) dx
\end{split}
\end{equation*}
\]
For a uniform censoring process with lower bound α and upper bound \(Z\) we have \(F_C(x) = \frac{x-α}{Z-α}\). We make sure to never exceed the specified proportion of right censorings.
Substituting the density \(f_X(x) = 𝟙_{x≥α} · λ \exp(-λ·(x-α))\) and using integration by parts: \[ \begin{equation*} \begin{split} π = P(C<X) &= ∫_α^∞ f_X(x) · F_C(x) dx\\ &= ∫_α^∞ λ \exp(-λ·(x-α)) \frac{x-α}{Z-α} dx\\ &= ∫_0^∞ λ \exp(-λ·t) \frac{t}{Z-α} dt\\ &= \frac{1}{Z-α} · \Big\{ [-\exp(-λ·t) t]_{t=0}^{t=\infty} - ∫_0^∞ -\exp(-λ·t) dt \Big\}\\ &= \frac{1}{Z-α} · \Big\{ [0-0] + ∫_0^∞ \exp(-λ·t) dt \Big\}\\ &= \frac{1}{Z-α} · \Big[ -\frac{1}{λ} \exp(-λ·t) \Big]_{t=0}^{t=\infty}\\ &= \frac{1}{λ (Z-α)} \end{split} \end{equation*} \]
We can solve for \(Z\): \[ π λ (Z-α) = 1 \qquad ⇔ \qquad Z-α = \frac{1}{π λ} \qquad ⇔ \qquad Z = α + \frac{1}{πλ} \]
As an example, to draw a random sample from a delayed exponential that (on average) has 30% right censorings, we can use the following call:
(x <- rexp_delayed(n = 24, delay1 = 5, rate1 = .2, cens = .3))
#> [1] 13.897616 11.207098+ 8.345280 11.685812 6.610885 7.137528
#> [7] 7.783468 11.489477 7.716870 5.113803+ 5.565182 7.393699
#> [13] 9.248512 5.333573+ 6.785296 6.237849 8.673463+ 5.637047
#> [19] 6.228599+ 5.056433 11.546889 11.288894 11.056427 17.476316
class(x)
#> [1] "Surv"
sum(x[,2] == 0)/length(x)
#> [1] 0.2083333The density of delayed Weibull is \(f_X(x) = 𝟙_{x≥α} · \frac{γ}{σ} · (\frac{x-α}{σ})^{γ-1} \exp(-(\frac{x-α}{σ})^γ)\) with shape parameter \(γ\) and scale parameter \(σ\). Then, using integration by parts and using \(∫_0^∞ \exp(-a · x^b) dx = \frac{1}{b} a^{-1/b} · Γ(\frac{1}{b})\) for \(a=1\) and \(b=γ\), the shape parameter, we get \[ \begin{equation*} \begin{split} π = P(C<X) &= ∫_α^∞ f_X(x) · F_C(x) dx\\ &= ∫_α^∞ \frac{γ}{σ} · (\frac{x-α}{σ})^{γ-1} \exp(-(\frac{x-α}{σ})^γ) \frac{x-α}{Z-α} dx\\ &= ∫_0^∞ \frac{γ}{σ} · (\frac{t}{σ})^{γ-1} \exp(-(\frac{t}{σ})^γ) \frac{t}{Z-α} dt\\ &= \frac{γ}{σ(Z-α)} · ∫_0^∞ t · \frac{γ}{σ} · (\frac{t}{σ})^{γ-1} \exp(-(\frac{t}{σ})^γ) dt\\ &= \frac{γ}{σ(Z-α)} · \Big\{ [-t · \frac{σ}{γ} · \exp(-(\frac{t}{σ})^γ)]_{t=0}^{t=\infty} - ∫_0^∞ -\frac{σ}{γ} · \exp(-(\frac{t}{σ})^γ) dt \Big\}\\ &= \frac{γ}{σ(Z-α)} · \Big\{ [0 - 0] + \frac{σ}{γ} ∫_0^∞ \exp(-(\frac{t}{σ})^γ) dt \Big\}\\ &= \frac{1}{Z-α} · ∫_0^∞ \exp(-(\frac{t}{σ})^γ) dt\\ &= \frac{σ}{Z-α} · ∫_0^∞ \exp(-u^γ) du\\ &= \frac{σ}{Z-α} · \frac{1}{γ} · Γ(\frac{1}{γ}) \end{split} \end{equation*} \]
We can solve for \(Z\): \[ π · (Z-α) = σ · \frac{1}{γ} · Γ(\frac{1}{γ}) \qquad ⇔ \qquad Z = α + \frac{σ}{π γ} · Γ(\frac{1}{γ}) %%% Z = \frac{α + \frac{σ}{γ}·Γ(\frac{1}{γ})}{π} \]
As an example, to draw a random sample from a delayed Weibull with different shape parameters that (on average) has 30% right censorings, we can use the following call:
(x1 <- rweib_delayed(n = 24, delay1 = 5, scale1 = 3.5, shape1 = 1.7, cens = .3))
#> [1] 6.359985 6.347200 5.597964 7.269995 9.600487+ 6.947495+
#> [7] 6.134365 5.887219 6.336762 7.226662 8.442201 10.976897
#> [13] 9.138677 9.633564+ 5.868864 5.777568+ 7.281201 6.605345
#> [19] 9.098217+ 9.688999 8.111162 7.381088 5.751070 8.302671+
class(x1)
#> [1] "Surv"
sum(x1[,2] == 0) / length(x1)
#> [1] 0.25
(x2 <- rweib_delayed(n = 24, delay1 = 5, scale1 = 3.5, shape1 = .4, cens = .3))
#> [1] 6.874758 5.030978+ 11.675836+ 6.369219+ 6.759326 5.027795
#> [7] 5.009805 5.001818 15.109905+ 19.540309+ 5.184137 6.764632+
#> [13] 5.166811 7.879559 5.772267 7.903502 19.519993+ 10.577621
#> [19] 33.496340 5.053817 6.239492 21.061737 11.729313 5.003492
sum(x2[,2] == 0) / length(x2)
#> [1] 0.2916667
survival::survfit(c(x1, x2) ~ rep(c("x1","x2"), each = 24)) |>
plot(mark.time=T, conf.int=F, lty = 1:2, col = 1:2)Corrected MLE (MLEc) was proposed by Cheng and Iles (1987). It uses the normal likelihood except for the first observation that is instrumental for the delay estimation. For the first observation MLEc uses the difference in cumulative distribution function between the first two observations. The log-likelihood function becomes, using \(z_i := x_i - α\),
\[ \begin{align} l_\mathbf{x}(α, γ, β) &= \log(\exp(-(\frac{z_2}{β})^γ) - \exp(-(\frac{z_1}{β})^γ)) + (n-1) \log(γ) - (n-1) γ \log(β) + (γ-1) · ∑_{i=2}^n \log z_i - β^{-γ} · ∑_{i=2}^n z_i^γ\\ &= -(\frac{z_1}{β})^γ + \log(1-\exp(-\frac{z_2^γ - z_1^γ}{β^γ}) + (n-1) \log(γ) - (n-1) γ \log(β) + (γ-1) · ∑_{i=2}^n \log z_i - β^{-γ} · ∑_{i=2}^n z_i^γ \end{align} \] We could use 1st Taylor approximation to simplify \(\log(1+z) = z - \frac{z^2}{2} + \frac{z^3}{3} ∓ ... ≈ z\) for all \(z\) close to zero, \(|z|<1\): \[ l_\mathbf{x}(α, γ, β) ≈ -(\frac{z_1}{β})^γ - \exp(-\frac{z_2^γ - z_1^γ}{β^γ}) + (n-1) \log(γ) - (n-1) γ \log(β) + (γ-1) · ∑_{i=2}^n \log z_i - β^{-γ} · ∑_{i=2}^n z_i^γ \]
We could use partial derivatives to find local maxima of this approximate log-likelihood.
Question is if we allow \(C\) to occur during the incubation period. If yes and there is a long delay then we will see many censored observations during the incubation period which might be odd. Therefore, we decided to model the censoring process as \(C \sim U(α, Z)\).↩︎