Chapter 2 - Mathematical Foundations

Slides

Lecture slides here. (To convert html to pdf, press E \(\to\) Print \(\to\) Destination: Save to pdf)

Chapter Summary

Time-to-event data require specialized techniques for analysis. These tend to focus on the survival function \(S(t)\) and the hazard function \(\lambda(t)\) of the latent (uncensored) event time \(T\). Some methods rely on parametric models, while others leverage counting processes and martingales for robust inference.

Notation and basic quantities

An outcome event time can be represented either as a random variable \(T\) or as a counting process \(N^*(t) = I(T \le t)\). The survival function is defined as \[ S(t) = \mathrm{pr}(T > t), \] while the hazard function \(\lambda(t)\) quantifies the instantaneous risk of experiencing the event at time \(t\), conditional on survival up to that point. The cumulative hazard function is obtained by integrating \(\lambda(t)\) over \([0, t]\): \(\Lambda(t) = \int_0^t \lambda(u)\,\mathrm{d}u\), and relates to \(S(t)\) via \[ \Lambda(t) = -\log\bigl\{S(t)\bigr\} \quad\Longleftrightarrow\quad S(t) = \exp\bigl\{-\Lambda(t)\bigr\}. \] This relationship is central in survival analysis.

Simple parametric models for \(T\) include the exponential, Weibull, Gamma, and log-normal distributions, each with distinct hazard functions. The exponential model assumes constant hazard, while the Weibull model allows for time-varying (but still monotone) hazards.

Observed data and likelihood

In practice, the event time is subject to censoring by time \(C\). As a result, we only observe \(X = \min(T, C)\), along with the event indicator \(\delta = I(T \le C)\). Under independent censoring, the likelihood function for the observed data \((X, \delta)\) is given by \[ p\bigl(X,\delta\bigr) \;=\; \lambda\bigl(X\bigr)^{\delta}\,S\bigl(X\bigr). \] This means that, for a sample of size \(n\), the log-likelihood is \[ \ell_n(\theta) = n^{-1}\sum_{i=1}^n \Bigl\{ \delta_i\,\log \lambda\bigl(X_i;\theta\bigr) \;-\; \int_{0}^{\infty} I\bigl(X_i \ge t\bigr)\,\lambda\bigl(t;\theta\bigr)\,\mathrm{d}t. \Bigr\} \] where \(\lambda(t;\theta)\) is the hazard function parametrized by \(\theta\). We can use this general expression for the log-likelihood to derive maximum likelihood estimator for any parametric models.

Stochastic integrals and martingales

To count the observed event under censoring, define \(N(t) = N^*(t\wedge C)\), where \(x\wedge y = \min(x, y)\). The step function \(N(t)\) takes a jump of size 1, i.e., \(\mathrm{d}N(t)=1\), at \(t = X\) if \(\delta = 1\); it stays flat if \(\delta = 0\). Hence, any function of the form \(\delta h(X)\) can be re-written as a stochastic integral \(\int_0^\infty h(t)\mathrm{d}N(t)\).

If we consider \(\mathrm{d}N(t)\) as the Bernoulli response of event occurrence at time \(t\), then we can decompose it into \(\mathrm{d}N(t) = I(X\ge t)\lambda(t)\mathrm{d} t + \mathrm{d}M(t)\), where \[ \mathrm{d}M(t) \;=\; \underbrace{\mathrm{d}N(t)}_{\text{Observed}} \;-\; \underbrace{I\bigl(X \ge t\bigr)\,\lambda(t)\,\mathrm{d}t}_{\text{Expectation given data prior to $t$}}. \] is the martingale increment.

The martingale property of \(M(t)\) implies that the expectation of \(\mathrm{d}M(t)\), given the event history up to time \(t\), is zero. It also implies that the increments at different times are uncorrelated. This property of uncorrelated increments simplifies the variance calculation for martingale integrals of the form: \[ \int_0^t h(u)\,\mathrm{d}M(u) \] Many test statistics, estimators, and score functions can be expressed in the above form, and the martingale properties facilitate their asymptotic analysis.

Conclusion

Although \(T\) may not be directly observed, the partial information in \((X, \delta)\) still supports principled inferences about its distribution. Parametric models describe \(\lambda(t)\) or \(S(t)\) via a likelihood-based approach, while a martingale-based framework captures event processes through residuals and their properties. The latter approach will be utilized extensively in the non- and semi-parametric analysis presented in later chapters.