survive.NelsonAalen

class survive.NelsonAalen(*, conf_type='log', conf_level=0.95, var_type='aalen', tie_break='discrete')[source]

Nelson-Aalen nonparametric cumulative hazard estimator.

This estimator was suggested by Nelson in [1] in the context of reliability, and it was rediscovered and generalized by Aalen in [2].

Parameters:
conf_type : {‘log’, ‘linear’}

Type of confidence interval for the cumulative hazard estimate to report.

conf_level : float

Confidence level of the confidence intervals.

var_type : {‘aalen’, ‘greenwood’}

Type of variance estimate to compute.

tie_break : {‘discrete’, ‘continuous’}

Specify how to handle tied event times.

Notes

Suppose we have observed right-censored and left-truncated event times. Let \(T_1 < T_2 < \cdots\) denote the ordered distinct event times. Let \(N(t)\) be the number of events observed up to time \(t\), let \(Y(t)\) denote the number of individuals at risk (under observation but not yet censored or “dead”) at time \(t\), and let

\[\begin{split}J(t) = \begin{cases} 1 & \text{if $Y(t) > 0$,} \\ 0 & \text{otherwise.} \end{cases}\end{split}\]

The Nelson-Aalen estimator estimates the cumulative hazard function of the time-to-event distribution by

\[\widehat{A}(t) = \int_0^t \frac{J(s)}{Y(s)} \, dN(s).\]

This formula, proposed in [2], is computed as a sum in one of two ways depending on how tied event times are handled (cf. Section 3.1.3 in [3]). This is governed by the tie_break parameter.

  • If tie_break is “discrete”, then it is assumed that tied events are possible, and we compute the integral defining the Nelson-Aalen estimator directly, leading to

    \[\widehat{A}(t) = \sum_{j : T_j \leq t} \frac{\Delta N(T_j)}{Y(T_j)}.\]

    Here \(\Delta N(T_j)\) is the number of events occurring at time \(T_j\).

  • If tie_break is “continuous”, then it is assumed that tied events only happen due to grouping or rounding, and the tied times are treated as if they happened in succession, each one immediately following the previous one. This leads to the estimator

    \[\widehat{A}(t) = \sum_{j : T_j \leq t} \sum_{k=0}^{\Delta N(T_j) - 1} \frac{1}{Y(T_j) - k}.\]

The variance of the Nelson-Aalen estimator is estimated by one of two estimators suggested by [4]. You can select the variance estimator by using the var_type parameter.

  • If var_type is “aalen”, then the variance estimator derived in [2] is used:

    \[\widehat{\mathrm{Var}}(\widehat{A}(t)) = \int_0^t \frac{J(s)}{Y(s)^2} \, dN(s).\]

    This integral is computed in one of two ways depending on tie_break:

    • If tie_break is “discrete”, then the variance estimator is computed as

      \[\widehat{\mathrm{Var}}(\widehat{A}(t)) = \sum_{j : T_j \leq t} \frac{\Delta N(T_j)}{Y(T_j)^2}.\]
    • If tie_break is “continuous”, then the variance estimator is computed as

      \[\widehat{\mathrm{Var}}(\widehat{A}(t)) = \sum_{j : T_j \leq t} \sum_{k=0}^{\Delta N(T_j) - 1} \frac{1}{\left(Y(T_j) - k\right)^2}.\]

    This estimator of the variance was found to generally overestimate the true variance of the Nelson-Aalen estimator [4].

  • If var_type is “greenwood”, then the Greenwood-type estimator derived in [4] is used:

    \[\begin{split}\widehat{\mathrm{Var}}(\widehat{A}(t)) &= \int_0^t \frac{J(s) (Y(s) - \Delta N(s))}{Y(s)^3} \, dN(s) \\ &= \sum_{j : T_j \leq t} \frac{(Y(T_j) - \Delta N(T_j)) \Delta N(T_j)}{Y(T_j)^3}.\end{split}\]

    This estimator tends to have a uniformly lower mean squared error than the Aalen estimator, but it also tends to underestimate the true variance of the Nelson-Aalen estimator [4].

The difference between these two variance estimators is only significant at times when the risk set is small. Klein [4] recommends the Aalen estimator over the Greenwood-type estimator.

The two types of confidence intervals (“log” and “linear”) provided here are presented in [5]. They are based on the asymptotic normality of the Nelson-Aalen estimator and are derived from the delta method by suitable transformations of the estimator. The “log” intervals are more accurate for smaller sample sizes, but both methods are equivalent for large samples [5].

References

[1](1, 2) Wayne Nelson. “Theory and Applications of Hazard Plotting for Censored Failure Data”. Technometrics, Volume 14, Number 4 (1972), pp. 945–966. JSTOR.
[2](1, 2, 3, 4) Odd Aalen. “Nonparametric Inference for a Family of Counting Processes”. The Annals of Statistics, Volume 6, Number 4 (1978), pp. 701–726. JSTOR.
[3](1, 2) Odd O. Aalen, Ørnulf Borgan, and Håkon K. Gjessing. Survival and Event History Analysis. A Process Point of View. Springer-Verlag, New York (2008) pp. xviii+540. DOI.
[4](1, 2, 3, 4, 5, 6) John P. Klein. “Small sample moments of some estimators of the variance of the Kaplan-Meier and Nelson-Aalen estimators.” Scandinavian Journal of Statistics. Volume 18, Number 4 (1991), pp. 333–40. JSTOR.
[5](1, 2, 3) Ole Bie, Ørnulf Borgan, and Knut Liestøl. “Confidence Intervals and Confidence Bands for the Cumulative Hazard Rate Function and Their Small Sample Properties.” Scandinavian Journal of Statistics, Volume 14, Number 3 (1987), pp. 221–33. JSTOR.
Attributes:
conf_level

Confidence level of the confidence intervals.

conf_type

Type of confidence intervals to report.

data_

Survival data used to fit the estimator.

random_state

Seed for this model’s random number generator.

summary

Get a summary of this estimator.

tie_break

How to handle tied event times.

var_type

Type of variance estimate to compute.

Methods

check_fitted() Check whether this model is fitted.
fit(time, **kwargs) Fit the Nelson-Aalen estimator to survival data.
plot(*groups[, ci, ci_style, ci_kwargs, …]) Plot the estimates.
predict(time, *[, return_se, return_ci]) Compute estimates.
to_string([max_line_length]) String representation of this model.
check_fitted()[source]

Check whether this model is fitted. If not, raise an exception.

conf_level

Confidence level of the confidence intervals.

Returns:
conf_level : float

The confidence level.

conf_type

Type of confidence intervals to report.

Returns:
conf_type : str

The type of confidence interval.

data_

Survival data used to fit the estimator.

This property is only available after fitting.

Returns:
data : SurvivalData

The survive.SurvivalData instance used to fit the estimator.

fit(time, **kwargs)[source]

Fit the Nelson-Aalen estimator to survival data.

Parameters:
time : one-dimensional array-like or str or SurvivalData

The observed times, or all the survival data. If this is a survive.SurvivalData instance, then it is used to fit the estimator and any other parameters are ignored. Otherwise, time and the keyword arguments in kwargs are used to initialize a survive.SurvivalData object on which this estimator is fitted.

**kwargs : keyword arguments

Any additional keyword arguments used to initialize a survive.SurvivalData instance.

Returns:
survive.nonparametric.NelsonAalen

This estimator.

See also

survive.SurvivalData
Structure used to store survival data.
plot(*groups, ci=True, ci_style='fill', ci_kwargs=None, mark_censor=True, mark_censor_kwargs=None, legend=True, legend_kwargs=None, colors=None, palette=None, ax=None, **kwargs)[source]

Plot the estimates.

Parameters:
*groups : list of group labels

Specify the groups whose curves should be plotted. If none are given, the curves for all groups are plotted.

ci : bool, optional

If True, draw pointwise confidence intervals.

ci_style : {“fill”, “lines”}, optional

Specify how to draw the confidence intervals. If ci_style is “fill”, the region between the lower and upper confidence interval curves will be filled. If ci_style is “lines”, only the lower and upper curves will be drawn (this is inspired by the style of confidence intervals drawn by plot.survfit in the R package survival).

ci_kwargs : dict, optional

Additional keyword parameters to pass to fill_between() (if ci_style is “fill”) or step() (if ci_style is “lines”) when plotting the pointwise confidence intervals.

mark_censor : bool, optional

If True, indicate the censored times by markers on the plot.

mark_censor_kwargs : dict, optional

Additional keyword parameters to pass to scatter() when marking censored times.

legend : bool, optional

Indicates whether to display a legend for the plot.

legend_kwargs : dict, optional

Keyword parameters to pass to legend().

colors : list or tuple or dict or str, optional

Colors for each group. This is ignored if palette is provided. This can be a sequence of valid matplotlib colors to cycle through, or a dictionary mapping group labels to matplotlib colors, or the name of a matplotlib colormap.

palette : str, optional

Name of a seaborn color palette. Requires seaborn to be installed. Setting a color palette overrides the colors parameter.

ax : matplotlib.axes.Axes, optional

The axes on which to plot. If this is not specified, the current axes will be used.

**kwargs : keyword arguments

Additional keyword arguments to pass to step() when plotting the estimates.

Returns:
matplotlib.axes.Axes

The Axes on which the plot was drawn.

predict(time, *, return_se=False, return_ci=False)[source]

Compute estimates.

Parameters:
time : array-like

One-dimensional array of times at which to make estimates.

return_se : bool, optional

If True, also return standard error estimates.

return_ci : bool, optional

If True, also return confidence intervals.

Returns:
estimate : pandas.DataFrame

DataFrame of estimates. Each columns represents a group, and each row represents an entry of time.

std_err : pandas.DataFrame, optional

Standard errors of the estimates. Same shape as estimate. Returned only if return_se is True.

lower : pandas.DataFrame, optional

Lower confidence interval bounds. Same shape as estimate. Returned only if return_ci is True.

upper : pandas.DataFrame, optional

Upper confidence interval bounds. Same shape as estimate. Returned only if return_ci is True.

random_state

Seed for this model’s random number generator. This may not be an numpy.random.RandomState instance. The internal RNG is not a public attribute and should not be used directly.

Returns:
random_state : object

The seed for this model’s RNG.

summary

Get a summary of this estimator.

Returns:
summary : NonparametricEstimatorSummary

The summary of this estimator.

tie_break

How to handle tied event times.

to_string(max_line_length=75)[source]

String representation of this model.

Parameters:
max_line_length : int, optional

Specifies the maximum length of a line. If None, everything will be on one line.

Returns:
model_string : str

A string representation of this model which should be able to be used to instantiate a new identical model.

var_type

Type of variance estimate to compute.