`survive`.NelsonAalen¶

class survive.NelsonAalen(*, conf_type='log', conf_level=0.95, var_type='aalen', tie_break='discrete')[source]¶

Nelson-Aalen nonparametric cumulative hazard estimator.

This estimator was suggested by Nelson in [1] in the context of reliability, and it was rediscovered and generalized by Aalen in [2].

Parameters:	conf_type : {‘log’, ‘linear’} Type of confidence interval for the cumulative hazard estimate to report. conf_level : float Confidence level of the confidence intervals. var_type : {‘aalen’, ‘greenwood’} Type of variance estimate to compute. tie_break : {‘discrete’, ‘continuous’} Specify how to handle tied event times.

Notes

Suppose we have observed right-censored and left-truncated event times. Let $T_1 < T_2 < \cdots$ denote the ordered distinct event times. Let $N(t)$ be the number of events observed up to time $t$, let $Y(t)$ denote the number of individuals at risk (under observation but not yet censored or “dead”) at time $t$, and let

\[\begin{split}J(t) = \begin{cases} 1 & \text{if $Y(t) > 0$,} \\ 0 & \text{otherwise.} \end{cases}\end{split}\]

The Nelson-Aalen estimator estimates the cumulative hazard function of the time-to-event distribution by

\[\widehat{A}(t) = \int_0^t \frac{J(s)}{Y(s)} \, dN(s).\]

This formula, proposed in [2], is computed as a sum in one of two ways depending on how tied event times are handled (cf. Section 3.1.3 in [3]). This is governed by the tie_break parameter.

If tie_break is “discrete”, then it is assumed that tied events are possible, and we compute the integral defining the Nelson-Aalen estimator directly, leading to

\[\widehat{A}(t) = \sum_{j : T_j \leq t} \frac{\Delta N(T_j)}{Y(T_j)}.\]

Here $\Delta N(T_j)$ is the number of events occurring at time $T_j$.
If tie_break is “continuous”, then it is assumed that tied events only happen due to grouping or rounding, and the tied times are treated as if they happened in succession, each one immediately following the previous one. This leads to the estimator

\[\widehat{A}(t) = \sum_{j : T_j \leq t} \sum_{k=0}^{\Delta N(T_j) - 1} \frac{1}{Y(T_j) - k}.\]

The variance of the Nelson-Aalen estimator is estimated by one of two estimators suggested by [4]. You can select the variance estimator by using the var_type parameter.

If var_type is “aalen”, then the variance estimator derived in [2] is used:

\[\widehat{\mathrm{Var}}(\widehat{A}(t)) = \int_0^t \frac{J(s)}{Y(s)^2} \, dN(s).\]

This integral is computed in one of two ways depending on tie_break:
- If tie_break is “discrete”, then the variance estimator is computed as
  
  \[\widehat{\mathrm{Var}}(\widehat{A}(t)) = \sum_{j : T_j \leq t} \frac{\Delta N(T_j)}{Y(T_j)^2}.\]
- If tie_break is “continuous”, then the variance estimator is computed as
  
  \[\widehat{\mathrm{Var}}(\widehat{A}(t)) = \sum_{j : T_j \leq t} \sum_{k=0}^{\Delta N(T_j) - 1} \frac{1}{\left(Y(T_j) - k\right)^2}.\]
This estimator of the variance was found to generally overestimate the true variance of the Nelson-Aalen estimator [4].
If var_type is “greenwood”, then the Greenwood-type estimator derived in [4] is used:

\[\begin{split}\widehat{\mathrm{Var}}(\widehat{A}(t)) &= \int_0^t \frac{J(s) (Y(s) - \Delta N(s))}{Y(s)^3} \, dN(s) \\ &= \sum_{j : T_j \leq t} \frac{(Y(T_j) - \Delta N(T_j)) \Delta N(T_j)}{Y(T_j)^3}.\end{split}\]

This estimator tends to have a uniformly lower mean squared error than the Aalen estimator, but it also tends to underestimate the true variance of the Nelson-Aalen estimator [4].

The difference between these two variance estimators is only significant at times when the risk set is small. Klein [4] recommends the Aalen estimator over the Greenwood-type estimator.

The two types of confidence intervals (“log” and “linear”) provided here are presented in [5]. They are based on the asymptotic normality of the Nelson-Aalen estimator and are derived from the delta method by suitable transformations of the estimator. The “log” intervals are more accurate for smaller sample sizes, but both methods are equivalent for large samples [5].

References

[1]	(1, 2) Wayne Nelson. “Theory and Applications of Hazard Plotting for Censored Failure Data”. Technometrics, Volume 14, Number 4 (1972), pp. 945–966. JSTOR.

[2]	(1, 2, 3, 4) Odd Aalen. “Nonparametric Inference for a Family of Counting Processes”. The Annals of Statistics, Volume 6, Number 4 (1978), pp. 701–726. JSTOR.

[3]	(1, 2) Odd O. Aalen, Ørnulf Borgan, and Håkon K. Gjessing. Survival and Event History Analysis. A Process Point of View. Springer-Verlag, New York (2008) pp. xviii+540. DOI.

[4]	(1, 2, 3, 4, 5, 6) John P. Klein. “Small sample moments of some estimators of the variance of the Kaplan-Meier and Nelson-Aalen estimators.” Scandinavian Journal of Statistics. Volume 18, Number 4 (1991), pp. 333–40. JSTOR.

[5]	(1, 2, 3) Ole Bie, Ørnulf Borgan, and Knut Liestøl. “Confidence Intervals and Confidence Bands for the Cumulative Hazard Rate Function and Their Small Sample Properties.” Scandinavian Journal of Statistics, Volume 14, Number 3 (1987), pp. 221–33. JSTOR.

Attributes:	`conf_level` Confidence level of the confidence intervals. `conf_type` Type of confidence intervals to report. `data_` Survival data used to fit the estimator. `random_state` Seed for this model’s random number generator. `summary` Get a summary of this estimator. `tie_break` How to handle tied event times. `var_type` Type of variance estimate to compute.

Methods

`check_fitted`()	Check whether this model is fitted.
`fit`(time, **kwargs)	Fit the Nelson-Aalen estimator to survival data.
`plot`(*groups[, ci, ci_style, ci_kwargs, …])	Plot the estimates.
`predict`(time, *[, return_se, return_ci])	Compute estimates.
`to_string`([max_line_length])	String representation of this model.

check_fitted()[source]¶: Check whether this model is fitted. If not, raise an exception.

conf_level¶

Confidence level of the confidence intervals.

Returns:	conf_level : float The confidence level.

conf_type¶

Type of confidence intervals to report.

Returns:	conf_type : str The type of confidence interval.

data_¶

Survival data used to fit the estimator.

This property is only available after fitting.

Returns:	data : SurvivalData The `survive.SurvivalData` instance used to fit the estimator.

fit(time, **kwargs)[source]¶

Fit the Nelson-Aalen estimator to survival data.

Parameters:

time : one-dimensional array-like or str or SurvivalData: The observed times, or all the survival data. If this is a survive.SurvivalData instance, then it is used to fit the estimator and any other parameters are ignored. Otherwise, time and the keyword arguments in kwargs are used to initialize a survive.SurvivalData object on which this estimator is fitted.
**kwargs : keyword arguments: Any additional keyword arguments used to initialize a survive.SurvivalData instance.

Returns:

survive.nonparametric.NelsonAalen: This estimator.

See also

survive.SurvivalData: Structure used to store survival data.

plot(*groups, ci=True, ci_style='fill', ci_kwargs=None, mark_censor=True, mark_censor_kwargs=None, legend=True, legend_kwargs=None, colors=None, palette=None, ax=None, **kwargs)[source]¶

Plot the estimates.

Parameters:

*groups : list of group labels: Specify the groups whose curves should be plotted. If none are given, the curves for all groups are plotted.
ci : bool, optional: If True, draw pointwise confidence intervals.
ci_style : {“fill”, “lines”}, optional: Specify how to draw the confidence intervals. If ci_style is “fill”, the region between the lower and upper confidence interval curves will be filled. If ci_style is “lines”, only the lower and upper curves will be drawn (this is inspired by the style of confidence intervals drawn by plot.survfit in the R package survival).
ci_kwargs : dict, optional: Additional keyword parameters to pass to fill_between() (if ci_style is “fill”) or step() (if ci_style is “lines”) when plotting the pointwise confidence intervals.
mark_censor : bool, optional: If True, indicate the censored times by markers on the plot.
mark_censor_kwargs : dict, optional: Additional keyword parameters to pass to scatter() when marking censored times.
legend : bool, optional: Indicates whether to display a legend for the plot.
legend_kwargs : dict, optional: Keyword parameters to pass to legend().
colors : list or tuple or dict or str, optional: Colors for each group. This is ignored if palette is provided. This can be a sequence of valid matplotlib colors to cycle through, or a dictionary mapping group labels to matplotlib colors, or the name of a matplotlib colormap.
palette : str, optional: Name of a seaborn color palette. Requires seaborn to be installed. Setting a color palette overrides the colors parameter.
ax : matplotlib.axes.Axes, optional: The axes on which to plot. If this is not specified, the current axes will be used.
**kwargs : keyword arguments: Additional keyword arguments to pass to step() when plotting the estimates.

Returns:

matplotlib.axes.Axes: The Axes on which the plot was drawn.

predict(time, *, return_se=False, return_ci=False)[source]¶

Compute estimates.

Parameters:

time : array-like: One-dimensional array of times at which to make estimates.
return_se : bool, optional: If True, also return standard error estimates.
return_ci : bool, optional: If True, also return confidence intervals.

Returns:

estimate : pandas.DataFrame: DataFrame of estimates. Each columns represents a group, and each row represents an entry of time.
std_err : pandas.DataFrame, optional: Standard errors of the estimates. Same shape as estimate. Returned only if return_se is True.
lower : pandas.DataFrame, optional: Lower confidence interval bounds. Same shape as estimate. Returned only if return_ci is True.
upper : pandas.DataFrame, optional: Upper confidence interval bounds. Same shape as estimate. Returned only if return_ci is True.

random_state¶

Seed for this model’s random number generator. This may not be an numpy.random.RandomState instance. The internal RNG is not a public attribute and should not be used directly.

Returns:	random_state : object The seed for this model’s RNG.

summary¶

Get a summary of this estimator.

Returns:	summary : NonparametricEstimatorSummary The summary of this estimator.

Parameters:	max_line_length : int, optional Specifies the maximum length of a line. If None, everything will be on one line.
Returns:	model_string : str A string representation of this model which should be able to be used to instantiate a new identical model.

survive.NelsonAalen¶

`survive`.NelsonAalen¶