survive
.NelsonAalen¶

class
survive.
NelsonAalen
(*, conf_type='log', conf_level=0.95, var_type='aalen', tie_break='discrete')[source]¶ NelsonAalen nonparametric cumulative hazard estimator.
This estimator was suggested by Nelson in [1] in the context of reliability, and it was rediscovered and generalized by Aalen in [2].
Parameters:  conf_type : {‘log’, ‘linear’}
Type of confidence interval for the cumulative hazard estimate to report.
 conf_level : float
Confidence level of the confidence intervals.
 var_type : {‘aalen’, ‘greenwood’}
Type of variance estimate to compute.
 tie_break : {‘discrete’, ‘continuous’}
Specify how to handle tied event times.
Notes
Suppose we have observed rightcensored and lefttruncated event times. Let \(T_1 < T_2 < \cdots\) denote the ordered distinct event times. Let \(N(t)\) be the number of events observed up to time \(t\), let \(Y(t)\) denote the number of individuals at risk (under observation but not yet censored or “dead”) at time \(t\), and let
\[\begin{split}J(t) = \begin{cases} 1 & \text{if $Y(t) > 0$,} \\ 0 & \text{otherwise.} \end{cases}\end{split}\]The NelsonAalen estimator estimates the cumulative hazard function of the timetoevent distribution by
\[\widehat{A}(t) = \int_0^t \frac{J(s)}{Y(s)} \, dN(s).\]This formula, proposed in [2], is computed as a sum in one of two ways depending on how tied event times are handled (cf. Section 3.1.3 in [3]). This is governed by the tie_break parameter.
If tie_break is “discrete”, then it is assumed that tied events are possible, and we compute the integral defining the NelsonAalen estimator directly, leading to
\[\widehat{A}(t) = \sum_{j : T_j \leq t} \frac{\Delta N(T_j)}{Y(T_j)}.\]Here \(\Delta N(T_j)\) is the number of events occurring at time \(T_j\).
If tie_break is “continuous”, then it is assumed that tied events only happen due to grouping or rounding, and the tied times are treated as if they happened in succession, each one immediately following the previous one. This leads to the estimator
\[\widehat{A}(t) = \sum_{j : T_j \leq t} \sum_{k=0}^{\Delta N(T_j)  1} \frac{1}{Y(T_j)  k}.\]
The variance of the NelsonAalen estimator is estimated by one of two estimators suggested by [4]. You can select the variance estimator by using the var_type parameter.
If var_type is “aalen”, then the variance estimator derived in [2] is used:
\[\widehat{\mathrm{Var}}(\widehat{A}(t)) = \int_0^t \frac{J(s)}{Y(s)^2} \, dN(s).\]This integral is computed in one of two ways depending on tie_break:
If tie_break is “discrete”, then the variance estimator is computed as
\[\widehat{\mathrm{Var}}(\widehat{A}(t)) = \sum_{j : T_j \leq t} \frac{\Delta N(T_j)}{Y(T_j)^2}.\]If tie_break is “continuous”, then the variance estimator is computed as
\[\widehat{\mathrm{Var}}(\widehat{A}(t)) = \sum_{j : T_j \leq t} \sum_{k=0}^{\Delta N(T_j)  1} \frac{1}{\left(Y(T_j)  k\right)^2}.\]
This estimator of the variance was found to generally overestimate the true variance of the NelsonAalen estimator [4].
If var_type is “greenwood”, then the Greenwoodtype estimator derived in [4] is used:
\[\begin{split}\widehat{\mathrm{Var}}(\widehat{A}(t)) &= \int_0^t \frac{J(s) (Y(s)  \Delta N(s))}{Y(s)^3} \, dN(s) \\ &= \sum_{j : T_j \leq t} \frac{(Y(T_j)  \Delta N(T_j)) \Delta N(T_j)}{Y(T_j)^3}.\end{split}\]This estimator tends to have a uniformly lower mean squared error than the Aalen estimator, but it also tends to underestimate the true variance of the NelsonAalen estimator [4].
The difference between these two variance estimators is only significant at times when the risk set is small. Klein [4] recommends the Aalen estimator over the Greenwoodtype estimator.
The two types of confidence intervals (“log” and “linear”) provided here are presented in [5]. They are based on the asymptotic normality of the NelsonAalen estimator and are derived from the delta method by suitable transformations of the estimator. The “log” intervals are more accurate for smaller sample sizes, but both methods are equivalent for large samples [5].
References
[1] (1, 2) Wayne Nelson. “Theory and Applications of Hazard Plotting for Censored Failure Data”. Technometrics, Volume 14, Number 4 (1972), pp. 945–966. JSTOR. [2] (1, 2, 3, 4) Odd Aalen. “Nonparametric Inference for a Family of Counting Processes”. The Annals of Statistics, Volume 6, Number 4 (1978), pp. 701–726. JSTOR. [3] (1, 2) Odd O. Aalen, Ørnulf Borgan, and Håkon K. Gjessing. Survival and Event History Analysis. A Process Point of View. SpringerVerlag, New York (2008) pp. xviii+540. DOI. [4] (1, 2, 3, 4, 5, 6) John P. Klein. “Small sample moments of some estimators of the variance of the KaplanMeier and NelsonAalen estimators.” Scandinavian Journal of Statistics. Volume 18, Number 4 (1991), pp. 333–40. JSTOR. [5] (1, 2, 3) Ole Bie, Ørnulf Borgan, and Knut Liestøl. “Confidence Intervals and Confidence Bands for the Cumulative Hazard Rate Function and Their Small Sample Properties.” Scandinavian Journal of Statistics, Volume 14, Number 3 (1987), pp. 221–33. JSTOR. Attributes: conf_level
Confidence level of the confidence intervals.
conf_type
Type of confidence intervals to report.
data_
Survival data used to fit the estimator.
random_state
Seed for this model’s random number generator.
summary
Get a summary of this estimator.
tie_break
How to handle tied event times.
var_type
Type of variance estimate to compute.
Methods
check_fitted
()Check whether this model is fitted. fit
(time, **kwargs)Fit the NelsonAalen estimator to survival data. plot
(*groups[, ci, ci_style, ci_kwargs, …])Plot the estimates. predict
(time, *[, return_se, return_ci])Compute estimates. to_string
([max_line_length])String representation of this model. 
conf_level
¶ Confidence level of the confidence intervals.
Returns:  conf_level : float
The confidence level.

conf_type
¶ Type of confidence intervals to report.
Returns:  conf_type : str
The type of confidence interval.

data_
¶ Survival data used to fit the estimator.
This
property
is only available after fitting.Returns:  data : SurvivalData
The
survive.SurvivalData
instance used to fit the estimator.

fit
(time, **kwargs)[source]¶ Fit the NelsonAalen estimator to survival data.
Parameters:  time : onedimensional arraylike or str or SurvivalData
The observed times, or all the survival data. If this is a
survive.SurvivalData
instance, then it is used to fit the estimator and any other parameters are ignored. Otherwise, time and the keyword arguments in kwargs are used to initialize asurvive.SurvivalData
object on which this estimator is fitted. **kwargs : keyword arguments
Any additional keyword arguments used to initialize a
survive.SurvivalData
instance.
Returns:  survive.nonparametric.NelsonAalen
This estimator.
See also
survive.SurvivalData
 Structure used to store survival data.

plot
(*groups, ci=True, ci_style='fill', ci_kwargs=None, mark_censor=True, mark_censor_kwargs=None, legend=True, legend_kwargs=None, colors=None, palette=None, ax=None, **kwargs)[source]¶ Plot the estimates.
Parameters:  *groups : list of group labels
Specify the groups whose curves should be plotted. If none are given, the curves for all groups are plotted.
 ci : bool, optional
If True, draw pointwise confidence intervals.
 ci_style : {“fill”, “lines”}, optional
Specify how to draw the confidence intervals. If ci_style is “fill”, the region between the lower and upper confidence interval curves will be filled. If ci_style is “lines”, only the lower and upper curves will be drawn (this is inspired by the style of confidence intervals drawn by plot.survfit in the R package survival).
 ci_kwargs : dict, optional
Additional keyword parameters to pass to
fill_between()
(if ci_style is “fill”) orstep()
(if ci_style is “lines”) when plotting the pointwise confidence intervals. mark_censor : bool, optional
If True, indicate the censored times by markers on the plot.
 mark_censor_kwargs : dict, optional
Additional keyword parameters to pass to
scatter()
when marking censored times. legend : bool, optional
Indicates whether to display a legend for the plot.
 legend_kwargs : dict, optional
Keyword parameters to pass to
legend()
. colors : list or tuple or dict or str, optional
Colors for each group. This is ignored if palette is provided. This can be a sequence of valid matplotlib colors to cycle through, or a dictionary mapping group labels to matplotlib colors, or the name of a matplotlib colormap.
 palette : str, optional
Name of a seaborn color palette. Requires seaborn to be installed. Setting a color palette overrides the colors parameter.
 ax : matplotlib.axes.Axes, optional
The axes on which to plot. If this is not specified, the current axes will be used.
 **kwargs : keyword arguments
Additional keyword arguments to pass to
step()
when plotting the estimates.
Returns:  matplotlib.axes.Axes
The
Axes
on which the plot was drawn.

predict
(time, *, return_se=False, return_ci=False)[source]¶ Compute estimates.
Parameters:  time : arraylike
Onedimensional array of times at which to make estimates.
 return_se : bool, optional
If True, also return standard error estimates.
 return_ci : bool, optional
If True, also return confidence intervals.
Returns:  estimate : pandas.DataFrame
DataFrame of estimates. Each columns represents a group, and each row represents an entry of time.
 std_err : pandas.DataFrame, optional
Standard errors of the estimates. Same shape as estimate. Returned only if return_se is True.
 lower : pandas.DataFrame, optional
Lower confidence interval bounds. Same shape as estimate. Returned only if return_ci is True.
 upper : pandas.DataFrame, optional
Upper confidence interval bounds. Same shape as estimate. Returned only if return_ci is True.

random_state
¶ Seed for this model’s random number generator. This may not be an
numpy.random.RandomState
instance. The internal RNG is not a public attribute and should not be used directly.Returns:  random_state : object
The seed for this model’s RNG.

summary
¶ Get a summary of this estimator.
Returns:  summary : NonparametricEstimatorSummary
The summary of this estimator.

tie_break
¶ How to handle tied event times.

to_string
(max_line_length=75)[source]¶ String representation of this model.
Parameters:  max_line_length : int, optional
Specifies the maximum length of a line. If None, everything will be on one line.
Returns:  model_string : str
A string representation of this model which should be able to be used to instantiate a new identical model.

var_type
¶ Type of variance estimate to compute.