RANK-BASED INFERENCE FOR THE ACCELERATED FAILURE TIME MODEL IN THE PRESENCE OF INTERVAL CENSORED DATA

Semiparametric analysis and rank-based inference for the accelerated failure time model are complicated in the presence of interval censored data. The main difficulty with the existing rank-based methods is that they involve estimating functions with the possibility of multiple roots. In this paper a class of asymptotically normal rank estimators is developed which can be acquired via linear programming for estimating the parameters of the model, and a two-step iterative algorithm is introduced for solving the estimating equations. The proposed inference procedures are assessed through a real example. The results of applying the proposed methodology on the breast cancer data show that the algorithm converges after three iterations, and the estimations of model parameter based on Log-rank and Gehan weight functions are fairly close with small standard errors.


1.
Introduction. In failure time data analysis the accelerated failure time model is quite attractive to researchers as an alternative to the popular Cox proportional hazard model [4], due to its easy physical interpretation. This model relates the covariates linearly to the logarithm of the failure time [9].
For the semiparametric analysis of accelerated failure time model, a class of linear rank estimators was proposed by [16] based on the weighted log-rank statistics. In the early 1990's [17], [18], [20], [10], [11], and [22] studied the asymptotic properties of the rank estimators, among others.
Semiparametric methods for the accelerated failure time model have had significant theoretical advances, but they have rarely been used in applications. The main difficulty with the existing computational method is that they are not reliable and efficient in practice. To be precise, the existing rank-based estimating functions possibly result multiple roots, since they are step functions of the model parameters.
In this paper, simple and reliable methods are provided for estimating the parameters of the aforementioned accelerated failure time model in the presence of interval censored data. The proposed methodology modifies a generalisation of the rank estimators previously studied by [20], [18], and [22]. To obtain the rank estimator with the Gehan-type weight function, it is shown that the minimization of the Gehan [7] estimating function can be carried out by minimizing a convex objective function. A class of general weighted log-rank estimating functions is introduced to approximate the log-rank estimating functions. These estimating equations are solved through an iterative algorithm. The proposed two-step algorithm includes estimation step and approximation step, which can be executed through linear programming in each iteration.

2.1.
Model. Let random variable T i be the failure time for the ith subject for i = 1, 2, ..., n, and let Z i be a p × 1 associated vector of covariates. Assuming that the failure times are independent, the accelerated failure time model specifies that where β 0 is a p × 1 vector of unknown regression parameters and ε i 's are error terms which are independent and identically distributed, but their distribution is unspecified. Suppose that the failure time for subject i, T i cannot be observed directly, but it can be examined at random examination times. For the ith subject let R i = (R i1 , R i2 , ..., R ini ) denote the sequence of ordered examination times, where n i is the number of examinations. Assume that examination times and failure times are independent. For the subject i, let R iL be the last examination time prior to T i , and let R iU be the first examination time posterior to T i . It is assumed that the true unknown T i lies in the interval (R iL , R iU ].
Let C i denotes the censoring time for the ith subject. It is assumed that C i are independent of T i conditionally on Z i . The data then consist of (R iL ,R iU , and I{.} is the indicator function).
Let random variableT i indicates an approximation of the unknown true failure time T i , which satisfies the assumption The proposed weighted log-rank estimating function for β 0 is as follows: or , and φ is a specified weight function which is twice continuously differentiable on [0, 1]. Note that U φ (β) is the log-rank estimating function [13] if φ(t; β) = 1, and it is the Gehan estimating function [7] if φ(t; β) = S (0) (t; β). A reasonable estimator for β 0 is a value of β, denoted byβ φ , which is a zero-crossing of U φ (β) or a minimiser of U φ (β) . The random vector n 1 2 (β φ − β 0 ) is asymptotically zero-mean normal, according to the general asymptotic theory for the rank estimators ( [18]; [10]; [22]).
However, it is not easy to solve the system of equations {U φ (β) = 0}, since U φ (β) is in general a p-dimensional step function of β. In fact, U φ (β) is non-monotone and discontinues in β, thus it is difficult to solve the equation or locate a minimiser.
2.2. Proposed methodology. The Gehan estimating function can be written as follows: which is monotone in each component of β [6]. Considering that ( , it is not difficult to see that the Gehan estimating function U G (β) is the gradient of the loss function The minimizer of L G (β), denoted byβ G , is a root of the Gehan estimating function. As considered by [12], this minimization can be carried out through minimizing the linear function The random vector n 1 2 (β G − β 0 ) is asymptotically zero-mean normal according to the general asymptotic theory for the rank estimators, as mentioned before.
To construct weighted Gehan-type loss functions which their minimisers can be obtained by linear programming, an approach similar to that of [8] will be developed. Consider the following modification of the weighted log-rank estimating functioñ where ψ(t; β) = φ(t, β)/S (0) (t; β) andβ is a preliminary estimator of β 0 , such aŝ β G . Clearly, Note thatŨ φ (β; β) is the same as U G (β), except for ψ{e * i (β),β} which are free of β. Thus, monotone in each component of β,Ũ φ (β; β) is the gradient of the following loss functioñ Similar to the case of L G (β), The minimisation ofL φ (β; β) is a linear optimization problem.
For inference in the presence of interval censored data, we will propose a twostep iterative algorithm, which estimates β 0 at the first step and approximates the censored observations at the second step. Due to the fact that the interval censored failure times are unknown, the estimating function in general can be written as U φ (β; {T i ; i = 1, ..., n}). The proposed estimation-approximation algorithm at the kth iteration can be described in two steps as follows: Estimation step: Determineβ 3. Numerical results.

3.1.
Breast cancer data. The proposed method is illustrated with the data which were obtained from a retrospective study on early breast cancer for 94 patients, reported by [5]. The data consist of 46 patients who were given radiation therapy alone (RT), and 48 patients who were given radiation therapy plus adjuvant chemotherapy (RTC). At visits, which were supposed to be at clinic every 4 to 6 months, the breast retraction of the patients were evaluated by physicians. However, the actual visit times and times between visits differ from patient to patient. Whereas the exact time to breast retraction was not observed, the data contain the observed time periods during which breast retractions occurred, represented by intervals in months.
To determine the effect of the treatments on the time to breast retractions, the treatment indicator Z i was considered as the covariate in the accelerated failure time model (model (1)). To be clear, Z i represents RT group and RCT group by  Table 1. Accelerated failure time analysis for the breast cancer data the values 0 and 1, respectively. The estimation-approximation algorithm was used to obtain the estimates based on the log-rank and the Gehan weight functions. The estimates were identical after five iterations, with the accuracy of 0.0001 between successive estimates. The confidence intervals were constructed using the Wald method. The main results of the regression parameter estimates are summarized in Table 1.
To find the failure times that maximize the estimating function in the approximation step of the algorithm the behaviour of Log-rank and Gehan estimating functions were examined using the presented failure time data of 94 patients. Figure 1 displays the line charts for Log-rank and Gehan estimating functions versus the failure times of 10th and 30th patients which are members of the RT group, and 60th and 90th patients which are members of the RTC group. As illustrated in Figure 1, for patients in the same group of therapy methods, the behaviour of estimating functions were very similar. To be specific, the estimating functions increase when the failure times increase. Therefore, maximizing the estimating function in the approximation step can be carried out by taking the upper bound of the censoring interval as the approximation of the failure time for each subject. 4. Conclusion. The introduced iterative algorithm is defined based on a class of general weighted log-rank estimating functions and their corresponding loss functions, and it does not involve estimating functions that are step functions of the model parameters. The first step of the algorithm is computationally simple, since it requires solving the estimating equations which are convex functions of the model parameters. Although the maximization part in the second step of the algorithm may create difficulties, the results of the numerical study displays that examining the behaviour of the estimating function for the possible values of approximated failure times helps to avoid complications in practical use.