/ Citation

Sutradhar R and Cook RJ. **A conditional frailty model for bivariate interval-truncated failure time data: an application to a study on siblings diagnosed with schizophrenia**. *J Med Stat Inform*. 2016; **4**:1. http://dx.doi.org/10.7243/2052-6237-4-1

| **Cited-by**

Rinku Sutradhar^{1,2*} and Richard J. Cook^{3}

*Correspondence: Rinku Sutradhar rinku.sutradhar@ices.on.ca

1. Institute for Clinical Evaluative Sciences, Toronto, Ontario M4N 3M5, Canada.

2. Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario M5T 3M7, Canada.

3. Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

**Background**: Paired failure time data often arise in medical studies involving familial information. When
each member of a pair is subject to interval-truncation, there is lack of literature on methodologies for
analyzing such data. The aim of this paper is to develop an approach for examining the association between
paired failure times under the presence of interval-truncation.

**Methods**: A conditional frailty model is described and an expectation-maximization algorithm is developed
for estimation of the model parameters. Simulation studies are conducted to examine the performance of
the proposed algorithm, and an application of the methods is illustrated using data from a familial study on
pairs of siblings diagnosed with schizophrenia.

**Results**: The results from the simulation studies show that all model parameters can be estimated with
negligible bias, even under finite sample size. In the motivating dataset, our proposed model and method of
estimation reveal a strong presence of dependence among times to diagnosis of schizophrenia within pairs
of siblings.

**Discussion**: The proposed conditional frailty model is easy to implement since the expectation step of the
algorithm is relatively straightforward. The maximization step also provides closed form expressions for the
parameter estimates of the hazard function.

**Keywords**: Frailty model, interval-truncation, bivariate failure time data, monte carlo EM algorithm

Interval truncated failure time data arise when only those individuals with a failure time that lies within a certain interval are observed. When a failure time falls outside this truncation interval, no information regarding the subject is available. One may view this as a screening procedure in which all observations outside the interval known as the truncation region are removed. It is the concept of "who is included" that distinguishes truncation from censoring. In the case of censoring, we have partial information on all observations in a sample in that subjects are known to have failure times greater (or less) than some value. However in the case of truncation, individuals with failure times outside the truncation interval are not considered for inclusion in the study. This important difference between truncation and censoring is reflected in the construction of the data likelihood.

Approaches for dealing with censoring in the presence of interval truncation were first discussed by Turnbull [1]. He proposed a method for obtaining the nonparametric maximum likelihood estimate of the distribution function under interval truncation that also accommodates left, right, and interval censoring. Turnbull described a self-consistent approach to estimation that turns out to be a version of the EM algorithm [2]. Frydman noted that the method proposed by Turnbull was not applicable in the case of both truncation and censoring, and suggested appropriate modifications [3]. Alioum and Commenges provided a further detailed correction of Turnbull's approach and offered an extension to regression analysis [4].

We are interested developing methodology for examining interval-truncated failure times in a bivariate setting. That is, when available information consists of paired failure times (e.g., we consider families of size two), where, due to the observation scheme, paired failure times are known only if they both fall within their respective truncation intervals. Such data arises in settings where families are observed through family disease registries, for example, or where data are grouped by family by design. When paired data are collected subject to truncation, it is frequently of interest to investigate associations among failure times within families. The existing literature on methods for analyzing interval-truncated failure times in a bivariate setting is very limited and mostly refers to the case when only one component of the bivariate vector is subject to truncation. Exceptions to this assumption include the studies by van der Laan and Gurler [5,6], in which nonparametric estimation of the bivariate survival function is discussed where both components of the bivariate vector are randomly left or right-truncated, but not interval-truncated. This paper proposes a model and estimation approach for examining interval-truncated bivariate failure time data. Straight forward extensions are possible to deal with the multivariate setting.

**Notation**

Suppose a pair of individuals is included in a study if both
members' event times are observed to fall in the calendar
time interval [*L,R*]. Let *Y _{ij}* represent the year of birth for the

For the* i ^{th}* family with two members or siblings

The likelihood for bivariate truncated failure times with
observed data (*t _{ij}, B_{ij}=[l_{ij},r_{ji}]; j=1,2, i=1,...n*) can be expressed
simply in terms of the bivariate survivor function:

where

and

is the probability of inclusion in the truncation rectangle
*Bi=B _{i1}XB_{i2}*.

**Model formulation: joint distributions using a frailty
model**

A conditional formulation is a common approach adopted
for modeling bivariate failure time data. Joint distributions
are routinely formulated by assuming responses are conditionally
independent given a common scalar random effect.
Integrating the conditional joint distribution with respect to
the unobserved random effect provides the joint distribution
of the failure times. Suppose (*T _{1},T_{2}*) are bivariate failure
times for a particular pair. The joint survivor function under
a conditional model can be written as

and the marginal survivor function for *T _{j}* is

Paired failure times *T _{1}* and

Various models have been proposed in the literature for
the distribution of the random effects: one-parameter gamma
distribution [8], the log-normal distribution [13], the positive
stable distribution [9], and the inverse gamma distribution [10].
We assume the random effects arise from a one-parameter
gamma distribution with mean 1 and variance φ^{-1}. Furthermore,
consider a frailty model in which random effects act
multiplicatively on the hazard rate for each member of a family.
Conditional on the random effect, the survivor function for *T _{j}* is

where ^*_{0j} (t) is the cumulative baseline
hazard function for *T _{j} (j=1,2)* under the conditional model.
These assumptions provide closed form expressions for the
joint and marginal survivor function, respectively.

and

Note that the marginal survivor function S_{j}*(t), under the
conditional formulation, depends on both the parameter φ
and the parameters of the conditional baseline hazard function
*Λ _{0j}*(t)*. Moreover, as ϕ→∞, S

Thus, as the variance
of the random effect distribution tends to zero, *S* (t _{1},t_{2} )*= S

It is important to interpret the parameters of the model appropriately. Parameter φ measures both the lack of fit of the conditional hazard function, as well as the association among paired failure times. Large values of φ could indicate that standard survival models may be inadequate by failing to sufficiently account for variation in the data; this can arise simply from poor fit of a model. That is, large values of φ may be an indication of unaccounted for dependence, overdispersion, or other forms of model misspecification including violations of the proportional hazards assumption. Moreover, as the variance approaches zero, failure times among individuals within a pair approach independence.

**Proposed estimation approach: an EM algorithm**

Methodological complications arise when both failure times
in a pair are observed subject to interval truncation. The inclusion
of shared random effects among individuals within a
pair, in addition to the presence of interval truncation for each
member in the pair, results in a non-trivial likelihood function.
Direct maximization of this likelihood function requires
integrating the conditional joint distribution with respect to
the unobserved random effect, which is not straightforward
particularly if strong distributional assumptions are to be
avoided. In addition, if a piecewise constant model is used,
direct maximization becomes more challenging as the number
of pieces (and hence parameters) increases. Instead of trying
to directly maximize over a high dimensional space, the EM
algorithm offers an alternative method of estimation that
is sometimes slow to converge but avoids occasional divergence
problems. We develop an EM algorithm for bivariate
interval-truncated failure times under a conditional (random
effect) formulation. We use the concept of "ghosts", which
was first discussed by Turnbull [1] with the aim of obtaining
a nonparametric estimate of a truncated distribution in the
univariate setting. Here the notion of "ghost pairs" is used
for the analysis of interval-truncated failure times in the
bivariate case.

Let *t _{i}=(t_{i1},t_{i2})* denote the vector of observed failure times
for the subjects of the ith pair and let

Figure 1 **:** **Bivariate truncation rectangle for a pair of failure
times.**

Due to truncation, the pair giving the observations *t _{i}=(t_{i1},t_{i2})* may be considered a remnant of an unknown number of pairs
from the same "birth cohort". Pairs are said to be in the same
birth cohort if they have the same dates of birth. We define
"ghost pairs" corresponding to pair

To define the complete data, let *J _{i}* be the number (unknown)
of ghost pairs for the

parameter φ. Here the complete data is given by Y=(Y

The complete data likelihood and corresponding loglikelihood for the ith pair of individuals are constructed as

and

respectively, where

We further assume a gamma distribution for the random effect distribution with p.d.f. given as

where *E*(α)=1 and Var (α)=φ^{-1}. Based on these assumptions,
the complete data log-likelihood can now be re-expressed
-the parameters of this function consist of the parameters
of the baseline hazard functions and the parameter of the
random effects distribution. These parameters will be estimated
via an Expectation-Maximization algorithm, where
a weakly parametric approach will be taken by assuming a
piecewise constant model for the baseline hazard function.
The mathematical details of this algorithm are provided in
the Appendix. We show in the maximization step that the
parameter estimates of the baseline hazard function have a
closed simple form and the parameter estimate of the random
effects distribution can be easily obtained using standard
Newton-Raphson procedures.

**Evaluating performance of proposed methodology via
simulations**

We conducted simulations for investigating the performance
of our proposed EM algorithm. We generated random effects
*α _{1},...,α_{n}* from a gamma distribution with p.d.f.

where *E*(α)=1 and Var(α)=φ^{-1}. Bivariate failure times (*T _{1},T_{2}*) were
simulated from the joint distribution

with

where the cumulative baseline hazard function Λ_{0}* (*t*)=(ψt)^{γ} had
a Weibull form with shape parameter γ and scale parameter
ψ^{-1}. We generated *n*=1000 pairs of failure times. The median
of the marginal distribution

was specified at
*n*=40. Note that *T _{1}* and

The value of ψ was expressed in terms of φ and γ. It was obtained from the marginal survival function as follows:

which gave

The truncation percentiles were calculated in a similar manner. They depended on both parameters φ and γ:

where *P*(*T≤R*)=*p* for p=0.95 and 0.90. Upon generating uniform
random variables U~ U(0,1) and V~ U(0,1), the bivariate failure
times were derived from

as

and

To accommodate for bivariate interval truncation, (*t _{1},t_{2}*) was
retained only if it belonged to rectangle

The generated data of paired failure times were fit assuming a gamma distribution for the random effects and a cumulative baseline hazard function of the following forms: (i) 1-piece or exponential model (true model when γ=1) (ii) 2-piece model, and a (iii) 4-piece model.

The cut points were determined based on the percentiles of the true marginal distribution. That is,

where *P*(*T*≤*q*)=*p*. For a 2-piece model, the cut point was
computed for p=0.50, and for a 4-piece model, the cut points
were computed for *p*=0.25,0.50 and 0.75. Parameter estimates
under the bivariate truncation conditions were obtained using
the proposed EM algorithm with ghost pairs discussed
above. Note that the the crude rates of events during each
of the piecewise constant intervals from our simulated data
were taken to be the initial values of the piecewise constant
rate parameters for the EM algorithm, and the tolerance level
for convergence was specified at 1e-04.

**Application of proposed methodology to data on siblings
with schizophrenia**

L.S Penrose (1945) discussed a survey on the cases of familial
mental illness, conducted in 1944 by the Division of Psychiatric
Research for the Ontario Department of Health. An investigation
was made on all families in which two or more members
had been admitted to a mental hospital in Ontario. The records
date back to 1926, and thus a period of 18 years was covered.
Of the families that were admitted to a mental hospital, only
those families in which two siblings were admitted for a diagnosis
of schizophrenia were included in our study. The age
at diagnosis of schizophrenia was taken to be the age at first
admission to a mental hospital for schizophrenia.

The primary objective was to determine whether there
was an association within pairs in diagnosis times among
schizophrenics. Estimating the cumulative hazard function
and the marginal survival function of the event times were
also of interest. Various forms for the baseline hazard function
were examined, including: (i) Weibull with shape parameter γ and scale parameter ψ^{-1} i.e.,

and (ii) piecewise constant models with two, three, and four pieces i.e.,

where *ρ _{r}* is the rth unknown rate parameter corresponding
to the

**Results from simulations**

Results from the simulation study under the true value γ=1
and γ=1.2 are provided in Tables 1 and 2, respectively. The
mean and simulation-based standard error are given for the
estimates of the model parameters. Interest lied in examining
how the average bias and standard error for φ and ρ vary with
respect to (i) the number of pieces (ii) the amount of truncation,
and (iii) the true value of the variance of the random effect
distribution. Recall that as the variance φ^{-1} approaches zero,
there was no association among failure times within a pair.

Table 1
**: Summary of analysis of interval-truncated bivariate failure time data under a conditional model with
marginal survivor function.**

Table 2
**: Summary of analysis of interval-truncated bivariate failure time data under a conditional model with
marginal survivor function.**

From Table 1, when increasing the number of pieces in the model of the baseline hazard function, there was no notable effect on the bias of the estimates, however the standard error increased. The same occurred as the degree of truncation increased. Furthermore, a larger value for the variance of the random effect distribution resulted in an increase in both the bias and the standard error of the estimate for φ. Table 2 showed a strong decreasing trend in the bias of the estimate for φ as the number of pieces increased. Furthermore, a rise in the degree of truncation resulted in a notable increase in the standard error of the estimate for φ. Note that the magnitude of the bias of the estimates for all parameters in Tables 1 and 2 were in fact very small, irrespective of any change in conditions, and the standardized bias also indicated that the bias was negligible throughout.

**Results from application to data on siblings with
schizophrenia**

The data was comprised of 173 pairs of siblings, in which
each member of a pair was hospitalized with schizophrenia
between the years 1926 and 1944. The age at first hospitalization
was considered as the age of diagnosis (the event time).
This observation was left and right truncated between the
age at 1926 and the age at 1944, respectively. The ages at first
hospitalization for schizophrenia for the first-born and secondborn
siblings are represented by *T _{1}* and

Table 3
**: Schizophrenia Data (first 20 sibling pairs).**

Figure 2 **:** **Truncation rectangles for sibling pairs with
schizophrenia.**

Table 4 provides the results from our analysis. The estimates and their standard errors are given for the parameters of the Weibull, 2-piece, 3-piece, and 4-piece models, along with the corresponding observed log-likelihood at the maximum. Once these estimates were obtained, the 25th, 50th, and 75th percentiles under each model were computed by solving

for *t _{q}*, where

which was simply a function of marginal and joint survival
functions. The estimate of the variance of the log(*OR(t))* estimator
was computed using a sandwich estimator. The results
of the odds ratio under all models led to similar conclusions.
Specifically, under a Weibull baseline hazard model, the 95%
confidence interval for the log*(OR(t=21))* was (0.080, 3.780),
indicating a strong presence of association among diagnosis
times for schizophrenia within pairs of siblings.

Table 4
**: Results of the schizophrenia data analysis under a conditional formulation.**

The results in Table 4 were based on the joint modeling
of paired failure times. For comparison purposes, however,
similar measures may be obtained based on independence
assumptions among event times within a pair. With a naive
matched-pairs analysis under a binary response (age of onset
≤21 or age of onset >21), the estimate of the log*(OR(t=21))* is 0.788 with standard error 0.381, and the 95% confidence
interval for the log*(OR(t=21))* was (0.035, 1.525). Furthermore,
McNemar's test of marginal association for such matched-pair
data gave a chi-square test statistic of 3.781 with a resulting
p-value of 0.052. These naive results agreed with our more
formal analysis, indicating a notable presence of association
among diagnosis times for schizophrenia within pairs
of siblings.

An illustration of the performance of the piecewise constant models can be seen from the plot of the estimated marginal survivor functions given in Figure 3. The 95% confidence bands are also provided under the Weibull fit at years 20, 40, and 60. The results from the piecewise models agree with that from the Weibull model. However, during the first two decades in which very few events are observed to occur, the four-piece model provides a more suitable fit than the two-piece or three-piece model.

Figure 3 **:** **Estimated marginal survivor functions under various
fits to the schizophrenia data based on a conditional model.**

Although the piecewise model fit could be carried out using direct maximization, the EM algorithm was used to avoid divergence problems often arising when there is a large number of pieces. In the case of a 2-piece model, the computing time using a standard maximization function was faster than implementing an EM approach. However, if one wanted to fit an 8-piece or 10-piece model, the EM algorithm was attractive since it did not require obtaining the derivative functions, and thus avoided possible problems with convergence.

This paper was concerned with inference for bivariate failure time data that were interval-truncated. A conditional (random effect) formulation was discussed for modeling the marginal and joint distribution of the failure times, where particular interest lied in measuring the association within pairs. Under the conditional model, the marginal distributions were determined by both the variance parameter of the random effect distribution and the parameters of the conditional hazard function. To minimize the need for strong distributional assumptions and facilitate a likelihood analysis based on parametric models, the baseline hazard functions were assumed to be of a piecewise constant form.

The literature offers limited approaches in estimation when both failure times in a pair are observed subject to interval truncation. This paper developed an EM approach for estimation based on a conditional formulation. The idea of "ghosts" was introduced in the bivariate setting so that the complete data log-likelihood could be easily maximized. The formulation was convenient to implement since the E-step was relatively straightforward, and it also provided closed form expressions for the parameter estimates of the piecewise constant conditional hazard function in the M-step. Results from the simulation studies indicated that all model parameters could be estimated with negligible bias in finite samples. The data simulated for Table 1 was based on an exponential baseline hazard function (as the shape parameter γ=1), which is why more pieces were not needed to improve model fit. On the other hand, as the baseline hazard function was not constant for the data simulated in Table 2, we see that increasing the number of pieces provided a better approximation to the true marginal survivor function. In the motivating schizophrenia data set, our proposed model and method of estimation revealed a strong presence of dependence among schizophrenia diagnosis times within pairs of siblings. In addition, since the rate of onset for schizophrenia was quite low in the early years, the four-piece model was more sensitive to this shift and was able to detect this shift, thus providing different results than the two-piece and three-piece models. Moreover, as the estimate of the shape parameter under the Weibull distribution was much greater than 1.0, it was reasonable that increasing the number of pieces provided a better fit, as seen by the plots of the estimated marginal survivor functions in Figure 3.

Although we considered the scenario where there was no censoring, our proposed methods could be naturally extended to handle the case where the event times were interval-censored within their respective truncation intervals. We also considered the scenario in which there were paired failure times; our conditional approach could be easily extended to handle a multivariate setting with more than 2 individuals within a group.

The authors declare that they have no competing interests.

Authors' contributions |
RS |
RJC |

Research concept and design | √ | √ |

Collection and/or assembly of data | √ | √ |

Data analysis and interpretation | √ | √ |

Writing the article | √ | √ |

Critical revision of the article | √ | √ |

Final approval of article | √ | √ |

Statistical analysis | √ | √ |

This work was supported by the Institute for Clinical Evaluative Sciences. Richard Cook is a Canada Research Chair in Statistical Methods for Health Research. We would also like to thank Dr. Janice Husted from the University of Waterloo, Canada, and Dr. Anne Bassett from the University of Toronto, Canada, for their valuable input and guidance regarding the data.

EICs: Jimmy Efird, East Carolina University, USA.

Max K. Bulsara, University of Notre Dame, Australia.

Received: 17-Nov-2015 Final Revised: 31-Dec-2015

Accepted: 13-Jan-2016 Published: 29-Jan-2016

- Turnbull B.W.
**The empirical distribution function with arbitrarily grouped, censoredand truncated data**.*Journal of the Royal Statistical Society B*. 1976;**38**:290-295. | Article - Dempster A.P, Laird N and Rubin D.B.
**Maximum likelihood from incomplete data via the EM algorithm**.*Journal of the Royal Statistical Society B*. 1977;**39**:1-38. | Pdf - Frydman H.
**A note on nonparametric estimation of the distribution function frominterval-censored and truncated observations**.*Journal of the Royal Statistical Society B*. 1994;**56**:71-74. | Article - Alioum A and Commenges D.
**A proportional hazards model for arbitrarily censored and truncated data**.*Biometrics*. 1996;**52**:512-24. | Article | PubMed - Van der Laan M.J.
**Nonparametric estimation of the bivariate survival function with truncated data**.*Journal of Multivariate Analysis*. 1996;**58**:107-131. | Article - Gurler U.
**Nonparametric bivariate estimation with randomly truncated observations**.*Handbook of Statistics*. 2004;**23**:195-207. - Andersen P.K, Borgan O, Gill R.D and Keiding N.
**Statistical Models Based on Counting Processes**. New York: Springer-Verlag. 1993. - Clayton D.G.
**A model for association in bivariate lifetables and its application inepidemiological studies of familial tendency in chronic disease incidence**.*Biometrika*. 1978;**65**:141-151. | Article - Hougaard P.
**Survival models for heterogeneous populations derived from stable distributions**.*Biometrika*. 1986;**73**:387-396. | Article - Hougaard P.
**A class of multivariate failure time distributions**.*Biometrika*. 1986;**73**:671-678. | Article - Penrose LS.
**Survey of cases of familial mental illness**.*Eur Arch Psychiatry Clin Neurosci*. 1991;**240**:315-24. | Article | PubMed - Crow T.J.
**A note on “Survey of Cases of Familial Mental Illness” by L.S. Penrose**.*European Archives of Psychiatry and Clinical Neuroscience*. 1991; 507-517. | Article - Duchateau L and Janssen P.
**The Frailty Model**. New York: Springer-Verlag. 2008. - White F, Livesey D and Hayes B.
**Developmental Psychology: From Infancy to Development, 3rd edition**. Pearson Australia. 2013.

Volume 4

Sutradhar R and Cook RJ. **A conditional frailty model for bivariate interval-truncated failure time data: an application to a study on siblings diagnosed with schizophrenia**. *J Med Stat Inform*. 2016; **4**:1. http://dx.doi.org/10.7243/2052-6237-4-1

View Metrics

Copyright © 2015 Herbert Publications Limited. All rights reserved.

Post Comment|View Comments