
Stanley C, Molyneux E and Mukaka M. Comparison of performance of exponential, Cox proportional hazards, weibull and frailty survival models for analysis of small sample size data. J Med Stat Inform. 2016; 4:2. http://dx.doi.org/10.7243/2053-7662-4-2
Christopher Stanley1*, Elizabeth Molyneux3 and Mavuto Mukaka2,4
*Correspondence: Christopher Stanley stanleychristopher1@yahoo.com
1. UNC Project-Malawi, Lilongwe, Malawi.
2. The Malawi Liverpool Wellcome Trust Research Laboratories, College of Medicine, Blantyre, Malawi.
3. Department of Paediatrics and Child Health, College of Medicine, University of Malawi, Malawi.
4. Department of Public Health, College of Medicine, University of Malawi, Malawi.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Background: The Cox Proportional Hazards(PH) model is commonest survival data model used in clinical trials. However, little is known as to whether or not this approach is robust to the size of the study. We compared performance of Cox, Weibull, Exponential and Frailty models in a randomized study that had a small sample size in general.
Methods: Models were fitted to data from a three-arm randomized efficacy trial of Kaposi's sarcoma at Queen Elizabeth Central Hospital (QECH) of children aged <16 years. Patients were randomized to receive Vincristine monotherapy or Etoposide or Vincristine plus Bleomycin. The Cox, Weibull, Exponential and Frailty models were fitted to the data to obtain the adjusted effect of treatment on survival. The performance of models were compared using the Akaike's Information Criteria (AIC). Plots of log-log of survival against log of survival time, and the Schoenfeld's global tests were used to test suitability of the PH assumption.
Results: Ninety-two patients were available for analyses of which 64% were males, the mean age was 8 years (SD=2.8years) and 89% were HIV positive. Exponential model was the best fitting method with AIC=174.1. Children treated with Vincristine monotherapy survived evidently poorly compared with those on Etoposide (HR=5.8, p=0.04).
Conclusion: Exponential models can elicit more valid results than semi-parametric CoxPH model in a clinical trial with small sample size. This emphasizes the fact that when models are fitted to data, it is good practice to assess the goodness of fit and where appropriate alternative models should be fitted.
Keywords: Cox proportional hazards models, survival, parametric models, small sample size, Malawi
The Cox proportional hazards (PH) model is the most widely used approach in survival analysis of clinical trials because it requires few assumptions [1]. The model was proposed by Cox in 1972 for analysis of survival data with and without censoring; for identifying differences in survival due to treatment and prognostic factors in clinical trials. The main assumption of the Cox model is that the hazard ratios of any two people are independent of time, known as the PH assumption. However as observed by Ponnuraja and Venkatesan [2], this is not always appropriate as sometimes the assumptions do not hold. Unfortunately, in practice researchers often do not test models assumptions for fitted models and this practice does not spare the fitting of Cox PH model.
Furthermore, the Cox model is popular because it is considered to be robust to violation of model assumptions; the estimated hazards are always positive and the hazard ratio is often calculated and reported [2,3].
The CoxPH model is parameterized in terms of the hazard model as
(1)
Where h0(t) is the baseline hazard, exp(β1xi1+.....+βpxip) represents subject-specific component, xi1, xi2,...,xip are the explanatory variables, for ith individual where i=1,....,n. As can be observed from the expression (1), the Cox PH model is semi-parametric. Alternative models for survival analysis include the Exponential and Weibull, both of which are parametric. Unlike the CoxPH model, the Weibull model allows more flexibility because the associated hazard rate is not constant with respect to time. Also, we use a maximum likelihood process to estimate the unknown parameters. The Weibull model has two parameters, shape parameter (λ); and scale parameter . Basing on notation by Hosmer and Lemeshow [2] its survivorship function is
(2)
while the hazard function has the form
(3)
where λ=1/σ, for convenience. The hazard function can be expressed in proportional hazard form as
(4)
where the baseline hazard function h0(t)=λγtλ-1;
If expressed in the accelerated failure time form the hazard functionis given as
(5)
A value of λ<1 indicates that the failure rate decreases over time, λ=1 indicates that the failure rate is constant over time (reduces to an Exponential model) while λ>1 indicates that the failure rate increases with time. When the index (shape) parameter equals 1, Weibull reduces to Exponential model. The standard models including the Cox, Exponential or Weibull presented above assume a homogeneous population with independent and identical failure times. To account for heterogeneity of patients in clinical trials caused by unmeasured covariates, extensions of the Cox model called frailty models are introduced. The hazard function of frailty models is given as
(6)
where θ is the unobservable random variable. The survival function has the form
(7)
where S(t|θ,x) is the fraction of individuals surviving the time t after follow- up given the vector of observable covariates X and frailty θ [2]. In situations where the Cox PH model is deemed inappropriate, it is crucially important to investigate appropriate survival model(s) that can elicit more valid results from the clinical trial. This study emphasizes the importance of assessing model adequacy, where appropriate to consider alternative models, then compare the different models and chose the best fitting model. We compare the performance of the Cox PH, and other PH parametric survival models (Exponential, Weibull and Frailty) on a small sample size data from a Kaposi's sarcoma clinical trial at Queen Elizabeth Central Hospital, Blantyre Malawi. The exponential and the Weibull models were considered because they are the only ones that can be parameterized as either a proportional hazards model or an accelerated failure time (AFT) model. We also evaluated the Frailty model to assess effects of any unmeasured patients' characteristics.
Study setting and design
The data was from an open label, randomized three-arm trial
conducted in the Queen Elizabeth Central Hospital (QECH),
Blantyre Malawi, and it compared the efficacy of courses oral
Etoposide (reference groups), Vincristine monotherapy, and
a combined Vincristine plus Bleomycin therapy. Results for
the trial were published [4].
Study population and sample size
The QECH is a 1200 bedded government referral hospital. The
children department admits 25,000 children a year and over
90,000 attend the emergency department and outpatients.
The children's oncology unit has 25 beds and admits 340 new
cases every year. Children with KS are treated as outpatients
unless they are systemically unwell. Randomization was computer
generated in blocks of 12. Treatment schedules were
in sealed, numbered envelopes and opened sequentially at
enrolment. Children were enrolled aged <16 years with clinical
evidence of Kaposi's sarcoma. As an interim analysis of the
trial, this study used a sample of 92 patients, the number collected
at the onset of these analyses against 258, estimated
during trial design.
Statistical methods
Distributions of baseline characteristics are summarized
using numbers with percentages for categorical variables
and have been presented by treatment groups. Continuous
variables are summarized using medians and interquartile
range (IQR). Chi-squared test of independence or K-Wallis
test was used to compare distributions between groups for
categorical and continuous variables respectively. The Cox,
Weibull, Exponential and Frailty survival models were fitted
to the data. The survival time was defined as time from
randomization to death or censoring date. The outcome of interest was all-cause death. Patients were censored at
their last visit before being lost to follow up for those who
dropped out of the study or at the end of the study for those
who completed the study still alive. When the sample size is
small, Burnham & Anderson (2002) [5] has recommend using
second-order Akaike Information Criteria (AICc) over other
criteria. However, when all the models in the candidate set
have the same number of parameters, then AICc and Akaike
Information Criteria (AIC) will give relatively identical estimations.
In that situation therefore, AIC can always be used
and thus the AIC was used to compare the fit of the survival
models in this study. The model with the lowest AIC value is
considered to be the best data fitting model. The validity of
the PH assumption was assessed using the plots of log-log
of survival against log of survival time and the Schoenfeld'st
global tests were also used. The treatment on which an individual
was randomized to was the main exposure of interest.
In multivariable analyses, the estimates of the effect of
treatment were adjusted for age, gender and HIV status. All
analyses were performed using STATA SE version 12(College
Station, Texas). Statistical significance was declared at α-level
of 0.05. The 95% confidence intervals (CIs) have been provided
where appropriate.
Ethical approval
Ethical approval to conduct the randomized controlled clinical
trial (main study) was provided by College of Medicine
Research and Ethics Committee (COMREC), an independent
scientific and ethics committee of the University of Malawi. To
uphold confidentiality of the patients, any variables that would
be linked to the identity of the participants were removed from
the database that was available for this analysis.
Descriptive statistics
There were 92 patients with data available for analysis. Out
of the 92 patients, thirty-one were randomised to Etoposide,
31 to Vincristine and 30 to Vincristine plus Bleomycin. There
were more males (59) than females (33). However the gender
distribution was similar across the treatment groups [4]. Overall,
the prevalence of HIV was very high in this study (89%) which
was evenly distributed across treatment groups. The mean age
was 8 years (SD=2.8years) and was similar across the groups.
Across treatment groups, similar distributions were observed
in hemoglobin level, absolute neutrophil count, white blood
cell count, platelets count, CD4 count and Lansky score, as
proportions in oedema, vomiting, tingling, breathless, chest
pains, nausea and constipation. Thus randomization was fairly
uniform amongst the three treatment arms at least for these
known potential confounders.
Main analyses
Model estimation results
In multivariable analyses, the four models yielded different results regarding effects of treatment on the survival of
children. The lowest AIC was obtained from the Exponential
model, AIC=174.7 (Table 1) indicating that Exponential model
provides best fit to this data compared to the Cox, Weibull and
Gamma Frailty models. Overall, the three parametric models
had lower values of AIC than the Cox PH model. The value of
the AIC for the Cox PH model is substantially higher than the
parametric models indicating poor performance.
Table 1 : Akaike's Information Criteria (AIC) values obtained from each model.
In both univariate and multivariate analyses, the four models lead to different statistical inferences in both the crude and the adjusted analyses (Tables 2-5). After adjusting for age, gender and HIV status; children treated with Vincristine monotherapy had a significantly increased hazard of death compared with those on Etoposide (HR 5.8, p=0.04; Exponential model in Table 2); (HR 5.7, p=0.04; Weibull model in Table 4). The shape parameter in Weibull model did not significantly differ from 1 (p=0.75), suggesting insignificant improvement on the Exponential distribution. In contrast, the Cox and Gamma Frailty models showed that the difference insurvival of children treated with Vincristine monotherapy and those on Etoposide was not statistically significant at alpha=0.05(HR 5.0, p=0.06; Table 3) and (HR 5.3, p=0.08; Table 5) respectively. For the Gamma Frailty model, the random parameter θ=0.5, log (θ)=-0.7 and p-value=0.45 suggesting that the random effect is not significant in the model. There was no difference in survival of Etoposide treated children compared with those who received Vincristine combined with Bleomycin in multivariable analyses of all the four models. Age, gender and HIV status did not affect survival of children as depicted from multivariable analyses in all models (Tables 2-5).
Table 2 : Hazard Ratios for death in children with Kaposi's sarcoma from Exponential Model.
Table 3 : Hazard Ratios for death in children with Kaposi's sarcoma from Cox PH Model.
Table 4 : Hazard Ratios for death in children with Kaposi's sarcoma from Weibull Model.
Table 5 : Hazard Ratios for death in children with Kaposi's sarcoma from Shared Gamma Frailty Model.
Assessment of PH assumption
As can be seen in Table 6, none of covariates in the model as
well as the overall Schoenfeld's global test of the full model
violates the PH assumption. The plot log-log of survival against
log of survival time for study groups indicates roughly parallel
graphs for Etoposide and Vincristine monotherapy, thus PH
assumption is met in Figure 1. However, there was an interaction
between treatment group and time indicating possible violation
of PH assumption. The graphs of females and males are
roughly parallel as those of HIV positive and negative are, demonstrating
no violation of the PH assumption (Figures 2 and 3).
Table 6 : Test on proportional-hazards assumption on covariates and full model.
Figure 1 : Testing PH assumption on study drug adjusted for
age, gender & HIV status.
Figure 2 : Testing PH assumption on gender adjusted for
study drug, age & HIV status.
Figure 3 : Testing PH assumption on HIV status adjusted for
study drug, age & gender.
As shown in Figure 4, plots of children aged 5-8 years, 9-12 years and 13-16 years do not intersect, proving proportional hazards across these ages. However it is observed that the curve for 1-4 years intersects with 9-12 years and 13-16 years indicating possible violation of PH assumption.
Figure 4 : Testing PH assumption on age classes adjusted for
study drug, HIV status & gender.
The emphasis of these findings is that it is a good practice to assess the goodness of fit of the model and compare with other models. It should be appreciated that more general methods may not exist and that different methods may perform well under different settings. The methods that have performed well in our setting may prove inadequate in a different setting.
In our model comparisons, we found the Exponential model to have better performance than the other models in the study. Of course we note that the difference in AIC values is small for the Exponential, Shared Gamma Frailty and Weibull models. AIC recommends choosing the model with the lowest AIC value without stating the magnitude of difference. However, for the Cox model, the AIC value is way far from the rest of the models indicating that its performance was not good in this scenario. Bradburn et al., [6] evaluated the adequacy of some parametric models and the Cox PH model. They found that the generalized gamma model and parametric models achieved both a higher log-likelihood and a lower AIC than the Cox model.
In our study we investigated suitability of the Weibull model by studying the shape parameter. In our findings, Weibull did not improve estimation of the basic Exponential rather was inferior to Exponential model. Nardi et al., [7] also studied Weibull model based on the estimated variation of the parameter rate criteria, and concluded that the Weibull was the superior model.
We explored possibility of the random variation of the survival function using shared gamma frailty model. The standard models including the Cox, Exponential or Weibull assume that a homogeneous population is investigated and their failure times are independently and identically distributed [8]. Henderson and Oman [9] were able to demonstrate that the estimates of regression coefficients may be biased if frailty models are not used where there is frailty effect. Furthermore, Schumacher et al., [10] showed that models that ignore important factors under estimate the relative risk. In our study, hazard functions of the survival models could depend on other unobserved elements and this can result in biased hazard ratios [11,12]. The most common used distributions for modeling the frailty are gamma or inverse-Gaussian [13-15]. In this paper, we treat frailty as the effect of the unobserved factors on the children's event free survival and conclude that gamma frailty model is not suitable in this study's setting.
A major goal of this paper is to investigate the comparative performance of Cox and parametric survival models by means of real data sets. Results from this study are not reinforced by simulations, which we recommend for further studies. The simulation study would include data simulated under various distributions such as exponential, Weibull, log-normal, loglogistic and with different sample sizes and effect sizes, to evaluate power, type I error rate and precision of the estimated coefficient. Furthermore, in our study we noted some violations of PH assumption. When the PH assumption is violated, we reaffirm that researchers may consider exploring on the potential time-dependent models.
More often than not, models are not general in their applications. They are situation specific. In our case we noted that in a clinical trial with small sample size, the Exponential parametric model was the best fitting model compared to the semi-parametric Cox PH, Shared Gamma Frailty and the Weibull models. We recommend and emphasize the very maxim that no matter how routine a method may be used in practice, it is good practice to test the goodness of fit and where applicable it should be compared with alternative models to establish the best fitting model. In some cases the models are fitted by non-statisticians who either do not know how to assess model performance or may not even know that alternative approaches exist. We recommend to such researchers that they should make an effort to verify their analyses with statistician and to help them to explore alternative approaches.
The authors declare that they have no competing interests.
Authors' contributions | CS | EM | MM |
Research concept and design | √ | √ | √ |
Collection and/or assembly of data | √ | √ | √ |
Data analysis and interpretation | √ | √ | √ |
Writing the article | √ | √ | √ |
Critical revision of the article | √ | √ | √ |
Final approval of article | √ | √ | √ |
Statistical analysis | √ | √ | √ |
We thank the children and their families for taking part in this study, and all the nurses on Sobo ward for their care of the children.
Editor: Sorinel A. Oprisan, College of Charleston, USA.
EIC: Max K. Bulsara, University of Notre Dame, Australia.
Received: 02-Dec-2015 Final Revised: 11-Jan-2016
Accepted: 25-Jan-2016 Published: 05-Feb-2016
Stanley C, Molyneux E and Mukaka M. Comparison of performance of exponential, Cox proportional hazards, weibull and frailty survival models for analysis of small sample size data. J Med Stat Inform. 2016; 4:2. http://dx.doi.org/10.7243/2053-7662-4-2
Copyright © 2015 Herbert Publications Limited. All rights reserved.