Comparison of performance of exponential, Cox proportional hazards, weibull and frailty survival models for analysis of small sample size data

Background : The Cox Proportional Hazards(PH) model is commonest survival data model used in clinical trials. However, little is known as to whether or not this approach is robust to the size of the study. We compared performance of Cox, Weibull, Exponential and Frailty models in a randomized study that had a small sample size in general. Methods : Models were fitted to data from a three-arm randomized efficacy trial of Kaposi’s sarcoma at Queen Elizabeth Central Hospital (QECH) of children aged <16 years. Patients were randomized to receive Vincristine monotherapy or Etoposide or Vincristine plus Bleomycin. The Cox, Weibull, Exponential and Frailty models were fitted to the data to obtain the adjusted effect of treatment on survival. The performance of models were compared using the Akaike’s Information Criteria (AIC). Plots of log-log of survival against log of survival time, and the Schoenfeld’s global tests were used to test suitability of the PH assumption. Results : Ninety-two patients were available for analyses of which 64% were males, the mean age was 8 years (SD=2.8years) and 89% were HIV positive. Exponential model was the best fitting method with AIC=174.1. Children treated with Vincristine monotherapy survived evidently poorly compared with those on Etoposide (HR=5.8, p=0.04). Conclusion : Exponential models can elicit more valid results than semi-parametric CoxPH model in a clinical trial with small sample size. This emphasizes the fact that when models are fitted to data, it is good practice to assess the goodness of fit and where appropriate alternative models should be fitted.


Introduction
The Cox proportional hazards (PH) model is the most widely used approach in survival analysis of clinical trials because it requires few assumptions [1]. The model was proposed by Cox in 1972 for analysis of survival data with and without censoring; for identifying differences in survival due to treatment and prognostic factors in clinical trials. The main assumption of the Cox model is that the hazard ratios of any two people are independent of time, known as the PH assumption. However as observed by Ponnuraja and Venkatesan [2], this is not always appropriate as sometimes the assumptions do not hold. Unfortunately, in practice researchers often do not test models assumptions for fitted models and this practice does not spare the fitting of Cox PH model. Furthermore, the Cox model is popular because it is considered to be robust to violation of model assumptions; the estimated hazards are always positive and the hazard ratio is often calculated and reported [2,3].
The CoxPH model is parameterized in terms of the hazard model as h(t)=h 0 (t)×exp(β 1 x i1 +…..+β p x ip ) (1), Where h 0 (t) is the baseline hazard, exp(β 1 x i1 +…..+β p x ip ) represents subject-specific component, x i1 , x i2 ,…,x ip are the explanatory variables, for ith individual where i=1, ….,n. As can be observed from the expression (1), the Cox PH model is semi-parametric. Alternative models for survival analysis include the Exponential and Weibull, both of which are parametric. Unlike the CoxPH model, the Weibull model allows more flexibility because the associated hazard rate is not constant with respect to time. Also, we use a maximum likelihood process to estimate the unknown parameters. The Weibull model has two parameters, shape parameter (λ); and scale parameter . Basing on notation by Hosmer and Lemeshow [2] its survivorship function is
If expressed in the accelerated failure time form the hazard functionis given as h(t,x i ,β i ,λ)=λγ(t×exp{-(β 1 x i1 +……..+β p x ip )}) λ-1 ×exp{-(β 1 x i1 +… .+β p x ip )} (5) A value of λ<1 indicates that the failure rate decreases over time, λ=1 indicates that the failure rate is constant over time (reduces to an Exponential model) while λ>1 indicates that the failure rate increases with time. When the index (shape) parameter equals 1, Weibull reduces to Exponential model. The standard models including the Cox, Exponential or Weibull presented above assume a homogeneous population with independent and identical failure times. To account for heterogeneity of patients in clinical trials caused by unmeasured covariates, extensions of the Cox model called frailty models are introduced. The hazard function of frailty models is given as where θ is the unobservable random variable. The survival function has the form

Study setting and design
The data was from an open label, randomized three-arm trial conducted in the Queen Elizabeth Central Hospital (QECH), Blantyre Malawi, and it compared the efficacy of courses oral Etoposide (reference groups), Vincristine monotherapy, and a combined Vincristine plus Bleomycin therapy. Results for the trial were published [4].

Study population and sample size
The QECH is a 1200 bedded government referral hospital. The children department admits 25,000 children a year and over 90,000 attend the emergency department and outpatients. The children's oncology unit has 25 beds and admits 340 new cases every year. Children with KS are treated as outpatients unless they are systemically unwell. Randomization was computer generated in blocks of 12. Treatment schedules were in sealed, numbered envelopes and opened sequentially at enrolment. Children were enrolled aged <16 years with clinical evidence of Kaposi's sarcoma. As an interim analysis of the trial, this study used a sample of 92 patients, the number collected at the onset of these analyses against 258, estimated during trial design.

Statistical methods
Distributions of baseline characteristics are summarized using numbers with percentages for categorical variables and have been presented by treatment groups. Continuous variables are summarized using medians and interquartile range (IQR). Chi-squared test of independence or K-Wallis test was used to compare distributions between groups for categorical and continuous variables respectively. The Cox, Weibull, Exponential and Frailty survival models were fitted to the data. The survival time was defined as time from randomization to death or censoring date. The outcome doi: 10.7243/2053-7662-4-2 of interest was all-cause death. Patients were censored at their last visit before being lost to follow up for those who dropped out of the study or at the end of the study for those who completed the study still alive. When the sample size is small, Burnham & Anderson (2002) [5] has recommend using second-order Akaike Information Criteria (AICc) over other criteria. However, when all the models in the candidate set have the same number of parameters, then AICc and Akaike Information Criteria (AIC) will give relatively identical estimations. In that situation therefore, AIC can always be used and thus the AIC was used to compare the fit of the survival models in this study. The model with the lowest AIC value is considered to be the best data fitting model. The validity of the PH assumption was assessed using the plots of log-log of survival against log of survival time and the Schoenfeld'st global tests were also used. The treatment on which an individual was randomized to was the main exposure of interest. In multivariable analyses, the estimates of the effect of treatment were adjusted for age, gender and HIV status. All analyses were performed using STATA SE version 12(College Station, Texas). Statistical significance was declared at α-level of 0.05. The 95% confidence intervals (CIs) have been provided where appropriate.

Ethical approval
Ethical approval to conduct the randomized controlled clinical trial (main study) was provided by College of Medicine Research and Ethics Committee (COMREC), an independent scientific and ethics committee of the University of Malawi. To uphold confidentiality of the patients, any variables that would be linked to the identity of the participants were removed from the database that was available for this analysis.

Descriptive statistics
There were 92 patients with data available for analysis. Out of the 92 patients, thirty-one were randomised to Etoposide, 31 to Vincristine and 30 to Vincristine plus Bleomycin. There were more males (59) than females (33). However the gender distribution was similar across the treatment groups [4]. Overall, the prevalence of HIV was very high in this study (89%) which was evenly distributed across treatment groups. The mean age was 8 years (SD=2.8years) and was similar across the groups. Across treatment groups, similar distributions were observed in hemoglobin level, absolute neutrophil count, white blood cell count, platelets count, CD4 count and Lansky score, as proportions in oedema, vomiting, tingling, breathless, chest pains, nausea and constipation. Thus randomization was fairly uniform amongst the three treatment arms at least for these known potential confounders.

Model estimation results
In multivariable analyses, the four models yielded different results regarding effects of treatment on the survival of children. The lowest AIC was obtained from the Exponential model, AIC=174.7 ( In both univariate and multivariate analyses, the four models lead to different statistical inferences in both the crude and the adjusted analyses (Tables 2-5). After adjusting for age, gender and HIV status; children treated with Vincristine monotherapy had a significantly increased hazard of death compared with those on Etoposide (HR 5.8, p=0.04; Exponential model in Table 2); (HR 5.7, p=0.04; Weibull model in Table 4). The shape parameter in Weibull model did not significantly differ from 1 (p=0.75), suggesting insignificant improvement on the Exponential distribution. In contrast, the Cox and Gamma Frailty models showed that the difference insurvival of children treated with Vincristine monotherapy and those on Etoposide was not statistically significant at alpha=0.05(HR 5.0, p=0.06; Table 3) and (HR 5.3, p=0.08; Table 5) respectively. For the Gamma Frailty model, the random parameter θ=0.5, log (θ)=-0.7 and p-value=0.45 suggesting that the random effect is not significant in the model. There was no difference in survival of Etoposide treated children compared with those who received Vincristine combined with Bleomycin in multivariable analyses of all the four models. Age, gender and HIV status did not affect survival of children as depicted from multivariable analyses in all models (Tables 2-5).

Assessment of PH assumption
As can be seen in Table 6, none of covariates in the model as well as the overall Schoenfeld's global test of the full model violates the PH assumption. The plot log-log of survival against log of survival time for study groups indicates roughly parallel graphs for Etoposide and Vincristine monotherapy, thus PH assumption is met in Figure 1. However, there was an interaction between treatment group and time indicating possible violation of PH assumption. The graphs of females and males are roughly parallel as those of HIV positive and negative are, demonstrating no violation of the PH assumption (Figures 2 and 3 Table 2 shows that the Cox model suggests that survival of children treated with Vincristine and those on Etoposide was essentially not different. Figure 4, plots of children aged 5-8 years, 9-12 years and 13-16 years do not intersect, proving proportional hazards across these ages. However it is observed that the curve for 1-4 years intersects with 9-12 years and 13-16 years indicating possible violation of PH assumption.

Discussion
The emphasis of these findings is that it is a good practice to assess the goodness of fit of the model and compare with other models. It should be appreciated that more general methods may not exist and that different methods may perform well under different settings. The methods that have performed well in our setting may prove inadequate in a different setting. In our model comparisons, we found the Exponential model to have better performance than the other models in the study. Of course we note that the difference in AIC values is small for the Exponential, Shared Gamma Frailty and Weibull models. AIC recommends choosing the model with the lowest AIC value without stating the magnitude of difference. However, for the Cox model, the AIC value is way far from the rest of the models    Vinc=Vincristine: bleo=Bleomycin Table 4 shows that Weibull model suggests a significant increased hazard of death for children treated with Vincristine monotherapy compared with those on Etoposide. The shape parameter,σ is not significantly different from 1.  Vinc=Vincristine; bleo=Bleomycin Table 5 shows that Shared Gamma Frailty model suggests that survival of children treated with Vincristine and those on Etoposide was essentially not different.

Univariate model
indicating that its performance was not good in this scenario. Bradburn et al., [6] evaluated the adequacy of some parametric models and the Cox PH model. They found that the generalized gamma model and parametric models achieved both a higher log-likelihood and a lower AIC than the Cox model.
In our study we investigated suitability of the Weibull model by studying the shape parameter. In our findings, Weibull did not improve estimation of the basic Exponential rather was inferior to Exponential model. Nardi et al., [7] also studied Weibull model based on the estimated variation of the  Table 6. Test on proportional-hazards assumption on covariates and full model.
Vinc=Vincristine: bleo=Bleomycin Table 6 indicates that none of covariates in the model as well as the overall global test of the full model violates the PH assumption.   parameter rate criteria, and concluded that the Weibull was the superior model. We explored possibility of the random variation of the survival function using shared gamma frailty model. The standard models including the Cox, Exponential or Weibull assume that a homogeneous population is investigated and their failure times are independently and identically distributed [8]. Henderson and Oman [9] were able to demonstrate that the estimates of regression coefficients may be biased if frailty models are not used where there is frailty effect. Furthermore, Schumacher et al., [10] showed that models that ignore important factors under estimate the relative risk. In our study, hazard functions of the survival models could depend on other unobserved elements and this can result in biased hazard ratios [11,12]. The most common used distributions for modeling the frailty are gamma or inverse-Gaussian [13][14][15]. In this paper, we treat frailty as the effect of the unobserved factors on the children's event free survival and conclude that gamma frailty model is not suitable in this study's setting.
A major goal of this paper is to investigate the comparative performance of Cox and parametric survival models by means of real data sets. Results from this study are not reinforced by simulations, which we recommend for further studies. The simulation study would include data simulated under various distributions such as exponential, Weibull, log-normal, loglogistic and with different sample sizes and effect sizes, to evaluate power, type I error rate and precision of the estimated coefficient. Furthermore, in our study we noted some violations of PH assumption. When the PH assumption is violated, we reaffirm that researchers may consider exploring on the

Conclusions
More often than not, models are not general in their applications. They are situation specific. In our case we noted that in a clinical trial with small sample size, the Exponential parametric model was the best fitting model compared to the semi-parametric Cox PH, Shared Gamma Frailty and the Weibull models. We recommend and emphasize the very maxim that no matter how routine a method may be used in practice, it is good practice to test the goodness of fit and where applicable it should be compared with alternative models to establish the best fitting model. In some cases the models are fitted by non-statisticians who either do not know how to assess model performance or may not even know that alternative approaches exist. We recommend to such researchers that they should make an effort to verify their analyses with statistician and to help them to explore alternative approaches.