Joint modelling of time-to-clinical malaria and parasite count in a cohort in an endemic area

Background: In malaria endemic areas such as sub-Saharan Africa, repeated exposure to malaria results in acquired immunity to clinical disease but not infection. In prospective studies, time-to-clinical malaria and longitudinal parasite count trajectory are often analysed separately which may result in inefficient estimates since these two processes can be associated. Including parasite count as a time-dependent covariate in a model of time-to-clinical malaria episode may also be inaccurate because while clinical malaria disease frequently leads to treatment which may instantly affect the level of parasite count, standard time-to-event models require that time-dependent covariates be external to the event process. We investigated whether jointly modelling time-to-clinical malaria disease and longitudinal parasite count improves precision in risk factor estimates and assessed the strength of association between the hazard of clinical malaria and parasite count. Methods: Using a cohort data of participants enrolled with uncomplicated malaria in Malawi, a conventional Cox Proportional Hazards (PH) model of time-to-first clinical malaria episode with time-dependent parasite count was compared with three competing joint models. The joint models had different association structures linking a quasi-Poisson mixed-effects of parasite count and event-time Cox PH sub-models. Results: There were 120 participants of whom 115 (95.8%) had >1 follow-up visit and 100 (87.5%) experienced the episode. Adults >15 years being reference, log hazard ratio for children <5 years was 0.74 (95% CI: 0.17, 1.26) in the joint model with best fit vs. 0.62 (95% CI: 0.04, 1.18) from the conventional Cox PH model. The log hazard ratio for the 5–15 years was 0.72 (95% CI: 0.22, 1.22) in the joint model vs.0.63 (95% CI: 0.11, 1.17) in the Cox PH model. The area under parasite count trajectory was strongly associated with the risk of clinical malaria, with a unit increase corresponding to-0.0012 (95% CI: −0.0021, −0.0004) decrease in log hazard ratio. Conclusion: Jointly modelling longitudinal parasite count and time-to-clinical malaria disease improves precision in log hazard ratio estimates compared to conventional time-dependent Cox PH model. The improved precision of joint modelling may improve study efficiency and allow for design of clinical trials with relatively lower sample sizes with increased power.


Introduction
Malaria remains one of the most common parasitic infections globally with a disproportionately high burden in sub-Saharan Africa [1]. In malaria-endemic areas, repeated exposure results in acquired immunity to clinical malaria disease but not infection. Of interest in many malaria studies is to estimate time-to-clinical malaria but repeated exposure may give rise to a relationship between the disease and time-dependent covariates such as parasite count. However, time-to-clinical malaria and parasite count data are often analysed separately, mostly using Cox proportional hazards (PH) models and mixed-effects models or generalised estimating equations (GEE) respectively [2][3][4]. Separate analysis of time-to-clinical malaria disease and parasite count data may result in inefficient estimates when these two processes are strongly associated [5].
For accurate estimation of the risk of clinical malaria, analytical methods are required that account for historical exposure and the relationship between clinical malaria and infection parasite count. To investigate the strength of this association, one approach would be including the parasite countas a time-dependent covariate in the model of time-to-clinical malaria episode. However, this approach may be inaccurate because it does not account for the fact that parasite count in this case is an endogenous covariate whose existence and future path can be directly related to the occurrence of the episodes [6]. Standard time-toevent models require that time-dependent covariates be external to the event process [7] but clinical malaria disease frequently leads to treatment which may instantly affect the level of parasite count.
A second approach is to fit a joint model of time-to-clinical malaria and longitudinal parasite count profile. Previous applications of the joint modelling framework are many. These include, for example, analysis of CD4 count jointly with time-to-development of AIDS [8][9][10][11][12] and modelling quality of life performance scores jointly with time-to-death or disease progression among patients with cancer [11,13,14]. In order to fit joint models to data from malaria studies, there are certain aspects of these types of data that require consideration. In particular, following treatment it is possible for the parasite count to equal zero, so the joint model should allow for that.
We investigated whether modelling time-to-new clinical malaria disease jointly with parasite count trajectory data may improve precision in log hazard ratio estimates when compared to the conventional Cox PH model with time-dependent parasite count. We also assessed the strength of the association between the hazard of clinical malaria and a time-varying parasite count.

Data source
The study was motivated by data from the Mfera prospective cohort study conducted in Chikwawa district, southern Malawi, described previously by Buchwald and others [15]. Malaria disease is endemic in Malawi [16] and transmission of the Plasmodium falciparum parasite is high in Chikwawa [17,18]. The cohort enrolled 120 participants who presented with uncomplicated malaria at the Mfera health centre in the Chikwawa district between June 2014 and March 2015. Initial diagnosis was made by rapid diagnostic test (RDT) and confirmed by microscopy using thick blood smears. Exclusion criteria from the Mfera cohort included: acute illness requiring hospitalization, signs or symptoms of severe malaria or moderate to severe anemia, and chronic medication with any drug that has antimalarial activity e.g. HIV treatment. Participants underwent passive and active surveillance on a monthly basis and whenever sick for up to two years to assess re-infection, host response and parasite count.

Primary outcome
In the current analyses, the outcome of interest was time-to-first new clinical malaria disease which was defined by participants' self-reported fever and apositive RDT result.

Notation and specification of the models
The conventional Cox PH model with time-dependent parasite count is defined as in [9] with the hazard function λ i (t) for participant at given time expressed as.
where and λ 0 (t) is the unspecified baseline hazard function and the covariate vector X si (t) for participant includes participant's age and frequency of insecticide treated bed nets use in the previous month of the visit. Covariates were included in multivariable models if they were significant at alpha level of 0.1 in univariate Cox PH models.
For the joint models, we utilised the Bayesian joint modelling approach for longitudinal and survival data proposed by Chen et al [12] which fits a model with Markov Chain Monte Carlo (MCMC) methods as presented by Ibrahim et al [19]. The joint model is composed of longitudinal and survival sub-models. The longitudinal sub-model takes the form of a mixed-effects model as follows; supposing data is available from N participants with n i observations recorded for participant i, (i=1,……,N). The response y ij (j=1,…….,n i ), fixedeffect covariate vector X ij =(X 1ij ,…….,X pij )', and random-effect covariate vector Z ij =(Z 1ij , …….,Z qij )' are recorded at times t ij . The longitudinal sub-model is where β is the p×1 fixed-effect parameter vector, b i is the q×1 vector of random effects for participant which is assumed to be multivariate normal with mean zero, i.e., b i~Nq (0,∑ b ), and ∑ b is the variance-covariance matrix of the subject specific effects. The error vector ε i = (ε il , .., ε n j i )′ is assumed to be distributed ε i N n i 0, δ 2 l ni where δ 2 is variance and I ni is the n i × n i identity matrix.
The survival sub-model takes a Cox PH model form [9] where the hazard function for λi (t) participant i at time t as given in equation (1) is modelled as Once the two sub-models are specified, the likelihood for the joint model can be constructed as follows. Let T i and C i represent potential failure and censoring times respectively for participant i. Let S i =min {T i ,C i } be the minimum of the observed failure and censoring times for participant i, and let τ i be the failure indicator taking value 1 if T i < C i and 0 otherwise. Then the value of the longitudinal trajectory for participant i at time t can be defined as φ i (β,b i ,t) and at visit j as φ ij (β,b i ). Using the full longitudinal trajectory, then the likelihood of the joint distribution of the observed data and random effects for participant can be decomposed as and the joint likelihood for all participants can be written as L = ∏ i = 1 Denoting parasite count as PC, the likelihood of the joint distribution of observed data, random effects and PC can be expressed as and integrating out the random effects of the conditional likelihood yields the marginal likelihood. Under the Bayesian framework, the random effects are sampled in MCMC algorithm as extra parameters. The survival and longitudinal sub-models are linked by sharing common random effects structure. The MCMC computations of the model parameters proceed assuming that given the random effects, the longitudinal parasite countand time-to-clinical malaria process are independent as are the longitudinal responses of each participant.

Data analysis
Baseline data was summarised using frequencies and percentages if categorical and medians with ranges were presented if continuous and skewed. Failure functions for time to new clinical malaria disease were summarised using Kaplan Meier plots, and compared across age and bed net usage using logrank test.  [20] using programme bayesmh and R version 3.4.3 using packages JMbayes, survival and glmmPQL [21].

Results
There were 120 participants in the cohort, of which 69 (57.5%) were females.

Follow up and time-to-first new clinical malaria disease
Analyses of time-to-new clinical malaria disease included 115 participants who had a least one follow-up visit post enrolment and together contributed a total 894 observations. The median follow-up time to new clinical malaria disease episode was 3.5 months (IQR: 1.1-7.9). Out of the 115 participants, 100 (87.5%) experienced the episode while 15 (12.5%) were administratively right censored ( Table 2). Among 100 participants who experienced the episode, 58 (58%) were females, 48 (48%) were aged 5-15 years, 37 (37%) reported using bed net nightly in prior month to enrolment and their median parasite count was 13640 (IQR: 840 -52040).

Parameter estimation
Regression coefficient estimates are log hazard ratios, log (HRs). Overall, the joint models gave larger log (HR) estimates with consistently smaller standard errors and narrower credible intervals when compared to the conventional Cox PH model with time-dependent parasite count (Table 3). Comparing the Deviance Information Criteria (DIC) from the three joint models showed that joint model 4 with cumulative parasite count had the lowest value (Table 4), suggesting the best fit [22]. For the rest of the manuscript, the joint model is referencing to the joint model 4 with cumulative parasite count which is being compared to the conventional Cox PH model (model 1

MCMC convergence of estimated log (HR) parameters from final optimal joint model
The MCMC sampler trace plots from the final joint model4 seemed to mix well for parameters of children under 5 years, 5-15 years and infrequent bed net use and never moved beyond 1.5 achieving convergence (Figure 2A-2C). The sampler for cumulative parasite count parameter, showed noisy pattern before converging at about 1000 iterations ( Figure 2D). From the Kernel density estimator plots, all the four parameters were roughly normal suggesting that the M-H algorithm sampled within the target normal distribution ( Figure 3A-3D).

Factors associated with risk of new clinical malaria episode
In unadjusted analyses using Kaplan Meier failure estimator, the risk of experiencing new clinical malaria episode was higher in children under 5 years and the 5-15 years compared to adults above 15 years (log-rank p-value=0.016) ( Figure 1A). The risk of getting a new clinical malaria episode was also higher in participants who did not use bed net every night compared to those who used a bed net nightly in previous month (log-rank p-value=0.016) ( Figure 1B)

Discussion
This study has demonstrated that jointly modelling longitudinal parasite count and time-to clinical malaria episodes improves precision in risk factor estimates associated with clinical malaria disease. The joint model yielded larger parameter estimates with consistently smaller standard errors and narrower credible intervals when compared to a conventional Cox PH model with time-dependent parasite count. These results are consistent with findings from other areas including HIV [8,12] and cancer [13,14,23 ] where joint modelling out-performs separate analyses in terms of optimal use of the available information giving both more precise and less biased estimates. The improved precision provided by the joint model may improve study efficiency, for example, clinical trials may be designed with relatively lower sample sizes while still yielding high power. In general, the conventional time-dependent Cox PH model like any other standard time-to-event model assume that time-dependent covariates are external to the event process [7]. However, parasite count in this case is an endogenous covariate whose existence and future path can be directly related to the occurrence of the malaria episodes. By postulating a model for the joint distribution of the covariate parasite count and the time-to-malaria processes, the joint model explicitly accounts for possible inter-dependence between the two processes through shared random effects [24]. Moreover, the time-dependent covariate model assumes that the value of the parasite count does not change until a new measurement is taken which may not be correct. When we modelled parasite count using a quasi-Poisson mixed effects model, we were creating a model for the outcome at any time-point there by indirectly addressing the measurement error. The joint model also offered flexibility to investigate the appropriate link structure between longitudinal parasite count and time-to-new clinical malaria episode. Among the fitted joint models, only the model with cumulative parasite count was strongly associated with the hazard of new clinical malaria episode suggesting that the risk can better be explained by conditioning on the cumulative effect of the longitudinal parasite count trajectory from baseline up to the episode time. The underlying current value of the parasite count or change in parasite count at any time was not associated with the risk of experiencing new clinical malaria episode at the same.
Factors associated with a high risk of experiencing a new clinical malaria episode were young age and infrequent use of bed nets as have been reported in other studies [25][26][27][28].
Children aged up to 15 years had higher risk for clinical malaria disease when compared to adults over 15 years. The higher risk of experiencing a clinical malaria episode among children is possibly be due to the naturally under-developed humoral immune responses to different stage-specific antigens of P. Falciparum otherwise acquired with age [29]. In this study in an endemic area, high cumulative parasite count was associated with lower risk of getting a new clinical malaria episode suggesting that increased exposure to malaria parasites with time may result into protective effect to future clinical malaria episodes.
This study may be limited by focusing on time-to-first malaria episode only, thus estimates obtained here may not be applicable to analyses examining all clinical malaria episodes over a follow up period. This paper has established the optimal way of incorporating parasite count in estimating time-to-first new clinical malaria which may further be extended to study recurrent episodes over entire follow-up, but this may require different distributional assumptions. There were missing values for bed net use which may have affected the findings. Future studies should include multiple imputation of missing covariate data and also investigate the role of measurement error due to detection limit of parasite count in these joint models. Further work is required to consider joint modelling of parasite count with recurrent episodes and to predict risk of future clinical malaria episodes.

Conclusions
In conclusion, jointly modelling longitudinal parasite count and time-to-new clinical malaria improved precision in log hazard ratio estimates for clinical malaria when compared with the conventional Cox PH model with time-dependent parasite count. The improved precision of joint modelling may improve study efficiency and allow for design of clinical trials with relatively lower sample sizes with increased power.    Kernel density estimator plots for the parameters of the final joint model. The MCMC sampling process of the parameters portrays the target poster distribution. The plots suggest that algorithm sampled successfully within the assumed normal distribution for all parameters θ 5-15 years , θ <5 years , θ net use not every night and θ cumulative .