Isuru D. Dassanayake^{1,2} and Lakshika S. Nawarathna^{2*}

*Correspondence: Lakshika S. Nawarathna lakshikas@pdn.ac.lk

2. Department of Statistics and Computer Science, Faculty of Science, University of Peradeniya, Peradeniya 20400, Sri Lanka.

1. Department of Mathematics and Statistics, Texas Tech University, Broadway and Boston, Lubbock, TX 79409-1042, USA.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

**Background**: In clinical medicine, agreement evaluation plays a major role in determining the compatibility
and the accuracy of newly introduced methods with pre-existing methods. These methods may be assays,
clinical observers, medical devices etc. It is vital to assess the compatibility and the accuracy of these newly
introduced techniques because they deal with the measurements of the human body, such as blood pressure,
cholesterol level, heart rate etc. In practice, agreement evaluation is carried out among two methods of
measurements and deals with the data that are homoscedastic. The main objective of this study is to extend
the standard mixed model to allow the error variances to depend on magnitude of measurement and
evaluate agreement between multiple methods assuming the new model, taking the heteroscedasticity into
account.

**Methods**: In order to assess the agreement, there are two typical steps in method comparison studies. The
first step is to model the data using the Heteroscedastic mixed effects model. The model fitting is carried
out by using two main approaches, namely the mean method and the best linear unbiased predictor method.
After fitting the model for the second step, the agreement evaluation is carried out using Concordance
correlation coefficient and Total deviation index.

**Results**: The illustrative example contained five methods of measurements and was with heteroscedastic
measurements. First, the model fitting was carried out according to the two approaches and the resulting
parameters were almost identical. After the model fitting, the agreement evaluation was performed.
According to the values resulted from the agreement measurements, it is clear that all five methods agree
sufficiently well with the reference method.

**Conclusions**: The proposed model can be used to model the method comparison data with heteroscedastic
measurements with multiple methods of measurements as well as balanced and unbalanced data designs.
Under the proposed model, the agreement evaluation methodology for comparing multiple methods is also
developed taking heteroscedasticity into account.

**Keywords**: Agreement evaluation, best linear unbiased predictor, concordance correlation coefficient,
heteroscedasticity, mixed effects model, total deviation index

In the field of clinical medicine, measurements of the human body take a major part of diagnostic, prognostic and therapeutic evaluations. Due to the rapid advancement technology, new methods and instruments are introduced into this field. These introduced instruments or methods might be more advance, cheaper and easier to use than the old standard instruments. Before these newly introduced methods or instruments put into use, the accuracy and precision of the measurements need to be verified. If these instruments agree sufficiently well with the already existing methods, they can be used interchangeably. To get an understanding of these measurements, they must be compared against with a well-established technique. Then we need to assess the degree of agreement between these methods [1-5].

Experts in this field have already introduced many techniques to test the degree of agreement between two methods. But most of these techniques can be applied under certain assumptions.

In order to assess the agreement between the assays, the first step is to model the data. The Linear mixed effects model is most commonly used in modeling the method comparison data [6-15]. Because of the flexibility in modeling of within subject dependence, linear mixed models are popular. In this model, normality is assumed for the error terms, but in practice sometimes the normality assumption is violated. In such cases, linear mixed models cannot be used to model the data [16].

To overcome this problem [17] proposed a robust mixed model called "General skew-t mixed model (GSTMM)" that assumes a multivariate skew-t distribution for random effects and an independent multivariate t-distribution for the errors. But this general skew-t mixed model (GSTMM) and most of the other existing models available for method comparison studies are based on the assumption that the variability of the continuous measurement remains a constant throughout the range of the measurement. Though, this is not the case in some practical situations and the variability of the measurements might change with the magnitude, i.e., the 'Heteroscedasticity' of error terms [18].

A novel model to method comparison data with heteroscedastic error variances is proposed in [15], to evaluate the agreement between two methods of measurements measuring continuous data. However, this model cannot accommodate the comparison of multiple methods of heteroscedastic measurements. Therefore, the main objective of this study is to propose a heteroscedastic mixed effects model to analyze heteroscedastic method comparison data with more than two methods of measurements and to adapt the agreement evaluation methodology for multiple methods, taking heteroscedasticity into account.

The lung tumor size measurements data from [19] which motivated this work and are analyzed later in this article, provide a specific example of this phenomenon. The information was gathered from August 2000 to May 2001. In the dataset, out of 33 patients there are 40 lung tumors. These 40 lung tumors belong to 16 men and 17 women from the ages 43 to 78 years. All the lung tumors are larger than 1.5 cm in maximal diameter. Computed Tomography (CT) images of these 40 lung tumors are distributed among five radiologists. All five radiologists are with Thoracic Fellowship training and have more than four years of post-training experience. The five radiologists measured the lung tumors on the CT images by using a ruler or with calipers. Each of these measurements is performed independently and also each of these images was measured twice by each inspector. Here we will consider the measurements taken by the five radiologists as different methods of measurements. Therefore, the dataset contains 40 lung tumors (subjects), 5 readers (methods) and 2 replications for each measurement. In total 40*5*2=400 records.

The rest of this article is organized as follows. Section 2 presents the proposed methodology to deal with heteroscedastic method comparison data measured by multiple methods or raters. Section 3 discusses agreement evaluation under the proposed model using two techniques of model fitting and the last section discusses the results and conclusions of the study. The model fitting and the analysis was carried out by using the R statistical software.

In this study, the methodology was divided into two main parts. The first step is to model the data set using an appropriate model and the second step is to measure the agreement among the methods of measurements. The most popular statistical modeling technique in method comparison data is the "Liner mixed effects model". This model is an extension of linear regression models [12] and contains both fixed effects and random effects. In practice the mixed effect models are important when there are repeated measurements. The standard mixed model in the matrix form can be represented as follows.

$${Y}_{i}={X}_{i}\beta +{Z}_{i}{b}_{i}+{e}_{i};i=1,\dots ,n,$$ (1) (1)where, *Y _{i}* is the vector of observed responses on the

The main assumptions of this model are the random effects, random errors have normal distributions and they are mutually independent and the model has a constant error variance which depends only on the method.

$$\begin{array}{l}{b}_{i}~independent\text{}N\left(0,\text{\Psi}\right);{e}_{i}~\text{}independent\text{}N\left(0,{R}_{i}\right);\\ {R}_{i}=diag\left({\sigma}_{1}^{2},\dots ,{\sigma}_{1}^{2},{\sigma}_{2}^{2},\dots ,{\sigma}_{2}^{2},\dots ,{\sigma}_{n}^{2},\dots ,{\sigma}_{n}^{2}\right)\end{array}$$where *Ψ* is a lxl positive definite matrix with diagonals *Ψ _{1}^{2}*,........,

**Heteroscedastic Mixed Effects Model for multiple
methods**

This model is used when the assumptions (i.e., the random
effects and the random errors are normally distributed, independent
and has a constant variance) of the standard mixed
effect model (homoscedastic model) are violated where error
variance changes with the magnitude of the measurements.
The heteroscedastic model is as follows.

Let *Y _{ijk}* is the

and

$${\text{\Sigma}}_{i}\left({v}_{i}\right)=diag\left\{{\text{\Sigma}}_{i1}\left({v}_{i}\right),{\text{\Sigma}}_{i2}\left({v}_{i}\right),\dots .,{\text{\Sigma}}_{ij}\left({v}_{i}\right)\right\}$$Variance covariate is a function of magnitude of measurement
*μ _{i}* that will be used to model the error variances. A model for
conditional error variance is as follows,

where *g _{j}*is the variance function and

*g(ν,δ)= exp (δ _{ν})*. The heteroscedastic model in the matrix form
is as follows.

Due to the scarcity of closed form for the likelihood functions, the exact modeling approaches will be troublesome. Therefore, the model fitting is carried out by two model approximations. First using the mean vector of the reference method as the covariate and the second is using the Best Linear Unbiased Predictor (BLUP) as the covariate. In this case the likelihood functions will be possible in a closed form. Score test and likelihood ratio tests have been carried out to assess the validity of these models. After confirming the models validity, the agreement evaluation is done by using Concordance correlation coefficient (CCC) and Total deviation index (TDI).

**Model fitting**

The model fitting for this heteroscedastic mixed effects models
is carried out by selecting an appropriate value for the
variance covariate. In this study, there are two main options
for the variance covariate, namely mean of the reference
method and the BLUP as the variance covariate. The power
model function *g(ν,δ)=| ν | ^{δ}* was selected to fit the model [12].

*Using observable mean measurement as variance covariate*

As the first approach, we select the mean vector of the
reference method as the variance covariate. In this case *μ _{i}^{*}*,
an observable quantity is fixed and can fit the model by
maximum likelihood method. Here the

*Using BLUP as the covariate*

As the second approach, we use the best linear unbiased predictors
(BLUP) as the variance covariate and fit the data using
the heteroscedastic model. Random effects and error terms
are independent in this heteroscedastic model and according
to the [12] the BLUP can be written as,

In order to calculate the μi,blup any statistical software can be
used. *ν _{i}^{*}* depends on the unknown parameter

When considering the two approaches, the method of using the true mean as the covariate is much simpler than the method of using BLUP as the covariate, because of the complexity of calculating the BLUP than calculating the mean. However, method of using BLUP is more accurate than the other [14].

**Agreement evaluation under the proposed**

*Methodology*

In health science researches, agreement evaluation is a topic
which has considerable interest. This is assessing the agreement
between two or more methods measuring the same
response [20,21]. In this Section we discuss two measurements
that are used to assess the agreement in this study [22,23].

*Concordance correlation coefficient (CCC)*

Concordance correlation coefficient (CCC) is one of the most
common measurements used in order to assess the agreement
between methods of measurements. This was introduced by
[22]. CCC value ranges between -1 to 1. The higher the values
it gives a better agreement. Concordance correlation coefficient
under the proposed heteroscedastic mixed model is
follows for the lung tumor size measurements,

This represents the concordance correlation coefficient between
the reader1 and the reader *j*. For greater accuracy of
the measurement CCC was first calculated with Fisher's ztransformation
and then converted into the CCC [24].

*Total Deviation Index (TDI)*

Total deviation index (TDI) is another common measurement
of evaluating agreement between two methods of
measurements. TDI is the *π _{0}^{th}* percentile of absolute value of
the differences between the methods (

The initial model fitting was carried out by using the standard
mixed effects model. To check the normality of the residuals
for each method, the quantile-quintile plot and the Shapiro-
Wilk normality test was used. Table 1 represents the results
of the Shapiro-Wilk normality test, which was applied to the
residuals of each reader of the mixed effect model. According
to the results, it is obvious that the residuals do not follow a
normal distribution as all p-values are small. Therefore, the
main assumption for the standard mixed effects model (i.e.,
the residuals have a constant variance) is violated. Moreover,
Figure 1 shows separate quantile-quantile plot for each reader
when standard mixed effects is fit to the data. The circles cross
the line three times indicates that the hump is not the right
shape for these data to be normal. These data are therefore
not exactly normal. Hence, we model the error variability using
two approximations for *μ _{i}*, namely mean of the reference
method and the BLUP of

Table 1
**: Shapiro-Wilk test for Normality results.**

Figure 1 **:** **Separate quantile-quantile plot for each reader
when the homoscedastic model is fit to the lung tumor size
measurements data.**

We consider reader1 as the reference method. As the first approach, let's take the mean vector of the reader1 as the variance covariate and fit the data into a heteroscedastic model.

$${\mu}_{i1}=\stackrel{-}{{Y}_{1}}$$In the dataset there are 40 subjects and each subject has 2
replicates. Since there are 5 readers in the dataset, we need
to compare 5 different methods of measurements. Therefore,
as for the model stated in the methodology, *i=1,2,.....,40;
k=1,2; j=1,2,3,4,5.*

The Table 2 represents the estimates and standard errors of the fitted heteroscedastic model parameters, which was fitted by using mean of the reference method and BLUP as the variance covariate. Considering the results from the Table 2, both approaches take approximately equal values when compared with the other methods. Estimator of the reader3 has the largest men value, and the smallest estimator is from reader2.

Table 2
**: Model parameter estimates and their standard errors
using both mean and BLUP based methods.**

The Likelihood ratio test has been carried out to identify the most appropriate model among the standard mixed effects model and the proposed heteroscedastic model. The test rejected the null hypothesis, which implies that the reduced model is more appropriate, confirming the heteroscedastic model is the best among these two models. Furthermore, the score test is also performed on these models. The result was the same as before confirming that the heteroscedastic model is more suitable for the given dataset. Since the fitted heteroscedastic model is appropriate the next step is to assess the agreement between the methods of measurements.

**Agreement evaluation under the Heteroscedastic model**

The Figure 2 represents the estimates and their confidence
intervals of CCC and TDI using the mean method. The solid line indicates the estimate and the shaded region indicates
the estimate±standard error. The dotted line represents the
lower bound of the concordance correlation coefficient and
the upper bound of the Total deviation index. The upper
two graphs represent the CCC with Fisher's z transformation
and estimated CCC and lower two graphs represent the TDI
with log transformation and estimated TDI values. The CCC
gradually decreases from 0.82 to 0.79 through the diameter
of 1.45 to 8.5 cm. The TDI value increases from 1.505 to 1.69
with the increase of the magnitude of the measurement.
From both CCC and TDI values we can observe that for small
diameter values the both methods (readers) have satisfactory agreement, while with the increase of the magnitude of the
lung tumor diameter, the agreement decreases. Therefore, it
is safe to conclude that the reader2 sufficiently agrees with
the reference method (reader1). Likewise, the readings from
the readers 3, 4 and 5 derived almost similar results as the
reader2 by confirming that all the methods (readers) agree
well with the reference method. i.e., reader1.

Figure 2 **:** **Estimates and their 95% CI of CCC and TDI
between reader1 and reader2 using mean method.**

The Figure 3 represents the CCC and TDI values for the reader1 and reader2, using the BLUP based method. The Figure 2 and the Figure 3 are almost identical. Therefore, reader2 is well agreed with the reader1.

Figure 3 **:** **Estimates and their 95% CI of CCC and TDI between
reader1 and reader2 using BLUP method.**

The proposed heteroscedastic mixed effects model was fitted by using two methods. One was by using true mean as the variance covariate and the other method is by using the Best Linear Unbiased Predictor (BLUP) as the variance covariate. When fitting the model by using both methods, it was confirmed that the heteroscedastic model is better than the standard mixed effect model. The results from the likelihood ratio test and the score test confirms that conclusion. When considering the means from the Table 2, we can observe that the means from both the approaches are identical if round up to 2 decimal places. This means that the mean method and the BLUP method generate almost identical estimates.

CCC and TDI were used in evaluating the agreement between the reference method (reader1) and the other 4 methods (readers). Both mean based method and the BLUP based method produced almost identical (with slight differences) results for CCC and for TDI. Figures 2 and 3 both show that reader2 agrees well with reader1. Not only the reader2, but also reader3, reader4 and reader5 are also agreed well with the reader1. Moreover, Figures 2 and 3 show that the CCC and TDI values varies with the magnitude. If the data set was modeled using a homoscedastic model, constant values for CCC (0.8069) and TDI (1.6140) can be obtained. Therefore, if a dataset with heteroscedasticity is modeled with standard mixed effects model, the resulting outcome will be misleading and inaccurate. To overcome this issue, the data must be modeled with a model that accounts the heteroscedasticity.

We propose a method successfully in order to fit 5 methods of heteroscedastic clinical measurements and this proposed model can be easily extended to deal with any number of multiple methods which has replicated data. Our approach can accommodate balanced or unbalanced data designs and it works well with any scalar measure of agreement. Moreover, the two model fitting methods discussed give almost identical estimates to the model. The limitation of this study is that the proposed method can be applied only with replicated measurements.

BLUP: Best Linear Unbiased Predictor

CCC: Concordance Correlation Coefficient

TDI: Total Deviation Index

The authors declare that they have no competing interests.

Authors' contributions |
IDD |
LSN |

Research concept and design | √ | √ |

Collection and/or assembly of data | √ | √ |

Data analysis and interpretation | √ | √ |

Writing the article | √ | √ |

Critical revision of the article | √ | √ |

Final approval of article | √ | √ |

Statistical analysis | √ | √ |

The authors are grateful to Professor L. Broemeling for providing the lung tumor size measurements dataset.

Editor: Jimmy Efird, East Carolina University, USA.

Received: 18-Oct-2016 Final Revised: 26-Dec-2016

Accepted: 09-Jan-2017 Published: 27-Jan-2017

- Altman DG and Bland JM.
**Measurements in Medicine: The Analysis of Method Comparison studies**.*Journal of the Royal Statistical Society*. 1983;**32**:307-317. | Pdf - Bland JM and Altman DG.
**Statistical methods for assessing agreement between two methods of clinical measurement**.*Lancet*. 1986;**1**:307-10. | Article | PubMed - Bland JM and Altman DG.
**Measuring agreement in method comparison studies**.*Stat Methods Med Res*. 1999;**8**:135-60. | Article | PubMed - Choudhary PK.
**Interrater Agreement**.*In methods and applications of statistics in the life and health sciences*, John Wiley: New York. 2009; 461-480. | Book - Haber M, Gao J and Barnhart HX.
**Evaluation of Agreement between Measurement Methods from Data with Matched Repeated Measurements via the Coefficient of Individual Agreement**.*J Data Sci*. 2010;**8**:457-469. | PubMed Abstract | PubMed FullText - Altman DG and Bland JM.
**Agreement between methods of measurement with multiple observations per individual**.*Journal of biopharmaceutical statistics*. 2007;**17**:571-582. - Bland JM and Altman DG.
**Agreement between methods of measurement with multiple observations per individual**.*J Biopharm Stat*. 2007;**17**:571-82. | Article | PubMed - Barnhart HX, Haber M and Song J.
**Overall concordance correlation coefficient for evaluating agreement among multiple observers**.*Biometrics*. 2002;**58**:1020-7. | Article | PubMed - Carrasco JL and Jover L.
**Estimating the generalized concordance correlation coefficient through variance components**.*Biometrics*. 2003;**59**:849-58. | Article | PubMed - Carrasco JL, King TS and Chinchilli VM.
**The concordance correlation coefficient for repeated measures estimated by variance components**.*J Biopharm Stat*. 2009;**19**:90-105. | Article | PubMed - Carstensen B, Simpson J and Gurrin LC.
**Statistical models for assessing agreement in method comparison studies with replicate measurements**.*Int J Biostat*. 2008;**4**. | Article | PubMed - Pinheiro JC and Bates DM.
**Mixed-Effects Models in S and S-PLUS**.*Springer: New York*. 2000. | Book - Roy A.
**An application of linear mixed effects model to assess the agreement between two methods with replicated observations**.*J Biopharm Stat*. 2009;**19**:150-73. | Article | PubMed - Nawarathna LS and Choudhary PK.
**A heteroscedastic measurement error model for method comparison data with replicate measurements**.*Stat Med*. 2015;**34**:1242-58. | Article | PubMed - Nawarathna LS.
**Heteroscedastic Models for Method Comparison Data**. (Doctoral Dissertation). ProQuest Dissertation and Theses Database. 2014. | Article - Nawarathna LS and Choudhary PK.
**Measuring agreement in method comparison studies with heteroscedastic measurements**.*Stat Med*. 2013;**32**:5156-71. | Article | PubMed - Choudhary PK, Senguptha D and Cassey P.
**A general skew-t mixed model that allows different degrees of freedom for random effects and error distributions**.*Journal of Statistical Planning and Inference*. 2014;**147**:235-247. | Article - Hawkins DM.
**Diagnostics for conformity of paired quantitative measurements**.*Stat Med*. 2002;**21**:1913-35. | Article | PubMed - Erasmus JJ, Gladish GW, Broemeling L, Sabloff BS, Truong MT, Herbst RS and Munden RF.
**Interobserver and intraobserver variability in measurement of non-small-cell carcinoma lung lesions: implications for assessment of tumor response**.*J Clin Oncol*. 2003;**21**:2574-82. | Article | PubMed - Choudhary PK and Yin K.
**Bayesian and Frequentist methodologies for analyzing method comparison studies with multiplemethods***. Statistics in Biopharmaceutical Research*. 2010;**2**:122-132. | Article - Dunn G and Roberts C.
**Modelling method comparison data**.*Stat Methods Med Res*. 1999;**8**:161-79. | PubMed - Lin LI.
**A concordance correlation coefficient to evaluate reproducibility**.*Biometrics*. 1989;**45**:255-68. | Article | PubMed - Lin LI.
**Total deviation index for measuring individual agreement with applications in laboratory performance and bioequivalence**.*Stat Med*. 2000;**19**:255-70. | Article | PubMed - Lin LI, Hedayat AS, Sinha B and Yang M.
**Statistical methods in assessing agreement: models, issues, and tools**.*Journal of the American Statistical Association*. 2002;**97**:257-270. | Article

Volume 5

Dassanayake ID and Nawarathna LS. **Assessing inter-rater agreement between multiple medical instruments with heteroscedastic measurements**. *J Med Stat Inform*. 2017;** 5**:1. http://dx.doi.org/10.7243/2053-7662-5-1

View Metrics

Copyright © 2015 Herbert Publications Limited. All rights reserved.

Post Comment|View Comments