
Abo-Zaid G and Morrissey K. Examining the association between C-Reactive protein and obesity by using the fractional polynomial approach; applying on NHANES dataset from 2001 to 2010. J Med Stat Inform. 2017; 5:2. http://dx.doi.org/10.7243/2053-7662-5-2
Ghada Abo-Zaid1,2* and Karyn Morrissey1
*Correspondence: Ghada Abo-Zaid K.Morrissey@exeter.ac.uk (or) abozaidg@gmail.com
1. European Centre for Environment and Human Health, University of Exeter Medical School, Knowledge Spa, Royal Cornwall Hospital, Truro, Cornwall TR1 3HD United Kingdom.
2. Faculty of Commerce- Ain Shams University, Khalifa El-Maamon St, Abbaisiya Sq., Cairo 11566, Egypt.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Objective: This study uses a flexible nonlinear approach, Fractional polynomial models (FPs), to examine the association between obesity and C-reactive protein to select the best fitted model within 44 potentially FP models.
Methods: Data for 5 years (2001-2010) of the National Health Interview Survey (NHANES) was used. All respondents aged between 17 and 74 were included in the analysis. CRP was transformed to ln(CRP) to eliminate skewness and missing values were removed from the analysis. A fractional polynomial approach was applied to measure the relationship between elevated levels of CRP and obesity. A closed test was used to select the best model among the 44 models.
Results: The best fitted fractional polynomial regression model contained the powers -2 and -2 for BMI. The association between the ln(CRP) and BMI when estimated using the FP approach exhibited a J-shaped pattern for women and men. Women have a higher risk of elevated CRP level compared to men. A deviance difference test yielded a significant improvement in model fit of -2 and -2 compared to other BMI functions.
Conclusion: The fractional polynomial regression model is the most robust estimator of BMI compared to other linear or nonlinear models.
Keywords: Categorization, C-reactive protein, fractional polynomial model, linear model, obesity
C-reactive protein (CRP) is an acute-phase protein of the family of the pentraxins and is widely used in clinical settings to monitor chronic and acute inflammatory conditions.. Recent research has found that the increase of BMI is associated with elevated CRP concentrations regardless of sex, age, and ethnicity [1,2].
Various models are available to analyze the relationship between CRP and BMI. However, the choice between linear and non-linear analysis is controversial in applied fields such as medicine, clinical trials, and epidemiology. A few studies show that categorizing the continuous data is preferable , especially if the association between two variables is nonlinear. (Wang et al., 2016). However, Royston, et al(2006) [3]. Reported many pitfalls for this approach such as the loss of information and decrease the power of the model [3-7], while debate over the appropriate cut-off points for normal, overweight and obese further complicate the categorization of BMI.
A number of studies have used BMI as a continuous variable. However, estimating the association between BMI and a set of covariates using a continuous scale is challenging because the relationship may be nonlinear, and the BMI distribution is often right skewed. Furthermore, linear models require many assumptions including data normality and the absence of multicollinearity, and heteroscedasticity in the data. In addition, linear analysis assumes a constant influence of the independent variable on the outcome [5].
Previous research has used quadratic or cubic polynomials (non-linear approach), but the range of curve shapes afforded by low ordered polynomials is restricted [8]. In light of these modeling limitations, the purpose of this study is to investigate a flexible approach to modeling the relationship between obesity and CRP. Combining the strengths of linear and nonlinear models, FP models use polynomial transformations to measure the association between the independent and the dependent variables [4]. Fractional polynomials are more powerful than the regular polynomials and provide flexible transformation for continuous variables to determine the best fitting function form for BMI , by using the closed test [7,9,10].
The objectives of this paper are twofold. First this paper examines the association between CRP and obesity, and whether an association remains after adjusting for variables such as age, cotinine level (smoking status), alcohol consumption, race and gender. Second, we focus on comparing model outcomes between linear, nonlinear, categorization and multivariate fractional polynomial model (MFP) approaches to select the best-fitted model for the association between ln (CRP) level and BMI.
Study population
This study used National Health Interview Survey (NHANES)
which is designed to examine the health and diet of noninstitutionalized
and civilian (children and adults) in the United
States. The National Centre of Health Statistics (NCHS) at the
Centre for Disease Control and Prevention [11] was responsible
for setting up the NHANES surveys and its program of studies.
NCHS has also approved the study protocol of the NHANES
survey; for more details about the methods of the ethics approval,
the design of the survey and the methodology study
are available elsewhere [12,13]. The focus of this study was
participants aged between 17 and 74 years old. Individuals
were selected based on the availability of their CRP and their
BMI measurements. The data for this study was based on five
cross-sectional independent studies conducted from NHANS
dataset, starting from 2001 to 2010. Later studies were not
included because CRP was not measured post 2010. Pooling
the data across years yielded a sample of 52,749 participants
with 25,976 (44%) and 26,773(51%) men and women participants
respectively.
Measurements, Laboratory variables, and other variables
Measurements
CRP: “The latex-enhanced nephelometry with high sensitivity
by using a Dade Behring Nephelometer II Analyzer System
(Dade Behring Diagnostics, Inc., Somerville, New Jersey) was
used to measure CRP levels [13]. CRP level was measured
on a continuous scale. For the purpose of this study it was
transformed to ln(CRP) to eliminate skewness.
Laboratory variables
The medical examination centre assessment was responsible
for measuring the laboratory variables. “Serum cotinine was
measured by an isotope dilution-high-performance liquid chromatography-atmospheric pressure chemical ionizationtandem
mass spectrometry. Cotinine concentrations were
derived from the ratio of native/labeled cotinine in the sample,
by comparisons to a standard curve. Descriptions, in details,
of serum cotinine measurement NHANES, are available online
[14]. The cotinine level was used as an indicator of the
participants' smoking status. It is used as a continuous scale
on the statistical analyses for all models.
Other variables
Self-reported questionnaires were used for race, gender, and
age at the baseline.
Race, in the NHANES dataset, was presented in five categories listed as American, Non-Hispanic white, non-Hispanic black, other Hispanic, and other races. Gender was classified into male and female while age recorded on a continuous scale. BMI is measured on a continuous scale, and it is calculated by weight (kg)/height (m2). A respondent was considered to be obese if their BMI was greater than 30. The individuals with (BMI >50 or BMI<18) were discarded from the analyses, to avoid the outliers.
Data analysis
Five cross-sectional independent studies from NHANES dataset
were used using data for 2001 to 2010. This data pooled
together yielded a sample of 52749 participants with 25967
men and 26773 women, respectively. Weighting variables
were used to account for the complex sample design of the
NHANES. Each of the statistical analyses were stratified by
gender to consider the differential biological factors that
lead to weight gain between men and women [12]. The
association between CRP level and BMI was modeled using
linear, non-linear, categorical, and FP statistical approaches.
Each statistical model was adjusted for age, cotinine level
(smoking history) and alcohol consumption.
Statistical models
Categorical model
In this study, we categorize BMI into three groups (obesity:
BMI>30, overweight: 30>BMI>25, and normal: 18.5<BMI<25)
based on World health organization (WHO) criteria. The
(unadjusted) categorization model can be written as follows:
Where X1 and X2 are BMI levels for overweight and obese participants. ß0 is constant while ß1, and ß2 are the influence of the overweight and obese individuals compared to the normal weight participants. ln (CRP) is the logarithm of CRP level, and is the error term.
Linear model
The linear regression (unadjusted) model can be written as
follows:
Where XBMI is the BMI level on a continuous scale, and ß3 is the impact of increasing the BMI level.
Fractional polynomial approach
The fractional polynomial approach is a flexible model that
combines aspects of the linear and nonlinear models. Essentially,
the mechanism of this model depends on using
polynomial transformations to estimate the relation between
CRP level (outcome) and BMI level (covariate). Fractional
polynomial generate 44 models; The first degree polynomial
models (FP1) is based on one polynomial term and it estimates
8 models derived from the power of the following set {-2, -1,
-0.5, 0, 0.5, 1, 2, 3}; if the power equals zero this means taking
the logarithm of the BMI covariate (log(X)) ]. If the power term
is one for BMI, this means the model estimate is linear. The
FP1 model (unadjusted model) can be written as:
Where P1 is the first-degree power of the Fractional polynomial model. It transforms the BMI level based on the set {-2, -1, -0.5, 0, 0.5, 1, 2, 3} and produced 8 possible FP1 models.
The second-degree fractional polynomial models (FP2) are based on two polynomials (power terms) and it estimates 36 model based on the same power set noted above [9]. The FP2 model (unadjusted model) can be written as
Where р1 and р2 are the powers terms of FP2, when р1= р2, eight models are expected with the same powers. The model will be on the form
Only when the powers are the same. The closed test is used to select the best-fitted model among the 44 models [9].
Assessment criteria
To assess the strength of the fractional polynomial approach
for BMI compared to the other three approaches (e.g., linear,
polynomial and categorization model), the models were
stratified by gender. The assessment was repeated after adjusting
the models by age, cotinine level (smoking history)
and alcohol consumption. The models are compared using
three methods; (i) Deviation difference, (ii) graphically based
on the shape of the CRP level and BMI, (iii) Root mean square
error (Root MSE). STATA statistical software (Version 12) was
used to undertake each of the statistical analyses.
Summary statistics of the NHANES dataset sample are shown in Table 1. In this study, 52749 participants were included from merging five continuous studies (2001 to 2010); this sample is classified to 26773 men and 25976 women with an aver-age age of 44.89 and 44.25 years old respectively. Race and gender were measured on a categorical scale, while age and BMI were measured on a continuous scale (mean SD (Table 1). The majority of the respondents in the sample were in the overweight and obese classification; the percentage of obese in women is greater than men with 31.32 % and 26.01% respectively; while the percentage of overweight is more in men compared to women with 34.62% and 27.80 respectively. The percentage of people in the normal weight range is almost the same for both gender, Heavy smokers made up 27.85% of the male sample, and 26.22% of the female sample. The majority of the participants were non-smokers with 59% and 65% for men and women respectively. Ever smokers made up 54.93% of the male sample and 40.47% of the female sample. With regard to race, 42% of the sample was non-Hispanic white, followed by 25% Mexican-American and 25% non-Hispanic black for both men, and women samples, see Table 1.
Table 1 : Descriptive Statistics for NHANES rid Study From 2001 To 2010 Used to Estimate the Relationship Between the Elevated Level Of CRP and Obesity.
Model fit and findings
The best-fitted model for obtained from the second-degree
fractional polynomial model with powers (P1=-2 and P2=-
2) for both male and female sample, where BMI=BMI/100.
After adjusting the model by age, cotinine level and alcohol
consumption the models remain unchanging. The findings
of FP2 model were better than the FP1 model for both
men and women with ( Male: Deviance Difference=61.72,
P-value <0.0001), and (Female: Deviance Difference=128.89,
P-value<0.0001) respectively. The BMI- ln(CRP) curves for FP1
and FP2 models is illustrated in Figure 1. The FP2 model yielded
a J-curve for the association between the BMI and CRP level
for both gender; For instance, if BMI measure is less than 18,
the impact on the CRP level is barely visible; while if BMI is
over 18, the CRP level is slightly raised and it has significantly
increased if BMI was over 30. This shows a positive direct association
between CRP level and BMI for men and women in the sample, and the findings of the FP2 model were more
accurate over FP1 with (p1=log(BMI)).
Figure 1 : The Prediction Findings for the Association Between
CRP Level and BMI for Male and Female Samples by Using
the Best-Fitted FP2 Model With Powers (p1=-2, p2=-2) And
FP1 Model With Power (p1=0).
In addition, the FP2 model produced more accurate results for the both the male and female sub-samples compared to the linear regression model (Male: Deviance difference=231.31, Female: Deviance difference=378.58 and P-value=0.0001), linear quadratic model (Male: Deviance difference=670.41, Female: Deviance difference=1074.65 and P-value=0.0001) and categorization model (Male: Deviance difference=2776.958, Female: Deviance difference=3011.603 and P-value=0.0001) respectively. Figure 2 gives the predicted values for ln(CRP ) with BMI for male and female sub-samples separately using the FP1 and FP2 models. The BMI-ln(CRP) curve produced J-shape and it was more precise than the FP1 model and it also become noticeable that the risk of elevated level of CRP level is more in women over men.
Figure 2 : The Predicted Values of CRP Level for All Data
and After Stratified by Gender for FP1 and FP2 Models.
Using the FP2 approach, the male and female sub-samples were further stratified by race (Mexican American, other Hispanic, non-Hispanic white, non-Hispanic black, and other races), and produced five ln(CRP)-BMI curves, see Figure 3. In the male sub-sample, the best FP2 model was the same with powers (p1=-2, p2=-2) for all race groups except the 'other Hispanic', and 'others groups' with powers (p1=-2, p2=-2), and p1=2, p2=2) When BMI is greater than 30, the level of CRP was slightly higher for the other-Hispanic group followed by non- Hispanic black compared to others. In the female sub-sample, all race groups produced the same FP2 models with powers (p1=-2, p2=-2), and yielded a J-shape. The Mexican American group had the highest increase in CRP level and the and the other races group the lowest, see Figure 3.
Figure 3 : The Ln(CRP)-BMI Curve After Classified by Race for
the Male and the Female Samples.
Figure 4 presents the fitted curves of the association between ln(CRP) level and BMI at the 95% confidence interval for the FP2 model. The curves were for men and women who are never smoke at age 40, 50 and 60 respectively, with a BMI greater than 18.5. The estimated association between BMI and ln(CRP)level was almost J-shaped for both genders at ages 40, 50 and 60 respectively. However, the ln(CRP) level slightly increased at age 60 for both men and women. The predicted values of ln(CRP) level increased rapidly when BMI >30 and reach a maximum at BMI=50. The large confidence interval at the right tail of the BMI distributions is probably the effect of a small number of the individuals who are more obese (BMI>40).
Figure 4 : Predicted Ln(CRP)Level and 95% Confidence
Interval Based On The Best Fitted FP2 Model for Male and
Female Never Smoke, Age 40, 50 And 60.
Figure 5 presents a comparison of the FP2 model, the linear-quadratic model and the categorical model stratified by gender. The ln(CRP)-BMI curves were based on individuals who never smoked, aged 50 plus. The top panel of Figure 5 evaluates FP2 model versus linear-quadratic model. Both models generate a J-shape. However, the linear-quadratic model underestimated the predicted values of ln(CRP) when BMI was between 20 to 35 for male and 20 to 38 for female respectively. It also overestimated the findings of ln(CRP) measure when BMI was over 34 and 38 for both men and women compared to the FP2 model.
Figure 5 : The Best-Fitted FP2 Model Compared With the
Linear-Quadratic Model for BMI (Top Row).
The middle panel of Figure 5 evaluates the FP2 model versus the linear model. Both models produce similar predicted values for ln(CRP) when BMI <35. However, linear regression model overestimated the predicted values of ln(CRP) when BMI was greater than 35 for the male and 36 for the female samples, respectively.
In the bottom panel of Figure 5, the FP2 model was compared with the categorical model for the respondents aged 50 plus, who never smoked. BMI was again categorized into the three groups recommended by the WHO. The categorical approach underestimated the findings of the ln(CRP) measure when BMI was approximately more than 35 for both genders. It also overestimated the results when BMI was almost less than 20 for men and women, respectively. All analyses were repeated for all age participants and we obtained the same findings for male and female samples.
Ln(CRP) level estimates for male and female samples (unadjusted model) are presented in Table 2. For both samples, the FP2 model produced the lowest root mean square error ( Root MSE) compared to the quadratic, linear and categorical model. The categorical approach yielded the highest root MSE. All models produced a positive association between CRP level and BMI measure. Nevertheless, the impact of BMI on CRP level was different based on the selected model. The categorical approach provided the highest association for both samples. The effect of an increase in BMI on elevated CRP level was slightly high on the female sample over the male sample for all models. In the linear model, the effect is constant which means that one unit increase in BMI will increase ln(CRP) level by ß3.
Table 2 : The Comparison Between FP2, Quadratic, Linear and Categorical Unadjusted Models.
In the polynomial model, one unit increase at BMI will raise ln(CRP) level by ß3 (BMI2). In the categorical model, one unit increase in BMI will elevate ln(CRP) level by ß3 if 25<BMI<30, and by if BMI >30. In the FP2 model, one unit increase in BMI increase ln(CRP) level by ,
BMI/10. For example, at the male sample in the unadjusted models, if BMI increase from 45 to 46, ln(CRP) estimate level will increase by 0.009, 0.01, 0.11 and 0.22 for FP2, the quadratic, linear and categorical models respectively. The categorical and the linear models overestimated the results, while the quadratic model underestimates the findings compared to FP2 model. In addition, the Linear model produced a constant effect, for example, if BMI increase from 40 to 41, the Ln(CRP) level expected to increase by 0.11, and if BMI increase from 24 to 25, the elevated level of ln(BMI) expected to increase by 0.11. The FP2 model produced the most accurate findings, neither underestimating nor overestimating the elevated level of CRP. The findings were similar for the male/female sub-samples with the female sample reporting a slightly higher effect compared to the male sample.
For the adjusted model, all models were adjusted for age, cotinine level, and alcohol consumption. The findings for each model were presented in Table 3. The same findings were obtained for both the male and the female sub-sample. FP2 model remains the best-fitted model over other models.
Table 3 : The comparison between FP2, Quadratic, Linear and Categorical adjusted models.
In this study, the comparison between different statistical methods to measure the association between elevated CRP level and obesity was undertaken. All models found a positive association between elevated levels of CRP and Obesity and this relationship varied across gender and race. The findings in this study with regard to the association between the ln(CRP)- BMI association and estimate curves are matched with the findings of previous studies [15-21]. However, The FP method provided a better model fit compared to other statistical models (linear, quadratic, and categorical). The FP approach has also been shown to provide a more robust and precise method to determine the function of the BMI covariate [22]. In the FP models, the findings of the BMI-ln(CRP) curves were different across gender and race. In particular, BMI-ln(CRP) curves were a J-shaped for both genders (male and female) [7]; however, the BMI-ln(CRP) curves were slightly higher for female compared to male; and in the female sub-sample the curve for Mexican American was highest compared to all other race groupings. In the male sub-sample, the other- Hispanic group displayed the highest curve when BMI>30. We also found differences in the shape of BMI-ln(CRP) curve and ln(CRP) estimates compared to other models for women and men sub-samples.
Using the WHO categorization for obesity, the categorical model provided the least precise predicted values of ln(CRP) level [7,23]. Making full use of the scale data for BMI, the linear model demonstrated advantages over the categorical model. However, to use the linear models, the normality assumption must be achieved, and if not the linear model may overestimate or underestimate the findings. Linear models also produce a constant impact, of the association between ln(CRP) and BMI across the scale; this might be imprecise as one unit increase in BMI from 20 to 21 might not have the same influence if BMI increase from 35 to 36 [7]. The quadratic model yield a non-linear association between BMI and ln(CRP); however, this relation was restricted to a certain transformation, which might not be the best choice of non-linear approach.
The FP method has a number of advantages over the other models [24]; FP is a flexible model that allows the data itself to determine the function form for BMI across 44 available function[8]; and the closed test is used to select the best fitted model that produced the best predicted values for elevated ln(CRP) level based on BMI measure. The estimates of ln(CRP)- BMI curves from the FP method shows symmetry over other models, this is an indicator of less variation on the predicted values of ln(CRP). The findings of the FP approach showed that ln(CRP)level increased exponentially for extremely obese participants and also we obtained the same results for individuals aged 40, 50 and 60 for both genders.
A limitation of this study is the exclusion of missing data rather than statistically addressing these data. The focus in this study is only on the association between the elevated level of CRP and obesity for both genders controlling for age, cotinine level, and alcohol consumption. However, some other important explanatory variables including socio-economic position might produce a better model fit. Finally, the findings of this study are based on only five cross-sectional observational studies pooled from NHANES dataset, as such the findings may using other trials (e.g., randomized clinical trials).
Assessing the association between the obesity and CRP level is vital, as obesity and CRP level (as an inflammation factor) have been linked to cardiovascular disease (CVD) either in childhood or in adulthood. We found a positive association between CRP level and obesity for all selected statistical models (linear, non-linear, categorical and FP model). Categorizing the continuous variable and using categorical models, is the least preferable method to examine the relationship between obesity and CRP levels. While providing better estimates than the categorical models, the linear model requires data linearity assumptions and it also assumes a constant influence of the covariate on the outcome. Polynomials models (e.g., quadratic and cubic models) are slightly flexible as they are nonlinear models and also there is no need to categories the continuous variable; however the powers of the polynomial models are limited. FP model produced the best estimates for ln(CRP) level-BMI relationship with the minimum deviation over the other models and as such may be useful in examining other health outcomes.
The authors declare that they have no competing interests.
Authors' contributions | GAZ | KM |
Research concept and design | √ | -- |
Collection and/or assembly of data | √ | -- |
Data analysis and interpretation | √ | -- |
Writing the article | √ | √ |
Critical revision of the article | -- | √ |
Final approval of article | √ | √ |
Statistical analysis | √ | -- |
The authors would like to acknowledge the European Centre for Environment and Human Health at the University of Exeter Medical School- UK and the School of commerce, Faculty of commerce Ain Shams University , Cairo- Egypt.
Editor: Chi-Ming Wong, University of Hong Kong, China.
Received: 26-Mar-2017 Final Revised: 08-May-2017
Accepted: 16-Jul-2017 Published: 25-Jul-2017
Abo-Zaid G and Morrissey K. Examining the association between C-Reactive protein and obesity by using the fractional polynomial approach; applying on NHANES dataset from 2001 to 2010. J Med Stat Inform. 2017; 5:2. http://dx.doi.org/10.7243/2053-7662-5-2
Copyright © 2015 Herbert Publications Limited. All rights reserved.