Exploring the Use of Negative Binomial Regression Modeling for Pediatric Peripheral Intravenous Catheterization

A large study conducted at two southeastern US hospitals from October 2007 through October 2008 sought to identify predictive variables for successful intravenous catheter (IV) insertion, a crucial procedure that is potentially difficult and time consuming in young children. The data was collected on a sample of 592 children that received a total of 1,195 attempts to start peripheral IV catheters in the inpatient setting. The outcome here is number of attempts to successful IV placement, for which the underlying data appears to have a negative binomial structure. The goal of this paper is to illustrate the appropriateness of a negative binomial assumption using visuals obtained from PROC SGPLOT and to determine the goodness of fit for a negative binomial model. Negative binomial regression output from PROC GENMOD will be contrasted with traditional ordinary least squares output. Akaike’s Information Criterion (AIC) illustrates that the negative binomial model has a better fit and comparisons are made in the inferences of covariate impact. Many scenarios of negative binomial regression follow from an application to overdispersed Poisson data; however, this project demonstrates a data set that fits well under the traditional ideology and purpose of a negative binomial model.


INTRODUCTION
The purpose of this paper is to illustrate SAS® the code that assesses whether the assumptions of the negative binomial distribution are violated for the data set under consideration. It is necessary to explore the independence of each attempt to insert an IV using confidence intervals. The code to calculate the expected values provides output which illustrates whether the observations match the expectations under different assumptions. Finally, the paper compares a traditional ordinary least square regression analysis with the more specific negative binomial regression modeling in PROC GENMOD.
The data set under analysis includes many different variables, some of which were excluded from the analysis. The variables considered for the regression models here are: shift (whether the procedure was performed during the night or day shift); diff1 (whether the medical professional performing the procedure assessed the patient as difficult before the first stick attempt); dehydrated (a patient was either coded as dehydrated or not dehydrate/unknown); coopch1 (whether the medical professional performing the procedure assessed the patient as cooperative before the first stick attempt); Nurse1Exp (the self-reported level of experience for the medical professional performing the first stick attempt); and osbdm (the mean Obeservational Scale of Behavioral Distress score for the patient. The number of total insertion attempts ranged from 1 to 10.
A link the data set has been included in the reference section, for those who wish to explore it further.

ASSESSING INDEPENDENCE
This assumption of independence is critical to confirming that the data can be considered under a negative binomial assumption. Considering the exact 95% confidence intervals for the binomial proportion of successes per attempt was the main method of examining the independence of the attempts. The code to output these values follows, along with a table summarizing the output. The DATA steps were omitted for conciseness.

GRAPH OF CONFIDENCE INTERVALS
To better illustrate these confidence intervals, a graph of the proportions was created in SGPLOT. The code and picture of the output is below.
proc sgplot data=sample; scatter x = Stick y=Prop/yerrorlower=Lower yerrorupper=Upper Markerchar=SampleSize; run;  . This is a visual representation of the 95% confidence intervals for the proportion of successes per IV attempt. A line has been drawn through the graph at 43%, to illustrate that each interval overlaps, indicating that the assumption of independence required for the negative binomial setting is not violated.

CONFIRMING EXPECTATIONS
The expected values under the negative binomial, zero-inflated negative binomial, and Poisson distributions were calculated and output. The information for the zero-inflated negative binomial distribution was calculated for thoroughness. The Poisson distribution was selected because it is the most common distribution for count data.
Note that at least one attempt to insert an IV is necessary. Since all of the distributions under consideration include the possibility of success on the 0 count, the table lists the values by the number additional attempts before success. This can also be thought of as the number of failed attempts.
The SAS® coding used was to fit intercept only regression models for each distribution, and is detailed below.

GRAPH OF EXPECTED VS. OBSERVED VALUES FOR NEGATIVE BINOMIAL DISTRIBUTION
To better illustrate how closely the observed values match the expected values, a graph was created using PROC SGPLOT. First, the outputs from the intercept only regression had to be combined into a single data set. In the code below it is called "counts". The DATA step for this manipulation has been omitted for conciseness. The code and figure are below. The Chi-squared value for the data under the assumption of a negative binomial distribution was computed to be 9.80 (p-value =0.2), which verifies that such an assumption is appropriate.

REGRESSION MODELS
To analyze the data, two different regression models were fit. An ordinary least squares model was created, since such an analysis is the most common choice for many clinicians. Additionally, a negative binomial regression model was also created, since the previous work verified that the data seem to fit well to a negative binomial distribution.
Caution must be used when comparing the two models. Because the NB regression modeling requires the use of a loglink function, direct comparison of the coefficients is not appropriate. Instead, the least squared adjusted means are provided and analyzed.
The code and accompanying tables detailing the output are detailed below.

CONCLUSION
Based on the confidence intervals for each stick attempt, as well as the observed values being similar to the expected values of negative binomial data, the negative binomial assumption is a good fit for modeling the IV insertion process data. The smaller AIC of the negative binomial regression model indicates that it is a better fit than the OLS regression model. Additionally, the model based adjusted means for the significant factors are generally smaller with smaller standard errors under the negative binomial regression model. Thus, the negative binomial regression model appears to give more precise effects for each significant factor. It is interesting to note that the general clinical inferences implied by both models are the same, that is to say, the variables that had a significant impact on number of IV placement attempts under a negative binomial model were also identified in the OLS model.
Some limitations of the data should be noted. The analysis did not take into consideration the changes in nurses, if any, between stick attempts on the same patient. Furthermore, data on the actual site of each stick attempt was not collected. Thus, the difficulty of the stick site (only the perceived difficulty of the patient by the health provider) was not assessed as a possible factor in either the original or this new analysis.
The current study is important in that it adds an important dataset to the study of negative binomial regression analysis. The complete results and analysis can be found in an article of the same title as this paper, which can be found online and is included in the References section.