Accuracy of reporting maternal and infant perinatal service system coding and clinical utilization coding

Background: To determine the extent to which the accuracy of reporting maternal and fetal clinical diagnoses and procedural coding varies between clinical utilization and perinatal services records. Methods: Information on perinatal outcomes was extracted from Kaiser Permanente Southern California (KPSC) health plan perinatal service system (PSS) and clinical utilization records. A random sample of 400 charts was selected from eligible medical records. Clinical codes were abstracted for two time periods: 1/1/2003 through 12/31/2004 (paper medical records) and 1/1/2008 through 12/31/2008 (electronic medical records [EMRs]). Abstracted clinical codes were compared with corresponding diagnosis and procedural records, both maternal and fetal. Differences in coding accuracy between time periods were assessed through comparisons of sensitivity, specificity, positive and negative predictive value. Results: The accuracy of clinical diagnoses and procedural coding varies considerably by outcome. Sensitivities were generally higher with clinical utilization than PSS records for placental abruption (97%), placenta previa (100%), preeclampsia (94%), gestational anemia (91%), PROM (83%), chorioamnionitis (97%), intrauterine growth restriction (80%), fetal distress (91%), malpresentation of the fetus (92%), incompetent cervix (73%), vaginal birth after cesarean delivery (75%), chronic hypertension (98%), and respiratory conditions (51%). Specificities and predictive values were acceptable for the majority of conditions. Conclusion: Our findings suggest that many perinatal outcomes are not reliably coded in the PSS records. Accuracy of perinatal outcome identification can be improved by supplementing PSS records with electronic diagnosis and procedural codes from clinical utilization. Completeness of collected medical and obstetrical outcomes improved slightly after implementation of the EMR system at KPSC.


Introduction
While several studies have used information from hospitalization and birth certificate records to evaluate maternal and child health programs [1] and study pregnancy outcomes such as placental abruption [2], preeclampsia [3], and uterine rupture [4], the accuracy of this information has been challenged by researchers [5][6][7]. It is argued that medical and obstetrical information gathered for billing purposes by health providers, which are a frequent source of data for many epidemiological studies and service evaluations, may lead to bias due to poorly recorded information [8,9]. This problem is further complicated by inconsistency in the way this important information is collected and reported across hospitals and states [10], creating challenges for researchers in the field. Therefore, rigorous assessment of the reliability and accuracy of this clinical information is critical.
Many epidemiologic studies use birth certificate and/or hospital discharge records as a source of data [2][3][4]. Birth certificate records provide researchers important maternal and child information, such as maternal demographic characteristics, parity, and child sex, which have been described by many authors to be fairly accurate. However, the accuracy can be poor for other important items, such as behavioral information (e.g., smoking and drinking alcohol in pregnancy), medical and obstetrical diagnoses, procedures, and birth defects [11][12][13][14].
Kaiser Permanente Southern California (KPSC) is a large integrated health care system with a patient population that is broadly representative of the racial/ethnic groups living in Southern California. It makes extensive use of its clinical record information for research, decision-making, and evaluation of the effectiveness of programs. The electronic medical record (EMR) system was fully implemented in all KPSC hospitals circa 2008. Among other reasons, it was intended to provide improved information on maternal and child health issues. However, the accuracy of coding of perinatal outcomes collected from EMRs in this large health plan has not been validated. In light of the common use of clinical coding at KPSC, it is important to evaluate the quality of perinatal outcome data collected from medical records and vital records.
This study has a twofold purpose: (1) to evaluate the com- pleteness and accuracy of reporting perinatal outcomes in health plan medical records and (2) to compare the quality of clinical information collected before and after the implementation of the EMR system.

Study population
The study population includes 16,401 women who gave birth in KPSC-Los Angeles and KPSC-San Diego medical centers. These two medical centers were chosen because they represent the two largest medical centers of KPSC and the combination of these two medical centers provides racial/ethnic and age distribution of the general KPSC membership. We selected a stratified sample of 100 deliveries from each of the two medical centers in each of the two time periods (1/1/2003-12/31/2004 and 1/1/2008-12/31/2008). The selection of the two time periods allowed us to study the accuracy of the health plan medical records before and after the implementation of the electronic medical system. We refer to the two timeframes as the paper medical record system [PMR] period and the electronic medical record system [EMR] period. Within each medical center and period, study subjects were divided into groups based on gestational age and birthweight categories.
Since the rate of preterm birth in KPSC setting is about 10%, in order to draw a valid conclusion, we need at least 10 cases per group. Women with low birth weight (<2,499 grams) babies and/or with preterm birth (<37 weeks of gestation) were oversampled to ensure enough number of adverse events to be evaluated, especially for rare adverse outcomes. Chart abstraction was performed by trained Research Associates using a standardized abstraction instrument that contains information on medical and obstetrical diagnosis and procedures, ultrasounds, and laboratory reports.

Data sources
To compare the characteristics of study subjects with all pregnant women in KPSC and the state of California, abstractors were instructed to record each child's sex and the following

Definition of variables
Information on maternal and infant characteristics, including maternal age (<20, 20-29, 30-34, ≥35 years) and education (<12, 12, and ≥ 13 years of completed schooling), race/ethnicity (non-Hispanic white [White], non-Hispanic black [African American], Hispanic, Asian/Pacific Islander, and other racial ethnic groups), prenatal care (early or first trimester and none or late initiation), smoking during pregnancy (yes/no), child's sex (male/female), birthweight (<2500, 2500-3499, 3500-3999, and ≥4000 grams) and gestational age (<37, 37-40, and ≥41 weeks) at the time of delivery were taken from the infants' PSS. Gestational age was based on a combination of last menstrual period and clinical estimates of gestational age from medical records. The maternal and infant clinical utilization records includes International Classification of Diseases, Ninth Revision; Clinical Modification (ICD-9-CM) codes from which we derived maternal medical and obstetrical history and procedures. The ICD-9-CM system is a widely used international coding system with standard classifications that are updated periodically [16]. Items examined, from complete chart review, in this validation study include: placental abruption, placenta previa, preeclampsia, premature rupture of membranes, chorioamnionitis, oligohydramnios, polyhydramnios, gestational fever, intrauterine growth restriction, fetal distress, fetal malpresentation, incompetent cervix, cephalopelvic disproportion, prolapsed cord, perineal laceration, Cesarean delivery, chronic hypertension, pregestational hypertension, and respiratory conditions and group B streptococcal (GBS) infection during pregnancy. The definitions of variables as well as ICD-9-CM diagnostic and procedural codes are listed in Appendix 1.

Statistical analysis
First, we assessed the distributions of maternal and child characteristics of the study population and compared these with distributions for all women in KPSC and the state of California who gave birth during 2003-2004 study period. Using chartreviewed medical records as the criterion standard, we estimated true positive fraction (TPF) and false positive fraction (FPF) for medical and perinatal diagnoses and procedural codes in the (i) PSS records, (ii) the clinical utilization records, and (iii) either PSS or clinical utilization records. To examine the level of agreement with the criterion standard diagnosis, we used the kappa statistics (к), which estimates the extent of observed agreement between two data sources after accounting for the role of chance. The kappa values were categorized and interpreted as slight (0.00-0.20), fair (0.21-0.40), moderate (0.41-0.60), substantial (0.61-0.80), and almost perfect (>80) agreement [17]. Using the chart review for the criterion of truth doi: 10.7243/2053-7662-1-3 we calculated the sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) for clinical codes obtained from PSS and clinical utilization records, as well as 95% confidence intervals (CI) for each measure. Sensitivity was calculated as the percentage of true-positive among those patients who had positive findings by chart review. Similarly, specificity was calculated as the percentage of true-negative among those patients who had negative findings by chart review. PPV was calculated as the percentage of true positives among those identified as positives by ICD-9 codes. NPV was calculated as the percentage of true negatives among those identified as negatives by ICD-9 codes. To account for our stratified sampling approach, PPV and NPV estimates were weighted to incorporate sampling fractions. This approach provides a more accurate estimate of predictive values for an unselected population. We also examined the accuracy of clinical coding separately for both study periods. All statistical analyses were performed using SAS version 9.2 (SAS Institute Inc., Cary, SC, USA.). The study was approved by the KPSC Institutional Review Board.

Results
( Table 1) includes distributions of selected infant and maternal demographic, medical, and obstetrical characteristics of study subjects, for all of KPSC, and for the state of California. In 2003 and 2004 combined, there were 59,492 and more than 1 million births to state residents in KPSC hospitals and in all California hospitals, respectively. Study subjects were more likely to be of advanced maternal age (≥35 years) and African-American. Women in KPSC and in the sample were more educated and more likely to initiate prenatal care in the first trimester than women delivered in other hospitals. As a result of our sampling approach, we also observed a higher prevalence of premature delivery with low birthweight. The very low rate of smoking in reviewed charts suggests that this exposure was not reliably captured in the birth hospital record.
( Table 2) shows the frequencies of identified medical and obstetrical conditions in reviewed medical records, PSS, and clinical utilization, and the degree of agreement between these data sources and medical chart. As compared to PSS, perinatal risk factors and adverse outcomes were more frequently noted in chart review and from clinical utilization records.
The agreement between PSS records and medical charts ranged from slight for intrauterine growth restriction (k = 0.11 [95% CI 0.05, 0.28]) to fair for gestational anemia, chorioamnionitis, and Oligo-/Polyhydramnios. Moderate agreements were observed for fetal distress, cephalopelvic disproportion, chronic hypertension, group B streptococcal infection, and a substantially higher or almost perfect kappa estimates for the remaining medical and perinatal conditions. On the other hand, the agreement between the various medical and obstetrical diagnostic coding in clinical utilization records and medical charts ranged from moderate (k = 0.46 [95% CI 0.23, 0.69]) to almost perfect (k = 0.98 [95% CI 0.95, 1.00]). PSS and clinical utilization records were combined by counting a condition as present if it was coded in either source. The combination identified more conditions than either approach alone. The agreement between the combined PSS and clinical utilization records and medical chart ranged from moderate (k = 0.49 [95% CI 0.27, 0.70]) for respiratory conditions to a perfect agreement for cord prolapse (k = 1.00 [95% CI 1.00, 1.00]). In general, kappa values were very similar for the combined data and for clinical utilization records alone.
( Table 3) shows the sensitivity and specificity for selected medical and obstetrical conditions of PSS and clinical utilization coding. Careful review of the full medical record by one of the authors (DG) was used as the criterion of truth. We observed low sensitivity of PSS records in capturing the following medical and obstetrical conditions: placental abruption (58%), placenta previa (50%), preeclampsia (59%), gestational anemia (23%), premature rupture of membranes (61%), chorioamnionitis (19%), gestational fever (54%), oligo-/polyhyramnios (22%), intrauterine growth restriction (8%), breech and other forms of malpresentation of the fetus (67%), fetal distress (42%), vaginal birth after cesarean delivery (VBAC, 63%), chronic hypertension (45%), respiratory conditions (0%), and Group B Streptococcus (GBS) infection (46%). However, the sensitivity of both PSS and clinical utilization records was low for respiratory conditions (51%). Observed specificities were mostly very high in both sources. When PSS and clinical utilization records were combined, sensitivity was substantially improved over that provided by the clinical records for only a few outcomes: PROM, gestational fever, incompetent cervix and CPD. It is notable that there was very little loss of specificity associated with this sensitivity improvement. Sensitivity and specificity based on combined PSS and clinical utilization records were much higher than either source alone.
( Table 4) shows the PPVs and NPVs for studied medical and obstetrical conditions for PSS and clinical utilization records again using reviewed medical records as the criterion standard. Predictive values have been adjusted to reflect the prevalence of preterm birth and low birth weight in the general population (as compared to the high prevalence in our sample). Despite the low sensitivity of the PSS for many diagnoses, its PPV was reasonable for most. However, we observed low PPV with PSS records for following medical and obstetrical conditions: premature rupture of membranes (41%), intrauterine growth restriction (17%), and VBAC (64%). The PPV for cephalopelvic disproportion was equally poor with both PSS and clinical utilization records. In comparison to PSS, the PPVs in clinical utilization records were much lower for placenta previa (98% vs. 79%), preeclampsia (99% vs. 65%), fetal distress (100% vs. 69%), incompetent cervix (100% vs. 43%), and chronic hypertension (98% vs. 69%).
We also examined the impact of EMR implementation on the accuracy of PSS and clinical utilization records. Our analysis revealed that the overall sensitivity, and specificity of these records improved slightly after EMR implementation (

Discussion
In this validation study of multiple obstetrical and medical conditions, we observed higher levels of sensitivity, specificity, PPV, and NPV for clinical diagnoses and procedural coding in clinical utilization records than PSS. However, the sensitivity of gestational fever was higher for PSS than clinical utilization records. Relative to clinical utilization records, we found that PSS are not a valid source for most studied perinatal outcomes. Sensitivity was low for PSS records in capturing nearly all medical and obstetrical conditions. These findings suggest that epidemiologists should not rely exclusively on PSS to investigate adverse perinatal outcomes. With the exception of respiratory conditions during pregnancy, the combination of PSS and clinical utilization records yields higher levels of doi: 10.7243/2053-7662-1-3 sensitivity and specificity than either of the individual data sources. Furthermore, the combined data showed marginal improvement in PPV and NPV for most conditions. The findings of this study support continued skepticism regarding the accuracy of birth certificate records. These records vary considerably by medical and obstetrical condition, posing methodologic challenges for perinatal studies [13][14][15]. Our findings confirm that sensitivity can be improved by using clinical diagnoses and procedural coding from clinical utilization records to identify perinatal outcomes, extending the findings of Romano et al., [9].
The quality of birth certificate and hospital discharge records has been well studied. These data are typically created for nonresearch purposes [18] and methods of data collection vary greatly by institution [10]. Editorials published in the Obstetrics & Gynecology [18] and American Journal of Epidemiology [19] reflect the concerns of several researchers regarding data quality in perinatal epidemiological studies. Therefore, it is important to assess the accuracy of clinical utilization and PSS records in an integrated health maintenance organization such as KPSC. Additionally, it is important to examine the impact of EMR implementation on the quality of clinical coding in clinical utilization records.
Between the years of 2004 and 2008, KPSC fully transitioned from hard copy to electronic medical records for both inpatient and outpatient services. The highly sophisticated EMR system at KPSC is an integrated health information management and care delivery system designed to enhance the quality of patient care. It provides access to comprehensive patient information, latest research regarding relevant best medical practices, and also helps to coordinate patient care. While switching from paper to electronic medical records confers many advantages, the impact of this transition upon PSS and clinical diagnoses is not well understood. To assess the accuracy of clinical utilization records, we examined key medical and obstetrical conditions both before and after EMR implementation. For most conditions, we observed a slight improvement in the accuracy of clinical coding in clinical utilization records following implementation, suggesting that electronic medical records may positively impact data quality. This large, population-based study, examined key medical and obstetrical conditions which have been shown to adversely affect pregnancy, including: respiratory conditions, Group B streptococcal infection, and incompetent cervix. The socioeconomically diverse patient population at KPSC, which is broadly representative of Southern California, makes our findings widely generalizable. The validation of clinical utilization records from time periods both prior to and subsequent to EMR implementation further enhances the strength of this study.
Objective assessments of data quality require masking of medical record abstractors. One potential limitation of this study is that medical record abstractors were not blinded to the source of the data. We do not know if this may have influenced our findings. However, a previous study that examined agreement showed no difference in the level of agreement between masked and unmasked medical records abstractors [20].

Conclusions
The findings of this study suggest that PSS records have serious accuracy and validity problems in identifying perinatal outcomes. Therefore, researchers should be aware of its potential limitations. It further suggests that the accuracy of perinatal data can be improved by using a combination of both PSS and clinical utilization records. In general, we found the overall accuracy of reporting maternal and fetal clinical diagnoses and procedural coding improved slightly after the implementation of electronic medical records.