Joseph Hagan^{1*} and Bin Li^{2}

*Correspondence: Joseph Hagan jlhagan@bcm.edu

1. Assistant Professor, Baylor College of Medicine, Department of Pediatrics, Texas Children's Hospital - Pavilion for Women, 6651 Main Street, Houston, TX, USA.

2. Associate Professor, Louisiana State University, Department of Experimental Statistics, Room 173 Martin D. Woodin Hall Louisiana State University, Baton Rouge, LA, USA.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The p-chart has traditionally been used to monitor processes that yield binary data. The p'-chart that adjusts the p-chart control limits for between subgroup variation was proposed in 2002 in an effort to reduce the p-chart's false alarm rate in the presence of large subgroup sizes. As illustrated with an example using real pharmacy data, the p-chart and p'-chart often yield very different results and how to decide which chart is appropriate for a given situation is not clear. A simulation study was undertaken to examine the phase II performance of the p-chart and p'-chart. With large subgroup sizes or when the between subgroup variation is high, p-charts have relatively high sensitivity to detect out-of-control shifts, but exhibit a high false alarm rate when the process is in-control, while p'-charts have low sensitivity to detect out-of-control shifts but have relatively low false alarm rates. For a specific situation, Youden's index can be used to decide whether to act on p-chart or p'-chart results while considering the relative costs of false positives and false negatives.

**Keywords**: Statistical process control charts, p-chart, p'-chart, Sensitivity, False alarm rate

The p-chart is a type of statistical process control (SPC) chart
designed to monitor data arising from a binomial distribution
where each individual unit is dichotomized according to
whether or not it is "nonconforming". In this situation, a unit is an
observed value from a Bernoulli random variable with p being
the probability of the event occurring and 1- *p* the probability
of the event not occurring. When the process is "in-control" and
each unit in a random sample of size *n* is independent having
probability p of the event occurring, the number of events, *Y*,
that occur in a sample follow a binomial distribution with the
probability of observing y events given by:

For a given random sample of *n* units, the proportion having
the event is computed as:

The population mean and variance of *p*^ are represented,
respectively, by:

and

$${\sigma}_{\widehat{p}}^{2}=\frac{p(1-p)}{n}.$$ (4) (4)For a sample of data collected over time, the center line of the p-chart is given by,

$$\overline{p}=\frac{{\displaystyle \sum _{i=1}^{k}{y}_{i}}}{{\displaystyle \sum _{i=1}^{k}{n}_{i}}},$$ (5) (5)where

where:

*n _{i}*= the sample size of the ith subgroup,

The upper and lower three-sigma control limits of the pchart are computed as:

$$\overline{p}\pm 3{\widehat{\sigma}}_{{p}_{i}},$$ (6) (6)where

$${\widehat{\sigma}}_{{p}_{i}}=\sqrt{\frac{\overline{p}(1-\overline{p})}{{n}_{i}}}.$$ (7) (7)Thus, a p-chart displays the proportion of nonconforming units after units are aggregated by subgroups (e.g., lots from a manufacturing process, or some unit of time such as months).

Use of SPC charts is traditionally divided into phase I and phase II applications. Phase I typically involves analysis of retrospective data to construct control limits once the process is deemed to be in statistical control. Then phase II is used to monitor the process so as to quickly detect any aberrations. Subgroup proportions outside of the control limits are considered to be the result of "assignable causes, usually represented by changes in the parameter(s) of a probability distribution representing the common cause variation in the process" [1]. The source of assignable cause variation should be investigated using a predetermined outof- control action plan in order to learn about factors that exert a favorable or unfavorable influence, depending on the direction of the deviation [2]. But it has long been noticed that a large percentage of subgroups' proportions will fall outside the p-chart control limits when the subgroup sizes are large. Examination of equations (4) and (6) reveals that the width of the p-chart's control limits is inversely proportional to the number of observations in the subgroup. In practice, extensive time and effort might be needed to investigate a subgroup proportion falling outside of the control limits. Thus it is desirable to only take action when true assignable causes occur and to minimize the number of "false alarms". A false alarm in this context means to interpret a subgroup's observed proportion as being the result of assignable cause variation when, in truth, it was the result of common cause variation inherent to the system.

Traditionally, X-charts (also called individuals charts) were
used for samples with large subgroup sizes in order to provide
wider control limits by adjusting for between subgroup
random variation [3]. Formally, X-charts are appropriate for
continuous data arising from a normal distribution. For a
continuous random variable x, the center line of the X-chart
is simply the mean of all observations in the sample of size *n*,

and the usual three-sigma control limits are computed as

$$\overline{x}\pm 3\widehat{\sigma},$$ (9) (9)where

$$\widehat{\sigma}=\sqrt{\frac{{\displaystyle \sum _{i=1}^{n}{({x}_{i}-\overline{x})}^{2}}}{(n-1)}}.$$ (10) (10)When using an X-chart for binary data, the centerline is p, just as described for the p-chart. However the individual observations
are now p^i , where p^_{i} =* y _{i} / n_{i} *, which represents
the proportion of units having the event in the ith subgroup.
So for binary data,σ^ is computed by substituting p^

Recognizing the aforementioned deficiencies of using the p-chart and X-chart for binary data, Laney proposed a modified p-chart to accommodate random between subgroup variation inherent in the system [4]. Laney's p'-chart also yielded control limits that varied across subgroups as dictated by the subgroup sizes. The center line for Laney's p'-chart is given by the same value of p as the traditional p-chart, but the modified three-sigma control limits are computed as

$$\overline{p}\pm 3{\widehat{\sigma}}_{{p}_{i}}{\widehat{\sigma}}_{z}.$$ (11) (11)The ^σ_{z} term adjusts for variation across subgroups. In order
to compute ^σ_{z} , first the subgroup proportions' z-scores are
computed as

where ^p_{i} denotes the proportion of units having the event
in the ith subgroup. Then the average moving range of the
z-scores can be obtained by

where m is the number of subgroups. Now ^σ_{z} is calculated as

where 1.128 is the expected value of the relative range for a sample size of 2, which is the number of values used to compute the moving range [2].

For a given situation, the analyst is left to decide whether
to use the p-chart or p'-chart. There are rules of thumb and
general guidelines to help the analyst decide. For example,
Provost and Murray (p. 272) state in a book widely used to
teach quality improvement concepts at healthcare organizations
that "Only when subgroup sizes are above 1,000 should
the [p'-chart] adjustment be even considered" [5]. Alternatively,
some analysts implement a Variance Ratio Test (VRT)
adapted from Jones and Govindaraju (2000) that compares
the amount of observed variation in a sample of data to the
amount of variation expected from a binomial distribution [6].
(The VRT is the basis for the "P Chart Diagnostic" in Mintab^{®}
Statistical Software that is used to advise the analyst about
whether to use a p-chart or p'-chart for a particular sample of
data.) Jones and Govindaraju pointed out that overdispersion will occur if the probability p varies "in a smooth way" across
subgroups, which will give rise to an elevated false alarm rate
if the standard p-chart is used [6].

To apply the VRT, normalized event counts are computed for each subgroup as

$${\tilde{y}}_{i}={\mathrm{sin}}^{-1}\sqrt{\frac{{\widehat{p}}_{i}\overline{n}+3/8}{\overline{n}+0.75}},$$ (15) (15)where n denotes the average subgroup size. Then normal
scores of the *y _{i}* are computed as

for* i=1, 2.......m*, whereΦ^{−1} is the standard normal quantile
function with a = 3/8 if m ≤ 10, otherwise a=0.5 if m>10. Next,
the simple linear regression model
$${z}_{{\tilde{y}}_{i}}={\tilde{y}}_{i}$$ is fit using only
the middle observations for which

The estimated slope coefficient, *β^ _{1}* , from this fitted regression
equation is used to estimate two standard deviations for the
observed variation as 2 / β^

so the estimate of the expected two standard deviations is $$\frac{1}{\sqrt{\overline{n}}}$$Therefore the ratio of the observed to expected estimates of two standard deviations is computed by

$$\text{ObservedVariation:ExpectedVariation=}\frac{\left(2/{\widehat{\beta}}_{1}\right)}{\frac{1}{\sqrt{\overline{n}}}}.$$ (17) (17)This ratio is compared to the 95% upper limit computed using all of the data as

$$\text{95\%UpperLimit=}{e}^{\left(0.185+5.62m+0.274/(\overline{n}+\overline{p})\right)}.$$ (18) (18)If the Observed Variation: Expected Variation from equation (17) exceeds the 95% Upper Limit from equation (18) then the VRT indicates overdispersion, in which case the p'-chart would be considered more appropriate than the p-chart.

**Illustrative Example
**Consider the real pharmacy data in Table 1 that displays
the proportion of inpatient narcotic orders that were for
hydrocodone at a large pediatric hospital [7]. On October 6,
2014 hydrocodone-containing products were "up-scheduled"
by the United States Food and Drug Administration from C-III
to C-II status, thereby severely restricting electronic or phone in prescriptions. It would be important to quickly detect a
reduction in orders for hydrocodone-containing products in
this situation so that healthcare providers could be educated
about patient risks associated with hydrocodone alternatives
in a timely manner.

Table 1
**: Hydrocodone medication orders by week at a large
pediatric hospital.**

Should a p-chart or p'-chart be used to monitor the hydrocodone medication orders summarized in Table 1? Following the recommendation of Provost and Murray, one would use a p-chart for the hydrocodone medication order data since all of the subgroup sizes are less than 1,000 [5]. On the other hand, as shown in Figure 1, Minitab's P Chart Diagnostic recommends using a p'-chart instead of a standard p-chart for the hydrocodone medication order data so as to avoid an elevated false alarm rate.

Figure 1 **:** **Results of Minitab's P Chart Diagnostic for the
hydrocodone orders data.**

If one uses the p'-chart as recommended by Minitab's P Chart Diagnostic interpretation of the VRT, a signal of a decrease in hydrocodone orders that occurs as a result of the Oct 6 upscheduling is not detected (Figure 2). On the other hand, a p-chart of the same data detects the decrease in hydrocodone orders due to upscheduling (Figure 3). However this p-chart also indicates a signal of increased hydrocodone orders the week of June 23, when no known source of assignable cause variation occurred; and the difference between the 94.7% proportion for that week and the 91.0% average for the time under observation would not seem to be of great clinical importance. Comparing the p-chart and p'-chart from the example above illustrates the analyst's dilemma in balancing the desire to detect a true signal of special cause variation vs. expending unnecessary resources investigating and taking action due to a 'false alarm'. In the context of phase II SPC chart applications, the false alarm rate and probability of failing to detect true assignable cause variation are analogous to hypothesis testing type I and type II error rates, respectively, and the operating-characteristic (OC) curve can be used to decide which kind of control charts should be used for a given situation [2,8]. Additionally, the Average Run Length (ARL) can be used to evaluate the performance of control charts. For p-charts and p'-charts, the ARL represents the average number of subgroups plotted before an out-of-control signal is observed. When the subgroups are independent (uncorrelated), the ARL is given by

$$\text{ARL}=\frac{1}{1-\beta},$$ (19) (19)Figure

Figure 2 **:** **P-chart of the hydrocodone orders data.**

Figure 3 **:** **P-chart of the hydrocodone orders data.**

with* β* denoting the proportion of subgroups falling within
the control limits. The ideal control chart will have a very low
false alarm rate (false positive rate) and high ARL when the
process is in-control, yet quickly generate a signal of assignable
cause variation (i.e., have a very low ARL) when a the
process is out-of-control.

Comparison of p-chart and p'-chart performance is analogous to evaluation of diagnostic tests, with a process being out-of-control the equivalent of the "disease" that is screened for. In such situations, Youden's Index (sensitivity + specificity -1) is commonly used to compare performances. More precisely, Youden's J statistic is computed as

$$J=\frac{TP}{TP+FN}+\frac{TN}{TN+FP}-1,$$ (20) (20)with *TP, TN, FP* and *FN* representing true positives, true negatives,
false positives and false negatives, respectively. From a
purely statistical perspective, one could judge the method
that gives the higher value of *J* to be superior. The highest
value of *J* corresponds to the point on the Receiver Operating
Characteristic (ROC) curve that is farthest from the diagonal
"line of chance".

It should be noted that using J as the performance metric
treats FP and FN as being equally costly, which is generally
not true in reality. If the analyst wishes to specify a weight
for the relative importance of sensitivity and specificity, a
weighted statistic,* J _{w}*, can be computed as

with w denoting the user-defined weight where *0 ≤ w ≤ 1* [9]. When *w=0.5, J _{w}* is equal to the usual unweighted J for
which sensitivity and specificity are given the same importance. When w>0.5, J

This simulation study will compare the phase II performance of the p-chart and p'-chart and examine the utility of the VRT as a diagnostic tool for deciding which of these two SPC charts to use. Since no previous published study has addressed the question, the results of this simulation study will help data analysts make informed decisions when choosing between a p-chart and p'-chart for a given situation.

For the simulation study, 30 subgroups of size n were generated,
with each observation in the subgroup representing the
outcome of a Bernoulli trial with the simulated probability p
of the event occurring. These 30 simulated subgroups were
intended to mimic phase I of SPC chart applications with an
in-control process. For the 31^{st} subgroup, data from all 31 simulated
subgroups were used to compute the centerline of the
p-chart and p'-chart, but the probability of the event in the
31^{st} subgroup was simulated as p*where p* =p+δ. For each
simulation, the values of δ were varied in increments of 0.01.

A simulation study was undertaken with a mean in-control
proportion of p=0.1, with p* ranging from 0.01 to 0.3 in steps
of 0.01. Another simulation study was undertaken with a mean
in-control proportion of p=0.5, with p* ranging from 0.3 to
0.7 in steps of 0.1. Thus, the simulated proportion in the 31st
subgroup was varied around the in-control proportion to examine
the sensitivity to detect shifts and the false alarm rate
when the simulated proportion was equal to the in-control
proportion. A separate simulation was performed for each
value of n ranging from 10 to 2000 in steps of 10. The values
of n and p were allowed to vary from subgroup to subgroup
by letting these parameters be generated from a truncated
normal distribution with mean μ and variance *σ ^{2}*, under the
constraint that o

of 10, with the simulated standard deviation set to () When p was allowed to vary across subgroups, its simulated value of μ was either 0.1 or 0.5 as described above, with a simulated standard deviation of () In order to investigate the effect of greater between subgroup variation, an additional simulation study was performed for both in-control proportions (p=0.1 and p=0.5) using the same variability in n described previously, but with the between subgroup standard deviation of p increased to ().

The 30 subgroups were simulated for 10,000 iterations for each combination of n and p values. At the conclusion
of each iteration of the 30 subgroups' simulated values, the
31st subgroup was simulated 10,000 times for each value of p*.
For each value of p*, the proportion of the 10,000 iterations
with the 31^{st} subgroup's proportion falling within the control
limits was computed separately for the p-chart, p'-chart and
use of the VRT to decide between these two charts. The 31^{st} subgroup mimicked phase II application of SPC charts. The
ability of the three methods to detect a shift in the process in
the 31^{st} subgroup was assessed separately for each value of n
and p via OC curves. The ARL was calculated using equation
(17) with the observed proportion of the 10,000 simulated
iterations falling within the control limits used to estimate *β* in equation (19). All simulations were performed using R
3.2.3 for Windows [10].

When the subgroup sizes were relatively small, for example
*n = 20*, there was little difference between the methods for
the in-control proportion of 0.1 but as the subgroup sizes
increased p-charts exhibited a higher false alarm rate with a
corresponding increase in sensitivity to detect out-of-control
proportion shifts (Figure 4 and Table 2). When the in-control
proportion was 0.5, the false alarm rate and sensitivity to
detect out-of-control proportion shifts was markedly higher
for p-charts than the other two methods for all subgroup
sizes (Figure 5 and Table 3). For smaller subgroup sizes the
VRT yielded results between the p-chart and the p'-chart,
but as the subgroup sizes increased, the VRT more closely
followed the p'-chart. For both in-control proportions for all
subgroup sizes, differences between the p-chart and p'-chart
were greater when the variation of the in-control proportion
was higher. The proportion of 31stsubgroup's 10,000 iterations
falling outside of the control limits for all simulation scenarios
are provided in the Appendix.

Table 2
**: The estimated ARL for p-charts, p'-charts and the VRT for various simulated values of p* and the mean subgroup size (n) when the in-control proportion was 0.1 with a standard deviation of p of 0.1/5 and 0.1/3.**

Table 3
**: The estimated ARL for p-charts, p'-charts and the VRT for various simulated values of p* and the mean subgroup size (n) when the in-control proportion was 0.5 with a standard deviation of p of 0.5/5 and 0.5/3.**

The results of this simulation study show that neither the
p-chart nor the *p'-chart* perform well when subgroup sizes
are large or when between subgroup variation in *p* is high.
When n is large or when the between subgroup variation
in p is high, *p*-charts have relatively high sensitivity to detect
shifts in p, but exhibit a high false alarm rate when the
process is in-control. On the other hand, p'-charts have low
sensitivity to detect out-of-control shifts but have relatively
low false alarm rates when subgroup sizes are large or when
the between subgroup variation in *p* is high. Although the
VRT provides a compromise between the two types of charts
for small subgroup sizes, it does not meaningfully improve
performance when *n* is large or between subgroup variation
in *p* is high because in these situations the VRT results follow
the p'-chart results very closely.

Performance of the two charts diverge with increases in the subgroup size and higher between subgroup variation of the in-control proportion. So it appears that the p'-chart accomplishes its intended purpose by decreasing the false alarm rate in the presence of random common cause variation inherent to the system, but with the tradeoff that smaller signals of assignable cause variation will often not be detected. In theory and in a systematic simulation study such as this, the distinction between common cause random variation and non-random assignable cause variation (i.e., a change in the true proportion) is clear. But in real-world applications, the source of variability will generally not be readily apparent to the data analyst whose objective is simply to act when it is practical and economically beneficial to do so based on a signal from the available data [8].

The results of the simulation study that are provided in the
Appendix allow the analyst to know, for a given subgroup size
and standard deviation, the expected proportion of subgroups
falling outside of the control limits for both types of control
chart for an in-control process and for out-of-control shifts
of varying magnitude. The results for *p*=0.1 can be used for
*p*=0.9 since the binomial distribution is symmetrical around
0.5, and linear interpolation can be used to estimate out-ofcontrol
proportions for values of p between 0.1 and 0.5 (and
from 0.5 to 0.9, applying the symmetrical property of the
binomial distribution).

Returning to the pharmacy example, considering the results
from May 19 through Sept 29 to be the phase I data, the
average proportion of inpatient narcotic orders that were for
hydrocodone-containing products was 0.913 with a standard
deviation of 0.016 and an average subgroup size of 716.4. If we
round these estimates to a mean proportion of 0.9, standard
deviation of 0.02 and subgroup size of 720, exploiting the
symmetrical property of the binomial distribution we can
use the simulation results in the Appendix for the in-control
proportion of *p*=0.1 with SD=0.02 and* n*=720. Suppose an a
priori determination that process shifts of 5% are important to
detect, which would correspond to p* = *p + δ= 0.1 *+0.05=0.15
in the simulation study. For this scenario, from the Appendix
we see that the p-chart sensitivity is 0.6598, versus 0.2245
for the p'-chart, and looking at the proportion of iterations
giving false signals for the in-control process reveals specificities
of 1-0.1359=0.8641 and 1-0.0044=0.9956, for p-charts
and p'-charts, respectively. From equation (20), these results
yield *J*=0.5239 for the p-chart and *J*=0.2201 for the p'-chart.
Thus, for this situation, if FN and FP are given equal weight,
the p-chart would be considered superior when comparing
values of J, so the analyst should interpret the p-chart results.
Using the p-chart, the October 6 signal of assignable cause
variation would be detected to alert pharmacy staff that
there is a need for timely action (i.e., physician education about the pros and cons of using hydrocodone-containing
products versus alternative narcotics) in response to the recent
significant decrease in orders for hydrocodone-containing
products. This situation exemplifies the reason control charts
are used to monitor processes in the first place- to quickly
detect process changes so that timely action can be taken.

Given the tradeoff between sensitivity and specificity, we believe that a universal recommendation regarding the use of p-charts vs. p'-charts would not be prudent without considering the situation-specific costs of FP and FN. After considering the context from which the data arise and weighing the detrimental effects of failing to detect assignable cause variation against the costs of acting on false alarms, one could use Youden's index as the criterion for deciding whether to use a p-chart or p'-chart, while weighting the relative importance of sensitivity and specificity.

**Additional files
**Appendix 1

Appendix 2

The authors declare that they have no competing interests.

Authors' contributions |
JH |
BL |

Research concept and design | √ | √ |

Collection and/or assembly of data | √ | √ |

Data analysis and interpretation | √ | √ |

Writing the article | √ | √ |

Critical revision of the article | √ | √ |

Final approval of article | √ | √ |

Statistical analysis | √ | √ |

Thanks to Michael Chance for introducing the first author to statistical process control charts.

EIC: Jimmy Efird, East Carolina University, USA.

Received: 18-Feb-2018 Final Revised: 25-April-2018

Accepted: 02-May-2018 Published: 18-May-2018

- Woodall W and Montgomery DC.
**Some Current Directions in the Theory and Application of Statistical Process Monitoring**.*Journal of Quality Technology.*2014;**46**:78-94. | Article - Montgomery DC.
**Introduction to Statistical Quality Control, 7th ed**. John Wiley & Sons, Hoboken, NJ. 2009. - Wheeler DJ.
**Advanced Topics in Statistical Process Control**. SPC Press: Knoxville, TN. 1995. - Laney DB.
**Improved control charts for attributes**.*Quality Engineering*. 2002;**14**:531-537. - Provost LP and Murray SK.
**The Healthcare Data Guide: Learning from Data for Improvement**. Jossey-Bass: San Francisco, CA. 2011. - Jones G and Govindaraju KA.
**Graphical Method for Checking Attribute Control Chart Assumptions**.*Quality Engineering*. 2000;**13**:19-26. - Bernhardt MB, Taylor RS, Hagan JL, Patel N, Chumpitazi CE, Fox KA and Glover C.
**Changes in opioid prescribing habits following the rescheduling of hydrocodone containing products**.*American Journal of Health-System Pharmacy.*2017;**74**:2046-2053. - Woodall W.
**Controversies and Contradictions in Statistical Process Control**.*Journal of Quality Technology.*2000;**32**:341-350. | Pdf - Li DL, Shen F, Yin Y, Peng JX and Chen PY.
**Weighted Youden index and its two-independent-sample comparison based on weighted sensitivity and specificity**.*Chin Med J (Engl)*. 2013;**126**:1150-4. - R Core Development Team.
**A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing**. ISBN 3-900051-07-0. 2016. | Website

Volume 6

Hagan J, and Li B. **Phase II Performance of P-Charts and P’-Charts**. *J Med Stat Inform*. 2018; **6**:3. http://dx.doi.org/10.7243/2053-7662-6-3

View Metrics

Copyright © 2015 Herbert Publications Limited. All rights reserved.

Post Comment|View Comments