Cancer occurrence among women resident in the Gorizia province (North-Eastern Italy). May ecological data be used for adjusted epidemiological measures? Deterministic and probabilistic sensitivity analysis

Background: The evaluation of the confounders is crucial to accurately estimate the association between environmental factors and diseases. The deterministic sensitivity analysis permits an external adjustment of the observed measures of effect. The probabilistic sensitivity analysis allows to define several probability density functions for the bias parameters and to use these prior distributions tocalculate limits for the biasadjusted exposure-disease relative risks. The confounding effect of known risk factors, for lung, breast and bladder cancer, in women residing in the Gorizia Province (GP),was estimated. Methods: The deterministic sensitivity analysis was performed by calculating confounding risk ratios (CRRs) under different scenarios of confounding. Several parameterswere defined, as the population being studied; the prevalence of women residing in the GP and in the southern and northern Gorizia province (SGP and NGP respectively); the prevalence of the different risk factors in the selected areas;and the risk factordisease risk ratios per specific type of cancer. The CRRs were finally used to adjust theage-standardised rate ratios. Probabilistic sensitivity analysis allowed to specifyprior probability distributions for the lognormal risk factors-disease risk ratio with 95% lognormal confidence limits ln(RR1) and ln(RR2), mean and standard deviation equal to {ln(RR1)+ln(RR1)}/2 and {ln(RR2)*ln(RR1)}/(2*1.96) respectively. Results: In women resident in the GP, the age-standardisedIRR for bladder cancer is equal to 1.18 but, adjusting for smoking through deterministic analysis, the new IRR ranges from 1.13 to 1.25. In SGP the same age-standardised IRR is equal to 1.45 while after adjustment for smoking it ranges from 1.31 to 1.67. Adjusting for the deprivation index in the areas of interest, the IRRs for bladder cancer in GP range from 1.17 to 1.18 while in SGP resident from 1.45 to 1.47. Results from the probabilistic analysis show that the RR for bladder cancer inGP women, when adjusted for smoking results to be equal to 1.20 (2.5th – 97.5th: 0.99 – 1.44) when adjusted for deprivation, appears to be equal to 1.19 (2.5th – 97.5th: 0.99 – 1.43). In the SGPthe IRR for bladder cancer, when adjusted for smoking, results equal to 1.58 (2.5th – 97.5th: 1.09 – 2.25) when adjusted for deprivation, 1.57 (2.5th – 97.5th: 1.10 – 2.23). Regarding breast and lung cancers, no risk excess arises in the studied areas, even after adjustment in both deterministic and probabilistic analysis. Conclusions: Data on bladder cancer would seem to suggest a higher incidence of this cancer among the women residing in GP and, in particular, in the SGP. This excess risk isconfirmedafter adjustment for smoking and deprivation.No risk excesses in the studied areas for lung and breast cancers.


Introduction
When studying the possible association between the exposure to a risk factor, such as an environmental factor, and a certain disease, the evaluation of theeffect of potential confounders is crucial to estimate the risk accurately. Even in the case of a descriptive study, the information on confounders contributes to the correct definition of the hypotheses on which analytical studies are after that based. It is an elaborate effort to obtain individual data on confounders and, if the research investigates wide groups of the population (as is usually the case with descriptive studies), individual information may not be available. The challenge, therefore, lies in using ecological data, which are more easily available, to determine the extent of confounding within the estimated measures.
Sensitivity analysis (SA) provides a way to account for systematic errors and to assess the magnitude of biases; several different SA techniques have been proposedand therefore literature is large. This field was initiated by Cornfield and colleagues when, in 1959, they published a SA for the association between cigarette smoking and lung cancer [1]. For their analysis, authors assumed a single binary confounder, no interaction between the effect of the exposure and the confounder on the outcome, and the null hypothesis of no causal effect. Under these three simplifying assumptions, also known as the Cornfield conditions, authorscould determine how strong the potential confounder would have beento completely explain the effect of the exposure (cigarette smoking) on the outcome (lung cancer). Although computationally easy, performing quantitative bias analysis under the Cornfield conditions does not permit the estimateof the effect under alternative (weaker) confounding scenarios.
As cited, literature is large and bias analytic strategies ranges from easy-to-compute approaches to computationally intensive ones. In the simple deterministic SA, the estimate of the association between exposure and outcome is adjusted a single time for a fixed quantity of the bias parameter. An extension of this technique can be achieved by repeating the earlier analysis for different values of the same confounder. This multidimensional SA permits, therefore, an external adjustment of the effect measures, under different scenarios of confounding, even if it provides none or scarce information of the most likely adjusted estimate. Moreover, both simple and multidimensional SA address only one confounder at a time and, due to the absence of hypothesis about the prior distributions of the bias parameters, do not explicitly account for uncertainty [2,3].
Probabilistic bias analysis (PBA) can be viewed as a generalization of the simple SA in whichthe bias parameters are defined by probability density functions (joint prior distributions)rather than by one or more fixed values within a range. The so-called Monte-Carlo sensitivity analysis (MCSA) represents the easiest technique to implement PBAs: through this approach, bias parameters, extracted from their joint prior distributions, are used in the deterministic SA formula [3]. However, differently from the deterministic approach, the joint prior distributions capture the uncertainty about the bias parameters, allowing the definition of the limits for the bias-adjusted effect measure that can be interpreted as the frequentist confidence interval. Similarly to the deterministic SA, PBA accounts only for one source of confounding at a time [3,5].
When multiple sources of systematic errors (selection bias, misclassification of the exposure, unmeasured confounding) are concurrently present, multiple bias modeling (MDM) can be performed with adjustment in the reverse order of bias occurrence. Typically, for each bias source, a different joint prior distribution is defined and all these distributions are assumed to be independent. When this assumption does not hold, prior correlations or hierarchical models can be used to address dependency [3].
Another strategy consists in using frequency distributions of the bias parameters as prior distributions in Bayesian bias analysis (BBA). This approach starts from the same assumptions of MCSA and if prior distributions are defined for all parameters in the model then the analysis is known as fully Bayesian, else semi-Bayesian. Performing BBA is computationally demanding nevertheless it gives more reliable posterior probability distributions [3]. This occurs because while in BBA prior distributions are combined with the likelihood functions for the parameters, in MCSA, as previously explained, the bias parameters are repeatedly sampled from the prior distributionsand then used to calculate bias-adjusted estimates [6]. However, according to Greenland [4], BBA and MCSA may give very similar results if, in BBA, prior and posterior distributions of bias parameters do not differ.
Friuli Venezia Giulia (FVG) region (North-Eastern Italy) has a cancer registry (RT-FVG) which is part of the national system of cancer registration and is a member of the Italian Association of Cancer Registries (AIRTUM). RT-FVG is internationally accredited by the International Association of Cancer Registries at the International Agency for Research on Cancer in Lyon [7].
Since 1995 the RT-FVG has been monitoring all new incident cancer cases (that were diagnosed in the time period of interest) among the population residing in the region. The starting information system is the regional epidemiological data warehouse, storing all the information gathered by the regional health information system (SISR). The main data bases used by the RT-FVG are the ones referring to hospital discharge forms (SDO), pathology reports (AP), general death forms (MG) and health data repository of the region inhabitants (A). Each disease is encoded according to the International Classification of Diseases (10 th Revision -ICD-10). were provided by the National Institute of Statistic (ISTAT). The monograph shows that, for that period, women of the Gorizia province (GP) had a higher mortality, incidence and reduced survival for all types of cancer and, in particular, for malignant lung cancer, and malignant breast, and bladder cancer. Some of these excesses seem especially to affect the municipalities of the area covered by the Southern Gorizia province(SGP) [8].
This preliminary evidence triggered the development of hypotheses by public health officers and groups of citizen that exposure to environmental factors may explain the cited cancer excesses. As the first step, we performed a descriptive study to calculate the 1995-2009 age-adjusted cancer incidence and incidence rate ratios (IRRs) for lung, breast, and bladder cancerin women residing in GP. GP rateswere compared with the rest of FVG region, and SGP rates were compared with the Nothern Gorizia province (NGP).Afterwards, we tried to determine the magnitude of confounding in the observed findings by using aggregate data, which are easily available and easy-to-compute analytic approaches for sensitivity analysis.

Materials and methods
New cancer incident cases in FVG are recorded on the basis of the international methods defined by the International Association of Cancer Registries (IACR) and the Italian Association of Cancer Registries (AIRTUM). After excluding autopsy cases, a data set was obtained for the 1995-2009 period comprising 8,852 incident cases of malignant neoplasms, 7,000 after excluding non-melanoma skin cancer. Age-standardisedrates (ASR) were calculated with SEER*Stat software by the direct method using the conventional European population (EU 1960) as the standard population. For the 1995-2009 period, the population at risk used as denominator was the yearly average resident female population (calculated as the average between the resident population on January 1 st of the year under consideration and the resident population on January 1 st of the following year). All rates were expressed per 100,000 inhabitants. Ratios between age-standardised rates were also calculated (IRRs).
To select among the anatomical sites of cancer and the potential risk factors to be studied, we considered: frequency of cancer, availability of ecological data, evidence in the literature of an association between cancer and risk factors that were both environmental and related to smoking, alcohol consumption, occupational exposure and socioeconomic status. As mentioned, to evaluate the extent of confounding, within these measures, we implemented both deterministic and probabilistic sensitivity analysis, based on available ecological data.

Deterministic sensitivity analysis
The deterministic sensitivity analysis was performed by calculating confounding risk ratios (CRRs) under different scenarios of confounding. CRR is defined as the ratio between the crude risk ratio and the risk ratio after removing the effect of the risk factor and measures the degree of confounding [9][10][11][12].
In order for the CRR to be calculated, it was necessary to define several parameters as the population being studied (women residing in FVG on 31.12.2009); the prevalence of women residing in the GP (exposed to the unknown environmental factor), and in SGP or NGP on 31.12.2009; the prevalence of women residing in the remaining territory of FVG on 31.12.2009; the prevalence of the different risk factors (smoking, alcohol and deprivation index) in the selected areas; and the risk factor-disease risk ratios per specific type of cancer. The CRRs were finally used to adjust the age-standardised IRRs.
Discrete scenarios of confounding were hypothesized by varying the strength of association between risk factors and the selected types of cancer andusing both the point measure and the confidence limits of the risk factors' prevalence values, to simulate both the most likely and the extreme scenarios.
The 95% confidence intervals of the prevalence were calculated when data were based on samples (i.e. the PASSI survey on smoking habits and alcohol intake) [13], while for data based on the national census (i.e. employees by economic activity and deprivation index [14,15]) only the point measure was obtained.
Therefore, when available, the confidence limits were used to force the differences between prevalence values in the compared areas, alternating the lower limit in one area and the upper limit in the compared one and vice versa. Consequently, CRR 1 derives from using the lower limit of the confidence intervals for the GP and the SHD and the upper limit for the rest of the region FVG and the NHD, while CRR 2 derive from the upper limit for the GP and the SHD and the lower limit for the rest of the region FVG and the NHD.

Probabilistic sensitivity analysis
Probabilistic sensitivity analysis, through Monte Carlo simulations, allowed to define several probability density functions for the risk factors and to use these prior distributions to calculate limits for the bias-adjusted exposure-disease relative risks (RRs) [2,4,5]. The term RR is here used as a generic term for the risk ratio, the rate ratio or odds ratio. Indeed, the probabilistic sensitivity analysis was conducted through Monte Carlo simulations by firstly drawing a random sample from the specified probability density functions of the bias parameters, and secondly back-calculating the bias-adjusted RR. These two steps were iterated to obtain the distribution of the bias-adjusted RRs [5]. The a priori distribution of the bias parametersderived from background information (prevalence values and 95% confidence intervals). The probabilistic sensitivity analysis resulted in probability density functions (distributions) of adjusted measures that took into account the uncertainty about the parameters [4]. We, therefore, specified prior probability distributions for the lognormal risk factors-disease risk ratio with 95% lognormal confidence limits ln(RR1) and ln(RR2), mean equals to {ln(RR1)+ln(RR1)}/2

Risk factors and data sources
As previously mentioned, risk factors data derived from ecological studies and databases. Therefore, for the selected areas, data used for our analysis were the prevalence of consumers at risk from alcohol drinking [13], the prevalence of smokers [13], the prevalence of employees by economic activity [14], from 2011 national census and, lastly, the distribution of the deprivation index (developed on the basis of the information gathered through the 2001 national census) [15].

Risk factors definitions and related diseases
We assumed as definition of consumers at risk from alcohol drinking, women younger than 18 years who have drunk an alcoholic beverage; adult women exceeding 1 standard alcoholic unit and all women -regardless of age -who have binged (binge drinking being defined as the consumption of 6 glasses or more of alcoholic beverages in a single session) at least once a year. (In the PASSI study, the survey on alcohol consumption refers to the last 30 days before the interview) [13]. In our analysis, alcohol was considered a risk factor for breast cancer [16,17].
According to the World Health Organization definition, we assumed women to be smokers, someone who smoked at least 100 cigarettes in her lifetime (5 packets of 20 cigarettes) and was still smoking at the time of the PASSI survey or stopped smoking less than 6 months before [12]. Smoking was considered arisk factor for lung, breast and bladder cancer [18][19][20]. Prevalence of employees by economic activity (agriculture, industry, commerce, transport, financial activity, and other activity) was calculated on the basis of the 2011 Census data [14]. In our analysis, in order to adjust the IRRs, we used the prevalence of employed in industrial sites as a risk factor for lung cancer [21].
As mentioned above, for each area under analysis, the deprivation index used was developed by Caranci et al. on the basis of the information gathered through the 2001 national census. The deprivation index was subdivided into quintiles, ranking from very affluent to very disadvantaged. For our analysis, we calculate the prevalence of the most disadvantaged class with reference to lung and bladder cancer, while in the case of breast cancer the prevalence of the 'very affluent' category [15,[22][23][24][25][26].
For the purpose of this paper, women resident in FVG region were stratified by areas, and comparisons were made among GP versus the rest of the region and the SGP versus the NGP. The

Results
The reference population was defined by all women resident in FVG on January 1st, 2010; according to the Regional Healthcare Data Warehouse, FVG region had 637,251 inhabitants, while the province of Gorizia had 73,026 inhabitants.
The median age of women residing in FVG is 47 years (in Gorizia 48 years); women over 65 years of age account for 26.7% of the regional population (in Gorizia 28.8%).
Age standardised incidence rates in females by cancer site, study period and area, are reported in Table 1. An increasing trend of cancer incidence, particularly for breast cancer in both the entire Region as well as the Gorizia Province, can be noticed.
Although compared with the regional data, the confidence intervals of the estimates referring to the GP and to the SGP and NGP are wider (due to the lower numbers of cases) bladder cancer ASR appears to be higher in the GP (ASR 11.2; 95%CI 9.7-13) and particularly in the SGP (ASR 13.3; 95%CI 10.8-15.8) than in the FVG (ASR 9.5; 95%CI 9-10.1). No relevant differences among lung cancer ASRs arise in the studied areas. Table 2 shows both incident (1995-2009) and prevalent cancer cases on 01.01.2010 in the GP and in FVG; prevalence values are also indicated for the NGP and SGP. Table 3 shows prevalence and 95% confidence intervals of smoking and alcohol drinking habits by different areas. There is no evidence of heterogeneity among the different areas of interest.
Concerning the socioeconomic conditions, Table 4 shows the deprivation index distribution in the study areas. While data belonging to the rest of the Region and the GP seem to be comparable, within the GP, the NGP shows a higher prevalence of the very affluent category, compared to the SGP.

Deterministic analysis results
As already mentioned, firstly a deterministic sensitivity analysis was performed and confounding risk ratios (CRRs) were calculated and used to adjust the age-standardised IRRs; Tables 5 and 6 show the age-standardised IRRs and the agestandardised adjusted for the CRRs calculated under different scenarios. The IRRs concerning the bladder cancer in women resident in the GP and particularly in the SGP seem to be significant even after adjustment for different hypothetical scenarios of confounding. In women resident in the GP, the age-standardisedIRR is equal to 1.18 but, adjusting for smoking, the new IRR ranges from 1.13 to 1.25. In SGP resident, the same age-standardised IRR is equal to 1.45 while, after adjustment for smoking it ranges from 1.31 to 1.67. Adjusting for the different distributions of the deprivation index in the areas of interest, the IRRs for bladder cancer in GP range from 1.17 to 1.18 while in SGP resident from 1.45 to 1.47. As predictable from the age-adjusted incidence rates and the prevalences of the risk factors, for lung and breast cancers, the deterministic analysis shows no relevant results (data not shown).

Probabilistic analysis results
Tables 7 and 8 show the results of the probabilistic analysis in terms of posterior probability distributions of RRs by cancertype, in women resident in GP compared to FVG and in women resident in the SHD compared to the NHD, both adjusted by risk factors. From the 20,000 draws for each cancer type and bias parameter, the median risk factors-adjusted residence-cancer RRs and 2.5 th and 97.5 th percentiles are presented, together with the 97.5 th /2.5 th ratio. As expected, the ratios deriving from systematic and random errors are equal or higher than the ratios of the conventional limits. Figures 1 and 2

Discussion
As mentioned, the age-standardised incidence rates seem to show an increasing trend over the years for all types of cancer.
In particular, this increase appears to be more remarkable for breast cancer ( Table 1); this trend affects both the province of Gorizia and the rest of FVG, even though caution is necessary when drawing conclusions because of the overlapping confidence intervals. This trend, if real, could be explained in different ways: it may be due to the regional screening program for breast cancer (started in 2005), to an increase in theprevalence of the risk factors over last decades, and to improvements in diagnostic capabilities. The comparison of cancer incidence rates in the province of Gorizia with the rest of FVG, regarding lung and breast cancer, did not show any relevant difference. On the contrary, data on bladder cancer, particularly those referring to the entire period, would seem to suggest a higher incidence of this cancer type among women residing in the province of Gorizia. This is especially true in the Sothern Gorizia Province where the ASR reaches the highest value (1995-2009, ASR:13.3. 95%CI: 10.8-15.8).
On the contrary, rates in Northern Gorizia Province seem are comparable with the rest of the region. Although this difference seems worthy of further investigations, it should be noted that bladder cancer classification includes both in situ forms and neoplasms of uncertain behavior. Therefore, it cannot be excluded that a different diagnostic assessment explains this result. The use of aggregate data from available ecological sources to assess the effect of possible confounders (i.e. smoking, alcohol, deprivation, occupational exposure) on the association between exposure to environmental risk factors and outcome may be relevant and efficient. In fact, it allows an advantageous, in terms of time and costs, estimation of the magnitude of the confounding. However, limitations are equally evident: firstly, ecological data are available only for a limited number of all known risk factors, secondly the use of aggregate data does not provide a measurement of individual exposure but only an indirect calculation. Therefore, despite the adjustment of the effect measures for theabove mentioned confounders, a different prevalence of those risk factors that we did not consider and the use of proxies for individual exposures may have left residual confounding in the estimates.
In particular, there are two principal limitations bearing upon the potential use of aggregate data and the reliability of the estimates that derive from these data: a) the limited      availability of historical data; b) the level of geographical detail of the information. Especially in the case of cancer, where the induction-latency period can last decades, it is important to have historical information available on the diffusion of behaviors like, for instance, smoking and drinking. Moreover, the greater the detail (region, province, municipality, census area), the lower the ecological bias related to its use and the closer the measured estimates will be to individual data. A wider availability of data -for longer periods and with more detailed information -would allow more reliable CRRs doi: 10.7243/2053-7662-6-1 to be obtained. In the light of the systematic and accurate work carried out in particular by cancer registries and of the current availability of frequency indicators for the disease, the evaluation of the weight of confounders is therefore crucial to correctly identify areas with possible risk excesses and to hypothesize the existence of causal relationships with the exposure to pollution sources.
The above-mentioned limits have considerably influenced our estimates since historical data are only available at regional level and, even in the case of recent data, the only study that could be used to calculate smoking and drinking prevalence among women residing in the province of Gorizia, in the Southern and Northern districts and in the rest of FVG was the PASSI survey. Therefore,we assumed that the prevalence differencesby area were constant over time. For the deterministic analysis, to simulate different scenarios regarding prevalence variations, the upper and lower limits of the 95% confidence intervals of prevalence were used. The results reported in this paper show the CRR variations at changingof both confounder prevalence and strength of associations between confounder and disease (RR CE+ ).
The estimate of CRRs made possible to adjust cancer IRRs for different confounders among women residing in the studied areas and showed how these risks changed according to different scenarios based on ecological data.
Although the advantages of the probabilistic sensitivity analysis upon the conventional approach have been already highlighted [2,[4][5][6], it's worth to note that, differently from the deterministic sensitivity analysis, the probabilistic analysis requires the investigator to explicitly definethe uncertainty around the bias parameters, througha prior distribution. This prior distribution may arise from available background information as well as from the personal judgment of the researcher. The posterior probability distribution of biasadjusted estimates reflects this uncertainty about bias.
As another possible limitation, we ran our probabilistic sensitivity analysis only under the hypothesis of uncontrolled confounders assuming no exposure misclassification or selection bias.
Despite all limitations considered,our results seem to suggestthat even running several confounding scenarios and probability density functions of the bias parameters, the risk of bladder cancer among women resident in GP and particularly in SGP is higher than in the rest of the Region and this may induce further investigations.

Conclusions
Incidence rates showed an increasing trend over the years for all types of cancers and, in particular, for malignant breast cancer; these variations were seen both in the province of Gorizia and in the rest of the region.
Data on bladder cancer, for the whole period and the 5-year periods, and the respective confidence intervals suggest a higher incidence of this cancer among the women residing in the province of Gorizia and, in particular, in the SGP. This excess risk arises even after adjustment for different bias parameters.
The use of aggregate data to evaluate the effect of possible bias parameters, through deterministic and probabilistic sensitivity analysis, shows to be promising.