Confidence Intervals In Healthcare Administration Essay.
A confidence interval calculated for a measure of treatment effect
shows the range within which the true treatment effect is likely to lie
(subject to a number of assumptions).
● A p-value is calculated to assess whether trial results are likely to have
occurred simply through chance (assuming that there is no real
difference between new treatment and old, and assuming, of course,
that the study was well conducted). Confidence Intervals In Healthcare Administration Essay.
● Confidence intervals are preferable to p-values, as they tell us the range
of possible effect sizes compatible with the data.
● p-values simply provide a cut-off beyond which we assert that the
findings are ‘statistically significant’ (by convention, this is p<0.05). ● A confidence interval that embraces the value of no difference between treatments indicates that the treatment under investigation is not significantly different from the control. ● Confidence intervals aid interpretation of clinical trial data by putting upper and lower bounds on the likely size of any true effect. ● Bias must be assessed before confidence intervals can be interpreted. Even very large samples and very narrow confidence intervals can mislead if they come from biased studies. ● Non-significance does not mean 'no effect'. Small studies will often report non-significance even when there are important, real effects which a large study would have detected. ● Statistical significance does not necessarily mean that the effect is real: by chance alone about one in 20 significant findings will be spurious. ● Statistically significant does not necessarily mean clinically important. It is the size of the effect that determines the importance, not the presence of statistical significance. Confidence Intervals In Healthcare Administration Essay. 1 What is...? series Second edition Statistics For further titles in the series, visit: www.whatisseries.co.uk Huw TO Davies PhD Professor of Health Care Policy and Management, University of St Andrews Iain K Crombie PhD FFPHM Professor of Public Health, University of Dundee What are confidence intervals and p-values? Supported by sanofi-aventis Date of preparation: April 2009 NPR09/1106 Measuring effect size Clinical trials aim to generate new knowledge on the effectiveness (or otherwise) of healthcare interventions. Like all clinical research, this involves estimating a key parameter of interest, in this case the effect size. The effect size can be measured in a variety of ways, such as the relative risk reduction, the absolute risk reduction or the number needed to treat (NNT; Table 1). Relative measures tend to emphasise potential benefits, whereas absolute measures provide an across-the-board summary.1 Either may be appropriate, subject to correct interpretation. Whatever the measure used, some assessment must be made of the trustworthiness or robustness of the findings. The findings of the study provide a point estimate of effect, and this raises a dilemma: are the findings from this sample also likely to be true about other similar groups of patients? Before we can answer such a question, two issues need to be addressed. Confidence Intervals In Healthcare Administration Essay. ORDER A CUSTOM-WRITTEN PAPER HERE Does any apparent treatment benefit arise because of the way the study has been What are confidence intervals and p-values? 2 What are confidence intervals and p-values? Date of preparation: April 2009 NPR09/1106 Box 1. Hypothesis testing and the generation of p-values The logic of hypothesis testing and p-values is convoluted. Suppose a new treatment appears to outperform the standard therapy in a research study. We are interested in assessing whether this apparent effect is likely to be real or could just be a chance finding: p-values help us to do this. In calculating the p-value, we first assume that there really is no true difference between the two treatments (this is called the null hypothesis). We then calculate how likely we are to see the difference that we have observed just by chance if our supposition is true (that is, if there is really no true difference). This is the p-value. So the p-value is the probability that we would observe effects as big as those seen in the study if there was really no difference between the treatments. If p is small, the findings are unlikely to have arisen by chance and we reject the idea that there is no difference between the two treatments (we reject the null hypothesis). If p is large, the observed difference is plausibly a chance finding and we do not reject the idea that there is no difference between the treatments. Note that we do not reject the idea, but we do not accept it either: we are simply unable to say one way or another until other factors have been considered. Confidence Intervals In Healthcare Administration Essay. But what do we mean by a 'small' p-value (one small enough to cause us to reject the idea that there was really no difference)? By convention, p-values of less than 0.05 are considered 'small'. That is, if p is less than 0.05 there is a less than one in 20 chance that a difference as big as that seen in the study could have arisen by chance if there was really no true difference. With p-values this small (or smaller) we say that the results from the trial are statistically significant (unlikely to have arisen by chance). Smaller p-values (say p<0.01) are sometimes called 'highly significant' because they indicate that the observed difference would happen less than once in a hundred times if there was really no true difference. Confidence Intervals In Healthcare Administration Essay. What are confidence intervals and p-values? conducted (bias), or could it arise simply because of chance? The short note below briefly covers the importance of assessing bias but focuses more on assessing the role of chance. Bias Bias is a term that covers any systematic errors that result from the way the study was designed, executed or interpreted. Common flaws in treatment trials are: ● Lack of (or failure in) randomisation, leading to unbalanced groups ● Poor blinding, leading to unfair treatment and biased assessments ● Large numbers of patients lost to follow-up. Assessment in these areas is crucial before the results from any trial can be assessed, and many useful guides exist to assist this process, such as an article by Guyatt et al and books by Sackett et al and by Crombie.2–5 Interpretation of the effects of chance is only meaningful once bias has been excluded as an explanation for any observed differences.6,7 Chance variability The results from any particular study will vary just by chance. Studies differ in terms of the people who are included, and the ways in which these specific individuals react to therapeutic interventions. Even when everything possible is held constant, there will still be some random variations. Hence we need some tools to help us to assess whether differences that we see between new treatment and old in any particular study are real and important, or just manifestations of chance variability. Confidence intervals and p-values help us to do this. Confidence Intervals In Healthcare Administration Essay. What are p-values? Until comparatively recently, assessments of the role of chance were routinely made using hypothesis testing, which produces a 'pvalue' (Box 1). The p-value allows assessment of whether or not the findings are 'significantly different' or 'not significantly different' from some reference value (in trials, this is usually the value reflecting 'no effect'; Table 1). A different and potentially more useful approach to assessing the role of chance has come to the fore: confidence intervals.8 Although these might appear rather dissimilar to p-values, the theory and calculations underlying these two approaches are largely the same. Date of preparation: April 2009 NPR09/1106 Measure of effect Abbreviation Description No effect Total success Absolute risk ARR Absolute change in risk: the risk of an event in ARR=0% ARR=initial risk reduction the control group minus the risk of an event in the treated group; usually expressed as a percentage Relative risk RRR Proportion of the risk removed by treatment: the RRR=0% RRR=100% reduction absolute risk reduction divided by the initial risk in the control group; usually expressed as a percentage Relative risk RR The risk of an event in the treated group divided by RR=1 or RR=0 the risk of an event in the control group; usually RR=100% expressed as a decimal proportion, sometimes as a percentage Odds ratio OR Odds of an event in the treated group divided by OR=1 OR=0 the odds of an event in the control group; usually expressed as a decimal proportion Number NNT Number of patients who need to be treated to NNT=∞ NNT=1/initial needed prevent one event; this is the reciprocal of the risk to treat absolute risk reduction (when expressed as a decimal fraction); it is usually rounded to a whole number. Confidence Intervals In Healthcare Administration Essay. Summary of effect measures What are confidence intervals? Confidence intervals provide different information from that arising from hypothesis tests. Hypothesis testing produces a decision about any observed difference: either that the difference is 'statistically significant' or that it is 'statistically nonsignificant'. In contrast, confidence intervals provide a range about the observed effect size. This range is constructed in such a way that we know how likely it is to capture the true – but unknown – effect size. Thus, the formal definition of a confidence interval is: 'a range of values for a variable of interest [in our case, the measure of treatment effect] constructed so that this range has a specified probability of including the true value of the variable. The specified probability is called the confidence level, and the end points of the confidence interval are called the confidence limits'.9 It is conventional to create confidence intervals at the 95% level – so this means that 95% of the time properly constructed confidence intervals should contain the true value of the variable of interest. This corresponds to hypothesis testing with pvalues, with a conventional cut-off for p of less than 0.05. Confidence Intervals In Healthcare Administration Essay. More colloquially, the confidence interval provides a range for our best guess of the size of the true treatment effect that is plausible given the size of the difference actually observed. Assessing significance from a confidence interval One useful feature of confidence intervals is that one can easily tell whether or not statistical significance has been reached, just as in a hypothesis test. ● If the confidence interval captures the value reflecting 'no effect', this represents a difference that is statistically nonsignificant (for a 95% confidence interval, this is non-significance at the 5% level). ● If the confidence interval does not enclose the value reflecting 'no effect', this represents a difference that is statistically significant (again, for a 95% confidence interval, this is significance at the 5% level). Thus, 'statistical significance' (corresponding to p<0.05) can be inferred from confidence intervals – but, in addition, these intervals show the largest and smallest effects that are likely, given the observed data. This is useful extra information. An example of the use of confidence intervals is shown in Box 2.10 Examining the width of a confidence interval One of the advantages of confidence intervals over traditional hypothesis testing is the additional information that they convey. The upper and lower bounds of the interval give us information on how big or small the true effect might plausibly be, and the width of the confidence interval also conveys some useful information. If the confidence interval is narrow, capturing only a small range of effect sizes, we can be quite confident that any effects far from this range have been ruled out by the study.Confidence Intervals In Healthcare Administration Essay. This situation usually arises when the size of the study is quite large and, hence, the estimate of the true effect is quite precise. Another way of saying this is to note that the study has reasonable 'power' to detect an effect. However, if the confidence interval is quite wide, capturing a diverse range of effect sizes, we can infer that the study was probably quite small. Thus, any estimates of effect size will be quite imprecise. Such a study is 'low-powered' and provides us with less information. Errors in interpretation Confidence intervals, like p-values, provide us with a guide to help with the interpretation of research findings in the light of the effects of chance. There are, however, three important pitfalls in interpretation. Getting it wrong: seeing effects that are not real First of all, we may examine the confidence interval and/or the p-value and observe that the difference is 'statistically significant'. From this we will usually conclude that there is a difference between the two treatments. However, just because we are unlikely to observe such a large difference simply by chance, this does not mean that it will not happen. By definition, about one in 20 What are confidence intervals and p-values? Date of preparation: April 2009 NPR09/1106 significant findings will be spurious – arising simply from chance. Thus, we may be misled by chance into believing in something that is not real – technically, this is called a 'type I error'. Confidence Intervals In Healthcare Administration Essay. It is a frustrating but unavoidable feature of statistical significance (whether assessed using confidence intervals or p-values) that around one in 20 will mislead. Yet we cannot know which of any given set of comparisons is doing the misleading. This observation cautions against generating too many statistical comparisons: the more comparisons made in any given study, the greater the chance that at least some of them will be spurious findings. Thus, clinical trials which 5 What are confidence intervals and p-values? Date of preparation: April 2009 NPR09/1106 Box 2. An example of the use of confidence intervals10 Ramipril is an angiotensin-converting enzyme (ACE) inhibitor which has been tested for use in patients at high risk of cardiovascular events. In one study published in the New England Journal of Medicine, 10 a total of 9,297 patients were recruited into a randomised, double-blind, controlled trial. The key findings presented on the primary outcome and deaths are shown below. Incidence of primary outcome and deaths from any cause Outcome Ramipril group Placebo group Relative risk (n=4,645) (n=4,652) (95% CI) number (%) number (%) Cardiovascular event (including death) 651 (14.0) 826 (17.8) 0.78 (0.70–0.86) Death from non-cardiovascular cause 200 (4.3) 192 (4.1) 1.03 (0.85–1.26) Death from any cause 482 (10.4) 569 (12.2) 0.84 (0.75–0.95) These data indicate that fewer people treated with ramipril suffered a cardiovascular event (14.0%) compared with those in the placebo group (17.8%). This gives a relative risk of 0.78, or a reduction in (relative) risk of 22%. The 95% confidence interval for this estimate of the relative risk runs from 0.70 to 0.86. Two observations can then be made from this confidence interval. Confidence Intervals In Healthcare Administration Essay. ● First, the observed difference is statistically significant at the 5% level, because the interval does not embrace a relative risk of one. ● Second, the observed data are consistent with as much as a 30% reduction in relative risk or as little as a 14% reduction in risk. Similarly, the last row of the table shows that statistically significant reductions in the overall death rate were recorded: a relative risk of 0.84 with a confidence interval running from 0.75 to 0.95. Thus, the true reduction in deaths may be as much as a quarter or it could be only as little as 5%; however, we are 95% certain that the overall death rate is reduced in the ramipril group. Finally, exploring the data presented in the middle row shows an example of how a confidence interval can demonstrate non-significance. There were a few more deaths from non cardiovascular causes in the ramipril group (200) compared with the placebo group (192). Because of this, the relative risk is calculated to be 1.03 – showing a slight increase in risk in the ramipril group. However, the confidence interval is seen to capture the value of no effect (relative risk = 1), running as it does from 0.85 to 1.26. The observed difference is thus non-significant; the true value could be anything from a 15% reduction in non-cardiovascular deaths for ramipril to a 26% increase in these deaths. Not only do we know that the result is not significant, but we can also see how large or small a true difference might plausibly be, given these data. 6 show significance in only one or two subgroups are unconvincing – such significance may be deceptive. Unless particular subgroup analyses have been specified in advance, differences other than for the primary endpoint for the whole group should be viewed with suspicion. Statistical significance and clinical significance Statistical significance is also sometimes misinterpreted as signifying an important result: this is a second important pitfall in interpretation. Significance testing simply asks whether the data produced in a study are compatible with the notion of no difference between the new and control interventions. Rejecting equivalence of the two interventions does not necessarily mean that we accept that there is an important difference between them. A large study may identify as statistically significant a fairly small difference. It is then quite a separate judgement to assess the clinical significance of this difference. In assessing the importance of significant results, it is the size of the effect – not just the size of the significance – that matters. Getting it wrong again: failing to find real effects A further error that we may make is to conclude from a non-significant finding that there is no effect, when in fact there is a real effect – this is called a 'type II error'. Equating non-significance with 'no effect' is a common misconception. A non-significant confidence interval simply tells us that the observed difference is consistent with there being no true difference between the two groups. Thus, we are unable to reject this possibility. This is where confidence intervals are much more helpful than simple p-values: the observed difference will also be compatible with a range of other effect sizes as described by the confidence interval.8 We are unable to reject these possibilities and must then assess whether some of them (usually the upper and lower limits of the confidence interval) might be important. Just because we have not found a significant treatment effect, it does not mean that there is no treatment effect to be found.11 The crucial question is: how carefully have we interpreted the findings? Extrapolating beyond the trial For all the complexity of understanding bias and chance in the interpretation of the findings from clinical trials, another important consideration should not be forgotten. The findings from any given study relate to the patients included in that study. Even if an effect is assessed as probably real and large enough to be clinically important, a further question remains: how well are the findings applicable to other groups of patients, and do they particularise to a given individual?12 Neither confidence intervals nor p-values are much help with this judgement. Assessment of this external validity is made based on the patients' characteristics and on the setting and the conduct of the trial. Summary Confidence intervals and p-values take as their starting point the results observed in a study. Crucially, we must check first that this is an unbiased study. The question that confidence intervals then answer is: what is the range of real effects that is compatible with these data? The confidence interval is just such a range, which 95% of the time will contain the true value of the main measure of effect (relative risk reduction, absolute risk reduction, NNT or whatever; Table 1). This allows us to do two things. First, if the confidence interval embraces the value of no effect (for example, no difference between two treatments as shown by a relative risk equal to one or an absolute difference equal to zero), then the findings are non-significant. If the confidence interval does not embrace the value of no difference, then the findings are statistically significant. Thus, confidence intervals provide the same information as a p value. But more than this: the upper and lower extremities of the confidence interval also tell us how large or small the real effect might be and yet still give us the observed findings by chance. This additional information is very helpful in allowing us to interpret both borderline significance and non-significance. Confidence intervals from large studies tend to be quite narrow in width, showing the precision with which the study is What are confidence intervals and p-values? able to estimate the size of any real effect. In contrast, confidence intervals from smaller studies are usually wide, showing that the findings are compatible with a wide range of effect sizes. Health education researchers have called for research articles in health education to adhere to the recommendations of American Psychological Association and the American Medical Association regarding the reporting and use of effect sizes and confidence intervals (CIs). This article expands on the recommendations by (a) providing an overview of CIs, (b) evaluating the use and interpretation of CIs in selected journals in health education, (c) presenting how to calculate CIs using statistical software, and (d) suggesting how to interpret and use CIs. Thirty-three articles in the American Journal of Health Behavior and Health Education & Behavior were evaluated. The evaluation showed that although CIs were reported in approximately half of the evaluated quantitative studies, they were not interpreted in any of the studies. The lack of interpretation of CIs indicates that health educators might not fully understand the meaning of CIs and consequently could not make use of CIs except for presenting the numbers. This article intends to increase health researchers' understanding of CIs, encourage the practice of thinking meta-analytically, and facilitate the use of CIs in the future. Confidence Intervals In Healthcare Administration Essay. The call for health educators to adhere to the American Psychological Association's (APA, 2001) and the American Medical Association's (AMA, 1998) requests regarding the reporting of effect sizes and confidence intervals (CIs) in research reports and articles is becoming more apparent in the health education literature. The latest Publication Manual of the APA highly recommended the use of CIs in research articles (APA, 2001). The Publication Manual regarded CIs as "in general, the best reporting strategy" (APA, 2001,p. 22). Similarly, theAMAManual of Style (1998) indicates that reportage of CIs is preferred over p values, because they "convey information about precision as well • Jing Zhang, MS, Doctoral Student; Department of Health aod Kinesiology, Texas A&M University, College Station, TX 77843-4243; Telephooe: 979-847-9587; Fax: 979-862-2672; E-mail:jingostarI980@hlko.tamu.edo Bruce W. Haoik, MS, Doctural Student; Department of Health aod Kinesiology, Texas A&M University, College Station, TX 77843-4243; Chapter: Alpha Pi. Beth H. Chaoey, PhD, CHES; Assistaot Professor; Department of Health Education aod Promotion, East Carolins University, 3205 Carol G. Belk Building, Greenville, NC 27858; Telephone: 252-328-1611; E-mail: email@example.com; Chapter: Beta Theta • Corresponding author as statistical significance" (p. 539). Additionally, studies conducted by Watkins, Rivers, Rowell, Green, and Rivers (2006), Rivers and Rowell (in press), andBuhi (2005) strongly encourage increased use and reporting of effect sizes and CIs for effect size calculations. Assuming these recommendations made by the APA Publication Manual, AMA Manual of Style, and the cited health education researchers will lead to better accumulation and application of the scientific knowledge, the field of health education could benefit from having its journals follow these recommendations. The purpose of this article is to expand on these recommendations by (a) providing an overview of CIs, (b) evaluating the use and interpretation of CIs in selected journals in health education, (c) presenting how to calculate CIs using statistical software, and (d) suggesting how to interpret and use CIs. The intended results of this article are to increase health researchers' understanding of CIs, provide a snapshot of the frequency and quality of CIs' use in health research, and facilitate the use of CIs by health researchers in the future. Confidence Intervals In Healthcare Administration Essay. An Overview of Confidence Intervals Defining Confidence Intervals A CI is an interval estimation of the population parameter (population characteristic). Computed with the sample statistic, a CI involves a range of numbers that possibly include the population parameter. A CI has four noteworthy characteristics. First, for a given sample size, at a given level of confidence, and using probability sampling, there can be infinitely many CIs for a particular population parameter. The point estimates and endpoints of these CIs vary due to sampling errors that occur each time a different sample is drawn (Thompson, 2002). Second, the CI reported by a certain study is just one of these infinitely many CIs. Third, the percentage of these CIs that contains the population parameter is the same with the level of confidence. Fourth, whether a certain CI reported by a study contains the population parameter is unknown. In other words, the level of confidence is applied to the infinitely many CIs, rather than a single CI reported by a single study (Thompson, 2006). The following is an example to help illustrate the characteristics mentioned above. In a study investigating the predictors of current smoking among Vietnamese American men, Wiecha, Lee, and Hodgkins (1998) reported that higher educational level is negatively associated with current smoking (OR~0.8; 95% confidence interval 0.7 to Spring 2008, Vol. 40, No.1 The Health Educator 29 0.9). The "95%" refers to the level of confidence (I-a), which is the complement of the level of significance a=O.05 (Hinkle, Wiersma, & Jurs, 2003). With the sample size of774 and level of confidence of95%, Wiecha et al. drew a probability sample and got an ioterval of 0.7 to 0.9. With the same sample size, level of confidence, and sampling method, another researcher might get a different OR and ioterval, which is OR~0.6, 95% confidence interval 0.3 to 0.9. The difference in point estimates and endpoiots of the two CIs results from sampliog error. If researchers keep drawiog samples usiog Wiecha et al.'s procedures, they will have iofinitely many iotervals. Ninety-five percent of these intervals will contain the population parameter. However, whether Wiecha et al. 's or any other researcher's ioterval contains the population parameter is unkuown. Confidence Intervals In Healthcare Administration Essay. Hinkle et al. (2003) explaioed the meaniog of a 95% confidence ioterval of2.20-2.70 as follows: Theoretically, suppose we compute the sample means of all possible samples of size 20 and constructed the 95-percent confidence iotervals for the population mean usiog all these sample means. Then 95 percent ofthese iotervals would contaio /1 [population parameter 1 and 5 percent would not. Note that we cannot say that the probability is .95 that the ioterval from 2.20 to 2.70 contaios /1. Either the interval contains /1 or it does not. (p.205) Computing Confidence Intervals The CI for non-effect size statistics and the CI for effect sizes are computed differently. For non-effect size statistics, such as mean, a formula is used to calculate the CI. Hinkle et al. (2003) provided a general formula (p. 203): CI ~ Statistic ± (Critical value) (Standard error of the statistic). This formula shows that the standard error of the statistic determioes the width of the CI. The standard error of the statistic refers to the standard deviation of the sampliog distribution of the sample statistic. The larger the standard error, the wider the CI, and the less precise the ioterval estimate.Confidence Intervals In Healthcare Administration Essay. CIs for effect sizes cannot be computed with formulas. Instead, a statistical procedure (available in computer software such as SPSS}-iteration-must be performed to compute Cis for effect sizes (Thompson, 2006). Thompson (2006) noted, "As conventionally performed, iteration iovolves a process of ioitially guessing a solution, and then repetitively tweakiog the guess until some statistical criterion is reached" (p. 207). Cumming and Fioch (2001) and Klioe (2004) have more detail on computation of CIs for effect sizes usiog iteration (Thompson, 2006). The Importance of Confidence Intervals: Indicating Precision and Facilitating Meta-analytic Thinking A CI displays the full range of hypothetical values of a parameter that cannot be rejected, thus is more informative Academic jourllllls focus on statistical significance, rather than on documenting and integrating CIs, contributes to a publication bias where only statistically significant results are published, but non-significant results are not, creating an incomplete and biased pkture in the literature (Thompson, 1001). than a statistical significance test (which only focuses on one null hypothesis value), although most of the information provided by a CI is not about statistical significance (Smithson, 2003). A CI also reveals the precision of the ioterval estimate--the narrower the width, the more precise the estimate. However, a CI tells nothing about whether it contaios the parameter. Researchers might get excited about a 95% CI that does not subsume the null hypothesis parameter value, iodicating that the statistic around which the CI is constructed is statistically significant. They might get even more excited when this CI is narrow, iodicating that the CI is precise. Nevertheless, this narrow and "not subsumiog null hypothesis parameter value" CI can still be among the 5% of Cis that does not contain the parameter (Thompson, 2006). With this uncertainty, researchers may ask: Why are Cis important? Cis are important, not as isolated Cis reported by siogle studies, but as an addition to the collective body of all relevant CIs from previous studies. Confidence Intervals In Healthcare Administration Essay.The most thoughtful use of Cis iovolves compariog Cis across studies to reveal the true parameter, regardless of whether the CIs subsume the null hypothesis parameter value, or whether the statistics around which Cis are constructed are statistically significant (Thompson, 2006). Academic journals' focus on statistical significance, rather than on documenting and iotegrating Cis, contributes to a publication bias where ouly statistically significant results are published, but non-significant results are not, creating an iocomplete and biased picture in the literature (Thompson, 2002). The broader picture containiog all relevant CIs reveals the replicability and stability of the iotervals and helps researcher identity the region where the parameter may lie (Wilkinson & APA Task Force on Statistical Inference, 1999). Thompson (io press) wrote, "if we ioterpret the confidence intervals io our study in the context of the iotervals io all related previous studies, the true population parameters will eventually be estimated across studies, even if our prior expectations regarding the parameters are wildly wrong" (p. 21). CIs, particularly CIs for effect sizes, also facilitate metaanalytic thinking. Thompson (2002) defined meta-analytic thinking as both the ''prospective formulation of study expectations and design by explicitly iovokiog prior effect sizes" and "the retrospective interpretation of new results, 30 The Health Educator Spring 2008, Vol. 40, No. 1 once they are in hand, via explicit, direct comparison with the prior effect sizes in the related literature" (p. 28). Thinking meta-analytically itaelf, even absent from other improvements in research practice, Thompson argued, can lead to improved science of discovery (Thompson 2002). An Evalnation of How Selected Health Edncation Journals Used Confidence Intervals To assess how well journals in health education reported and used CIs, an evaluation of articles in two health education journals was conducted. The evaluation aimed to answer two questions: (a) What percentage of articles reported CIs, and (b) what percentage of articles interpreted CIs? Methods Two journals of prominent organizations in health education were selected for examination of the use of confidence intervals. The journals are theAmerican Journal of Health Behavior (AJHB) and Health Education & Behavior (HEB). The AJHB is the official publication of the AmericanAcademy of Health Behavior, a research-oriented organization. The mission of the Academy is "to serve as the 'research home' for health behavior scholars whose primary commitment is to excellence in research and the application of research to practice" (American Academy of Health Behavior, 2006). HEB is the official publication of the Society for Public Health Education (SOPHE). Established in 1950, SOPHE is ''the only professional organization devoted exclusively to public health education and health promotion" (Society for Public Health Education, 2005). It is assumed by the authors that these two journals of prominent organizations in health education reflect the some of the highest quality of research in health education. Confidence Intervals In Healthcare Administration Essay. ORDER A CUSTOM-WRITTEN PAPER HERE Since this article evolved from a paper intended for a graduate level statistics class in April 2006, April 2006 was chosen as the time point to collect articles for evaluation. A total offour issues ofjoumals were considered by the authors to be adequate, considering the fact that this paper served the purpose of a tutorial, rather than a full-blown review. Research articles in the two most recent issues of the AJHB and the most recent and the third-most recent issues of HEB were included (as of April 2006) in the evaluation. The secondmost recent issue of HEB was excluded from the evaluation because it was not representative of a typical issue of the A tolill of four Issues ofjounuds were considered by the authors to be adequate, considering tile fact that this paper served the purpose of a tutoriaL .. journal. This issue was devoted exclusively to a research project-the Trial of Activity for Adolescent Girls, focusing on descriptive statistics (e.g., frequencies; none of the articles included statistical significance testing), and qualitative research (including description of the project, e.g., data collection methods and transferring results to practice). Thirty-three research articles were included in the evaluation. Articles were categorized in methodological design as qualitative research (using focus groups and content analysis as the main method of data collection and analysis) and quantitative research (non-qualitative research). Only quantitative research articles were examined for the reporting and interpretation of Cis. If one or more CIs appeared in an article, the article was recorded as reporting CIs. If an article explained what a CI meant and/or compared if the CIs were different from CIs reported in previous studies, the article was recorded as interpreting CIs. References of the evaluated articles are in an appendix available from the first author. Also available from the first author are four tables documenting the methodological design of each article and whether each quantitative article reported and interpreted Cis. Two of the authors independently coded the articles and were in complete agreements with each other. Confidence Intervals In Healthcare Administration Essay. Results Regarding methodological design, the majority of the 33 articles were quantitative. Ninety percent (n~18) of the evaluated AJHB articles were quantitative, whereas 84.6% (n~ II) of the evaluatedHEB articles were quantitative. The remaining articles employed qualitative methods. CIs were reported in approximately half of the evaluated quantitative studies in both journals. However, none of the studies interpreted CIs. Among studies that did not report CIs, one article in AJHB (5.6%) and four articles in HEB (36.4%) reported standard error intervals, which could be converted to CIs. Thirty-three percent (N~6) ofAJHB articles and 18.2% (n~2) of HEB articles reported neither CIs nor standard error intervals. Of the twelve articles that reported ORs (odds ratios) using logistic regression, eleven reported CIs for the ORs. Of the four articles reporting the development of a scale or instrument, none reported CIs. Evaluation Discussion Although CIs were reported in approximately half of the evaluated quantitative studies, they were not interpreted in any of the studies. The reporting of Cis showed that health education researchers were aware of the importance of CIs. The reporting of CIs could facilitate meta-analyses for future researchers. Nevertheless, the lack of interpretation of CIs indicated that health education researchers might not fully understand the meaning of Cis and consequently could not make use of Cis except for presenting the numbers. Additionally, it was observed that researchers might have reported CIs, only when the statistical packages readily Spring 2008, Vol. 40, No.1 The Health Educator 31 provided CIs in certain analysis, such as logistical regression. This could be a possible explanation for why II of the 12 studies involving DRs reported CIs for DRs. Factor loadings, Chi-square, Cronbach's a, and Pearson's r were the major statistics of four reviewed articles regarding the development of a scale or instrument. It was suspected that authors of these four studies did not report CIs because the statistical packages they used did not readily provide calculations for CIs when the studies' major statistics were computed. How to Calculate Confidence Intervals Using Statistical Software One prominent barrier to reporting and interpreting CIs is the fact that widely used statistical software, such as Statistical Package for the Social Sciences (SPSS) and Statistical Analysis Software (SAS), limit CIs to mainly "normal or 'central' t-test statistic distributions" (Smithson, 2001, p. 606), which assume normal distributions of data. For example, output provided by the user-friendly ''point and click" options in SPSS does not always give the CIs of the statistics. Therefore, when ''noncentral'' distributions are needed for computations of CIs for specific statistics, such as Cohen's d, ,,', R', specific syntax must be used in order for popular statistical software, SPSS and SAS, to provide the CIs. Additionally, according to the University of California Academic Technology Services at University of Califomia (2007): In many instances, [users] may fmd that using syntax is simpler and more convenient than using point-andclick. The use of syntax is also helpful in documenting [the] analysis. Confidence Intervals In Healthcare Administration Essay.It is difficult to take adequate notes on modifications made to the data and the procedures used to do the analyses when using point-andclick. However, documenting what [users] are doing in a syntax file is simple and makes reviewing andlor reconstructing the analysis much easier" (p. I). Therefore, this section of the article provides point-andclick, along with syntax, needed to calculate CIs for several statistical analyses. Smithson (2001) provides SPSS script for computing CIs using ''noncentrality parameter for the noncentral F distribution [which] converts that into a confidence interval for multiple (or partial) R'" (p. 627). Additionally, Duhachek and Iacobucci (2004) and Iacobucci and Duhachek (2003) offer SAS and SPSS syntax for measuring reliability, standard error, and CIs. This provides only two examples of using syntax to compute CIs for specific statistics. Therefore, in addition to Smithaon's (2001), Duhachek and Iacobucci's (2004), and Iacobucci and Duhachek's (2003) scripts, Table 1 provides SPSS (Version 14.0) commands and syntax for calculation of CIs for various univariate and multivariate statistical analyses. Another software utilized to calculate and explore CIs is a graphical software called ESCI (Exploratory Software for Confidence Intervals). ESCI was developed by Geoff Cummings and runs through Microsoft Excel (Cummings & Finch, 2001). This software allows users to (a) explore many CI concepts, (b) calculate and display CIs for personal datasets, (c) "calculate CIs for Cohen's standardized effect size d," (d) "explore noncentral t distributions and their role in statistical power," (e) ''use CIs for simple meta-analysis, using original or [standardized] units," and (f) explore all of the previously mentioned concepts ''via vivid interactive graphical simulations" (Exploratory Software for Confidence Intervals, 2006). There are many different ESCI modules available for free download and non-commercial use at http:/ Iwww.latrobe.edu.aulpsy/esci/. These modules were developed with Microsoft Excel 2003. Confidence Intervals In Healthcare Administration Essay. ZurnaStat Statistical Programs provide an additional type of software that is compatible with both Microsoft Excel and versions of7.0 and higher ofSPSS. These programs report CIs for "percentages, correlations, means, standard deviations, variance ratios, differences between correlations, squared correlations, partial correlations, squared partial correlations, squared multiple correlations, group differences in squared multiple correlations, averages of correlations, percent of variance accounted for statistics inANOVA, single degree of freedom contrasts, odds ratios, relative risks and a wide range of additional statistics" (ZurnaStat, 2006, Emphasis on Confidence Intervals section). To read more on ZurnaStat programs, please refer to http://www.zumastat.com! Home.htm. Lastly, an SPSS Tools (Levesque, 2006) internet site is available for use and provides good information on SPSS syntax for calculating CIs for specific statistics. The syntax can be found at http://www.spsstools.netlSampleSyntax.htm #Distributions. These programs, software, and websites provide researchers and practitioners with the appropriate means for calculating CIs, and thus, should help to improve reportage of CIs in future research articles. How Reporting and interpretation of CIs Woald Enable Researcb Stodles to Yield More Insights One of the reviewed studies, Vittes and Sorenson (2005), offers an opportunity to show how the reporting and interpretation of CIs would enable the studies to yield more insights on the qnality of point estimates and the estimatiou of the parameter. Vittes and Sorenson reported CIs, but did not interpret the CIs in its own context or in the context of all previous studies. The discussion in the next two sections is based on an actual odds ratio and its CI reported by Vittes and Sorenson. Confidence Intervals In Healthcare Administration Essay. Reporting CIs Makes a Diflerence Vittes and Sorenson (2005) reported CIs, but let us take a moment to see what would happen if we remove one of its 32 The Health Educator Spring 2008, Vol. 40, No. 1 Table I Statistical Package for the Social Sciences (SPSS) Commands for Statistical Analyses to Calculate Confidence Intervals (CI) (spSS, 2006) Statistical analysis Possible strategy in SPSS to calculate CIs GLM Multivariate Run the GLM Multivariate procedure, under the "analyze" menu in SPSS. Click on Options to provide the 95% CI based on Student's t distribution for the differences between the dependent variables. GLM Univariate Utilize the PRINT subcommand, and the PARAMETER keyword with the PRINT subcommand provides CI. For the POSTHOC subcommand in the GLM Univariate analysis, the following keywords provide CI for the Posthoc tests: LSD, SIDAK, BONFERRONI, GH, T2, T3, C, DUNNETT, DUNNETTL, DUNNETTR, TUKEY, SCHEFFE, GT2, GABRIEL. Lastly, when using the CRITERIA subcommand in a GLM Univariate analysis, the keyword ALPHA(n) has two functions. It (a) provides the alpha level under which the power is to be calculated, and (b) identifies the CI level. The value of n should be between 0 and I to work properly. Independent-Samples T Run the Independent-Samples T Test, under the "analyze" menu, then click on Options, which Test provides 95% CI by default. Linear Regression Under the "analyze" menu in SPSS, click on the Linear Regression procedure, and the Save option gives the 95% CI for prediction intervals. Additionally, the Estimates option provides the 95% CI for each regression coefficient or covariance matrix. Logistic Regression Under the "analyze" menu in SPSS, click on Logistic Regression, and Options gives the 95% CIs for exp(B). Also, the PRINT subcommand, with the CI(level) keyword provides CI for exp(B). The value identified by (level) must be between 1 and 99. MANOVA (Multivariate Use the MANOVA: Multivariate command, and specify a type of analysis in parenthesis after Command) MULTIVARIATE keyword: ROY, PILLA!, WILKS, HOTELLIN~ BONFER. These keywords provide CI. Additionally, the MULTIVARIATE command on CINTERVAL gives CIs similar to the univariate analysis at the 0.95 level. Mixed Linear Model Use the MIXED command in SPSS syntax, and CIN(value) provides CI, and the default value is 95%. Nonlinear Regression Utilize the NLR command in SPSS syntax and the BOOTSTRAP subcommand provides CI. One-Sample T Test Use the "analyze" menu in SPSS, and under the Compare Means option, click on One-Sample T Test. The Options button provides 95% CI by default. One-Way ANOVA Use the "analyze" menu in SPSS, and under the Compare Means option, click on One-Way ANOVA. The Post-Hoc option gives the 95% CI for the mean. Additionally, the STATISTICS command, using SPSS syntax, along with the DESCRIPTIVES subcommand, gives the 95% CI for each dependent variable for each group. Paired-Samples T Test Use the "analyze" menu in SPSS, and under the Compare Means option, click on Paired-Samples T Test. The 95% CI for difference in means are displayed by default. Regression Utilize the REGRESSION command, and the subcommand, CI, provides 95% CI for the unstandsrdized regression coefficients. To reset the percent for CI, use CIN[(value )], in which the (value) sets the specified percentage interval utilized with the temporary variable types MCIN (lower and upper bounds for predication intervals of the mean predicated response) and ICIN (lower and upper hounds of prediction intervals for a single observation). Reliability Utilize the RELIABILITY Command, and the ICC subcommand, along with the CIN keyword, gives the percent for CI and significance levels of the hypothesis testing. Additionally, the Statistics option gives the 95% CI for the intraclass correlation coefficient (SPSS 14.0 Help Database, 2006). Spring 2008, Vol. 40, No.1 The Health Educator 33 CIs, leaving only the point estimate-the adjusted odds ratio of7.52. This particular adjusted odds ratio indicates that adolescents who own handguns are 7.52 times more likely to have recreational gun use than adolescents who do not own a handgun, while adjusting for all the other variables included in the model. The point estimate may lead the readers to think that handgun ownership is an important predictor of recreational gun use. However, since there is no CI for this odds ratio, we do not know the precision of this odds ratio. By providing the 95% CI ofI.01-55.83, Vittes and Sorenson (2005) enable the readers to estimate by themselves the precision of the odds ratio (although such estimates may be wrong; explanations provided later in the article). How to Interpret a CI without Comparing it to Previous Studies Had Vittes and Sorenson (2005) interpreted this CI within its own context (Le., in the context of this one study, but not in the context of all previous studies), the interpretation conld have included the following four points: I. Ninety-five percent of the CIs constructed with the same method as this stody, will contain the true odds ratio for the popnlation. 2. This 95% ClofI.01-55.83 may or may not contain the true odds ratio for the population. 3. This 95% CI ofI.01-55.83 indicates that adolescents who own handguns are more likely than those who do not own a handgun to have recreational gun use by a factor which can be as low as 1.01 or as high as 55.83, whiJeadjusting for all the other variables included in the model. Confidence Intervals In Healthcare Administration Essay. 4. Without comparing this CI to CIs in previous studies, the CI shows that the 7.52 odds ratio (point estimate) conld be imprecise, since the interval appears to be wide. In addition, the lower bound was close to the null hypothesis value of 1.00, indicating handgun ownership may not be an important predictor of recreational gun use. Nevertheless, the precision and replicability of the CI cannot be detennined until the CI is compared to all CIs from previous studies. How to Interpret a CI in the Context of AU Previous Studies Although interpreting a CI in its own context reveals more meanings than not interpreting it at all, the most thoughtful interpretation of CI involves the comparison of the current CI with CIs from all related studies (Thompson, 2006). All relevant CIs, no matter they subsume the nnll hypothesis parameter value or not, need to be included in the comparison. A better estimate of the parameter can be gained from the comparison. To interpret a CI in the context of all related previous studies, the researcher conld (a) construct a graph comprising all CIs for the statistics of interest reported so far, and (b) with the visual assistance of graph, compare the current CI with all related CIs from previous research regarding their width and location. The following discussion illustrates the interpretation ofVittes and Sorenson's (2005) 95% CI ofI.01-55.83 in the context of all related previous research. Since Vittes and Sorenson did not present any CIs from previous research, CIs used in this discussion are hypothetical and for illustrative purposes only. Suppose seven studies examined the odds ratio for recreational gun use by gun ownership (v. no gun) in adolescents. All seven stodies reported CIs for the odds ratios. CIs for the odds ratio are compiled in Figure I. The true parameter value will eventually be discovered as researchers continue to compare CIs across studies (Thompson, 2006). Vittes and Sorenson (2005) could have made the following interpretation of the 95% CI of 1.01-55.83, depending on which interval in the graph represents this CI. If their 95% CI ofI.01-55.83 is interval E, the interval is indeed the widest and not precise. However, since the CI covers a frequently reported area, the researcher might interpret the CI as generally consistent with previous research and might have captored the parameter. If their 95% CI ofI.01-55.83 is interval B, the interval is narrower than most of the CIs from previous studies, and can be interpreted as an improvement in the interval estimate. If their 95% ClofI.01-55.83 is interval L-_________ ---,:--__ :-;,---___ th,e X axis ___ ----' '-------::-_.A -.J LB~ ~C-.J LD-----.J ~E __________________ ~ LF_-----' Figure 1. Visual representation of95% CIs of odds ratio for recreational gun use by gun ownership (v. no gun) in adolescents, reported by all 7 stodies. Confidence Intervals In Healthcare Administration Essay. 34 The Health Educator Spring 2008, Vol. 40, No. 1 G, the interval estimate is the narrowest of all the CIs, and may be hastily and happily seen as precise. However, interval G does not cover a frequently reported area. The researcher needs to ponder whether the current CI is accurate and has caught the parameter, or most of the previous CIs are accurate and have contained the parameter. If in fact all the previous CIs contain the parameter, this narrowest CI is inaccurate. The interpretation of a narrow CI as precise demonstrates that simply looking at a CI's width without comparing its location with previous related studies can lead to inaccurate interpretation of the CI. By asking why the current CI is inconsistent with previous CIs, the researchers engage in a critical evaluation of all related CIs in their estimation of the parameter. Limitations This article has several limitations. First, the sample size of the evaluated studies (N~33) was too small to genemlize to the field of health education. This small study could serve as a pilot study for a full-blown study ""amining all issues in three to five journals of selected years. Second, causal statements can not be made on the relationship between chamcteristics and point estimates of studies and whether studies reported CIs. Confidence Intervals In Healthcare Administration Essay. ORDER A CUSTOM-WRITTEN PAPER HERE Conclusion Making inferences about the population characteristics (parameter) based on knowledge of sample characteristics (statistics) is the goal of inferential statistics (Hinkle et al., 2003). The true parameter value eventually emerges from comparison of CIs for the statistics (Thompson, 2006). Illustrations like Figure 1 assist the comparison of CIs across studies and demonstrate meta-analytic thinking. Schimdt (1996) argues, "Unlike traditional methods based on significance tests, meta-analysis leads to correct conclusions and hence leads to cumulative knowledge" (p. 119). CIs are the building blocks of the meta-analytic thinking. When CIs for point estimates are not reported, the building blocks for meta-analytic thinking are missing. Without the building blocks, a figure revealing the location of the true parameter cannot be built. When CIs for point estimates are interpreted in the context of a single isolated study, a building block is created and the quality of the building block can be somewhat assessed. We will be able to tell, in some sense, whether a building block is sturdy and usable (narrow) and whether it is flimsy and unusable (wide). However, we cannot know whether a CI is narrow or wide or if it captures the parameter until we compare it with all previous CIs. Without comparing the single CI with all previous CIs, the building block simply lies on the ground and does not contribute to the figure.Confidence Intervals In Healthcare Administration Essay. The full use of the building block is realized ouIy when the CI in the current study is compared to CIs for the same point estimate in all previous related studies. By doing so, the researcher is actively engaged in assessing the quality of his building block, upgmding the quality assessment of previous building blocks, and actualIy building the figure of meta-analytic thinking. The more researchers add building blocks on the figure, the more the parameter will reveal its location and the more accurate the estimate of the parameter. The 33 reviewed studies show that health education researchers are beginning to create the building blocks, but are not actively building the figure of meta-analytic thinking. Health education researchers have not fully employed the practice of thinking meta-analytically. However, by utilizing meta-analytic thinking with the assistance of CIs, health education researchers will be able to better estimate the population parameters and use more accurate results to improve people's health. References American Academy of Health Behavior. (2006). Mission Statement. Retrieved May 8, 2006, from http:// www.aahb.orgl American Medical Association. (1998). Manual of style, a guide for authors and editors (9th ed). Fairfield, NJ: Author. American Psychological Association. (2001). Publication manual of the American Psychological Association (5th ed.). Washington, DC: Author. Buhi, E. (2005). The insiguificance of "siguificance" tests: Three recommendations for health education researchers. American Journal of Health Education, 35,109-112. Cumming, G., & Finch, S. (2001). A primer on the understanding, use and calculation of confidence intervals based on central and noncentral distributions. Educational and Psychological Measurement, 61, 530- 572. Dubachek,A., & Iacobucci, D. (2004). Alpha's standard error (ASE): An accurate and precise confidence interval estimate. Journal of Applied Psychology, 89(5), 792- 808. Explomtory software for confidence intervals. (2006). ESC! Software. Retrieved June 28, 2006, from http:// www.latrobe.edu.aulpsy!escilindex.htrnl. Hinkle, D. E., Wiersma, W, & Jurs, S. G. (2003). Applied statisticsfor the behavioral sciences (4th ed.). Boston: Houghton Miffiin. Iacobucci, D. & Dubachek, A. (2003). Advancing alpha: Measuring reliability with confidence. Journal of Consumer Psychology, 13(4),478487. Kline, R. (2004). Beyond significance testing: Riforming data analysis methods in behavioral research. Washington, DC: American Psychological Association. Levesque, R (2006). Raynold's SPSS tools: Syntax. Retrieved June 28, 2006, from http://www.spsstoois.net/ SampleSyntax.htrn#Distributions. Rivers, D., & Rowell, K. (in press). Encouraging confidence intervals for effect size reporting in health education research. Eta Sigma Gamma Student Monograph. Spring 2008, Vol. 40, No.1 The Health Educator 35 Schmidt, F. 1.. (1996). Statilticallignificance tcIting and cumulative know1cdgc in plychology: ImplicatiOllll for training oflClClilChcrl.P~MeIhod8,1, 115- 129. SmithIon, M. (2001). Com=ctCCllfidcDcciDl:c1vall forv.riOOl rcgrcllion effect 10 and paramdII:n: The importance of nonccntral diltributionl in computing intcrvall. EducatiCNUll wrd PqchoWgicaJ~, 61(4), = SmilhIon, M. (2003). Co~ iIrtervah. Sage Uniwnity PipCfI Scrie. on QoantitativcApp1icatiOllll in the Socill SciC!lCCl, 07-140. Thouund ow, CA: Slgc. Society for Public Health Education. (2005). Mg.rion /ftutement. Retrieved May 8, 2006, from http:// www.lophc.orgIoontcntlmillion_ltatcmcnt.up StatiiticalPackagc forthe Social SciCDCCI 14.0 (2006). Help dutabu.re. SPSS Inc.: Chicago, IL. ThompIon, B. (2002). Wbatfuturc qoantitativc IOciallCiencc rclcarch oou1d look like: Confidcncc intcrvalI foreffcct lizc .. Edwcatimrul RufltlTChet; 31(3), 25-32. ThomplOll, B. (2006). FOII1Idutio"" o/behaviorul.rtuti8tic.r. NewYmk:Ouilford. ThomplOll, B. (in p!C1I). RcIcuch lynthCl:il: Effect lizcl. In J. Green, a Camilli, &: P. B. Elmore (Edl.), Complementary WUlthodr /0,. 1'Uflm'Cn hi flfivctllion. Wuhington. DC: American Educational RClcarch AlIOciation. Confidence Intervals In Healthcare Administration Essay. Soclety for Pablle :se.Ith Edacatlon Mid-Year Sdent:lfk CoDfenmee May 21-14, 1008 Cldcap, Dllnoll UniVCflity of California Academic Tochnology ServiCCI. (2007). Statiltical computing IICIninani beyond pointand click: SPSS Iyntax. R.ctmvcdAprll24, 2007, ftunhtlp:/ !www.atl.ucla.cdulltatllpll/lcminanlIPII_lyntu! do ........... VlttCI, K. A.," SOlaIIOllo S. B. (2005). Rccrcatiooal glDl UIC by California adolelcentl. Heu/tn Edvcutioll