Coming Attractions: Where Are We Going?
Our goal is to get to the point were we can read, understand, and write statements like
Does the mean vitamin C blood level of smokers differ from that of nonsmokers? Let's suppose for a moment they do, with smokers tending to have lower levels. Nevertheless, we wouldn't expect every smoker to have levels lower than those of every nonsmoker. There would be some overlap in the two distributions. This is one reason why questions like this are usually answered in terms of population means, namely, how the mean level of all smokers compares to that of all nonsmokers.
The statistical tool used to answer such questions is the confidence interval (CI) for the difference between the two population means. But let's forget the formal study of statistics for the moment. What might you do to answer the question if you were on your own? You might get a random sample of smokers and nonsmokers, measure their vitamin C levels, and see how they compare. Suppose we've done it. In a sample of 40 Boston male smokers, vitamin C levels had a mean of 0.60 mg/dl and an SD of 0.32 mg/dl while in a sample of 40 Boston male nonsmokers (Strictly speaking, we can only talk about Boston area males rather than all smokers and nonsmokers. No one ever said research was easy.), the levels had a mean of 0.90 mg/dl and an SD of 0.35 mg/dl. The difference in means between nonsmokers and smokers is 0.30 mg/dl!
The difference of 0.30 looks impressive compared to means of 0.60 and 0.90, but we know that if we were to take another random sample, the difference wouldn't be exactly the same. It might be greater, it might be less. What kind of population difference is consistent with this observed value of 0.30 mg/dl? How much larger or smaller might the difference in population means be if we could measure all smokers and nonsmokers? In particular, is 0.30 mg/dl the sort of sample difference that might be observed if there were no difference in the population mean vitamin C levels? We estimate the difference in mean vitamin C levels at 0.30 mg/dl, but 0.30 mg/dl "give-or-take what"? This is where statistical theory comes in.
One way to answer these questions is by reporting a 95% confidence interval. A 95% confidence interval is an interval generated by a process that's right 95% of the time. Similarly, a 90% confidence interval is an interval generated by a process that's right 90% of the time and a 99% confidence interval is an interval generated by a process that's right 99% of the time. If we were to replicate our study many times, each time reporting a 95% confidence interval, then 95% of the intervals would contain the population mean difference. In practice, we perform our study only once. We have no way of knowing whether our particular interval is correct, but we behave as though it is. Here, the 95% confidence interval for the difference in mean vitamin C levels between nonsmokers and smokers is 0.15 to 0.45 mg/dl. Thus, not only do we estimate the difference to be 0.30 mg/dl, but we are 95% confident it is no less than 0.15 mg/dl or greater than 0.45 mg/dl.
In theory, we can construct intervals of any level of confidence from 0 to 100%. There is a tradeoff between the amount of confidence we have in an interval and its length. A 95% confidence interval for a population mean difference is constructed by taking the sample mean difference and adding and subtracting 1.96 standard errors of the mean difference. A 90% CI adds and subtracts 1.645 standard errors of the mean difference, while a 99% CI adds and subtracts 2.57 standard errors of the mean difference. The shorter the confidence interval, the less likely it is to contain the quantity being estimated. The longer the interval, the more likely to contain the quantity being estimated. Ninety-five percent has been found to be a convenient level for conducting scientific research, so it is used almost universally. Intervals of lesser confidence would lead to too many misstatements. Greater confidence would require more data to generate intervals of usable lengths.
[Zero is a special value. If a difference between two means is 0, then the two means are equal!]
Confidence intervals contain population values found to be consistent with the data. If a confidence interval for a mean difference includes 0, the data are consistent with a population mean difference of 0. If the difference is 0, the population means are equal. If the confidence interval for a difference excludes 0, the data are not consistent with equal population means. Therefore, one of the first things to look at is whether a confidence interval for a difference contains 0. If 0 is not in the interval, a difference has been established. If a CI contains 0, then a difference has not been established. When we start talking about significance tests, we'll refer to differences that exclude 0 as a possibility as statistically significant. For the moment, we'll use the term sparingly.
A statistically significant difference may or may not be of practical importance. Statistical significance and practical importance are separate concepts. Some authors confuse the issues by taking about statistical significance and practical significance or by talking about, simply, significance. In these notes, there will be no mixing and matching. It's either statistically significant or practically important any other combination should be consciously avoided.
Serum cholesterol values (mg/dl) in a free-living population tend to
be between the mid 100s and the high 200s. It is recommended that
individuals have serum cholesterols of 200 or less. A change of 1 or 2
mg/dl is of no importance. Changes of 10-20 mg/dl and more can be
expected to have a clinical impact on the individual subject. Consider
an investigation to compare mean serum cholesterol levels produced by two
diets by looking at confidence intervals for
1 -
2 based on
. High cholesterol levels are bad. If
is positive, the mean from diet 1
is greater than the mean from diet 2, and diet 2 is favored. If
is negative, the mean from diet 1
is less than the mean from diet 2, and diet 1 is favored. Here are six
possible outcomes of experiment.
|
|
95% CI |
|
| (what was observed) |
(what the truth might be) |
|
| Case 1 | 2 | (1,3) |
| Case 2 | 30 | (20,40) |
| Case 3 | 30 | (2,58) |
| Case 4 | 1 | (-1,3) |
| Case 5 | 2 | (-58,62) |
| Case 6 | 30 | (-2,62) |
For each case, let's consider, first, whether a difference between population means has been demonstrated and then what the clinical implications might be.
In cases 1-3, the data are judged inconsistent with a population mean difference of 0. In cases 4-6, the data are consistent with a population mean difference of 0.
Cases 5 and 6 require careful handling. While neither interval formally demonstrates a difference between diets, case 6 is certainly more suggestive of something than Case 5. Both cases are consistent with differences of practical importance and differences of no importance at all. However, Case 6, unlike Case 5, seems to rule out any advantage of practical importance for Diet 1, so it might be argued that Case 6 is like Case 3 in that both are consistent with important and unimportant advantages to Diet 2 while neither suggests any advantage to Diet 1.
It is common to find reports stating that there was no difference between two treatment. As Douglas Altman and Martin Bland emphasize, absence of evidence is not evidence of absence, that is, failure to show a difference is not the same thing as showing two treatments are the same. Only Case 4 allows the investigators to say there is no difference between the diets. The observed difference is not statistically significant and, if it should turn out there really is a difference (no two population means are exactly equal to an infinite number of decimal places), it would not be of any practical importance.
Many writers make the mistake of interpreting cases 5 and 6 to say there is no difference between the treatments or that the treatments are the same. This is an error. It is not supported by the data. All we can say in cases 5 and 6 is that we have been unable to demonstrate a difference between the diets. We cannot say they are the same. The data say they may be the same, but they may be quite different. Studies like this--that cannot distinguish between situations that have very different implications--are said to be underpowered, that is, they lack the power to answer the question definitively one way or the other.
In some situations, it's important to know if there is an effect no matter how small, but in most cases it's hard to rationalize saying whether or not a confidence interval contains 0 without reporting the CI, and saying something about the magnitude of the values it contains and their practical importance. If a CI does not include 0, are all of the values in the interval of practical importance? If the CI includes 0, have effects of practical importance been ruled out? If the CI includes 0 AND values of practical importance, YOU HAVEN'T LEARNED ANYTHING!
[back to LHSP]