Statistics from Altmetric.com
Through their exceptionally thorough follow up, Pennefather et al (this issue, p 643) have presented us with a fine example of the impact that bias, or systematic error, can have on the results of an epidemiological study. They found a higher rate of ocular abnormalities in children who were hard to locate or whose parents were reluctant for them to attend for follow up, suggesting that a less comprehensive survey would have underestimated the extent of disease.
Epidemiological studies are subject to two types of error, systematic and random, and of the two, the systematic errors are by far the more problematic. Statistical theory offers us an abundance of methods for quantifying and allowing for the impact of random error by using standard errors, confidence intervals, or p values. From this theory we know that as the sample increases so the size of the random or sampling error will decrease, often in proportion to the square root of the sample size. Bias, however, is much more difficult to handle because it is generally unmeasured and, being systematic, it does not decrease as the sample size increases. It is important to remember that a confidence interval only captures the uncertainty due to sampling errors and consequently can only be thought to represent a likely range of values for the feature of interest if we believe that there is no bias in the study.
Sackett’s influential article attempted to catalogue the sources of different forms of bias that can arise.1 Despite listing over 50 sources of bias in medical research these are all essentially variations on three key themes—measurement error, selection bias, and confounding. Measurement error is the most obvious type of bias and includes not only straightforward examples such as a faulty tonometer that underestimates intraocular pressure, but also problems because of forgotten exposures or misdiagnosis. Selection bias arises through the inclusion or exclusion of subjects leading to an unrepresentative sample. Pennefather et al’s paper illustrates this. It is a common experience that studies which are set out to recruit a representative sample fail because of selective non-response. The third form of bias, confounding, is important in comparative studies. If a study is correctly randomised then the comparison groups will tend to be similar in every respect except the allocated treatment or exposure. Observational studies are not able to randomise and so it is not possible to ensure that the comparison groups do not differ in other important respects. Thus, a comparison between a new surgical technique and previous experience of a standard procedure is open to bias if there have been other changes in medical practice or patient selection that have coincided with the introduction of the new technique. It is possible to make a statistical adjustment to the comparison for known confounders provided that these can be measured. One of the key advantages of randomisation is that it will automatically tend to balance out potential confounders even when they are neither suspected nor measured.
Sometimes it is claimed that non-differential bias—that is, bias that affects everyone equally, will only diminish the size of associations or differences. It follows that if a study finds an effect, the removal of any non-differential bias would only increase the effect. Theoretical studies and simulations have supported this idea but in practice the situation is usually more complex. Even a small amount of differential bias can exaggerate the size of an effect and where there are more than two factors at play, bias in one can affect the apparent relation between two others. It is therefore very difficult to sure be that a result in an observational study is not due to some bias.
Even with the security of a randomised trial methodology in place an overall selection bias may exist because of excessively stringent inclusion/exclusion criteria. Thus, a trial result may be perfectly valid within itself but if the design is not pragmatic the result may lack generalisability, as a significant proportion of “real world patients” may fall outside the trial entry criteria. A quest for purity, with overemphasis of trial population homogeneity, may ultimately be counterproductive.
Given the potential for misleading results due to bias in epidemiological studies it is almost surprising that observational studies make any contribution to medical research. Indeed, it is true that no epidemiological study should be treated as convincing evidence in isolation. It is only when a number of epidemiological studies using different methodologies in different populations agree on some finding that one should be persuaded of its truth. Despite this reservation some epidemiological studies are better than others and one of the main features that must be assessed in a critical appraisal of the evidence is the way that the study attempts to minimise bias or quantify its potential impact.
Pennefather et al conclude their article by reiterating that the difference between the prevalence of ocular abnormalities in the cooperative subjects was not significantly different from that in the whole cohort. This appears to contradict the (valid) tests presented in their Table 1 (p 644), which show that significant differences do exist between the three groups. The main point, however, is that when dealing with bias it is more relevant to emphasise the importance of the overall impact resulting from the bias, rather than focusing on statistically significant subgroup differences. Thus, the primary question has been appropriately addressed—that is, whether the difference between the easily measured 11.3% ocular abnormalities and the true value for this cohort of 13.4% is of practical importance.