Article Text


Evidence about evidence
  1. B C Reeves
  1. Correspondence to: Dr Barney C Reeves London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, UK;

Statistics from

The quality of evaluations of diagnostic test performance

In this issue of the BJO (p 261), Siddiqui et al review the compliance of researchers with quality standards for evaluations of diagnostic test performance (DTP). “Standards” were originally set by the McMaster evidence based medicine (EBM) group1,2 and they have continued to evolve over recent years. Unfortunately, the standards appear to have had little impact since reviews of recent evaluations have shown that they tend to be of poor quality, in medicine generally and in ophthalmology and other specialties.3–6 The review of Siddiqui et al confirms this gloomy picture.

In contrast, during the same period, there has been substantial improvement in the quality and reporting of evaluations of the effectiveness of treatments. Why has research to evaluate DTP not benefited in a similar way from the EBM “movement”? Perhaps improving the quality of research about effectiveness was seen as a priority because it was perceived to be important to patients—the “bit” of health care that makes them better—or because the resources wasted from using treatments that don’t work (and not using ones that do) was much easier for the public and media to appreciate. Perhaps the principles of high quality research to evaluate DTP are more difficult to grasp than controlled experiments to assess effectiveness. Or perhaps we can just blame Archie Cochrane!

Whatever the reason, prioritising research about effectiveness might be seen as paradoxical since it is difficult to optimise treatment without first knowing the diagnosis. It is also not clearly justified on an efficiency basis, since substantial (and increasing) amounts of healthcare resources are spent on diagnosis, with new and expensive diagnostic technologies emerging. And the diversity of evidence about DTP is often not appreciated—for example, patients’ responses to standard questions when taking a history and standardised observations of clinical signs all constitute diagnostic “test” information, the value of which can be quantified.7

The relative neglect of evidence about DTP may, at last, be about to change. The Cochrane Collaboration has long appreciated the importance of such evidence—a methods groups on the topic was registered in 1995—and, in 2003, the collaboration took the decision to develop a new database of systematic reviews of diagnostic test accuracy. This will be developed in parallel with the existing database of systematic reviews of the effectiveness of healthcare interventions.

This new review of ophthalmic tests might appear to suggest that things are improving compared to the situation during the 1990s.5 All evaluations scored some points, with scores ranging from 8–19/25 compared with 0–5/7. However, although all STARD items are important, they are not all equally important. Failure to report some item may mislead a reader but does not necessarily invalidate the evidence. In contrast, poor compliance in reporting particular items leads (on average) to biased, optimistic estimates of DTP.4 Unfortunately, compliance with these items, about masking/blinding (item 11) and workup bias (item 16), was poorer than for others, with 6/16 and 4/15 papers respectively judged to be compliant with the standard.

Reporting indeterminate results (and analysing them correctly) is also crucial, since decisions still need to be made about patients who give such results. Failure to comply will almost always cause researchers to overestimate DTP. This item was poorly reported as well (item 22: 5/16).

The lack of evidence about diagnostic test performance represents an opportunity for medical researchers to make a significant contribution

Reviews of evidence about DTP suggest that researchers, and journal editors, compartmentalise their knowledge. At last, the message about confidence intervals seems to have been learnt with respect to estimates of effect. Why, then, are estimates of DTP perceived to be immune (item 21: 4/16) (Siddiqui et al)?5,8

The STARD items illustrate the distinction between the quality of reporting and the quality of the research itself. This distinction is also true for randomised controlled trials (RCTs) (cf CONSORT quality standards9) but is less important, perhaps, because the design principles of RCTs and measures to protect against bias are now well known, relatively simple and, hence, straightforward for readers to appraise. This is not yet the case for evidence about DTP. Note the STARD item that requires researchers to describe how the study population was selected. This leaves the reader to judge the appropriateness of the population for the research question/context of interest, which is the key issue in determining the relevance of the evidence.10 The STARD initiative is a very important step forward but users of evidence of DTP need to remain vigilant and hone their appraisal skills.

Although requirements for a good evaluation (study design features to protect against bias, and analysis) are not widely appreciated, in other respects such evaluations are often relatively easy to conduct. Evaluations are typically based on cross sectional studies, often without any need for prolonged follow up. Studies often investigate tests for diagnosing rare conditions, which can cause difficulties in recruiting a representative population that includes sufficient individuals with the condition(s) of interest (also true for evaluations of screening accuracy). However, high quality evidence for common conditions, and very simple “tests” (see above), is often lacking. The lack of evidence about DTP represents an opportunity for medical researchers to make a significant contribution (

Methodology for evaluating DTP is an evolving area. In a recent critique,11 the limitations of the current framework were laid bare and challenges for the future set out. The UK National Health Service recently prioritised the commissioning of a review of evidence about methods for evaluation of DTP when there is no gold standard, a problem that is not uncommon ( This decision highlights the importance of DTP evidence for healthcare services.

Note in Proof

The quality of evaluations of diagnostic test performance


View Abstract

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Linked Articles