Article Text

PDF

The quality of reporting of diagnostic accuracy studies published in ophthalmic journals
  1. M A R Siddiqui1,
  2. A Azuara-Blanco1,
  3. J Burr2
  1. 1Department of Ophthalmology, Grampian University Hospitals NHS Trust, UK
  2. 2Health Services Research Unit, University of Aberdeen, UK
  1. Correspondence to: Augusto Azuara-Blanco PhD FRCS(Ed), The Eye Clinic, Aberdeen Royal Infirmary, Aberdeen AB25 2ZN, UK; aazblancoaol.com

Abstract

Aim: To evaluate the quality of reporting of all diagnostic studies published in five major ophthalmic journals in the year 2002 using the Standards for Reporting of Diagnostic Accuracy (STARD) initiative parameters.

Methods: Manual searching was used to identify diagnostic studies published in 2002 in five leading ophthalmic journals, the American Journal of Ophthalmology (AJO), Archives of Ophthalmology (Archives), British Journal of Ophthalmology (BJO), Investigative Ophthalmology and Visual Science (IOVS), and Ophthalmology. The STARD checklist of 25 items and flow chart was used to evaluate the quality of each publication.

Results: A total of 16 publications were included (AJO = 5, Archives = 1, BJO = 2, IOVS = 2, and Ophthalmology = 6). More than half of the studies (n = 9) were related to glaucoma diagnosis. Other specialties included retina (n = 4) cornea (n = 2), and neuro-ophthalmology (n = 1). The most common description of diagnostic accuracy was sensitivity and specificity values, published in 13 articles. The number of fully reported items in evaluated studies ranged from eight to 19. Seven studies reported more than 50% of the STARD items.

Conclusions: The current standards of reporting of diagnostic accuracy tests are highly variable. The STARD initiative may be a useful tool for appraising the strengths and weaknesses of diagnostic accuracy studies.

  • AJO, American Journal of Ophthalmology
  • Archives, Archives of Ophthalmology
  • BJO, British Journal of Ophthalmology
  • IOVS, Investigative Ophthalmology and Visual Science
  • MRI, magnetic resonance imaging
  • ROC, receiver operating characteristic
  • STARD, Standards for Reporting of Diagnostic Accuracy
  • diagnostic accuracy studies
  • ophthalmic journals
  • quality of reporting
  • AJO, American Journal of Ophthalmology
  • Archives, Archives of Ophthalmology
  • BJO, British Journal of Ophthalmology
  • IOVS, Investigative Ophthalmology and Visual Science
  • MRI, magnetic resonance imaging
  • ROC, receiver operating characteristic
  • STARD, Standards for Reporting of Diagnostic Accuracy
  • diagnostic accuracy studies
  • ophthalmic journals
  • quality of reporting

Statistics from Altmetric.com

Current ophthalmological practice relies on diagnostic tests using sophisticated technologies that are constantly evolving. Diagnostic accuracy studies determine the performance of the test in diagnosing the target condition. Improperly conducted and incompletely reported studies are prone to bias that, in turn, may lead to overly optimistic appraisal of evaluated tests.1 The performance of a diagnostic test can be estimated in several ways, including sensitivity, specificity, receiver operating characteristic curves (ROC), positive and negative predictive values, likelihood ratios, and diagnostic odd ratios.2,3

To improve the quality of reporting of diagnostic accuracy studies the Standards for Reporting of Diagnostic Accuracy (STARD) initiative was published.4 During a consensus conference in the year 2000, the STARD project group developed a checklist of 25 items and a prototypical flow chart.2,5

The aim of this study was to examine the current standard of reporting of diagnostic accuracy studies using the STARD parameters. Current standards may provide a useful baseline to measure the impact of the introduction of the STARD statement in the future.

METHODS

The five leading ophthalmic journals (that is, according to the impact factor) with clinical research sections or articles were selected. Basic science research and subspecialty journals were excluded. The journals evaluated were AJO, Archives, BJO, IOVS, and Ophthalmology. Since search strategies for diagnostic accuracy tests are suboptimal6 a hand search of all issues of 2002 was done. In these journals all manuscripts related to a diagnostic procedure were identified. Manuscripts were selected for inclusion if the diagnostic test was used in human subjects, the test was intended for clinical use, and measures of diagnostic accuracy were provided. Review articles, case reports, and longitudinal studies were excluded. The full paper was assessed for inclusion by one author, if uncertain; the study was selected as potentially suitable for inclusion. The selected papers were then independently assessed for inclusion by two investigators; if there was a disagreement, a consensus was reached.

The STARD checklist (table 1) was used to score the studies. Each item could be considered to be fully, partially, or not reported. If the item was “not applicable” it was marked as such. For example, item 21 required reporting of estimates of diagnostic accuracy and measure of statistical uncertainty. If a study reported estimates of accuracy but no measure of precision it was considered partially fulfilled. Similarly, item 20 (reporting of adverse events associated with the test) was scored non-applicable for non-invasive studies (for example, visual field tests, fundus photography).

Table 1

 STARD checklist4,5

One investigator assessed all the included studies. To evaluate the interobserver variability in the rating of the STARD criteria, a second investigator examined four randomly selected publications, masked to the results of the first investigator.

RESULTS

Twenty manuscripts were identified as potentially suitable for inclusion. After review of the full paper, four reports were excluded as they did not meet the inclusion criteria. One longitudinal study evaluated the value of short wavelength automated perimetry to predict the development of glaucoma.7 Another study discussed the use of magnetic resonance imaging (MRI) to differentiate between optic neuritis and non-arteritic anterior ischaemic neuropathy.8 Another study evaluated longitudinally changes in the wavefront aberration of patients with keratoconus.9 The fourth excluded paper described videokeratography findings in children with vernal keratoconjunctivitis and compared them with those of healthy children, without attempting to use these differences as a diagnostic test.10 A total of 16 studies (table 2) were included in this review (AJO = 5, Archives = 1, BJO = 2, IOVS = 2, and Ophthalmology = 6).

Table 2

 Included studies

Glaucoma was the specialty with the highest number of studies (n = 9). Other specialties included retina (n = 4), cornea (n = 2), and neuro-ophthalmology (n = 1) (table 3). Interobserver rating agreement was observed in 92% of items. Among the 16 articles the range of fully reported positive STARD items was from eight to 19. Less than half the studies (n = 7) explicitly reported more than 50% of STARD items. Reporting of an individual STARD item ranged from 1/16 (item 24) to 16/16 (100%) (item 2 and item 25) (table 4). The commonest description of diagnostic accuracy was sensitivity and specificity values (n = 13), followed by area under the ROC curve (n = 4). The reporting of each of the items is described in table 4.

Table 3

 Details of included studies

Table 4

 Summary score of STARD items

DISCUSSION

In 1978 Ransohoff and Feinsten11 first reported a detailed analysis of diagnostic accuracy studies and identified the major sources of bias. Since then there have been numerous articles identifying a variety of biases as a potential source of inaccuracies in the indices of diagnostic accuracy.12–17 Reid et al12 evaluated diagnostic accuracy studies published in four prominent medical journals between 1978 and 1993. They evaluated the quality of 20 diagnostic test studies published during this period against seven methodological standards. Their study showed that quality of reporting was of moderate or low quality, and that the essential elements of data required to evaluate a study were missing in the majority of the reports. Although there had been some improvement over time, most of the diagnostic accuracy tests were inadequately reported.

Harper and Reeves evaluated the quality of reporting of ophthalmic diagnostic tests15 published in the early and mid-1990s. They showed a limited compliance with accepted methodological standards. The compliance in ophthalmic journals was no worse than other evaluations published in general medical journals, but only 25% of articles complied with more than 50% of methodological standards

In this current appraisal of recent ophthalmic publications using the STARD checklist, similar flaws were found. Less than 50% of articles (n = 7) reported more than half of STARD items. Information on key elements of design, conduct, analysis, and interpretation of diagnostic studies were frequently missed. To our knowledge, STARD has not been used to appraise the quality of reporting of diagnostic accuracy studies in other medical specialties.

The importance of describing the selection of the study population in appraising a diagnostic test cannot be overemphasised (item 3). For example, Harper et al showed how indices of diagnostic accuracy of tonometry for glaucoma greatly varied depending on the characteristics of the study population.16 Most publications reported this issue properly (n = 13).

Review bias, including test review bias (inflation of diagnostic accuracy indices by knowing the results of the gold standard while reviewing the index test), diagnostic review bias (knowledge of the outcome of the index test while reviewing gold standard), and clinical review bias (additional clinical information available to the reader, which would not normally be available when interpreting the index test results) can lead to inflation of the measures of diagnostic accuracy. Reader masking (item 11) was reported in less than half of the studies (n = 6).

Methods for calculating test reproducibility or citation of reproducibility studies (item 13) was among the least commonly reported items from the STARD checklist (n = 2). There may be a lack of understanding of effects of poor reproducibility on the final outcome of a diagnostic accuracy test.

Verification or workup bias (item 16) occurs when gold standard test is performed only on people who have already tested positive for the index test.3 It is important to describe how many patients satisfying inclusion criteria failed to undergo index or reference tests and the reason of failing to do so. A flow diagram is highly recommended to clearly explain this issue.2,4 This item was reported in four studies.

Since the technology for existing tests is rapidly improving, it is important to report the actual dates when the study was performed. This will allow the reader to consider any technological advancement since the study was done. This information was provided in less than half of articles (n = 6).

Spectrum bias results from differences in the severity of target condition and co-morbidity. Incomplete reporting of clinical spectrum (item 18) may result in inaccurate diagnostic accuracy estimates—for example, advanced disease status would lead to increased sensitivity of a diagnostic test. This item was fully reported in 10 studies.

Confidence intervals (CIs) were reported in only a quarter (n = 4) of studies. A recent review by Harper and Reeves17 revealed that CIs were reported in only 50% of diagnostic evaluation reports published in the BMJ during the 2 year period of 1996 and 1997. Since the absolute values of diagnostic accuracy are only estimates, when evaluations of diagnostic accuracy are reported the precision of the sensitivity and specificity or likelihood ratios should be reported. Reporting of confidence interval is essential to allow a physician to know the range within which the true values of the indices are likely to lie.17

Intermediate, indeterminate, and uninterpretable results may not always be included in final assessment of the diagnostic accuracy of a test.18 The frequency of these results, by itself, is an important pointer of the overall usefulness of the test.2 Approximately one third of studies (n = 5) reported this item (item 22).Diagnostic accuracy in subgroups was reported in only a quarter of studies (n = 4) (item 23).

The STARD group strongly recommends use of a flow diagram to clearly communicate the design of the study and provide the exact number of participants at each stage of the study.2 A flow diagram has been a valuable addition to the report of randomised clinical trials. It has been reported that flow diagrams are associated with improved quality of reporting of randomised controlled trials.19 A flow diagram was present in only one of the evaluated studies.

In a similar and previous effort to improve the quality of reporting of the literature, to prevent shortcomings and biases in randomised control trials, the CONSORT statement was introduced in 1995.20 Use of CONSORT has shown to improve the quality of reporting of randomised controlled trials (RCTs).21 Sanchez-Thorin et al22 compared RCTs published in Ophthalmology during 1999 with the ones published in 1991–4 before the adoption of the CONSORT statement, and found an improvement in the quality of reporting. Future research will be able to evaluate the impact of the STARD initiative on the accuracy and completeness of reporting of studies on diagnostic accuracy.

Acknowledgments

The Health Services Research Unit is funded by Chief Scientist Office of the Scottish Executive Health Department; the views expressed here are those of the authors.

REFERENCES

View Abstract

Footnotes

  • Competing interests: none declared.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Linked Articles

  • Editorial
    B C Reeves