Aim: To assess the sample sizes used in studies on diagnostic accuracy in ophthalmology.
Design and sources: A survey literature published in 2005.
Methods: The frequency of reporting calculations of sample sizes and the samples’ sizes were extracted from the published literature. A manual search of five leading clinical journals in ophthalmology with the highest impact (Investigative Ophthalmology and Visual Science, Ophthalmology, Archives of Ophthalmology, American Journal of Ophthalmology and British Journal of Ophthalmology) was conducted by two independent investigators.
Results: A total of 1698 articles were identified, of which 40 studies were on diagnostic accuracy. One study reported that sample size was calculated before initiating the study. Another study reported consideration of sample size without calculation. The mean (SD) sample size of all diagnostic studies was 172.6 (218.9). The median prevalence of the target condition was 50.5%.
Conclusion: Only a few studies consider sample size in their methods. Inadequate sample sizes in diagnostic accuracy studies may result in misleading estimates of test accuracy. An improvement over the current standards on the design and reporting of diagnostic studies is warranted.
- CONSORT, Consolidated Standards of Reporting Trials
Statistics from Altmetric.com
Diagnostic tests help the clinicians to make a diagnosis and to evaluate the severity of a disease. When using the information gained from diagnostic tests in clinical practice, their performance must be known. Therefore, the design and the reporting of studies on diagnostic accuracy should comply with methodological standards. Calculation of sample size plays an important part in the design of a diagnostic accuracy study, as it determines how precise the estimates for test accuracy should be for a particular diagnostic situation. This becomes even more important when a subgroup analysis is planned as the sample size in different subgroups has to be considered as well. If too few patients with and without the target condition have been evaluated, the indexes of accuracy (sensitivity and specificity) may be unstable. The quantitative instability can be appraised—for example, from CIs, which progressively narrow as sample size increases. In addition, investigators performing meta-analysis would greatly benefit from the reporting of relevant data such as sample size.
A recent publication assessing the studies on test accuracy published in 2002 in leading medical journals showed that only 4.7% of the studies reported a calculation of sample size.1 It is possible that the literature on diagnostic tests in ophthalmology has similar limitations, and that the calculation of the sample size is not routinely reported. In this study, we investigated how often calculation of sample sizes was reported in leading ophthalmology journals.
Two reviewers independently and manually screened all issues of the five leading clinical journals in ophthalmology (Investigative Ophthalmology and Visual Science, Ophthalmology, Archives of Ophthalmology, American Journal of Ophthalmology and British Journal of Ophthalmology) published in 2005. Leading journals were defined according to their current impact index, excluding subspecialty journals. Diagnostic accuracy studies were identified (fig 1). Disagreement between reviewers was settled by consensus. From each report, data were extracted about the condition, type of test, number of participants, the prevalence and whether a prior calculation of sample size was described.
Of the 1698 articles published in 2005, 43 were studies on diagnostic accuracy. Three articles focused on a screening test, and were excluded. Figure 1 shows a flow chart highlighting the process and the results of this literature survey. Table 1 shows the data extracted from each report.
The median sample size reported was 122.5, whereas the mean (SD) was 172.6 (218.9). The median prevalence of the target condition was 50.5%.
One study (2.5%) reported a prior calculation of sample size for a planned sensitivity and specificity of 80% with a 95% CI. In another study, consideration of sample size was stated on the basis of the estimates of the prevalence of the visual impairment. However, the sample size itself was not calculated.
Of the 40 articles appraised, 29 (72.5%) evaluated the diagnostic performance of imaging technology. Four articles reported results of clinical examination, seven about functional tests, two about results of laboratory tests and one study about the diagnostic performance of patient’s history. About half of the studies (52.5%) on diagnostic accuracy evaluated imaging technologies in patients with glaucoma.
In studies on diagnostic accuracy, the performance of a test to identify a target condition is determined. A misleading estimation of the test performance may result in unwanted consequences in clinical practice as it is difficult to assess how accurate a test might be. In diagnostic studies, the sample size plays a central role as it directly influences the width of the CIs. In studies with small sample sizes, the estimation of sensitivity and specificity may be imprecise as the CIs can be wide. If, for example, a new test correctly detects disease in 1140 of 1770 patients, the sensitivity would be 64.5% with a narrow CI of 0.632 to 0.667. If the same test is used to assess 177 patients, with the same sensitivity of 64.5% 114 patients would be diagnosed correctly, but the CI would be much wider (0.565 to 0.719). When subgroups are analysed separately, this effect may become even more important.2 In the planning stage of a study, investigators can influence this issue by calculating the sample size needed to obtain narrow CIs as it is a common practise in randomised trials.3,4 In addition, reserves for patients who may drop out of the study can be considered at this stage. Tables have been recently published to ease the calculation of sample sizes in diagnostic test studies.5 Using these tables to determine the number of cases that are necessary to assess a new test, the examiner must only specify the expected sensitivity of the test and the maximal distance of the lower confidence limit from this sensitivity. With this information, the number of necessary cases can be easily extracted from the tables.
Sample size calculation is only one of the several aspects that are relevant in planning a study on a diagnostic test. Others include an independent, masked comparison with a reference standard, appropriate spectrum of included patients to whom the test will be applied, and absence of influence of the test results on the decision to perform the reference standard. Recently, tools have been designed to improve the standards and reporting of studies on diagnostic accuracy. The Standards for Reporting of Diagnostic Accuracy checklist gives a framework to improve the accuracy of reporting of studies on diagnostic accuracy.6 QUADAS (Quality Assessment of Diagnostic Accuracy Studies) has been recently designed to adequately assess the methodological quality of studies included in systematic reviews of diagnostic accuracy studies and consists of several items, one of which evaluates possible source of bias when patients withdraw from a study.7 If a sample size has not been calculated or reported, it is unlikely that considerations about the dropout rate and its influence on the power of a study have been done. In this literature review of all diagnostic performance studies in ophthalmology published in 2005 in five leading journals, sample size calculation of a variety of tests was only available in one publication.
In an attempt to improve the quality of reporting of randomised controlled trials, the Consolidated Standards of Reporting Trials (CONSORT) has been introduced.8 Sample size calculation is one of these key methodological items. An evaluation of all new randomised controlled trials published during 1999 in the journal Ophthalmology before and after the adoption of the CONSORT statement by the editors found an overall improvement in the quality of publications compared with the published of early 1990s, from 20% of studies reporting sample size calculation before the publication and use of CONSORT to 35% afterwards.9 An evaluation of the quality of controlled clinical trials in glaucoma found a pre-estimation of sample size in only 15% of them (34 of 226).10
The beneficial effect of CONSORT was also observed in general medical journals,11,12 but further improvement would still be desirable. In a recent study assessing the quality of reports of randomised trials, only one quarter of 162 trials (35, 21.6%) did not describe a sample size calculation.13 In an investigation comparing the quality of clinical trials among journals that endorse CONSORT, 85% of medical journals reported sample size calculation compared with 55% of specialist journals. Other key aspects of clinical trials such as methods of randomisation, allocation concealment and implementation, masking status, and use of intention-to-treat analysis were better reported in general medical journals than in specialist ones.14
It would seem that diagnostic test studies have poorer quality than trials evaluating interventions. In this survey of studies on diagnostic accuracy in ophthalmology, the number of studies reporting calculation of sample size was minimal. Only in two (5%) studies sample size was taken into consideration when the study was planned, and only 1 (2.5%) study reported an exact calculation of sample size considering the targeted sensitivity and specificity as well as a possible dropout rate. The consideration for sample size calculation seems to be worse in our survey than in current medical journals.1 Other important methodological aspects of diagnostic performance studies are often missing.15–19 Harper and Reeves19 highlighted that only two of 16 articles reported complete precision for the estimates of diagnostic accuracy and CIs.
An improvement in the reporting of methodological aspects of studies would facilitate systematic reviews and meta-analysis of the increasing number of publications on diagnostic tests. The assessment of data from studies on diagnostic tests to quantify bias and other sample size-related effects relies on the information about the methods. Only if the study design is entirely transparent and includes power calculation for testing hypotheses, it is possible to investigate whether there is any sample size-related effect, which is especially important to know for reviews on test accuracy.20 A possible contributing factor for under-reporting methodological characteristics can be limitation of space in journals or undervaluing the importance of the Methods section by reviewers and editors.
In conclusion, sample size calculations should be a part of the methods and published report of diagnostic performance studies. Currently, they are not being reported in the ophthalmology literature.
Competing interests: None.
Published Online First 14 February 2007
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.