Introduction

The Bailey–Lovie logMAR visual acuity chart,1 along with the version produced by Ferris et al2 (known as the ‘ETDRS’ chart), have become the gold standard tool for the measurement of visual acuity in prospective clinical research.3,4 Nevertheless, even in the absence of any clinical change, consecutive visual acuity measurements on a given subject using such charts are subject to a degree of variability. This variability can be thought of as a form of measurement noise, and will be referred to throughout as test–retest variability (TRV). The ability to detect true visual change decreases as the TRV of the acuity data increases. Increased TRV may therefore compromise the management of ophthalmic patients over time. The published data suggest that changes in acuity of around two lines of letters may be required to achieve significance at the 5% level.5,6,7,8 Increased TRV also has implications for the design of any clinical trial which uses acuity as a primary outcome measure.9 For such a trial, any increase in the TRV of acuity will necessitate an increased sample size to demonstrate a given clinical change with a given degree of statistical power.

The repetition and averaging of measurements to reduce measurement noise is a commonly used experimental technique, which can be readily applied to visual acuity measurement via the use of a personal computer. Use of a computer allows the presentation of stimuli in a random order (to avoid memorisation effects) as well as automated processing and statistical analysis of subject responses.

Our aim was to assess whether a computerised acuity test can produce lower levels of TRV (and hence allow earlier detection of change) than the Gold Standard ETDRS logMAR chart through repeating and averaging a series of acuity threshold measurements. This computerised form of test will be referred to throughout as a ‘PC-test’. TRV was defined after Bland and Altman10 as the 95% confidence limits of agreement (±1.96 SDs of the distribution of differences between paired acuity measurements).

Materials and methods

Overview

A series of clinically stable subjects had their acuity measured using each of three acuity tests:

  1. 1

    The ETDRS logMAR chart (reference standard),

  2. 2

    PC10-test (computerised test using 10 averaged measurements), and

  3. 3

    PC5-test (computerised test using five averaged measurements).

The measurements were repeated not less than 2 weeks later such that the TRV of each test could be assessed, and any systematic bias evaluated.

Subjects

Subjects were recruited from the outpatient clinics of Moorfields Eye Hospital who fulfilled the following inclusion criteria:

  1. a)

    Able to understand and comply with the testing protocol.

  2. b)

    Stable visual acuity determined via assessment of current clinical diagnosis.

One eye of each subject was assessed. Where both eyes met the criteria, the eye with the poorer acuity was used as the study eye. All subjects wore their habitual spectacle correction and viewed the acuity tests from a distance of 4 m.

Equipment

ETDRS logMAR chart (Lighthouse International)

Display: The printed panel charts were back-lit in the standard Lighthouse box achieving a luminance of 111 cd/m2.

Acuity stimuli: The ETDRS chart has five letters per row ranging in size from +1.0 to –0.30 logMAR in 0.1 logMAR steps (see Figure 1). The chart has been described in detail by Ferris et al.2 Versions 1 and 2 were used.

Figure 1
figure 1

Arrangement of letters on PC-test and ETDRS chart. (a) PC-test captured display. (b) ETDRS chart.

Testing paradigm: Subjects were required to attempt each letter on the chart until they responded to all the letters on a single row incorrectly,11 at which point the test was terminated. In accordance with usual practice, the ETDRS test consisted of a single reading of the chart.

PC-tests (PC10-test and PC5-test)

Display: Stimuli were displayed using a 20-in CRT monitor (Ergovision 2040, Taxan Europe Ltd) using a resolution of 1024 × 768 and a noninterlaced refresh rate of 85 Hz. The monitor was driven by a standard IBM compatible 150 MHz PC. After a 30 min warm-up period, luminance was measured at 31 cd/m2.

Acuity stimuli: Three horizontal rows of three letters were displayed, the centre row of the three containing the three test stimuli. The upper and lower rows of letters were included to simulate the contour interaction provided by the surrounding letters on the ETDRS chart (see Figure 1). The display format followed that of the ETDRS convention in terms of interletter spacing, interline spacing, use of the Sloan set of letters, and a 0.1 logMAR unit increment between lines. The letters were generated at random, the only restriction being no repeats on a given line. All letters were generated to ETDRS specifications.2

Testing paradigm: Each computerised test contains a series of individual acuity measurements. As used in this study, the test commences at a stimulus size of +0.8 logMAR (although poorer acuities could be measured by reducing the viewing distance). The examiner enters the subject's responses to the three stimuli at this size level. Providing at least one of the three responses is correct, the stimulus size decreases progressively by 0.1 logMAR. This represents a ‘scrolling’ of the chart down to the next line such that the dimensions of stimuli, surrounding optotypes and the spaces between them all decrease by 0.1 log units. Once the subject has made a full line of incorrect responses, the computer calculates and stores the acuity score before initiating the next in the series of measurements by increasing the stimulus size by 0.5 logMAR and repeating the process. Once the specified number of measurements within the series (in this case either 10 or 5) has been completed, the computer averages the individual measurement values to produce the final PC-test result. The 0.5 logMAR increase in stimulus size between consecutive measurements was incorporated to save time by not requiring subjects to read letters that are so far above threshold, that there is a negligible chance of them being misnamed. This magnitude of 0.5 logMAR was chosen based on unpublished data from this group.

Scoring

All acuity scores were calculated using the interpolated method described by Ferris et al2 such that credit is given for each letter correctly named. This method is known to produce less TRV than the line-assignment method commonly encountered in clinical practice.12,13 For the ETDRS test, the score is derived from the total number of correctly named letters, whereas producing the final score for the PC-tests requires the additional step of averaging the series of individual measurements that make up each PC-test result.

Investigations

Four acuity measurements were taken on one eye of each subject using ETDRS charts 1 and 2, as well as the PC10-test and PC5-test in random order. Following an interval of not less than 2 weeks, the subjects attended for a second visit during which they underwent repeat testing using the same four tests, again in random order. Table 1

Table 1 Summary of acuity tests

Outcome measures

The objectives were:

  1. a)

    To determine in terms of 95% confidence limits, the TRV of acuity data for the ETDRS chart, the PC10-test, and the PC5-test.

  2. b)

    To determine in terms of mean difference and 95% confidence intervals for the mean, the extent to which measurements taken using the PC10-test and the PC5-test agreed with those of the Gold Standard ETDRS chart.

The methods of Bland and Altman10 were used to determine (a) and (b) above.

Results

A total of 19 subjects were recruited. The range of acuities was from +0.64 to −0.20 (median +0.12) logMAR (as measured with the ETDRS chart). Table 2 shows TRV in terms of 95% confidence limits (±1.96 SDs of the differences between paired measurements). TRV as measured for the PC10-test and PC5-test was ±0.11 and ±0.10 logMAR, respectively (see Table 2). This represents reductions of 39 and 44% respectively on the level of measurement variability compared with the value of ±0.18 logMAR achieved by the ETDRS chart.

Table 2 Test–retest variability (TRV)

Table 3 shows the mean of the differences between paired measurements with the 95% confidence intervals. For both comparisons (PC10-test vs ETDRS, and PC5-test vs ETDRS) the mean difference is greater than zero; however, in both cases the 95% confidence interval includes zero. Hence, there is no indication of notable systematic bias. The similarity between the mean differences suggests that increasing the number of repeats from five to 10 does not affect the absolute acuity value.

Table 3 Agreement with ETDRS

Figure 2a and b show Bland–Altman plots for the ETDRS and PC10-test and PC5-test data, respectively. These show the difference between paired measurements plotted against their mean. The greater spread of points in Figure 2a as compared with Figure 2b indicates the greater variability between paired measurements on the same individual subject for ETDRS as compared with the PC-tests.

Figure 2
figure 2

Bland–Altman plot for (a) ETDRS chart and (b) PC-test.

Discussion

A previous study conducted by this group demonstrated levels of TRV for the ETDRS chart of ±0. 18 logMAR.8 Based on this finding, a subject's visual acuity measurement is required to deteriorate (or improve) by at least 0.20 log units (equivalent to a 58% increase in letter size) before the change exceeds the test's measurement error and can therefore be deemed ‘real’. In this method-comparison study, the average of a series of five acuity measurements as measured using the PC5- test produced a TRV of ±0.10 logMAR. TRV for the ETDRS chart was again measured at ±0.18 logMAR. Hence, repeating and averaging appears to produce a considerable reduction in TRV with a commensurate improvement in the ability of the test to detect change. However, differences exist between the displays of the two tests (eg with respect to luminance), which may confound the effects of repeating and averaging. Accordingly, additional evidence for the main determinants of TRV in this study was gained by recalculating TRV for the PC5 and PC10-tests using data from only the first repeat of each test. This results in TRV increasing by more than twofold to ±0.28 logMAR and ±0.25 logMAR for the PC5 and PC10-tests, respectively. While not ruling out any influence of display type upon TRV, this increase suggests that, for small numbers of repeats, the level of TRV is strongly influenced by the number of repeats.

It should be noted that there is considerable variation in the published levels of TRV for the ETDRS chart. The level of TRV for ETDRS acuities in this study (±0.18 logMAR) is consistent with a number of published research papers.5,6,7,8 However, some workers including Elliott and Sheridan14 and Arditi and Cagenello15 have achieved levels of TRV using the ETDRS chart, which are similar to those achieved using the PC5-test in this study.

Reeves et al16 suggested ‘ceiling effects’ as a potential reason for the discrepancy between the level TRV measured in his study (±0.19 logMAR), and that published by Elliott and Sheridan14 (±0.07 logMAR). He proposed that repeated acuity measurements on subjects with very good vision may produce an artificially low level of TRV because the measurements are truncated by the end of the scale. Although feasible, this effect appears not to explain the difference between Reeves' and Elliott's data as Elliott's own results for subjects with cataract produced TRV levels of ±0.09 logMAR, despite these subjects having acuities that did not approach the bottom of the scale (maximum acuity +0.40 logMAR). Reeves et al16 also suggested uncorrected refractive error as a potential confounder for TRV, while admitting that this alone could not explain the difference in TRV between his study and that of Elliott and Sheridan. Indeed, the more recent data of Siderov and Tiu5 suggest that the correcting refractive error has no effect on TRV. Having raised ocular pathology as another potential confounder, Reeves proceeded to discount this based on his own study,16 in which he found no difference in TRV between subgroups of normal and abnormal eyes. This finding also appears to be supported by Elliott's data in which TRV levels for normals and subjects with cataract were within ±0.02 logMAR (one ETDRS letter) of one another. A final potential confounder suggested by Reeves et al was that of the time interval between test and retest. This also appears to be discounted by the available data, as this study, along with those of Elliott, Arditi, Reeves, and Siderov, shows varying levels of TRV despite employing similar time intervals between test and retest.

In spite of the fact that the discrepancies between published levels of TRV are difficult to account for, the fact that the levels of TRV produced by the PC5-test and the ETDRS test in this study (±0.10 and ±0.18 logMAR, respectively) were measured on a single group of subjects under identical conditions remains compelling evidence for the effect of repeating and averaging on TRV.

The benefit of the approach of repeating and averaging appears to be limited to not more than five repeated measurements as we can see from the TRV figure of ±0.11 logMAR produced by the PC10-test. This test differs from the PC5-test only in the number of individual measurements, which are taken and averaged to produce the final acuity result, and yet the PC10-test produced a slightly higher (equivalent to half an ETDRS letter) level of TRV than the PC5-test. If we consider the series of measurements within each PC-test as a sample from an infinite distribution of repeated measurements, we would expect diminishing returns with respect to improved (lower) TRV as the sample size increases. This is because the variability of the mean of the sample is inversely proportional to the square root of the sample size. The absence of any improvement in TRV with a larger number of averaged measurements might suggest that beyond five repeated and averaged measurements, a factor other than short-term measurement noise is the limiting factor preventing further reduction in TRV. An example of such a factor might be a type of long-term fluctuation in visual acuity analogous to that seen in perimetric defects in glaucoma. This suggestion is, however, speculative and would require further study. In fact a small increase in TRV was observed with the PC10-test as compared with the PC5-test. The fact that this is contrary to the expected finding and may be explained by increased fatigue associated with a longer test duration.

As is common in method-comparison studies, the prototype PC-tests were designed to produce acuities which agree well with those of the Gold Standard ETDRS test. The results suggest that there is no systematic bias between acuities measured with either PC-test and the Gold Standard ETDRS test, when used in this population. It is interesting that none of the departures from the ETDRS chart design have introduced measurement bias. Such departures include:

  1. 1

    The use of three letters per line rather than five (because of finite display screen size),

  2. 2

    the use of letters chosen at random (although without adjacent repeats) rather than being grouped such that the total difficulty of all lines of letters was as similar as possible, and

  3. 3

    lower background illumination.

Factors 1 and 2 might be expected to compromise the level of TRV; factor 1 because of the increased scale increment resulting from a smaller number of letters per line, and factor 2 because variation in average letter difficulty between lines is a potential confounder. It appears that any such factors tending to increase TRV were more than compensated for by the reduction in TRV produced by repeating and averaging. Factor 3, along with factor 1, might be expected in introduce systematic bias; factor 1 because the letters on a three-letter-per-line chart are less crowded, and factor 3 because visual acuity is known to be proportional to chart luminance. As no significant bias was encountered, either these factors were insufficient to result in bias, or that they produced opposing biases of approximately equal magnitude. A factor that might produce acuity-dependent bias between the PC-tests and the ETDRS chart is the one related to the pixelated nature of the PC-test display. Pixelisation is likely, if present, to have a more detrimental effect upon good acuities (for which stimulus sizes are small) than poor acuities, where the pixel size is small relative to letter size. Examination of the data showed no variation in the level of agreement between the PC5 and ETDRS tests with the underlying acuity, suggesting that pixelisation did not influence legibility of the smaller letters.

One benefit of reduced TRV not yet discussed relates to the ability to detect differences between groups in clinical research. The effect of reduced TRV upon a clinical trial, which uses visual acuity as a main outcome measure, is best illustrated by an example. Using the above results, a study using ETDRS acuity data to compare the outcome of two treatments would need to recruit a total of 136 subjects to show a difference in mean acuity outcome between the groups of 0.10 logMAR, at the 5% significance level and with a power of 90%. Using the PC5-test, the required number of subjects could be reduced to 42.

In summary, this study suggests that repeating and averaging acuity measurements using a computerised visual acuity test may produce lower levels of TRV than the ETDRS logMAR chart. Reduced TRV allows earlier detection of true visual change in individuals, and for clinical trials using visual acuity as a primary outcome measure enables differences between groups to be demonstrated with a smaller number of subjects. A larger study is required to confirm this finding as well as to further investigate whether the benefits of repeating and averaging are dependent upon the underlying level of acuity.