Article Text


Measuring the effectiveness of cataract surgery: the reliability and validity of a visual function outcomes instrument


AIMS To assess test-retest reliability and validity of the “TyPE” patient self assessed visual function questionnaire, as part of a study in two hospitals measuring the effectiveness of cataract surgery. The American TyPE questionnaire had minor adaptations made for use in Britain.

METHODS Test-retest reliability was assessed on 63 out of 378 adult cataract surgery patients in the study, using Spearman correlation coefficients and kappa coefficients of agreement. “Construct” validity was evaluated by comparing the association between changes in visual function questionnaire scores after surgery, with patients’ perception of change in visual function obtained by independent interview of 24 patients.

RESULTS The TyPE questionnaire items showed very good test-retest reliability. Average Spearman and kappa coefficients for 39 patients from hospital 1 were 0.93 and 0.84 respectively. Spearman and kappa coefficients of 0.9 and 0.81 were obtained for those nine patients in hospital 2 where both the test and retest questionnaires were filled in by the same people. However, for the 15 patients from hospital 2, where the questionnaire was filled in by different people in the retest, reliability was less good: the Spearman coefficients were still high, average 0.72, but the kappa coefficients were poor, 0.27. Good construct validity was exhibited, with a correlation of 0.79 between change in distance vision score from the questionnaires and the independent interview.

CONCLUSIONS The adapted TyPE questionnaire is both very reliable and has good construct validity. The kappa coefficient should be used wherever possible to evaluate reliability. The test-retest reliability and validity and practicability of other visual function questionnaires have not been assessed adequately, and further development should be carried out of all such questionnaires, so that they may be introduced into routine clinical care.

  • reliability
  • validity
  • cataract outcomes
  • visual function
  • questionnaire

Statistics from

Though surgery for cataract is one of the most common elective operations in the NHS, there are still important unresolved issues about the procedure.

There are difficulties in estimating prevalence of cataract,1 2 and assessing population unmet need for surgery.1 There are still large variations between districts in surgery rates which have not been explained in terms of variations of need, and which indicate possible differences in severity thresholds for surgery.1 3-5 Achieving equitable and optimal surgical rates depends on establishing criteria for which individuals should be operated on.6-8

Though there is much evidence from clinical observation that cataract surgery improves vision, only visual acuity has been systematically used as a quantitative assessment and outcome measure in routine clinical care.9 However, there is much evidence that Snellen visual acuity measurement does not validly measure impairment in everyday visual function caused by cataract.6 7 10-12 For instance, it does not adequately measure the effects of glare in everyday situations.

Because cataract surgery produces changes in quality of life rather than reducing mortality, patients’ own assessments of visual function and health related quality of life are most appropriate for outcomes measurement.1 13 However, until now, there has rarely been systematic measurement of patients’ visual function in cataract patients.12 14 15

There is no definitive ophthalmological test to determine severity thresholds for operating.6 American and British ophthalmological authorities’ guidelines do not consider it useful to define specific levels of visual acuity as sole cut off points for surgery in preoperative assessment.6 7 According to these guidelines, surgery should be undertaken when the benefits to the patient’s quality of life outweigh the risks, and this depends on the extent to which a cataract is interfering with the patient’s visual function.6 7

Some patients benefit less from cataract surgery than others, and the patient characteristics and other factors which are associated with this are only beginning to be researched.16 Also, although second eye surgery is known to benefit some patients17 18still too little is known about what the relative visual function benefits are from second eye compared with first eye surgery.14 17


To begin to resolve the above issues, it is essential to measure outcomes by asking patients with cataract about their visual function, using reliable and valid questionnaires.19

Reliability is the extent to which a vision test item, such as a Snellen visual acuity measurement, or a patient assessed visual function questionnaire item, gives the same measurement or score when two or more measurements are made, while the patient’s true visual function has not changed.

Validity is the extent to which a measurement instrument or test really measures what it is supposed to. It is difficult to evaluate the validity of visual function measurement instruments because visual function is a complex concept and there is no obvious “gold standard” (technically called “criterion” validity) for measuring it. However, it is possible to estimate “construct” validity by measuring the degree of agreement between two or more different ways of measuring visual function.

While research to develop patient assessed visual function and quality of life measurement in cataracts has been carried out in the USA,12 13 15 these instruments are still not widely used there.13 In Britain there has only been a little research on the development of such instruments,5 17 19 and they are not yet being used in routine care.

Our study was designed to test the reliability and validity of an established patient assessed visual function questionnaire, and to use it to measure the long term effectiveness of cataract surgery. In this article we report on the test-retest reliability, and validity of the questionnaire. The results of using this questionnaire to measure visual function and quality of life for up to 2 years after surgery are being reported in a separate paper, and further analyses of subsets of the data are in progress.


During the period March 1993 to August 1994, 378 patients being assessed for cataract surgery in two hospitals were entered into the study, and completed post operation questionnaires. Clinical data were collected using a protocol developed by a group of ophthalmologists in Buckinghamshire.19

Patients’ visual function was measured using the American “TyPE” cataract visual function measurement questionnaire Version 1,20 adapted by us for use in Britain. This questionnaire was developed in America after a comprehensive literature review, and used expert opinion of ophthalmologists, and a consensus process involving patient care committees.20 There are questions on overall ability to see; how much vision hinders or limits usual daily activities, recognising people or objects across the street, reading, knitting, watching TV, and driving; and how much glare hinders day to day activities, reading, and driving under dazzling conditions. It uses five point scales of disability, from “not at all” to “totally disabled.” We used especially large and clear fonts (a reduced example is shown in the ).

Our adaptations of the TyPE Version 1 changed “store” to “shop,” removed “sewing” from the “reading” question, and created a separate question about “knitting/sewing.” We also added a question on watching television. (Version 3 of the American TyPE, issued after this study was completed, has the same changes to the “reading/sewing” questions which we have, and has also added a “watching TV” question.)

Following Mangioni et al’s methods,15 we empirically constructed four visual ability dimensions by combining related visual function questions. The distance vision function dimension combined the items “recognising people” and “watching TV”; the near vision dimension combined “reading price labels” and “reading magazines”; the glare vision dimension combined “usual activities glare,” glare when walking,” and “glare when reading shiny paper”; and the driving vision dimension combined “day driving,” “night driving,” and “glare when driving.”


In our cataract outcomes study, very nearly all patients in hospital 1 filled in the questionnaire twice preoperatively. The first time was at outpatients, as part of the routine preoperation assessment, and the second time was in hospital just before surgery. Test-retest reliability was assessed using as a sample the 39 patient visual function questionnaires where the interval between filling in the first and second preoperation questionnaires was less than 30 days. This 30 day limit between test and retest was chosen so that visual ability would not have changed significantly in the interval. Patients in hospital 2 filled in the questionnaire only once preoperatively. Therefore, for the reliability test-retest in this hospital, a random sample of 24 patients was asked to fill in the questionnaire again before operation, at home 4 days after filling in the first.

In both hospitals, the questionnaires were filled in either by the patients themselves, or with help from a friend or a health professional. We recorded which occurred. The data from both hospitals were analysed in two groups, according to whether the respondent (whether the patient responded alone or with help) remained the same from test to retest, or whether the respondent was different in the retest compared with the test.

Virtually all the test and retest questionnaires at hospital 1 were filled in at the hospital outpatient clinic. At hospital 2, all the test questionnaires were filled in at the outpatient clinic, whereas all the retest questionnaires were filled in at home.

The data were entered into a database (Microsoft Access), and statistical analyses were carried out using the Statistical Package for the Social Sciences (spss), Windows version 6.1. Spearman non-parametric correlation and Cohen’s kappa coefficient of agreement for categorical data21 were used to measure the degree of agreement between test and retest questionnaire scores for all the visual ability variables.


Construct validity of the questionnaire for measuring change in visual function as a result of surgery was evaluated by comparing the change as measured by the visual function questionnaire with patients’ perception of their change in visual function obtained by independent interview.

A random sample of 24 patients entering the study during the first 3 months in hospital 1 were interviewed by DP using “open ended” questions. Each was asked how much their visual function had changed as a result of surgery. The narrative responses for each interviewed patient were then assessed as falling into one of four categories of change in visual function—“worse,” no change,” “some what better,” and “much better,” by DL, without knowing what the changes in the visual function questionnaire scores were (that is, the assessment was “masked”). The extent of agreement between this categorised variable and the change in visual function scores from the questionnaires was tested using Spearman non-parametric correlation (kappa coefficients could not be calculated). The data were entered into Microsoft Access, and statistical analyses carried out usingspss.



The visual function questionnaire items overall showed very good test-retest reliability. For hospital 1, visual function test-retest Spearman coefficients averaged 0.93, with all above 0.85 and all but two above 0.90. Their kappa coefficients averaged 0.84, and only two items—“vision hinders usual activities,” 0.77, and “glare hinders reading shiny paper,” 0.78—had coefficients below 0.80 (Table 1).

Table 1

Retest reliability of visual function questionnaire items, hospital 1

For hospital 2, where the retest questionnaire responses were filled in by the same person (or people) who filled in the test questionnaire (group 1 patients), the results were of similar reliability to those from hospital 1. Visual function test-retest Spearman coefficients averaged 0.90, with all above 0.70 and all but two above 0.80 (Table2). Their kappa coefficients could only be calculated for three items, but for these they averaged 0.81, of similar order to those from hospital 1 (Table 2).

Table 2

Retest reliability of visual function questionnaire items, hospital 2

By contrast, where the retest questionnaires were filled in at home by a different person or people from the test questionnaire filled in at the hospital (group 2 patients), the reliability results were poorer. The average Spearman correlation coefficient was still quite high, 0.72, but the kappa coefficients, where they could be calculated, were low, average 0.27 (Table 2).

To calculate the kappa coefficient, the same categories have to be present in both test and retest. Because of small numbers, in hospital 2 for groups 1 and 2 separately, there were many variables in which some categories of response did not occur and the kappa coefficient could not be calculated.


Patients’ perception from the interviews of how their visual function had changed overall, correlated very well, 0.79, with the changes in the “distance” visual function dimension scores calculated from the questionnaires, and quite well, 0.62, with a composite “overall vision” score calculated from the original questionnaire data items (Table 3).

Table 3

Correlations of patients’ perception of overall change in visual function (from the interviews), with changes in visual function dimension scores (from the questionnaires)

The correlations with “near,” 0.48, and “glare,” 0.43, visual function dimension scores, were only moderate. This might be expected if overall vision perception is more connected with distance vision than near vision or glare disability. There were insufficient car drivers in the interview sample to consider the “driving visual function” dimension.



These results show that the visual function questionnaire we used has very good test-retest reliability, with very high Spearman and kappa coefficients. The kappa coefficient of agreement is a much better measure of test-retest reliability than the Spearman correlation coefficient, because the kappa coefficient allows for agreement which would be expected by chance between the patients’ test and retest scores.21 Perfect agreement is shown by kappa = 1, whereas if there is no agreement, other than that expected by chance, kappa = 0. Values of 0.75 and above are considered excellent agreement, with those between 0.4 and 0.74 moderate agreement.22

The results for hospital 2 (Table 2) illustrate this. The Spearman coefficients are high for both groups 1 and 2, 0.81 and 0.72 respectively, but the average value of the kappa coefficients for group 2, 0.27, is much lower than for group 1, 0.81.

The test-retest reliability results reported here are as good as those reported from other work on patient assessed visual function questionnaires by Sanderson5 and Mangioneet al.15 However, those two studies only reported Spearman correlation coefficients, and we have shown that these can give too optimistic results compared with use of the kappa coefficient. This is the first time that the kappa coefficient has been used to test reliability in these types of outcome instruments. The VF-14, another important visual function questionnaire, has been reported as being reliable by American13 and English17 studies, but no test-retest reliability results have so far been published by them. The American study . . . “elected not to assess the test-retest reliability of the VF-14 before surgery, but . . .to administer it at 4 and 12 months after surgery.”23 Because changes in vision are possible during this post operation period, and one purpose of these questionnaires is to detect such changes, we suggest that only preoperation test-retest reliability, with a short interval between tests, is valid for reliability testing. The English study did not assess test-retest reliability.

As far as we know this is the first study to use self administered questionnaires; other studies having used telephone or interviewer responses.12 15 17 In this study, patients who could not fill the questionnaire unaided were helped by an accompanying person or health professional, and we monitored this. We have shown that the conditions under which questionnaires are administered can affect reliability for self administered questionnaires. The very good results for group 1 patients from hospital 2 indicate that provided questionnaires are filled in by the same person or people, reliability is very good even when the milieu changes.


The results provide evidence of good construct validity of the questionnaire items as a measure of patients’ perception of change in visual function as a result of cataract surgery (Spearman correlation of 0.79 with distance vision function, 0.62 with a composite measure). These results also show that the questionnaire items are sensitive to change in visual function. That is, where patients report in interview that there is change in visual function due to operation, this is indicated by changes in questionnaire item scores.

Other studies have reported similar results for construct validity, though they have called it criterion validity (see below). These studies correlated questionnaire scores with other measures of patient assessed visual function, and not with change in scores before and after operation as we have done, but obtained similar levels of Spearman correlations (around 0.6).

No reported study, including this one, has evaluated the criterion validity of vision tests in measuring visual impairment from cataracts, using direct comparisons with patients’ real ability to read, recognise people, or perform other everyday tasks which are vision dependent. Some studies have reported measuring criterion validity13 15 17 but did not in fact do so. These studies variously tested the visual function questionnaires against visual acuity,13 17 global self rating,13 15 patient satisfaction with vision,13 percentage binocular vision loss,15and vision related quality of life.17 None of these is a direct measure of specific visual function dimensions.

In conclusion, this study provides important evidence that patient assessed visual function questionnaires have good test-retest reliability and construct validity for preoperative assessment and for measuring the impact of cataract surgery. Their potential for use in routine ophthalmic practice should be explored. Self assessed instruments are potentially more cost effective than questionnaires which require an interviewer.

Further research is required on the relative test-retest reliability of the various visual function questionnaires now available against each other and against visual acuity, using the kappa coefficient as a measure of agreement.

The criterion validity of visual acuity and visual function questionnaires needs to be established by testing them against patients’ real ability to carry out visual function tasks at near and distance vision ranges. Further research is also necessary to examine how these visual function questionnaires compare with questionnaires which directly ask about changes in visual function rather than just ask about visual function status at different times.


We would like to thank all the patients and hospital staff who took part in the study, our research assistant Penny Drewett, the Bucks Association for the Blind volunteers who helped in outpatients, Angela Coulter, and Crispin Jenkinson for guidance, and Helen Doll for statistical advice. We would also like to thank Dr Larry Chambers, Dr JA Muir Gray, and Dr Alison Hill for encouragement and advice. Buckinghamshire Health Authority gave much support, and the former Oxford Regional Health Authority provided funding.



Please answer these questions based on your best vision with both eyes open and wearing glasses or contact lenses if you usually do.


View Abstract

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.