Article Text

Download PDFPDF

Development of a disease specific quality of life questionnaire for patients with Graves’ ophthalmopathy: the GO-QOL
  1. Caroline B Terweea,
  2. Martin N Gerdingb,
  3. Friedo W Dekkera,
  4. Mark F Prummelb,
  5. Wilmar M Wiersingab
  1. aDepartment of Clinical Epidemiology and Biostatistics, Academic Medical Center, University of Amsterdam, Netherlands, bDepartment of Endocrinology, Academic Medical Center, University of Amsterdam, Netherlands
  1. C B Terwee, Department of Clinical Epidemiology and Biostatistics J2-218, Academic Medical Center, University of Amsterdam, PO Box 22700, 1100 DE Amsterdam, Netherlands.


AIM To develop a reliable and valid disease specific quality of life questionnaire (the GO-QOL) for patients with Graves’ ophthalmopathy (GO), that can be used to describe the health related quality of life and changes in health related quality of life over time as a consequence of disease and treatment.

METHODS 70 consecutive GO patients (age >18 years) who were referred for the first time to the combined outpatient clinic of the orbital centre and the department of endocrinology completed the 16 questions of the GO-QOL. Additional information on general quality of life and disease characteristics was obtained. Construct validity and internal consistency of the disease specific questionnaire was determined, based on principal component analysis, Cronbach alphas and correlations with MOS-24, three subscales of the SIP, demographic, and clinical measures.

RESULTS The a priori expected subdivision of the questionnaire in two subscales, one measuring the consequences of double vision and decreased visual acuity on visual functioning, and one measuring the psychosocial consequences of a changed appearance, was confirmed in the principal component analysis. Both scales had a good reliability and high face validity. Correlations with other measures supported construct validity. Mean scores (range 0–100) were 54.7 (SD 22.8) for visual functioning and 60.1 (24.8) for appearance (higher score = better health).

CONCLUSION The GO-QOL is a promising tool to measure disease specific aspects of quality of life in patients with GO and provides additional information to traditional physiological or biological measures of health status.

  • Graves’ ophthalmopathy
  • health related quality of life
  • questionnaire

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Graves’ ophthalmopathy (GO), associated with Graves’ thyroid disease (GTD), is an incapacitating eye disease, causing disfiguring proptosis, pain, redness, and swelling of the eyelids, grittiness of the eyes, diplopia, and sometimes even blindness.1 2Several studies have shown that visual problems in general can have a major impact on daily functioning and wellbeing.3-6Furthermore, the psychological burden of the progressive disfigurement resulting from GO is well recognised.7 8 Bartleyet al9 report that after treatment 61% of the patients believed that the appearance of their eyes had not returned to baseline status, 51% thought their eyes continued to be abnormal in appearance, and 37% were dissatisfied with the appearance of their eyes. Overall, the effects of GO on physical and psychological functioning have a significant impact on a patient’s health related quality of life.10

The outcomes of GO disease and treatment are mostly assessed with biological and physiological measures—for example, combined in the NO SPECS classification.11 While these measures provide important information to clinicians, they often correlate poorly with functional capacity and perceived health as experienced by the patient.12 13 For example, Prummel et al 14 found a response rate of 50% and 46% respectively to prednisone and radiotherapy measured by the NO SPECS classification, but the benefit of both treatments on the subjective judgment of the eye condition by the patient (expressed in the subjective eye score15) was only modest. In a recent study we found low correlations between scores on a general health related quality of life questionnaire and measures of severity and activity of disease.10

These different outcomes can be regarded as different concepts on a causal pathway from biological and physiological measures to perceived symptoms, then to the functional consequences of these symptoms, and finally to more complex elements such as general health perceptions and overall quality of life.12 Following on the WHO definition of health, health related quality of life (HRQL) can be defined as the physical, psychological, and social domains of health, as perceived by the patient, which are influenced by a patient’s experiences, beliefs, and expectations of their disease and treatment.16 17

In general, HRQL measures are among the best predictors of the use of general medical and mental health services as well as strong predictors of mortality.12 Although implicitly (health related) quality of life always has been an important goal in medicine, it is only quite recently that it became explicitly an outcome measure in medical studies. In 1992, a joint committee of thyroid associations recommended that self assessment of the eye condition by the patient should be included in evaluations of treatments for patients with GO.18 While a number of studies have assessed functional status and/or (health related) quality of life in patients with other eye diseases,4-6 19-24 so far no studies were published on HRQL in patients with GO.

A number of instruments are available for measuring HRQL. A distinction can be made between generic and disease specific measures. Generic instruments are designed to measure the most important general concepts of HRQL and are applicable across different diseases and different populations, allowing direct comparisons among patient groups. In a recent study we have shown that the general HRQL in patients with GO is markedly decreased compared with a general population and with patients with other chronic diseases.10 However, these generic questionnaires are often too general and their questions too broadly based to detect small, but clinically important changes within diseases.25 Such generic measures may have low content validity in population specific applications because they contain items of little or no relevance to patients and they may miss concerns of particular interest for that disease.26 Disease specific measures are more focused on those aspects of quality of life particularly affected by a specific disease and may have greater detail or more items concerning specific relevant aspects. They are often designed to meet the need of clinical trials to use instruments that are most responsive to clinical changes that occur over time.

Several vision specific quality of life questionnaires are available.4 27-31 However, those are considered unsuitable for studies in patients with GO, because they are primarily developed for patients with problems with their vision (for example, cataract). However, visual acuity is almost never a problem in patients with GO, who suffer from disfiguring proptosis and diplopia. Although vision as well as diplopia can affect visual functioning, questionnaires like the VR-SIP or NEI-VFQ contain many scales or items within scales (including the psychological and social scales) that are not relevant for patients with GO. Furthermore, the length of these questionnaires is a problem (for example, the VR-SIP takes 30 minutes to complete) because disease specific quality of life questionnaires will often be used in combination with general HRQL questionnaires (such as the SF-36, EuroQol) in a battery approach in evaluative studies.

Therefore, the aim of this study was to develop a disease specific quality of life questionnaire for patients with GO. Such a questionnaire can be used to describe the HRQL and changes in HRQL over time as a consequence of GO disease and treatment—for example, as an outcome measure in clinical trials. The questionnaire was not designed to measure clinical aspects of disease or symptoms that are usually assessed by the clinician, but to assess the perceived effects of GO by the patients on their daily physical and psychosocial functioning or HRQL, which are often neglected in clinical practice.

The questionnaire should be reliable (it should produce consistent and reproducible answers) and valid (it should measure what it is supposed to measure). Additionally, to be useful in clinical practice, the questionnaire should be short, self administered, highly acceptable to patients, and simple to score and to interpret.25



Item generation

Items from questionnaires designed for patients with other eye conditions (the most important ones being the VF-14,27 28the Activities of Daily Vision Scale (ADVS),29 and the Vision Related SIP (VR-SIP)30) that were considered relevant for patients with GO (based on discussions with patients and experienced physicians) were considered for inclusion in the questionnaire. In addition, 24 patients completed a questionnaire with open ended questions about symptoms and problems experienced with their disease. Problems that were most often mentioned were incorporated in the questionnaire. Items were selected to reflect the full domain of aspects relevant for patients with GO.

Questionnaire construction

Sixteen questions were constructed to represent two different aspects of GO related to quality of life (two examples are presented in the ).

(1) The consequences of double vision and decreased visual acuity on visual functioning. Patients were asked to indicate the degree to which they were impaired, because of their GO, in activities like driving, reading, watching TV, etc, during the past week, on a three point Likert scale (not impaired, a little impaired, severely impaired).

(2) The psychosocial consequences of a changed appearance. Patients were asked to indicate how much they felt their appearance had changed, how much they felt they got unpleasant reactions from others, how much they felt the disease influenced their self confidence and friendships, etc, all because of their GO, during the past week, also on a three point Likert scale (strongly, a little bit, or not at all). Two of these questions (“less often on photos” and “using camouflage”) had slightly different answering options (yes, no, or I don’t know/not applicable).


Eight patients were interviewed after completing the questionnaire about problems with understanding the questions, relevance of the items, omissions regarding important aspects of the disease, and other comments.


We included consecutive GO patients (age >18 years) who were referred for the first time to the combined outpatient clinic of the orbital centre and the department of endocrinology of the Academic Medical Center (AMC) in Amsterdam. Excluded were patients with current mild or overt hyper- or hypothyroidism (we included only patients with FT4 10–23 pmol/l; FT3I 1.3–2.7 nmol/l) because our primary interest was to measure the impact of the ophthalmopathy and it is possible that a disturbed thyroid function may affect the scores on some of the items. However, owing to their small number, we were not able to evaluate the reliability and validity of the GO-QOL in this subset of patients. Also excluded were patients who had problems reading Dutch. All patients received the GO-QOL, together with the MOS-24 questionnaire (Medical Outcomes Study Short-form General Health Survey,32 consisting of seven subscales) and three subscales of the SIP (Sickness Impact Profile; Social interaction, Household management, and Leisure pastimes and recreation33), both generic HRQL instruments. Additionally, information was obtained on duration of GO and GTD, current GO severity (according to the NO SPECS classification system,11 Table 1), current GO activity (according to the clinical activity score34), which is a measure of inflammation in the orbit, and demographic characteristics. Questionnaires were completed at the outpatient clinic immediately or at home, and returned by the patient on the next appointment in 2–3 weeks.

Table 1

NO SPECS classification of eye changes in Graves’ ophthalmopathy (Werner’s classification11)


Principal component analysis (PCA) was used to identify whether the GO-QOL actually measured two different aspects of HRQL, which could be summarised in two sum scores. PCA (also called factor analysis) is a statistical technique used for analysing the interrelations among a set of variables (or questions in the questionnaire) and for explaining these interrelations in terms of a reduced number of variables, called factors.35 (For example, the relation between questions about apples, bananas, and pears can be explained by a factor called “fruit”, which represents the underlying construct, or dimension). These factors determine the subscales defined in the questionnaire. The technique is based on the assumption that items within subscales are highly correlated with each other, while items between subscales are less correlated or not at all.

In the analysis, all variables are standardised with a mean of 0 and a variance of 1, which means that the total variance in the questionnaire is equal to the total number of items (16 in the GO-QOL). Initially, the number of factors to represent the data were based on the eigenvalue, which represents the total variance of the questionnaire explained by each factor. Only factors with eigenvalues higher than 1 were presented, because factors with an eigenvalue less than 1 explain no more variance than a single variable. In a second analysis we performed a forced two factor analysis, based on the a priori expectation of two dimensions. The final number of factors determining the number of subscales defined in the questionnaire was based on the interpretation of the factor solutions and the scree plot, in which the eigenvalues of all factors are plotted. As a measure of reliability, the internal consistency, based on correlations of items within a subscale, was assessed by calculating Cronbach’s alphas.35 36 Correlations of the scales with subscales of the MOS-24 and SIP and with clinical characteristics were calculated to assess construct validity (the validity of the underlying factors, or dimensions) on the basis of previously formulated hypotheses about the relation between different measures.


In total, 112 consecutive patients were eligible for the study. Ten patients were excluded because of current mild or overt hyperthyroidism or hypothyroidism, 11 patients were excluded because they had problems reading Dutch, and 21 patients did not return the questionnaires. In total, 70 patients completed the questionnaires, 20 male (mean age 55.2 (SD 12.3)) and 50 female (mean age 52.5 (13.5)). Patient characteristics are shown in Table 2. We do not know why the 21 non-responders did not return the questionnaire. The non-responders had less severe disease (median TES 4.5 (range 2–13)), tended to be younger (mean age 40.8 (SD 9.6)), and were more often female (18/21) than the responders.

Table 2

Patient characteristics (n=70)

Percentages of responses to the individual questions of the GO-QOL are given in Table 3. Severe impairment was reported in 35% of the patients for driving and for performing leisure activities, in 27% for reading, and in 28% for watching TV. Ninety per cent of the patients believed their appearance had changed (a little or seriously), 71% felt their eye disease had influenced their self confidence (a little or seriously), and 56% felt they were watched by other people (a little or seriously).

Table 3

Frequencies of responses on the disease specific questions of the GO-QOL (n=70)

The results of the principal component analyses are given in Table4. The factor loadings can be interpreted as the correlation between the item and the underlying factor. Initially, a four factor structure (based on eigenvalues higher than 1) was found. Items that loaded high on the first factor referred mainly to problems with distant vision (driving, cycling, walking indoors and outside). Items that loaded high on the second factor were related to social problems (feelings of social isolation, influence on self confidence, and friendships) and items that loaded high on the third factor were related to changed appearance (using camouflage, feelings of being watched, and change in appearance). Items that loaded high on the fourth factor were related to problems with near vision (reading, watching TV, and hobbies). However, a number of items were related to two or more different factors, especially in factors 2 and 3. The scree plot showed a distinct break between the steep slope of the first two factors (with high eigenvalues) and the gradual trailing of the rest of the factors (with lower eigenvalues), which indicates that two factors might be adequate to describe the data. A forced two factor solution supported the a priori expected subdivision in eight items referring to visual functioning (near and distant vision) and eight items referring to the psychosocial consequences of a changed appearance (Table4).

Table 4

Principal component analysis: factor structure and factor loadings after Varimax rotation of the 16 questions of the GO-QOL

The analyses were initially based on only 46 patients because eight patients had no driver’s licence, resulting in a missing value for the item “driving”, and 16 patients answered “I don’t know” on the item “less often on photos” which was also coded as a missing value (the answer “not applicable” on the item “using camouflage” was coded as “no”). We used several strategies to analyse more patients, by repeating the analyses without these two items, and by replacing missing values with the series mean of the questions or with the scale mean (with the item “social isolation” assigned to the appearance scale). All analyses gave comparable factor structures, except for the item “social isolation”, which was related to both factors depending on the analysis.

Based on these results, two subscales were defined—one scale with eight items called “visual functioning” and the other scale with eight items called “appearance”. It was decided to assign the item “social isolation” to the appearance scale on the basis of its content. Cronbach’s alphas were 0.86 for visual functioning and 0.82 for appearance and did not change much with different strategies for replacing missing values. Highest alphas if one of the items was deleted were 0.85 for visual functioning and 0.83 for appearance.

For both scales, scores of the eight questions were summed and transformed to a 0–100 scale, 0 indicating worst health state, 100 indicating best health state. Mean scores (SD) (range) were 54.7 (22.8) (6.3–100) for visual functioning and 60.1 (24.8) (12.5–100) for appearance. Mean scores on MOS-24 and SIP ranged from 46.2 (21.7) to 88.1 (13.3) (0–100).

In Table 5 correlations of the two subscales of the GO-QOL with other measures are given. All correlations with MOS-24 and SIP scales were in the expected direction. In general, higher correlations were found with subscales of the SIP than with subscales of the MOS-24. Younger patients reported fewer visual problems, but more problems with appearance than older patients, and females reported more problems with appearance than men. Visual functioning correlated with duration and severity of GO disease, but neither scale correlated with disease activity.

Table 5

Spearman’s rank correlations of the two scales of the GO-QOL with subscales of MOS-24 and SIP scales and with clinical characteristics


The GO-QOL is a short, simple, self administered HRQL questionnaire for patients with GO. Two different disease specific dimensions of quality of life can be distinguished, one referring to the consequences of double vision and decreased visual acuity on visual functioning, and one referring to the psychosocial consequences of changed appearance. Both scales had a good reliability. Validity of the questionnaire was supported by correltions with scales of the MOS-24, SIP, and clinical characteristics.

Principal component analysis was used to identify subscales in the questionnaire. Although both scales had high face validity (which means that by just examining the items, the questionnaire appears to measure what it should measure35) and high Cronbach alphas, which support the two factor structure, the results of the principal component analysis should be considered with caution because of our relatively small patient sample.

It could be argued that the item “social isolation” should be excluded from the questionnaire because the item does not discriminate consistently between the two scales. However, an association with both scales is clinically understandable because impairments in driving, cycling, etc, can lead to social isolation even as psychosocial problems resulting from disfiguring disease. This effect might even be strengthened by the fact that this item is placed in the middle of the questionnaire as the first of the eight questions regarding appearance, after the eight questions about visual functioning. Perhaps a different location of this question would result in a higher factor loading on the appearance scale.

The high factor loadings and Cronbach alphas, seen in this study, could possibly be explained by a sequence effect because the questions of both scales were separated. However, because of the obvious face validity of the two scales and because separation of questions of different scales is common practice in a lot of generic HRQL questionnaires, we argue this is not a major issue here.

The best way of validating a questionnaire would be to demonstrate that its results match a gold standard. However, a gold standard for (health related) quality of life is unavailable. Therefore, one must rely on construct validity, which is based on predictions about how the results of the questionnaire should correlate with other related or non-related measures.24 For example, the psychosocial questions about the consequences of changed appearance were expected to correlate with social interactions (SIP), while the questions about impairments in activities of the visual functioning scale were expected to correlate with household management (SIP). The results confirmed these expected relations, which support construct validity. Both scales correlated with leisure pastimes (SIP), which may be explained by the social as well as the visual component of leisure activities. In general, correlations with subscales of the SIP were higher than correlations with subscales of the MOS-24, probably because the subscales of the SIP contain more relevant items for patients with GO.

Construct validity was further supported by the correlations of the two scales with age, sex, and clinical characteristics. However, GO severity (total eye score) correlated only moderately with visual functioning (r=−0.36) and low with appearance (r=−0.10). This could partly be explained by the fact that the total eye score is a compound measure, in which different factors (NO SPECS classes) are added that might have a different effect on HRQL. Higher NO SPECS classes (4 (eye motility) and 6 (visual acuity)) contribute more to the total eye score than lower classes (3 (proptosis)), which might explain the higher correlation with visual functioning than with appearance. The low correlation of GO activity (clinical activity score) with HRQL might also be explained by the fact that the clinical activity score is also a compound measure of different factors (pain, redness, swelling), but we were not able to examine these factors separately. On the other hand, HRQL was not expected to be highly correlated with clinical measurements of disease severity or activity because the perception of the impact of disease will be different for each patient, which indicates the essence of HRQL measurements. This is also illustrated by the low correlation we found between the objective measurement of proptosis and the subjective perception of changed appearance by the patient.

Overall, these correlations with different measures gave confidence that the GO-QOL is really measuring what it is supposed to measure. However, validation is not an all or nothing process. The more frequently an instrument is used, and the more situations in which it performs as expected, the greater the confidence in its validity.13 The validity of the GO-QOL should therefore be confirmed in future studies.

Also, other aspects of reliability and validity should be evaluated before the GO-QOL can be used in clinical studies. Since the questionnaire is designed for evaluative purposes (that is, to measure within subject change over time) it should be able to discriminate between patients with more or less severe disease. A more heterogeneous patient sample is needed to examine whether the GO-QOL can discriminate between patient groups. Also, the questionnaire should give reproducible results on repeated administration in clinically stable patients and the questionnaire should be able to demonstrate clinically important changes—for example, change after an intervention of known efficacy.24 These aspects will be assessed in a further study.

Finally, the score on the questionnaire should be interpretable and meaningful. One should know whether a particular score means mild, moderate, or severe impairment on HRQL and whether a change in score should be interpreted as a small, moderate, or large change. On the GO-QOL, a score of 50 points on one scale reflects, for example, that you are slightly impaired on all eight questions, or severely impaired on half the questions. A change of 1 point on the three point Likert scale of 1 or 2 questions (for example, from moderate to severe impairment), leads to a change in score on the scale of 6.25 or 12.5 points respectively (on a scale from 0 to 100). However, the clinical meaning of the (changes in) scores will become more clear when additional information is available about the variability in scores in stable and non-stable patients.

HRQL measures provide important additional information to traditional physiological or biological measures of health status because they rely on the experience of the individual patients about their functioning and wellbeing in daily life. When the goal of treatment is to improve functional capacities and wellbeing (rather than to prolong life) and correlations between clinical measures and patients’ experiences are low, then HRQL measures are imperative as outcome measures in the evaluations of treatments.37 We conclude that the GO-QOL is a promising tool to measure disease specific aspects of quality of life in patients with GO.


This study could not have been carried out without the enormous help of Professor L Koornneef and the other physicians of the Orbital Center of the AMC.

This work was supported by a grant (OG94/038) from the Dutch National Health Insurance Board (“Fonds Ontwikkelingsgeneeskunde”), Amstelveen, the Netherlands.


Two examples of questions from the GO-QOL (translated from Dutch)