Aims: To measure agreement and estimate sensitivity and specificity of uveitis experts' interpretation of retinal photographs for the diagnosis of toxoplasma retinochoroiditis.
Methods: The authors collated 96 retinal photographs from patients presenting with symptomatic posterior uveitis to ophthalmology clinics within a defined geographical area. Five uveitis experts independently ranked each photograph as definite, probable, possible, or not toxoplasma retinochoroiditis. They measured interobserver agreement based on an intraclass correlation coefficient and estimated sensitivity and specificity for each observer using a maximum likelihood model
Results: The intraclass correlation coefficient for all five observers was 0.62 (95% CI: 0.54 to 0.69). Estimates of sensitivity and specificity for individual observers ranged from 71% to 96% and 58% to 100% respectively when appearances were dichotomised as definite or probable toxoplasma retinochoroiditis, versus possible or not.
Conclusion: Agreement between uveitis experts on the interpretation of retinal photographs was moderate to good, but the estimates of sensitivity and specificity of their interpretation for the diagnosis of toxoplasma retinochoroiditis were highly variable. Assuming that interpretation of photographs is similar to real time ophthalmoscopy, clinicians need to beware that treatment decisions made on the basis of a single assessment will include patients without toxoplasma retinochoroiditis and will miss patients with the disease.
- retinal photographs
- toxoplasma retinochoroiditis
Statistics from Altmetric.com
The diagnosis of recurrent toxoplasma retinochoroiditis is based on the finding of focal retinitis associated with a retinochoroidal scar. Presentation to ophthalmologists is usually prompted by symptoms of sudden visual impairment or pain.1 Given these signs and symptoms, the most likely differential diagnoses are ocular tuberculosis, viral retinitis, sarcoidosis, brucellosis, syphilis, serpiginous choroiditis, and fungal infections.2 Additional information from the clinical history or the results of invasive tests is rarely definitive. For example, a history of similar previous episodes, reported in 50% of patients with acute toxoplasma retinochoroiditis,1 makes the diagnosis more likely but not certain as recurrent episodes can also occur with sarcoidosis or serpiginous choroiditis.1 On the other hand, a history of ocular or neurological signs in early childhood associated with congenital toxoplasmosis is rare but makes the diagnosis almost certain.1 Neither systemic signs of tuberculosis, sarcoidosis, or brucellosis nor serum or skin tests for these diagnoses are highly specific.3 A positive toxoplasma specific antibody result does not add important information but a negative result rules out toxoplasma retinochoroiditis. Finally, more specific tests such as polymerase chain reaction (PCR) detection of toxoplasma DNA in ocular fluids require invasive sampling and cannot be justified as a routine investigation.
The diagnosis of toxoplasma retinochoroiditis and the decision to treat therefore depend on the clinician's interpretation of the appearance of the retina and on the baseline risk of toxoplasmosis and alternative diagnoses. As there is no reference standard, assessment by uveitis experts is regarded as the best method for diagnosing or excluding toxoplasma retinochoroiditis. We evaluated the performance of assessments made by five uveitis experts by measuring agreement and by modelling the sensitivity and specificity of their categorisation of retinal photographs.
We collated 96 fundal photographs from patients presenting to ophthalmology clinics within a defined geographical area (Birmingham, south London, or north east London). Of these, 66 were from patients with suspected acute symptomatic toxoplasma retinochoroiditis reported to a central surveillance scheme1 and, in two patients, one photograph was included for each eye. The remaining 28 photographs were from patients with alternative diagnoses based on clinical follow up and laboratory investigations.
Performance of expert assessors
As there is no reference standard for toxoplasma retinochoroiditis, we could not definitively diagnose or exclude the disease. We therefore determined the performance of expert assessments by measuring agreement between assessors. Sensitivity and specificity were estimated for each assessor using a statistical model. Finally, we used the estimated sensitivity and specificity to predict the risk of mislabelling retinal photographs as toxoplasma retinochoroiditis, and of missing the disease in photographs categorised as unaffected.
Agreement between assessors
Five uveitis experts were asked to independently classify the retinal photographs into four categories without any additional information: definitely, probably, possibly, or not toxoplasma retinochoroiditis. We calculated the level of agreement between five assessors using an intraclass correlation coefficient.4 This coefficient expresses the variability attributable to true differences between retinal photographs as a proportion of the total variability and gives similar results to a quadratic weighted kappa statistic.5 The intraclass correlation coefficient ranges from 0 and 1, a value of 1 indicates complete agreement. Residual variability (1 − intraclass correlation coefficient) is due to true variation between assessors and measurement error. Approximate confidence intervals were calculated according to the method described by Fleiss and Shrout.6
Estimation of sensitivity and specificity
We used a statistical method,7 which made probabilistic estimates for the true disease prevalence, to estimate sensitivity and specificity for each assessor. As calculation of sensitivity and specificity requires binary data, categories were dichotomised as definite or probable toxoplasma retinochoroiditis versus possible or not. The former group was considered to represent patients likely to be treated without further investigations, while in the latter group, alternative diagnoses would be considered. The model took account of the source of the photographs (central surveillance scheme or alternative diagnoses) and the different prevalences of toxoplasma retinochoroiditis likely in photographs from these two sources. Estimates obtained from the model were based on the assumptions that misclassification errors were independent between assessors and between retinal photographs, conditional on the true disease status.7 Confidence intervals were calculated using the profile likelihood method.8
Predicting the risk of mislabelling retinal appearances
We used the estimated sensitivity and specificity for assessor 3 and Bayes's formula9 to produce illustrative calculations of the risk of mislabelling retinal appearances in: (a) a community setting, where the reported prevalence of toxoplasma retinochoroiditis was 35% (95% CI: 18% to 53%); and (b) a uveitis clinic, reported prevalence 8% (4%–13%).10 We assumed the spectrum of disease to be similar in both settings. The post-assessment odds of toxoplasma retinochoroiditis in photographs classified as definite or probable toxoplasma retinochoroiditis can be calculated as the pretest odds of toxoplasma retinochoroiditis × (sensitivity/(1 − specificity). Odds can then be converted to a risk estimate (= odds/1 + odds). The risk of mislabelling retinal appearances as toxoplasma retinochoroiditis is 1 − the post-assessment risk of toxoplasma retinochoroiditis.
Agreement between experts
Figures 1–4 provide examples of photographs that were classified as definite, probable, possible, and not toxoplasma retinochoroiditis respectively by all five assessors. Of the 96 photographs examined, two were excluded from the analysis owing to lack of categorisation by one or more assessor. The frequency of categorisation as definitely, probably, or possibly toxoplasma retinochoroiditis varied between assessors, but all assessors categorised fewest photographs as not toxoplasma retinochoroiditis (Fig 5). The intraclass correlation coefficient for all assessors across the four categories was 0.62 (95% CI: 0.54 to 0.69). The intraclass correlation coefficients for pairs of assessors across the four categories varied between 0.53 and 0.73 (Table 1). When categories were dichotomised as definitely/probably versus possibly/not toxoplasma retinochoroiditis the intraclass correlation coefficient for all five assessors was 0.46 (95% CI: 0.37 to 0.54).
Estimation of sensitivity and specificity
Point estimates for the sensitivity for toxoplasma retinochoroiditis, based on categories of definitely or probably toxoplasma retinochoroiditis versus probably or not, ranged from 71% to 96% (confidence intervals shown in Table 2). Point estimates for specificity ranged from 58% to 100%.
Predicting the risk of mislabelling retinal appearances
In a community setting (prevalence of toxoplasma uveitis 34%), the risk of mislabelling retinal appearances as toxoplasma retinochoroiditis in photographs from patients without the disease would be 28%. Conversely, the diagnosis would be missed in 7% of photographs labelled as possible or not toxoplasma retinochoroiditis.
In a uveitis referral clinic (reported prevalence of toxoplasma retinochoroiditis 8%10), the estimated risk of mislabelling retinal appearances as toxoplasma retinochoroiditis would be 70%. The risk of missing the diagnosis in photographs labelled as possible or not toxoplasma retinochoroiditis would be 1%. These risks should be regarded as illustrative as the confidence intervals around the sensitivity and specificity are wide and require several assumptions.
We found moderate agreement between uveitis experts' attribution of retinal appearances to toxoplasma retinochoroiditis or not which resulted in substantial interobserver variation in the estimated sensitivity and specificity. Our calculations of the predicted risks of mislabelling retinal appearances indicate that, in low prevalence settings, uveitis experts are more likely to diagnose toxoplasma retinochoroiditis and possibly treat patients who do not have the disease than to miss patients who have the disease. The consequences of these errors depend on the harm in delayed or no treatment in patients with toxoplasma retinochoroiditis and inappropriate treatment in patients with alternative diagnoses.
Toxoplasma treatment consists of antibiotics and high dose systemic corticosteroids for at least 3–4 weeks.11 Treatment aims to reduce the duration of acute symptoms and reduce the risk of permanent visual impairment.2 However, there is no evidence from randomised controlled trials that antibiotic or steroid treatment of toxoplasma retinochoroiditis, compared with no treatment, improves visual symptoms or acuity12,13 and both treatments are associated with major adverse effects. In addition, steroid treatment may exacerbate tuberculosis and the rarer infectious causes of retinitis such as viruses, brucellosis, or fungi.14 In Europe, the most common differential diagnosis is sarcoidosis,15,16 which is usually treated with corticosteroids, but treatment may nevertheless delay a definitive diagnosis.
One previous study reported clinician agreement on the interpretation of quiescent toxoplasma retinal lesions,17 but we know of no previous reports on the performance of retinal examination for toxoplasma retinochoroiditis in clinical practice. Three limitations of our study mean that our estimate of the performance of clinical interpretation of retinal photographs is likely to overestimate agreement, sensitivity, and specificity. Firstly, the assessors were all uveitis experts. Non-experts might be expected to show more disagreement.18 Secondly, the model used to estimate sensitivity and specificity required the assumption that misclassification between assessors was unrelated, conditional on the true disease status. In practice, classification is likely to be part of a continuum and may be correlated due to qualitatively similar mistakes made by assessors with similar background and training.7 Thirdly, although photographs from patients with suspected acute toxoplasma retinochoroiditis were derived from a population based study, the remaining photographs were from patients with retinochoroiditis because of alternative diagnoses. Selection of the latter photographs may have favoured those with unambiguous appearances of an alternative diagnosis, thereby introducing spectrum bias.19
Finally, generalisation of our results, based on retinal photographs, to real time, biomicroscopic ophthalmoscopy, is a matter of judgment. Studies comparing ophthalmoscopy with assessments of photographs of diabetic retinopathy reported moderate to good agreement18 but no such comparisons have been published for toxoplasma retinochoroiditis. Agreement may be increased given the clinical information available in practice, but this has not been assessed. A further limitation of our results is that they reflect interpretation of retinal photographs taken from patients in the United Kingdom. Further studies are required to evaluate clinician assessments in populations with a different spectrum of differential diagnoses and, possibly, a different spectrum of toxoplasma retinochoroiditis, as may be the case in endemic areas such as south Brazil.17 Finally, the criteria used by each of the assessors to classify toxoplasma retinochoroiditis were not investigated and may have differed. Agreement would be improved if standardised criteria could be developed and adopted by ophthalmologists.
Unless a readily usable reference test for toxoplasma retinochoroiditis becomes available, future evaluations of clinician assessments should involve comparison with a clinical reference standard which measures the evolution of retinochoroiditis in the long term.1 In addition, the methods need to take account of the fact that retinal assessment would be part of the test and reference standard.20 Such long term follow up studies would provide more valid estimates of the performance of clinician assessments together with important information on the baseline risk of toxoplasma retinochoroiditis and the differential diagnoses.
The authors thank Professor A Bird, Dr A Rothova, Professor G Dutton, Dr A Brezin, and Professor G Holland for acting as independent assessors in this study. Professors P Murray and S Lightman provided the retinal photographs of alternative diagnoses. The work was supported by the Iris Fund for Prevention of Blindness.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.