Purpose Comparison of optical coherence tomography (OCT) segmentation performance regarding technical accuracy and clinical relevance.
Methods 29 eyes were imaged prospectively with Spectralis (Sp), Cirrus (Ci), 3D-OCT 2000 (3D) and RS-3000 (RS) OCTs. Raw data were evaluated in validated custom software. A 1 mm diameter subfield, centred on the fovea, was investigated to compare identical regions for each case. Segmentation errors were corrected on each B-scan enclosed in this subfield. Proportions of wrongly segmented A-scans were noted for inner and outer retinal boundaries. Centre point thickness (CPT) and central macular thickness (CMT) were compared before and after correction.
Results Segmentation errors occurred in 77% and affected on average 29% of A-scans, resulting in mean differences of 24/13 µm (CPT/CMT). The incidence of segmentation errors was 48% (Sp), 79% (Ci), 86% (3D) and 93% (RS), p<0.001. Mean proportions of A-scans with wrong outer retinal boundary were 30% (Sp), 9% (Ci), 23% (3D) and 10% (RS), p=0.006; proportions for the inner retinal boundary were 11% (Sp), 12% (Ci), 6% (3D) and 21% (RS), p=0.034. Mean deviations in CPT/CMT were 41/28 µm (Sp), 17/11 µm (Ci), 30/13 µm (3D) and 18/8 µm (RS), p=0.409/0.477.
Conclusions By comparison of identical regions, substantial differences were detected between the tested OCT devices regarding technical accuracy and clinical impact. Spectralis showed lowest error incidence but highest error impact.
- Diagnostic tests/Investigation
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Optical coherence tomography (OCT) is an innovative in vivo imaging technique with worldwide popularity, especially since the introduction of spectral domain (SD) OCT.1–3 A qualitative assessment of the morphological changes that OCT visualises is indispensable for the early diagnosis and accurate follow-up of macular diseases.4 However, also the quantitative analysis of OCT data plays a pivotal role in research studies and clinical practice.5 ,6 Particularly central macular thickness (CMT) has been heavily used as an outcome variable in major clinical trials evaluating pharmacological or laser treatment of neovascular age-related macular degeneration (AMD) or retinal vascular disease.7–10 CMT metrics are also frequently employed in clinical practice as they permit a rapid evaluation of potential disease progression and/or treatment response.
Retinal thickness reports are provided by most commercially available OCT instruments. For thickness measurements, the devices use built-in automated segmentation algorithms that detect the retinal boundaries on OCT images. However, such automated segmentation algorithms are reported to fail frequently, particularly in the presence of pathology, with an output of incorrect thickness values as consequence.11–24 Unless segmentation errors are corrected manually, they may lead to wrong conclusions in clinical practice and may confound the results of clinical trials.25
Importantly, the presence of a segmentation error may not automatically entail a clinically relevant retinal thickness deviation. It is, therefore, necessary to objectively quantify the presence and extent of segmentation errors (ie, the technical accuracy of the segmentation algorithm), as well as the impact of segmentation errors on thickness outputs (ie, the clinical relevance). Previous studies used subjective scoring scales,11 ,14 ,16 ,17 ,21 ,22 assessed thickness deviations,13 ,18 or both,15 ,24 to investigate the severity of segmentation artefacts. As yet, an objective, quantitative method to evaluate the technical accuracy of segmentation algorithms is lacking. Moreover, previous studies including different instruments failed to ensure a comparison of identical regions.13 ,14 ,19 ,22 A fair comparison of SD-OCT machines would, however, only be possible if exactly corresponding retinal loci were compared. A compelling quantitative inter-instrument comparison would be of value for clinicians and researchers when choosing between the several available SD-OCT devices for clinical practice or a research trial.
The aim of this study was to quantitatively compare the segmentation performance across four prevalent SD-OCT devices, focusing on both a measure of the technical accuracy of segmentation algorithms and the resulting clinical impact of segmentation errors.
Patients and methods
This prospective, comparative non-interventional case series was conducted in compliance with the tenets of the Declaration of Helsinki and was prospectively approved by the ethics committee at the Medical University of Vienna. Patients seen at the macula clinic at the Department of Ophthalmology, Medical University of Vienna, Austria, were included. Inclusion criteria were as follows: age 18 years or older, patient able and willing to give written informed consent, presence of retinal disease involving thickening or thinning of the central retina in at least one eye, and clear optical media allowing OCT imaging in high quality. If a patient's compliance permitted, both eyes were eligible for inclusion. All patients underwent a complete ophthalmic examination and pupils were dilated before the imaging procedure.
OCT imaging procedure
Image acquisition followed an established standard procedure provided by the Vienna Reading Center. OCT imaging was performed by experienced, reading-centre certified operators, using each a commercially available 3D-OCT 2000 Mark II (Topcon, Tokyo, Japan), Cirrus HD-OCT (Carl Zeiss Meditec, Dublin, California, USA), RS-3000 (Nidek, Tokyo, Japan) and Spectralis (Heidelberg Engineering, Heidelberg, Germany) instrument under standardised conditions. Identical or similar raster scan patterns were chosen across the devices to enable an objective comparison. The specific scan protocols were as follows—3D-OCT 2000: ‘3D Macula’ pattern with 128 sections (512 A-scans each) in a 6×6 mm area, Cirrus HD-OCT: ‘Macular Cube’ pattern with 128 sections (512 A-scans each) in a 6×6 mm area, RS-3000: ‘Macula Map’ pattern with 128 sections (512 A-scans each) in a 6×6 mm area and Spectralis: custom raster scan pattern with 49 or 25 sections (512 A-scans each; lower number of sections chosen in case of patient fatigue) in a 20°×20° field of view and automatic real-time averaging activated at 29 frames. In all instruments, an internal fixation target was provided. Each subject was scanned at all devices within 1 h, by the same operator and in random order, to counteract a potential systematic bias. Scans were immediately discarded and repeated if the operator noticed low scan quality upon initial review. In detail, the operator was instructed to inspect the entire OCT volume for excellent signal strength, even illumination and absence of motion or blink artefacts.
OCT raw data were exported from each instrument and evaluated in custom, validated Vienna Reading Center software (‘OCTAVO’). The software shows B-scan images as well as the device-provided segmentation lines in standardised display conditions. The segmentation lines can be manually adjusted at any location throughout the scan and can be hidden as needed in order to facilitate B-scan assessment. OCTAVO, furthermore, allows the grader to plot an Early Treatment Diabetic Retinopathy Study (ETDRS)-like grid and to display its centre point and the actual borders of the subfields on the corresponding B-scans (figure 1). The software provides, among other variables, centre point thickness (CPT, retinal thickness at the centre of the ETDRS grid) and CMT (average retinal thickness in the central 1 mm subfield of the grid). Thickness values can be calculated using either the device-provided segmentation only or using device-provided segmentation with manual corrections after reader input.
Two masked certified reading supervisors of the Vienna Reading Center (SMW, BSG) evaluated all SD-OCT scans, strictly adhering to a standardised grading protocol. All scans of a particular eye were graded consistently by the same reader to counteract a potential annotation bias. In order to compare identical retinal loci across each tested SD-OCT device, the ETDRS grid was first manually replotted to the foveal centre. Afterwards, all errors in the device-provided segmentation lines were corrected on all B-scans enclosed within the central subfield (eg, 22 B-scans in Cirrus). Any visible deviations of the segmentation lines, however small these may be, were corrected manually. Only the contents of the central subfield were corrected and compared as CMT, one of the most important variables in clinical and research use, and precise manual segmentation correction in the entire volume stack would be impracticable. Correction of line errors was performed separately for the inner retinal border (ie, retina-vitreous interface) and the outer retinal border (ie, either above (3D-OCT 2000) or at the retinal pigment epithelium (Cirrus, RS-3000) or at Bruch's membrane (Spectralis)).
Assessment of technical accuracy of segmentation algorithms
A quantitative measure of the technical accuracy of the segmentation algorithms was devised as follows: each A-scan (equivalent to one pixel in y-direction on the OCT image) containing a segmentation error correction as per reader input was counted by the software. The sum of affected A-scans within the central subfield was divided by the total number of A-scans within the central subfield and expressed as percentage of affected A-scans. The percentage values were noted separately for segmentation errors at the inner and outer retinal borders.
Assessment of clinical impact of segmentation errors
Differences between uncorrected thickness values and manually corrected thickness values were used as a measure of the clinical impact of segmentation errors. A thickness deviation of 7 µm or more (2× average nominal z-resolution of the evaluated devices) was set as cut-off point for clinical significance of segmentation errors, assuming that deviations below this value would not warrant manual correction in clinical practice.
PASW Statistics (V.18.0.0, SPSS, Chicago, Illinois, USA) was used for statistical analysis. χ2 Tests were used to compare categorical variables and analysis of variance was used to compare scale variables with assumed normal distribution between the SD-OCT devices. Linear regression was used to assess the relation between technical accuracy and clinical impact of segmentation errors. p Values ≤0.05 were considered statistically significant.
Twenty-nine eyes of 19 patients were included. The retinal pathological features were as follows: seven retinal vein occlusion, three diabetic macular oedema, three epiretinal membrane (grouped as ‘oedematous disease’); six neovascular AMD, three lamellar hole, two Stargardt disease, one non-neovascular AMD (grouped as ‘degenerative disease’); and four healthy eyes. The mean age of subjects was 50±14 years; 10 were women and nine were men. Mean retinal thickness values in Cirrus HD-OCT were 296±150 µm (CPT) and 340±118 µm (CMT).
In the assessed replotted central subfield area, segmentation errors occurred in 76.7% of cases. In these cases, errors affected, on average, 22.5% of A-scans within the central subfield, resulting in a mean thickness difference of 18.8 µm for CPT and 10.2 µm for CMT. Clinically significant CMT deviations occurred in 25.9% of cases.
Incidence of segmentation errors across the devices
Proportions of cases with segmentation errors ranged from 48% (Spectralis) to 79% (Cirrus), 86% (3D-OCT 2000) and 93% (RS-3000) (p<0.001). Misidentification of the outer retinal border ranged from 27% (Spectralis) to 62% (3D-OCT 2000) (p=0.04), while misidentification of the inner retinal border ranged from 27% (Spectralis) to 93% (RS-3000) (p<0.001). Table 1 shows proportions of cases with segmentation errors across the tested devices with breakdowns for retinal borders and disease types.
Technical accuracy of segmentation algorithms
Among the cases with segmentation errors, the mean proportions of A-scans with erroneous segmentation were 21.0% (Cirrus), 28.9% (3D-OCT 2000), 30.6% (RS-3000) and 40.8% (Spectralis), p=0.237. Significant differences in the technical accuracy were found between the segmentation of the outer retinal boundary (mean 9.0% error (Cirrus) up to mean 30.4% error (Spectralis), p=0.034) and inner retinal boundary (mean 5.5% error (3D-OCT 2000) up to mean 20.8% error (RS-3000), p=0.006). Table 2 provides data on the technical accuracy with regard to retinal borders and disease types.
Clinical impact of segmentation errors
Mean deviations in CPT and CMT were 17.6 µm/8.4 µm (RS-3000), 16.7 µm/10.6 µm (Cirrus), 29.7 µm/13.0 µm (3D-OCT 2000) and 41.1 µm/28.1 µm (Spectralis). One eye with a large pigment epithelial detachment (figure 2) showed an extraordinarily severe segmentation error in Spectralis, resulting in a CPT/CMT deviation of 362 µm/336 µm. After exclusion of this outlier, mean deviations in CPT/CMT were 16.5 µm/4.4 µm for the Spectralis. Clinically relevant CMT deviation (over 7 µm) occurred in 17.2% (Cirrus HD-OCT), 20.7% (Spectralis), 27.6% (RS-3000) and 37.9% (3D-OCT 2000) (p=0.278). Table 3 shows an overview over the clinical impact of segmentation errors including breakdown for disease types.
Impact of disease type on segmentation error incidence
Table 4 provides a comparison of the incidence of segmentation errors, technical accuracy and clinical impact among the evaluated disease types (healthy eyes, degenerative disease and oedematous disease). Generally, segmentation errors were found to be more frequent, severe and impactful in degenerative retinal disease.
Correlation between technical accuracy and clinical impact
Linear regression analysis was performed to assess the correlation between technical accuracy (area affected by segmentation errors in central subfield) and clinical impact (CMT deviation). Significant correlations were detected for all devices (regression lines are shown in figure 3). The percentage of affected A-scans had minimum impact on CMT in Spectralis (after exclusion of outlier), with a coefficient of 0.089±0.016 (R2=0.74, p<0.001). 3D-OCT 2000 (R2=0.55, p<0.001) and RS-3000 (R2=0.46, p<0.001) showed moderate correlations with coefficients of 0.523±0.098 and 0.357±0.077, respectively. Cirrus HD-OCT showed a strong correlation (R2=0.75, p<0.001) with substantial impact of the area of segmentation error on CMT deviations (coefficient=0.865±0.108). Correlation between disease severity and segmentation error incidence.
Linear regression analysis was performed to assess the correlation between disease severity (ie, CMT) and the rate of segmentation errors as well as technical accuracy and clinical impact. No significant correlations were detected (all p values >0.05).
Our study was performed to quantitatively compare macular segmentation performance across four prevalent SD-OCT devices. We demonstrated the feasibility of a comparison of identical retinal loci and presented a novel measure for the technical accuracy of segmentation algorithms, that is, the proportion of error-affected A-scans. Our data indicate that significant differences exist between the devices regarding the incidence of errors and the technical accuracy, but not regarding the clinical impact of segmentation errors.
The incidence of any detectable alignment errors within the central subfield showed major differences between the tested devices. It was lowest in Spectralis (about one-half of the evaluated eyes affected) and highest in the Nidek RS-3000, with segmentation errors present in almost all cases (93%). Among the cases with segmentation errors, Cirrus HD-OCT showed the least technical inaccuracy, with a mean of 21% affected A-scans within the central subfield. Using this particular measure, Spectralis OCT had worst outcomes with, on average, 41% A-scans affected. With a view to the potential clinical impact in CMT metrics, the Spectralis instrument showed the highest mean error (28.1 µm) and the single highest CMT deviation (336 µm). However, after exclusion of this extreme case, Spectralis delivered very reliable segmentation results (average 4 µm deviation).
To the best of our knowledge, this is the first study to evaluate segmentation performance in the Nidek RS-3000 instrument. For this particular machine, our results show considerable technical inaccuracy, especially for detection of the inner retinal boundary. Of the affected cases, 27% showed clinically relevant segmentation errors. However, in terms of absolute CMT deviation, performance was generally acceptable with 8 µm error on average.
The results of our study should best be interpreted in synopsis of all its outcome variables. For the Spectralis instrument, although the chance of errors was rather low, errors consistently affected large proportions of A-scans and sometimes caused considerable thickness deviations. It can be assumed that this finding is due to a lower sampling rate in Spectralis. For the clinician, our results imply that it may be necessary to inspect Spectralis thickness outputs for errors, since these can be severe, and manual correction only requires little effort as the sampling density is low.
With regard to the other instruments, uniformly using the 512×128 pattern without B-scan averaging, a correction of segmentation errors in these machines does not seem feasible or worth the effort since thickness deviations are often within an acceptable range and manual correction of several B-scans might be too laborious. In a research setting, however, manual inspection and correction of alignment errors remains mandatory, with up to 40% clinically relevant thickness deviations detected.
Not surprisingly, eyes in the ‘degenerative disease’ subgroup including AMD and lamellar macular holes, showed significantly poorer segmentation outcomes. It is therefore an unmet medical need to develop segmentation techniques providing robust outcomes even in the presence of severe disruptive pathology.
A limitation of our study is the use of a different scanning pattern in the Spectralis instrument. As the 512×128 on 6 mm×6 mm pattern is not available on the Spectralis machine, we employed the Vienna Reading Center standard pattern (512×49) for this instrument. Proportions were used instead of absolute numbers in all statistics in order to account for the difference in sampling density. A further limitation is the use of scan averaging (in accordance with our reading centre standard operating procedures) in the Spectralis instrument as compared with no averaging in the other SD-OCT machines. However, averaging in volume scans is currently only available in the Spectralis and it seems prudent to compare devices with regard to their maximum technical capabilities. Moreover, OCT signal quality as a potential influencing factor in segmentation performance was not evaluated in this study. Since no uniform signal quality measures are available in this study, only optimal quality scans were selected at each device during acquisition to remove such influence as much as possible.
In conclusion, our study demonstrated the feasibility of an objective comparison of segmentation performance using identical retinal loci in four SD-OCT machines. The Spectralis device showed the lowest error rates, but the highest impact of errors if present. This may be attributable to a lower sampling density in Spectralis. The remaining SD-OCT instruments demonstrated higher error rates, with generally low clinical impact of segmentation errors. Manual inspection and correction of segmentation failures may be clinically useful and feasible for Spectralis scans.
SMW and BSG contributed equally.
Contributors SMW: Design of the study, acquisition of data, analysis of data, literature research, writing, revision and approval of article. BSG: design of the study, acquisition of data, analysis of data, revision and approval of article. AM: analysis of data, revision and approval of article. CS: design of the study, provision of funding, revision and approval of article. US-E: revision and approval of article, provision of funding.
Funding The financial support by the Austrian Federal Ministry of Economy, Family and Youth and the National Foundation for Research, Technology and Development is gratefully acknowledged.
Competing interests None.
Ethics approval Ethics Committee, Medical University of Vienna.
Provenance and peer review Not commissioned; externally peer reviewed.