Purpose To validate three models for predicting proliferative vitreoretinopathy (PVR) based on the analysis of genotypic data and relevant clinical characteristics.
Methods The validation series consisted of data from 546 patients operated on from primary rhegmatogenous retinal detachment (RRD) coming from centres in the Netherlands, Portugal, Spain and the UK. Temporal and geographical validation was performed. The discrimination capability of each model was analysed and compared with the original series, using a receiver operating curve. Then, clinical variables were combined in order to improve the predictive capability. A risk reclassification analysis was performed with and without each one of the variables. Reclassification of patients was compared and models were readjusted in the original series. Readjusted models were further validated.
Results One of the models showed good predictability in the temporal sample as well as in the original series (area under the curve (AUC) original=0.7352; AUC temporal=0.6457, 95% CI 50.17 to 78.97). When clinical variables were included, only pre-existent PVR improves the predictability of this model in the validation series (temporal and geographical samples) (AUC original=0.7940 vs AUC temporal=0.7744 and AUC geographical=0.7152). The other models showed acceptable AUC values when clinical variables were included although they were less accurate than in the original series.
Conclusions Genetic profiling of patients with RRD can improve the predictability of PVR in addition to the well-known clinical biomarkers. This validated formula could be a new tool in our current clinical practice in order to identify those patients at high risk of developing PVR.
- Diagnostic tests/Investigation
Statistics from Altmetric.com
Proliferative vitreoretinopathy (PVR) continues to be the main cause of failure of retinal detachment (RD) surgery.1 ,2 Its management has remained basically unchanged for the last 30 years, despite the impressive changes observed in many other areas of ophthalmology. PVR reduces anatomical success rates and leads to poor functional results resulting also in an increase of the cost of the RD treatment.2 For decades, attempts to overcome this complication have included, among others, the use of antiproliferative agents, but results have been unsatisfactory and none has been incorporated routinely to the clinic.2 Thus, from a practical point of view, clinicians are concentrated in improving their surgical results which seems to have reached a limit, than in preventing its development.3 Additionally, most of those strategies developed to prevent PVR are not free of side effects,4 making it more than necessary to identify those patients at high risk of developing PVR before applying any prophylactic treatment.
In order to predict PVR development, most researches had been focused on the analysis of clinical characteristics of patients suffering from a RD, and some have developed formulas to help the clinicians in the early identification of this complication.5–9 However, these formulae have not been validated by using them in external samples or do not provide satisfactory predictive capability,10 thus, the search of more predictable biomarkers for PVR is fully justified.
With the developments of affordable and reliable high-throughput genotyping technology, many association studies have been conducted in the last decades to identify DNA variants associated with many complex human diseases, such as cancer, autoimmune diseases, blood pressure, body mass index and several others. These studies have unveiled many novel genes and implicated unexpected pathways associated with disease mechanisms. Additionally, these works have identified some DNA variants associated with an increased risk of developing certain diseases. Knowing the ‘high risk’ DNA variants could help us in the identification of patients at high risk of suffering from a determined disorder.
Our research group confirmed the implication of a genetic component in the development of PVR following a RD surgery.11–13 Using these genotypic data, we developed three predictive models of PVR based on the analysis of genetic markers detected in a sample of patients suffering from RD, with and without PVR, obtained from Spanish centres.14 These models were developed using the genotypic characteristics of patients (original dataset) whose outcomes were already known with regard to PVR. Models were therefore developed in a way that optimally adjusted this dataset to predict the known outcome (internal validation).
In this project, we used a new sample, defined as the validation series, in order to find out the reproducibility and transportability of the previously described models (external validation). As mentioned, we also analysed the effect on the predictability values adding some clinical risk factors.
A European multicenter study was performed among new patients who had undergone primary rhegmatogenous renal detachment (RRD) surgery from 2008 to 2010. In the same way as in the original series, careful ophthalmoscopic examination by slit lamp and indirect ophthalmoscope were performed preoperatively and postoperatively to classify the patient as case or control. Same inclusion and exclusion criteria as in the original series were considered. Briefly, those patients who developed PVR grade C1 or higher according to the classification described by the Retina Society Terminology Committee,15 were included as cases, and those who did not develop PVR after 3 months of follow-up were included as controls. Causes other than a primary RRD, such as traumatic, tractional, exudative or iatrogenic RD, were excluded. Exclusion criteria also included RD secondary to macular hole, giant retinal tears defined by more than three clock hours, patients with PVR grade A or B,15 and patients with RD in the affected eye and RD with PVR in the fellow eye. All patients provided written informed consent; the study followed the tenets of the Declaration of Helsinki and was approved by the institutional research ethics committees of each centre.
DNA samples and data from patients recruited in 1 centre in the Netherlands, 2 in Portugal, 14 in Spain and 1 in the UK (table 1). This sample was used for the geographical or external validation of the models.
Single nucleotide polymorphisms selection and measurements
Single nucleotide polymorphisms (SNPs) included in this study were the same as that in the original series used for building the predictive models.14 These SNPs came from those originally analysed in order to find out the contribution of the genetic component to the development of PVR.11 In that case, and considering the inflammatory nature of PVR, we selected and studied 30 genes implicated in the inflammatory cascade including the main cytokines such as tumour necrosis factor (TNF)-α, platelet derived growth factor, transforming growth factor, and their receptors and signalling pathway mediators. Common tag SNPs with correlation coefficients ≥0.8 and a minor allelic frequency ≥10% were studied to explain, as much as possible, the known genetic variation of this gene. We used the Tagger method implemented in the Haploview programme (www.broadinstitute.org, accessed 27 July 2006)16 and considered 10 kb upstream and downstream genic and extragenic regions. A substitute SNP was considered for inclusion in case there was an error during the pipeline design. Functional SNPs, or ones previously described in association with other inflammatory diseases, were also added for analysis.
A peripheral blood sample was used for DNA extraction which was performed using the commercial REALpure Kit protocol SSS, (DURVIZ SL; Valencia, Spain) or similar. All DNA samples were shipped to the central lab of the CEGEN, the National Genotyping Centre located in Santiago de Compostela (Spain) for analysis. Those markers included in the previously developed models were investigated (table 2) using the MassARRAY SNP Genotyping System (Sequenom, San Diego, California, USA) following the manufacturer’s instructions. The principles of this method are detailed in Buetow et al.17
Predictive models of PVR
Three predictive models of PVR previously developed by our group14 were validated. These models had been adjusted using subsets of 197 SNPs over 30 candidate genes analysing 450 blood samples from patients suffering from a primary RD after surgery, 312 controls and 138 cases (original series). Machine-learning methods were used to predict the probability of developing PVR. Two of them were built with the support vector machine algorithm, one using the lineal kernel18 and the other one the radial kernel18 (named model 1 and model 2). Finally, model 3 was built with the random forest algorithm.19 They worked with 42, 10 and 2 SNPs, and had accuracy values of 78.4%, 70.3% and 69.3%, respectively (table 2).
Differences among the original and the validation series were analysed. Demographic and clinical characteristics were evaluated. Sensitivity, specificity, positive predictive value, negative predictive value and diagnostic accuracy were assessed using a receiver operating curve (ROC) in order to determine the discrimination capacity of each model. While sensitivity evaluates how good the model is at detecting a positive disease, and specificity how good the model is at detecting a negative disease, accuracy is a global measure. Accuracy measures how properly a diagnostic tool classifies a patient. It is calculated as a ratio between the number of correct predictions and number of all assessments. These discrimination measurements were obtained by setting an optimal threshold in each ROC curve: the point with the best sensitivity and specificity values.
Predictive models combining clinical and genetic variables
In order to improve the predictive capability of the models, these following clinical variables were added: pre-existent PVR (clear signs of PVR such as retinal shortening, anterior PVR, subretinal bands, or epiretinal membranes appearing before any surgical procedure to treat the RD), personal history of RD, family history of RD, status of the lens and race. A risk reclassification analysis was performed as follows: each model was evaluated with and without each one of the variables. Reclassification of patients was compared, valuating the changes observed for the predicting capability of models. The net reclassification improvement (NRI) and the integrated discrimination improvement (IDI) were used for determining the significance of the change. These two measures were introduced by Pencina et al20 in order to improve the predictive capability of existing models adding new predictors as they are identified. A statistically significant association of any new predictor with the process under study is not a sufficient criterion to be included into a multivariate model. It is necessary to analyse the classification improvement. While the NRI focuses on reclassification tables set separately for participants with (cases) and without (controls) events, and quantifies the correct reclassifications in terms of proportions, the IDI is based on the new model's ability to improve sensitivity without sacrificing specificity. That means, it focuses on differences between sensitivity and ‘one minus specificity’ for models with and without the new predictor. In our case, for the reclassification tables, three categories were considered based on the PVR risk: <20%, between 20% and 40%, and >40%. Movements upwards for cases and downwards for controls were defined as correct reclassifications. For that purpose the PredictABEL package from R was used.21 Those variables which significantly changed the predicting capability of models were selected. Models were then readjusted in the original series taking into account those selected variables. Finally, these readjusted models were validated with the temporal and geographical samples. The same parameters of discrimination previously used for validation of models without clinical variables were used this time for the validation of models with clinical variables: sensitivity, specificity, positive predictive value, negative predictive value and diagnostic accuracy.
A total of 546 patients were included, 90 for the temporal sample (18 cases and 72 controls) and 456 (133 cases and 323 controls) for the geographical sample (table 1).
Differences among series (original and validation series)
PVR was significantly less frequent in the temporal sample than the original series (p=0.037). A similar trend was observed for the geographical sample, although differences were not significant. Also, there were no significant differences in the percentage of cases with pre-existent PVR for both samples, temporal and geographical. The temporal sample had more aphakic patients than the original series (p=0.02). Finally, there were fewer patients reporting a family history of RD in the geographical sample than in the original series (p<0.001) (table 3).
In the validation series (temporal and geographical samples) surgeons performed less pneumatic retinopexies, although difference was not statistically significant. Scleral surgery (SS) was also less frequent (p=0.01 and <0.0001 for temporal and geographical samples, respectively). Conversely, pars plana vitrectomy (PPV) was more frequent in the validation series, although difference was only significant for the geographical sample (p<0.0001). These results most likely reflect a general trend towards vitrectomy over SS in the last few years, particularly with the advent of small-gauge vitrectomy systems. Surgeons applied more laser in the geographical sample than in the original series (p=0.001). Cryotherapy was more frequently used in the temporal sample and less in the geographical one, although difference was not significant. Among patients in whom PPV was performed, SF6 was used more frequently as tamponade in the validation series, although difference was only significant for the geographical sample (p<0.0001). Similarly, silicone oil was more significantly used in the geographical sample (p<0.0001) (table 3).
Validation of models
After comparing the ROC curves, model 2 showed better values (AUC=0.6457; 95% CI 0.5017 to 0.7897) in the temporal sample. It is remarkable that significant differences with the original series were not observed (p=0.2980) (table 4 and figure 1) resulting in a diagnostic accuracy of 60.98% (95% CI 50.42 to 71.53). Regarding the geographical sample, the three models showed AUC values significantly lower than the AUC obtained by internal validation (table 4).
Models with genetics and clinical variables
Adding the data of existence of pre-existent PVR improved the predictive capability of the three models showing significant differences in both samples, temporal and geographical (table 4 and figure 1). If the remaining evaluated clinical variables did not show any improvement, then they were not considered for reclassification of patients (table 5). Taking into account the existence of pre-existent PVR for readjusting models in the original series, validation showed again the better values for the model 2 in the temporal sample (AUC=0.7744; 95% CI 0.6393 to 0.9094), and also for the geographical sample (AUC=0.7152; 95% CI 0.6540 to 0.7765) (table 4). Additionally, and noteworthy, these values did not show significant differences with the original series. Most of the discrimination measurements did not show significant differences compared to the original series (table 6).
In this study, we have validated the accuracy of previously described models of PVR14 using the genotypic profile and some clinical variables of patients operated on from primary RD. Then we revalidated the models, improving their discrimination capability. To the best of our knowledge, this is the first time that genetic and clinical risk factors have been considered together for predicting PVR. Furthermore, the validation confirms the good diagnostic accuracy values of these models which are also better than that obtained by other published formulae.5–8 The main advantage of this model is that it could allow us to select those high-risk patients of PVR in order to apply more specific treatments, which would probably improve our real clinical practice.
As mentioned, medical approaches for preventing PVR have been attempted for almost 30 years. However, results have had limited success, and retinal toxicity and potential ocular and systemic side effects remain major concerns. Therefore, most of the tested approaches have been completely abandoned and a systematic use of some preventing measures in all RD is not justified. Thus, the identification of patients at high risk of developing PVR becomes paramount. Most models are based on the identification of clinical risk factors of PVR, however, none of the formulae developed are accurate enough for routine clinical use.10
In this scenario, it is likely that genetic susceptibility may have a role, and we developed predictive models of PVR based on the genetic markers.14 As described by Bleeker et al,22 results are often accepted without sufficient regard to the importance of external validation. They demonstrate the limitations of internal validation to determine generalisability of a diagnostic prediction model to future settings. Optimistic predictions are a common problem of many predictive models: their performance in new patients is often worse than the one expected, which is based on their performance estimated from the original dataset.23 It gets worse with small sample sizes and when dealing with complex models,24 and is a frequent problem in genetic studies. Then, before applying predictive models in clinical practice, it is therefore necessary to establish whether or not the model provides realistic estimations in another unrelated cohort of patients. External validation is always mandatory25–29 with a new and independent sample of patients. Populations can differ in time and place of collection. In this sense, it is important to analyse temporal validation and geographic or external validation.30
Machine-learning methods were used during the building of models to predict the probability of developing PVR.14 In that case, we did not consider the possibility of using traditional methods of modelling. The reasons for the use of machine-learning algorithms were: the presence of highly correlated factors with correlation structures in many cases unknown, and the huge amount of features in relation to the number of samples, especially the number of cases. In this scenario, it is not easy to fix an underlying functional form for models and also estimate the required number of parameters from small samples, both necessary to apply more traditional regression methodology. One primary characteristic of genomic datasets is the high dimensionality along with highly correlated features. Also typically, they have too many features in relation to the number of samples. For these reasons, the application of traditional methods of statistical analysis, such as discriminant analysis and logistic regression, is limited. Machine-learning algorithms treat the underlying structure of data as unknown, offering the possibility of extracting the hidden complex relationships and correlations. Additionally, although a small sample size also can cause important problems in the application of machine-learning algorithms, such as overfitting, the ratio of the number of samples to the number of variables is less than for traditional methods. Furthermore, many machine-learning tools incorporate methods of variable selection that allow us to reduce the dimension of the problem.
Limitations of this work must be discussed before analysing our results. There are two aspects that should be discussed; the quality of the samples (are both series comparable?) and the predictability of the models (are our models good predictors?). If both answers were positive, are they better than the already published ones? The quality of the validation sample is an important issue in the external validation of predictive models. One of the samples analysed here, the temporal sample, could be considered small, as the low prevalence of PVR greatly reduces the number of cases. Nevertheless, this sample was not used to fit any model, but for validating models fitted using the original sample. New patients from different places were included. In the validation series, PVR was less frequent in both samples, temporal and geographical, than in the original. This significant difference observed for the temporal sample was expected, as patient inclusion was made in a prospective way. Other differences, such as the status of the lens, or even the higher tendency to perform PPV among cases, were all predictable results as already described in the literature.31 Differences observed in the surgical variables such as the type of tamponade and retinopexy, and type of procedure, may suggest that patients in the geographical sample were more often reoperated on. In this case, there were more referral centres participating in the study than in the original series. The higher number of reinterventions in the geographical sample do not necessarily mean that RDs were more complex, but that patients could be included later, when primary procedures had already failed in other centres. This is one of the limitations of any retrospectively collected sample. However, the low prevalence of PVR makes it extremely difficult to collect such number of cases in a reasonable period of time.
The other question is about the capability of prediction of our validated models. As explained above, prediction models tend to perform better on data on which the model was developed than on new data. In our work, the model 2 resists the external validation offering AUCs values not significantly different from the ones obtained with the original data. It is also interesting to note that when clinical variables were added, the discrimination capability of the model improved significantly. Additionally, it also predicts as accurately as in the original series, confirming the strength of the model. It is, therefore, significant in our results that one of the three validated models showed similar discrimination values to the original series, overcoming any inherent bias of the predictive models in the first series.
Finally, values of AUCs obtained from this work considering both genetic and clinical variables (0.7744 for the temporal sample and 0.7152 for the geographical sample) are better than that observed for other formulae based only on clinical variables.5–8 Our research group recently performed the external validation of four published models, the AUC value being 0.5964 for the better predictive formula.10 It is important to point out that there is one recently published formula in which predicting values are better than the ones offered by our model.9 These authors investigate some biomarkers in the subretinal fluid obtained during surgery in addition to the clinical variables, but our model works with markers that could be analysed in a small peripheral blood sample before surgery which could be an advantage if any prophylactic measure could be administered during surgery. The subretinal fluid analysis may add further predictive value, but it very important to state that it has not been validated in an external sample which is mandatory before using it for clinical application.
It is important to observe that surgical procedures in the original series were recorded only for cases. Then, these variables could not be included in this work in the reclassification analysis. Despite that, the best model is the one that allows us to take a decision but not including the decision itself. Finally, this model allows us to select those high-risk patients to be recruited for clinical trials for assessing new procedures or compounds. The use of this diagnostic tool will reduce the required sample in a dramatic way as it enables to have more homogenous samples.
It is very interesting to note that analysing the role of markers included in the model 2, 7 out of 10 SNPs identified are located in genes that code for cytokines with anti-inflammatory actions: IL-10, IL1RN, NFKBIA, NFKBIB, and the TGFBs. Briefly, the main action of IL-10 is to limit the inflammatory response by blocking IFNG, IL-2, TNFA and IL-4 production.32 IL1RN binds the IL-1 receptor, neutralising the IL1A y B.33 NFKBIA and NFKBIB are responsible for inhibition of NFKB, a central cytokine in the inflammatory cascade, keeping it inactivated in the cytoplasm.34 Finally, TGFB mediates a number of actions that contribute to the limitation of the acute inflammation.35 ,36 These findings may justify future studies on the functional role of these markers.
In summary, we have validated three models for predicting PVR, one of them with good discrimination capability values and better than that obtained from other formulae. This finding could serve in the near future as a new tool into the clinical practice, in order to identify those patients at high risk of PVR, who will benefit most from new therapies which cannot be applied systematically to every RD patient.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
- Data supplement 1 - Online appendix
Contributors All authors state that they have participated in: substantial contributions to the conception or design of the work, or the acquisition, analysis or interpretation of data. Drafting the work or revising it critically for important intellectual content. Final approval of the version published. Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Funding This research was partially funded by The Special Trustees of Moorfields Eye Hospital, the Health Foundation and the NIHR Biomedical Research Centre, 162 City Road, London EC1V 2PD; the Portuguese grants from FCT (PTDC/SAU-ORG/110683/2009), through Unidade I&D Cardiovascular (51/94-FCT); the General R&D&I Office of the Government of Galicia (through grant PGIDIT06PXIB208204PR) and the FIS PI051437 project of the ‘Carlos III Health Institute’, Ministry of Health, Spanish Government.
Competing interests None.
Patient consent Obtained.
Ethics approval This study was approved by the institutional research ethics committees of each centre involved in the study.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.