Article Text

Accurate detection and grading of pterygium through smartphone by a fusion training model
  1. Yuwen Liu1,2,
  2. Changsheng Xu1,3,
  3. Shaopan Wang1,3,
  4. Yuguang Chen1,3,
  5. Xiang Lin1,4,
  6. Shujia Guo1,
  7. Zhaolin Liu1,5,
  8. Yuqian Wang1,
  9. Houjian Zhang1,
  10. Yuli Guo1,
  11. Caihong Huang1,
  12. Huping Wu6,
  13. Ying Li7,
  14. Qian Chen1,
  15. Jiaoyue Hu1,6,
  16. Zhiming Luo8,
  17. Zuguo Liu1,6
  1. 1 Eye Institute, Xiamen University, Xiamen, Fujian, China
  2. 2 Xiamen University National Institute for Data Science in Health and Medicine, Xiamen, Fujian, China
  3. 3 Institute of Artificial Intelligence, Xiamen University, Xiamen, Fujian, China
  4. 4 Department of Ophthalmology, Xiang'an Hospital of Xiamen University, Xiamen, China
  5. 5 Department of Ophthalmology, The First Affiliated Hospital of University of South China, Hengyang, Hunan, China
  6. 6 Eye Institute, Affiliated Xiamen Eye Center of Xiamen University, Xiamen, Fujian, China
  7. 7 Department of Ophthalmology, Xi'an People's Hospital, Xi'an, Shaanxi, China
  8. 8 Department of Artificial Intelligence, Xiamen University, Xiamen, Fujian, China
  1. Correspondence to Professor Zuguo Liu, Eye Institute of Xiamen University, Xiamen University, Xiamen, Fujian 361005, China; zuguoliu{at}xmu.edu.cn

Abstract

Background/aims To improve the accuracy of pterygium screening and detection through smartphones, we established a fusion training model by blending a large number of slit-lamp image data with a small proportion of smartphone data.

Method Two datasets were used, a slit-lamp image dataset containing 20 987 images and a smartphone-based image dataset containing 1094 images. The RFRC (Faster RCNN based on ResNet101) model for the detection model. The SRU-Net (U-Net based on SE-ResNeXt50) for the segmentation models. The open-cv algorithm measured the width, length and area of pterygium in the cornea.

Results The detection model (trained by slit-lamp images) obtained the mean accuracy of 95.24%. The fusion segmentation model (trained by smartphone and slit-lamp images) achieved a microaverage F1 score of 0.8981, sensitivity of 0.8709, specificity of 0.9668 and area under the curve (AUC) of 0.9295. Compared with the same group of patients’ smartphone and slit-lamp images, the fusion model performance in smartphone-based images (F1 score of 0.9313, sensitivity of 0.9360, specificity of 0.9613, AUC of 0.9426, accuracy of 92.38%) is close to the model (trained by slit-lamp images) in slit-lamp images (F1 score of 0.9448, sensitivity of 0.9165, specificity of 0.9689, AUC of 0.9569 and accuracy of 94.29%).

Conclusion Our fusion model method got high pterygium detection and grading accuracy in insufficient smartphone data, and its performance is comparable to experienced ophthalmologists and works well in different smartphone brands.

  • telemedicine
  • imaging
  • conjunctiva
  • ocular surface

Data availability statement

Data are available on reasonable request. The data generated and/or analysed during the current study are available on reasonable request from the corresponding author ZuL (zuguoliu@xmu.edu.cn).

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

  • For the lack of a smartphone database, we established a fusion model by fusing the huge slit-lamp images and a small number of smartphone images to enhance the pterygium detection and grading accuracy based on smartphones.

WHAT THIS STUDY ADDS

  • The fusion model can obtain high detection and grading accuracy close to the slit-lamp images-based model for pterygium by using smartphone images.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

  • Our study is conducive to the early detection of pterygium using smartphone images and the establishment of a database of smartphone images in the future.

Introduction

Pterygium is a common fibrovascular degeneration disease featured by a wing-shaped growth of conjunctival tissue over the adjacent cornea, usually on the nasal side.1 Surgery is the primary treatment for pterygium when it invades the corneal area and impairs vision.2–4 The restoration of corneal topography and the risk of recurrence after surgery are closely related to the size of the pterygium,5 6 indicating the importance of pterygium grading.

Usually, most of the complications of pterygium can be managed when diagnosed and treated early. The global prevalence of pterygium is 12%, with the lowest in Saudi Arabia (0.07%) and the highest in China (53%).7 However, due to inadequate propaganda and weak healthcare information in remote rural areas,8 9 it often results in delayed diagnosis and treatment of pterygium, which is liable to cause irregular astigmatism, limitation of eye movement, vision loss and even blindness.10–13 Moreover, the lack of professional ophthalmologists and medical equipment such as slit-lamp further gives rise to low accuracy in evaluating the progress of pterygium in remote areas.14 15 Some recent studies have detected and graded pterygium through artificial intelligence methods.16 17 Nevertheless, the dependency on slit-lamp images limits their application where slit lamps are unavailable. Hence, with tremendous universality and portability, smartphones may be a good choice to be indispensable personal health devices.18 In recent years, the per capita ownership of smartphones has increased significantly, even in remote and underdeveloped areas.19 With a wide variety of sensors and high-resolution cameras, smartphones also provide an innovative platform for extensive data collection in the future and give assistance to early diagnosis and management outside the hospital.

Nevertheless, compared with large amounts of high-quality slit-lamp images in uniform formats,20 it is a huge challenge to collect abundant smartphone images with high-resolution appropriate eye position, and full exposure of eyeball, which is vital for accurate recognition subsequently due to the limitations of privacy and security,21 non-standard photography methods, few publicly available standardised data sets and so on. Therefore, there is an increased demand for new artificial intelligence methods to achieve accurate recognition using a small number of smartphone images.

In this study, we established an interesting fusion training model by fusing slit lamp data and a small amount of smartphone data to significantly improve the accuracy of pterygium detection and grading, which provides a new idea of training sets data collection for accurate smartphone image detection in the future.

Method

Datasets based on slit-lamp images and smartphone-based images

Two datasets were collected using a slit-lamp and smartphone for training, validation and testing (table 1). The cobalt blue light, slit, overexposure, not looking straight ahead and blurred image collected from all subjects were considered low quality. Manually selected clear-eye panoramas and looking straight ahead were considered eligible images.

Table 1

Summary of datasets

The slit-lamp dataset (SLD) was collected from the Xiamen Eye Center of Xiamen University and Xiang'an Hospital of Xiamen University. After excluding 4651 low-quality images, there were 20 987 eligible images (8845 images with pterygium, 10 096 images with other abnormalities and 2046 images with normal corneas).

The smartphone-based dataset (SPB) was collected from the Xiamen Eye Center of Xiamen University and Xiang'an Hospital of Xiamen University, photoed by HUAWEI, iPhone and Xiaomi (the specific models and detailed collecting protocol in online supplemental figure S1). After removing 371 low-quality images, three smartphone brands finally got 418, 581 and 95 smartphone images, respectively, for 1094 smartphone images (563 images of pterygium, 426 images of other abnormalities and 105 images of normal corneas).

Supplemental material

When training the model, the primary datasets were randomly split into the training set (70%) and test set (30%) and verified over 40% of the test set as the validation set. Therefore, the image in the training/validation set will not appear in the test set.

Pterygium grading

In our study, the primary surgery indicators are the location of the pterygium head, corneal limbus and pupillary margin based on the study of Maheshwari.22 In general, the average horizontal diameter of the cornea is 11.5–12 mm in adults,23 and the pupil size is approximately 4 mm in normal light.24 The pterygium was graded to three levels (figure 1A) from the SLD and SPB.

Figure 1

The grading system for pterygium (A) example of pterygium images in three different grades by the slit-lamp and smartphone. Grade I: Between the limbus of the cornea and the midpoint between the limbus of the cornea and the pupil. Grade II: Between the midpoint between the limbus of the cornea and the pupil and the limbus of the pupil. Grade III: Exceed the margin of the pupil, or the width >5 mm and the area of the cornea invasion >6.25 mm². (B) The parameters of base width, length and area of pterygium.

In grading pterygium, the length of pterygium invasion of the cornea was the primary consideration. In addition, pterygium with width >5 mm and an area of the corneal invasion >6.25 mm² were also recommended to take into account Grade III.25

Detection of pterygium

The SLD was trained as a detection model for detecting pterygium in both slit-lamp and smartphone-based images. Our detection model used ResNet101 Faster RCNN26 27 for the feature extraction (online supplemental figure S2, stage 1). The detection model was trained by slit-lamp images from SLD called DM. During the training process, the slit-lamp images were randomly split into a ratio of 7:3, 14 691 for training and 6296 for testing. The training set and the test set were randomly distributed eye images of normal images, pterygium images and other disease images. The mAP (mean average precision), mIoU (mean intersection over union) and mAcc (mean accuracy) are used to evaluate the detection accuracy of the detection model.

Segmentation of pterygium and cornea

To accurately segment the pterygium region invading the cornea of those images with pterygium symptoms detected from the first stage, the SLD and SPB were used to train two segmentation models for segmenting slit-lamp and smartphone-based images, respectively. The segmentation models in this study were U-Net model28 based on Se-ResNeXt50 (SRU-Net) (online supplemental figure S2, stage 2). In this stage, the segmentation model was used to segment the cornea (the cornea area also includes the part covered by the pterygium) and segment the pterygium area. The segmentation models were trained with two datasets, SLD and SPB. In total, the SM1, SM2 and SM3 were trained with SLD, SPB and SLD and SPB, respectively. For the training process of SLD, 2276 single pterygium and double pterygium images extracted after removing blurred, severely exposed and misaligned images were split into a ratio of 7:3, 1693 for training and 583 for testing. For the training process of SPB, 118 smartphone images were used for training. The test set contained 248 smartphone images. The training and test set consists of normal eye images, pterygium eye images and other disease images. The mIoU, mHD (mean hausdorff distance) and mPA (mean pixel accuracy) were used as evaluation metrics in the segmentation task.

Methods of measurement

The resulting image output from the previous stage was processed, and the indicators of the pterygium invading the cornea were measured by the open-cv algorithm (online supplemental figure S2, stage 3). After segmentation of cornea and invading area of the pterygium, we then assessed and graded the corresponding level. In the stage, we measured the base width of pterygium (WidthP), the length of pterygium (LengthP) and the area of pterygium (AreaP) for its risk assessment29 (figure 1B).

By assuming the horizontal corneal diameter as 12 mm for each individual, we used the find contour function provided in the open-cv toolbox to calculate the pixel length of the horizontal diameter of the cornea. Then, the length of LengthP and WidthP were computed by calculating the corresponding ratios concerning the horizontal diameter of the cornea. The AreaP was computed based on the number of pixels of the contour of the segmented pterygium.

Statistical analysis

The performance of our system for detecting models was evaluated by calculating the accuracy, F1 score and area under the curve (AUC). The performance of the grading model in grade I–III was evaluated by the sensitivity, specificity, accuracy, F1 score, the receiver operating characteristic curve (ROC) and AUC under 95% CIs. The kappa test was performed to evaluate the consistency of the diagnostic test, and a kappa value of 0.61–0.80 was considered significantly consistent. In contrast, a kappa value higher than 0.80 were considered highly compatible. Statistical analyses were conducted using Python V.3.7.11. For AUC, the AUC curves were plotted to show the system’s ability. The ROC curve was created by plotting the ratio of true positive cases (sensitivity) against the percentage of false-positive cases (1-specificity) using the packages of Scikit-Learn (V.1.0.1) and Matplotlib (V.3.3.2). A larger area under the ROC curve indicated better performance.

Results

Performance in pterygium detecting and grading

Performance of detecting model

The quantitative performance indicators of DM (the detection model), the mAP, mIoU and mAcc were 0.9881, 0.9788 and 96.60% for SLD, 0.9563, 0.9100 and 95.24% for SPB (online supplemental table S1). The experimental results of image detection and original images based on SLD and SPB are shown in online supplemental figure S3. The above results indicated that our detection model DM also has high accuracy for pterygium images captured by smartphones.

Performance of grading model

In the beginning, the models SM1 and SM2 have used to segment smartphone-based images and found that neither of them took an ideal performance. The mIoU, mHD, mPA of the models SM1 and SM2 were 0.7781, 0.3507, 0.8889 and 0.6784, 0.5556, 0.8317 based on SPB, respectively. Therefore, the training set of slit-lamp images and smartphone-based images was used to train a new model (SM3) to test SPB, which turned out that the performance could reach the model SM1 to test SLD. In this study, the slit-lamp images used in SM3 training have a ratio of 83:17 to the smartphone. Due to the limited number of smartphone images, all available smartphone images were added to the model training. The mIoU, mHD mPA of the model SM3 were 0.8169, 0.3139 and 0.9259 based on SPB (table 2). This showed that our fusion modelling method was feasible.

Table 2

Performance of three different segmentation models based on SLD and SPB

Then, the SM3 was chosen as our final segmentation model to segment the smartphone-based image and used the open-cv algorithm to measure and grade. Using the result of SM1 to test SLD as the best standard and achieving a referable microaverage F1 score of 0.9118, sensitivity of 0.9201, specificity of 0.9764, AUC of 0.9478 and high accuracy (92.11%). The kappa consistency coefficient between the final measurement results and the ground truth results was 0.9193. To test SPB using SM3, we achieved a referable microaverage F1 score of 0.8981, sensitivity of 0.8709, specificity of 0.9668, AUC of 0.9295 and high accuracy (88.31%). The kappa consistency coefficient was 0.9086 (online supplemental table S2). The AUC analysis, ROC and confusion matrices of grading are shown in figure 2A–D. The appearance of the preprocessed images from SLD and SPB in the grading model is shown in online supplemental figure S3.

Figure 2

Performance of SM1 and SM3 in pterygium grading (A–D) ROC curves, AUC and confusion matrices of the system in pterygium grading in SLD and SPB. (E-H) TROC curves, AUC, and confusion matrices in pterygium grading based on images of the same group of patients in SLDS and SPBS. Different coloured point clouds represent the different grades. AUC, area under the curve; SLD, slit-lamp image dataset; SPB, smartphone-based image dataset, SLDS, slit-lamp image from the same group of patients dataset; SPBS, smartphone-based image from the same group of patients dataset.

The above results indicated that our fusion model SM3 achieved high grading accuracy for smartphone images, whose grading accuracy can reach model SM1 to test SLD.

Performance in pterygium grading based on the same group of patients’ slit-lamp and smartphone images

To make our results more convincing, we also collected 104 sets of images for the test, each containing a slit-lamp image and a smartphone-based image from the same patient’s eye. We used SM1 to test the slit-lamp images (SLDS) and achieved a referable microaverage F1 score of 0.9448, sensitivity of 0.9165, specificity of 0.9689, AUC of 0.9569 and high accuracy (94.29%). The kappa consistency coefficient was 0.8972. We used SM3 to test the smartphone images of the same patient (SPBS). We achieved a referable microaverage F1 score of 0.9313, sensitivity of 0.9360, specificity of 0.9613, AUC of 0.9426, high accuracy (92.38%) and the kappa consistency coefficient was 0.8521 (online supplemental table S3). The AUC analysis and ROC of grading are shown in figure 2E–H, indicating that SM3’s performance in smartphone-based images is close to SM1’s performance in slit-lamp images.

Performance in pterygium grading based on images of different smartphone brands

To further test the applicability of our model, we collected images of the three most popular smartphone brands on the market, HUAWEI, iPhone and Xiaomi. The referable microaverage F1 score, sensitivity, specificity, AUC and accuracy were 0.9549, 0.8143, 0.9729, 0.9676 and 95.65% by the HUAWEI phone, 0.9331, 0.8474, 0.9738, 0.9076 and 84.13% by the iPhone phone, and 0.9586, 0.8839, 0.9823, 0.9411 and 90.00% by the Xiaomi phone, respectively (online supplemental table S4). The AUC analysis and ROC are shown in online supplemental figure S4. These results showed that the model SM3 performed well in images taken from different brands of phones.

Comparison of three experienced ophthalmologists and the detecting and grading model

To further verify the diagnostic ability in pterygium detecting and grading, three experienced ophthalmologists with more than 10 years of clinical experience were asked to test independently. They marked the portion of the pterygium lesion for each smartphone image and then obtained values for the length, width and area of the pterygium that invaded the cornea.

We selected 200 images (90 images of pterygium, 37 images of normal and 73 images of other abnormalities) as the dataset SPBO for experts to detect pterygium symptoms, respectively. The experts got a 100% detection accuracy, and the model achieved a high accuracy (98.50%). AUC analysis and ROC are shown in figure 3A. Then, 90 images of pterygium in these 200 samples were screened out as the test data to assess and grade and compare the performance differences between them. For the entire test without patient information, the referable microaverage F1 score, sensitivity, specificity, AUC and accuracy were 0.8971, 0.8129, 0.9445, 0.9543 and 93.91% from the ophthalmology experts, and 0.9248, 0.7569, 0.9624, 0.9385 and 88.52% from the model (online supplemental table S5). The AUC analysis and ROC are shown in figure 3B. In the random sample of these results, our model’s performance is comparable to experienced ophthalmologists.

Figure 3

Comparison of the performance of SM3 and experts. (A) ROC curves and AUC of SM3 in pterygium detecting in SPBO. (B) ROC curves and AUC of SM3 in pterygium grading in SPBO. ‘Expert-Avg’ and ‘System-Avg’ indicates the average of the experts and system, respectively. Different coloured point clouds represent different experts. AUC, area under the curve. SPBO, smartphone-based image for ophthalmologists dataset.

Discussion

In this study, we aimed to achieve an early diagnosis of pterygium by fusion model, and we found that the model can effectively improve the accuracy of smartphone detection and grading of pterygium. Using 20 987 slit-lamp images and 1094 smartphone-based images, the mAP, mIoU and mAcc of DM (trained by the slit-lamp image) were 0.9563, 0.9100 and 95.24% in detecting smartphone images. For the following segmentation and grading of pterygium, the fusion model (SM3) sensitivity for segment and grade of pterygium was 0.8709, and the specificity was 0.9668, demonstrating that SM3’s performance in smartphone images was excellent and close to that of the model SM1 in slit-lamp images. Moreover, our model performance was comparable to experienced ophthalmologists and had excellent performance across different smartphone brands.

A slit lamp is a fundamental tool for ophthalmic examination that can generate plentiful high-resolution images, which has been reported to use to detect pterygium by artificial intelligence in previous studies.16 17 Regrettably, as a specialised medical device relying on professional medical staff, the slit lamp is not always available in primary hospitals. Instead, the progress of informatisation in society endows smartphones with superb portability and universality in the general population.19 Besides being equipped with various sensors and high-definition cameras that enable the collection, transmission and processing of information, smartphones can also generate photographs close to the quality of slit lamp images. In practice, however, as an unconventional examination, the data collected by smartphones is difficult and of varying quality. Moreover, with no accumulation of data over many years and scarcely any publicly available standardised data sets, the accuracy of smartphone recognition would be unsatisfactory, which further lowers the enthusiasm of users, in a vicious circle, hinders the stability of data sources and ulteriorly influences the improvement of the model’s accuracy. Therefore, we hope to use the existing high-quality slit-lamp image data to solve the problem of insufficient high-quality smartphone data. Our fusion modelling method improved the accuracy of smartphone detection and grading, which can also be applied to other disciplines to overcome the quagmire lacking high-quality smartphone image data. Furthermore, this novel out-of-hospital diagnosis mode keeps people largely free from the restriction of time and space, which greatly saves patients' time, energy and economic resources. Most importantly, under the circumstance of the unbalanced distribution of medical resources, our study shed light on the timely diagnosis of pterygium in underdeveloped areas.

However, there are still several limitations to our study. First, our study only makes judgements based on image information, lacking other information such as medical history and symptoms. In the future, we will take the medical history and the symptoms into account and evaluate the colour, transparency and blood vessels of pterygium to develop a more accurate classification method.29 Second, we have established the standard of eligible images, according to which relevant image data are selected manually. In the future, we expect to build an automatic quality control system for future practical applications, which can exclude images that are not eyes or low quality by algorithms rather than manual effort. Third, our study focused on identifying pterygium, and a clear sorting mechanism has not been established for other unidentifiable diseases. Therefore, we consider establishing a more complete recognition system for ocular surface diseases in the future.

In conclusion, under the situation of no existing mature mobile database, our fusion model method proves to be a powerful tool to improve pterygium detection and grading accuracy in insufficient smartphone data. The establishment of remote detection and grading of pterygium by smartphones relieves the pressure on hospitals, reduces the economic stress on the country and lowers the chance of infection, especially during the COVID-19 epidemic. In addition, our new attempt can be applied to other diseases besides pterygium, especially in remote areas worldwide.

Data availability statement

Data are available on reasonable request. The data generated and/or analysed during the current study are available on reasonable request from the corresponding author ZuL (zuguoliu@xmu.edu.cn).

Ethics statements

Patient consent for publication

Ethics approval

The ethical approval of this study was obtained from the Medical Ethics Committee of School of Medicine, Xiamen University (identifier, XDYX2022004). Informed consent in the study was exempted due to using only deidentified retrospective records data and images without any identifying marks.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • YL, CX and SW contributed equally.

  • Contributors Conception and design: ZuL, ZhL, JH and YL. Analysis and interpretation: YL, CX, SW, ZhL, YC, HZ, YG, CH and QC. Data collection: YL, YC, XL, SG, HW and YL. Manuscript preparation and overall responsibility: ZuL, ZhL, JH, YL, CX and SW. All authors approved the final manuscript. ZuL is the guarantor.

  • Funding This study was supported by grants from the National Key R&D Program of China (No. 2018YFA0107304, ZhL) and National Natural Science Foundation of China (No. 81870627, ZuL; No. U20A20363, JH and No. 81900825, CH).

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Linked Articles

  • Highlights from this issue
    Frank Larkin