This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3102176, IEEE Access Retinal Vessel Segmentation Learning: A Review Using Deep CHUNHUI CHEN1, JOON HUANG CHUAH1, (Senior Member, IEEE), RAZA ALI1,2, (Student Member, IEEE), YIZHOU WANG3, (Senior Member, IEEE) 1 Department of Electrical Engineering, Faculty of Engineering, University of Malaya, Kuala Lumpur 50603, Malaysia Faculty of Information and Communication Technology, BUITEMS, Quetta 87300, Pakistan 3 Center on Frontiers of Computing Studies, Peking University, Beijing 100871, P.R. China 2 Corresponding author: Joon Huang Chuah (

[email protected]

) ABSTRACT This paper presents a comprehensive review of retinal blood vessel segmentation based on deep learning. The geometric characteristics of retinal vessels reflect the health status of patients and help to diagnose some diseases such as diabetes and hypertension. The accurate diagnosis and timing treatment of these diseases can prevent global blindness of patients. Recently, deep learning algorithms have been rapidly applied to retinal vessel segmentation due to their higher efficiency and accuracy, when compared with manual segmentation and other computer-aided diagnosis techniques. In this work, we reviewed recent publications for retinal vessel segmentation based on deep learning. We surveyed these proposed methods especially the network architectures and figured out the trend of models. We summarized obstacles and key aspects for applying deep learning to retinal vessel segmentation and indicated future research directions. This article will help researchers to construct more advanced and robust models. INDEX TERMS Retinal vessel segmentation, fundus images, deep learning, convolutional neural network. I. INTRODUCTION The fundus retina image is the only deeper microvascular system that can be observed non-invasively. The retinal vessel map contains abundant geometric characteristics, such as vessel diameter, branch angles, and branch lengths. These geometric characteristics reflect clinical and pathological features, which are used to diagnose hypertension, diabetes, and atherosclerosis [1-4]. The ophthalmologist uses retinal blood vessels to diagnose vascular and vascular system lesions related diseases, which interprets diabetic retinopathy (DR) and diabetic maculopathy (MD). These are the leading causes of global blindness. Retinal image assessment has been an indispensable step for the identification of retinal pathology. Retinal fundus image illustrates retina structure, such as retinal blood vessel tree, optic disk (OD), fovea, macula, and abnormal structures, as shown in Figure 1. The retinal blood vessel tree is composed of the central retinal artery, vein, and branches. An abnormality may include microaneurysms (MAs), haemorrhages, exudates and cotton wool spots [5]. Figure 1 Annotated structure of retina [6]. Precise identification and diagnosis of eye abnormalities and their timely medication are vital in preventing blindness. Initially, trained experts would manually segment the retinal blood vessels, but that was an expensive, tedious and timeconsuming process [7]. Moreover, the complexities of the image cause inconsistency of vessel map segmented by different experts [8], due to the lower contrast between vessels and backgrounds, uneven illumination, various abnormalities and variation in vessel width/shape. These facts inspire the development of automatic retinal vessel segmentation with minimal human interference. Several supervised and unsupervised methods are developed and used to automate the segmentation of retinal vessels. Earlier, unsupervised methods are the most common approach for automatically segmenting the retinal vessels, which do not rely on any annotation for segmentation [9, 10]. These methods are roughly divided into matching filter [1113], vascular tracing based segmentation [14-16] and modelbased segmentation methods [17]. Unsupervised methods show some defects in their performance because they cannot benefit from the hand-labelled ground truth. Unlike unsupervised methods, supervised models are trained by using annotations and can benefit from the ground truth. Supervised models conduct retinal vessel segmentation in two stages: feature extraction and pixels classification. Features can be further divided into handcrafted features or automatically learned features. In machine learning, the process of feature extraction from fundus images is manual, and some typical classifiers are adopted, such as k-nearest neighbour classifier (KNN) [18] and support vector machine (SVM) [19]. However, selecting features manually lacks generalization ability since it is application-specific and cannot learn new features automatically [20]. 1 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3102176, IEEE Access Chunhui Chen et al,: Retinal Vessel Segmentation Using Deep Learning: A Review (June 2021) Deep learning, especially convolutional neural networks (CNNs), has gained much attention for image analysis [21, 22]. Deep learning methods learn features automatically by using massive data with less human inference. They have better generalization ability and recognition capability because they can learn different level patterns automatically and will not be limited by a specific application. In 2012, Krizhevsky, et al. [23] proposed AlexNet for image recognition. For image segmentation and identification, VGGNet [24] and GoogleNet [25] were introduced. Long, et al. [26] proposed fully convolutional networks (FCN) for image semantic segmentation, which made dense predictions in a sliding window fashion and thus speeded up the segmentation. Several review articles on retinal blood vessel segmentation have been published [10, 27-30]. However, Mookiah, et al. [10], Khan, et al. [28], Badar, et al. [29], Li, et al. [30] did not focus on deep learning methods for vessel segmentation, whereas the techniques discussed in Soomro, et al. [27] are published several years ago. Therefore, in this review article, we discussed publications of recent six years for retinal blood vessel segmentation based on deep learning. All the papers were retrieved by conducting iterative and exhaustive searches in IEEE Xplore, Springer Link and ScienceDirect databases. We applied the following search command to index terms of both journal papers and conference papers: “fundus image” AND (“retinal vessel” OR “retinal blood vessel”) AND (“segmentation” OR “extraction” OR “detection” OR “identification”) AND (“deep learning” OR “convolutional neural network” OR “CNN” OR “fully convolutional network” OR “FCN” OR “generative adversarial network” OR “GAN”). We only selected original studies from 2016 that formulated retinal vessel segmentation as the main task instead of the intermediate task. This article is organized as follows. In section Ⅱ, we discuss deep learning and convolutional neural networks. In section Ⅲ, we introduce the datasets used for retinal vessel segmentation and performance evaluation metrics for proposed models. In section Ⅳ, we analyze the existing models for retinal segmentation based on deep learning. In section Ⅴ, we discuss retinal vessel segmentation according to the analysis of existing models. Section Ⅵ concludes the article and points out future research directions. II. OVERVIEW OF DEEP LEARNING Deep learning models are composed of hierarchically structured layers which translate input information into a meaningful output. Deep learning has been developed a rich family since 1990 [31], such as deep neural networks (DNNs) [31], auto-encoders (AEs) [32] and stacked autoencoders (SAEs) [33] neural networks, deep belief network (DBNs) [34, 35], restricted Boltzmann machines (RBMs) [36], convolution neural networks (CNNs) [37], recurrent neural networks (RNNs) [38, 39], generative adversarial networks (GANs) [40] and graphic neural networks (GNNs) [41]. In this section, we have discussed the most widely used CNNs architectures for image computer vision tasks. A. CONVOLUTIONAL NEURAL NETWORKS (CNNS) Convolutional Neural Networks (CNNs) are inspired by multi-layered perceptrons (MLPs) and are widely used for image processing such as classification, segmentation, and localization. Hubel and Wiesel [42] conducted a first experiment based on CNN, indicated that cells in the cat's visual cortex were responsible for detecting light in corresponding receptive fields. LeCun, et al. [37] proposed another CNN based network that recognized the handwritten digits. The network was composed of convolution operation and pooling operation, trained by the back-propagation algorithm. Later, LeCun, et al. [43] proposed the LeNet-5 for document recognition. However, these architectures were not widely used due to the lack of training data and computation power at that time. Krizhevsky, et al. [23] proposed a powerful deep CNN for image classification, called AlexNet. The model had significant improvement and outperformed all existing methods, also won the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) [44]. AlexNet architecture is deeper than LeNet-5 and utilizes the ReLU activation function. Figure 2 and Figure 3 show the architecture of LeNet-5 and AlexNet, respectively. Figure 2 Architecture of the LeNet-5 [43]. C: convolution layer, S: subsampling layer, F: fully connected layer. Figure 3 Architecture of AlexNet [23]. Encouraged by AlexNet, a lot of research has been done based on CNN architectures. Several applications based on deeper architectures were proposed to improve the performance. VGG Net [24] was the first to explore much deeper networks, which stacked small, fixed sized kernels in each convolution layer. Simonyan and Zisserman [24] proposed deeper CNNs with different numbers of convolution layers, such as 13, 16 and 19. Finally, VGG19 with 19 convolutional layers won the ImageNet challenge of 2014. Szegedy, et al. [25] introduced GoogleNet which contains 22 layers and adopts the Inception module [45]. 2 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3102176, IEEE Access Chunhui Chen et al,: Retinal Vessel Segmentation Using Deep Learning: A Review (June 2021) 1) CNN ARCHITECTURE COMPONENTS CNN architectures are composed of hierarchically structured layers with optimized parameters. This section will interpret the main components of CNN. a) CONVOLUTIONAL LAYER Convolutional layer is the main layer in CNNs that extracts features from input data or feature maps. The convolutional layer contains several stacked convolution kernels to conduct convolution operations. In convolution operation (see Figure 4), a convolution kernel slides from left to right and from up to down, and it multiplies with a specific region of input or feature map elementwise to produce a value, which is known as feature extraction. The specific region is called the receptive field. These special regions share kernels, known as weight sharing which reduces the complexity of the model and makes the training process easier. Mathematically, the feature map z generated by convolution kernel can be expressed as: 𝑧 =𝑊∗𝑥+𝑏 (1) where x is the input image, W is the convolution kernel, while b is the bias for the convolution layer. 𝑦 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑥) = 1 1+𝑒 −𝑥 (2) where x is the input and y represents the output. The sigmoid function experiences the vanishing gradient problem for very large or very small input. ReLU [49] is another frequently used activation function. It is expressed as: y = ReLU(x) = max⁡(x, 0) (3) where x is the input of ReLU function and y represents its output. It preserves the positive part in feature maps and prunes the negative part to 0. ReLU can alleviate the problem of vanishing gradient since its gradient is 1 when the input is positive, no matter how large it is. However, when the input is negative, the output of ReLU and its gradient is always equal to 0. It can reduce overfitting, but it also obstacles CNN architecture to learn in some cases because of zero i.e. disconnection of neurons. LeakyReLU (LReLU) was proposed to address the problem of zero gradients when the input is negative for ReLU function [50]. It preserves the positive part fully, but it also preserves the negative part and scales them in a ratio λ (range 0 to 1). It is expressed as: y = LReLU(x) = max(x, 0) + λmin⁡(0, x) (4) when its input is negative, both output and gradient are nonzero values. Figure 4 Convolution operation. Stride=1 and assume bias=0. b) BATCH NORMALIZATION The input or feature maps generated by convolutional layers may vary greatly, so for large or small values sent to the activation function they face a problem of vanishing/exploding gradients, which hamper the training process [38, 46]. To address this problem, batch normalization [47] is proposed to accelerate the training process, which scales the input of activation function and reduces internal covariate shift by applying normalization operation to each mini-batch. Generally, batch normalization is performed before the activation function, but the function can also be used after the activation function based on application. c) ACTIVATION FUNCTION An activation function is a type of mathematical function that maps input non-linearly, which is applied to improve the feature representation ability of networks. It often follows convolutional layers and uses feature maps as input in neural networks. Sigmoid function [48] is a prevalent option for activation function, which is defined as: Generally, Softmax function is applied to the final layer as activation function for the multi-class classification task. It is expressed as: 𝑒 𝑥𝑖 𝑦(𝑥)𝑖 = ∑𝐾 𝑖=1 𝑒 𝑥𝑖 (5) where 𝑥 is input vector and 𝑥𝑖 is its component. Kdimension output means K-class classification.⁡𝑦(𝑥)𝑖 is the output which means the probability of the input vector is classified into the 𝑖𝑡ℎ class. d) POOLING LAYER The feature map out of the convolutional layer records the position of pixels precisely, so it is very sensitive to the location of features. The high sensitivity means a small movement of the position of features, such as rotation and shift, will lead to a different map, which will decrease the robustness of CNNs. Usually, a pooling layer with pooling operation is applied after the convolutional layer to reduce specific feature positioning reliance and ensure the shiftinvariance of CNNs. At the same time, pooling operations can also reduce the resolution of feature maps, and then reduce the burden of computation. Pooling operations can be categorized as max-pooling [51], average pooling [52], and sum pooling. Figure 5 shows how 3 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3102176, IEEE Access Chunhui Chen et al,: Retinal Vessel Segmentation Using Deep Learning: A Review (June 2021) pooling operations work: a sliding window is placed upon feature maps and max value, average value, or sum of the value in this window is calculated as output. Especially, if the size of the pooling window equals the size of the feature map, it is referred to as global pooling, otherwise, it is local/regional pooling. 𝑆𝑅| represents the common elements between 𝐺𝑇 and 𝑆𝑅. Based on the Dice coefficient, Dice loss (DSL) is another loss function widely applied to image segmentation tasks. Dice loss is expressed as: DSL = 1 − DSC = 1 − ⁡ 2|𝐺𝑇∩𝑆𝑅| |𝐺𝑇|+|𝑆𝑅| (9) B. FULLY CONVOLUTIONAL NETWORKS (FCNS) Long, et al. [26] proposed fully convolutional networks (FCNs) which replaced fully connected layers with upsampling layers. The feature maps were up-sampled to the same size as the input images, and thus dense predictions were made by the network. The proposed FCN architecture is shown in Figure 6. Compared with traditional CNNs, FCNs can predict each pixel in an image or image patch, so it is more suitable and fast for image segmentation tasks. Figure 5 Pooling operations, with 2x2 filter and stride=2. e) FULLY CONNECTED LAYER Fully connected layers (FCs) are flattened layers that generate specific semantic information. Each neuron in the fully connected layer has a connection with all the neurons in the previous layer, then all activations can be computed with matrix multiplication followed by biases. Figure 6 Architecture of an FCN [26]. C. U-NET 2) LOSS FUNCTION The loss function is used to evaluate the difference between the predicted result and desired result. An appropriate loss function can measure the difference between the result and label properly and guide a fast and correct training process. Following are some popular loss functions used in CNN architectures. For multi-class classification tasks, the most used loss function is cross-entropy function loss. It is given as: ℒ = − ∑𝑀 𝑖=1 𝑐𝑖 log⁡(𝑝𝑖 ) (6) Ronneberger, et al. [53] proposed U-net which has symmetrical encoder-decoder structure and skip connections from encoding path to decoding path. Features were extracted in the encoder and images were reconstructed in the decoder. Skip connections sent low-level feature maps generated in the encoder to the decoder directly. Since lowlevel feature maps contained local information while highlevel feature maps contained global information, then the proposed U-net integrated low-level and high-level feature maps and thus made the better prediction. Figure 7 shows the U-net architecture. where M is the number of classes, 𝑐𝑖 is the practical label of an input belongs to 𝑖𝑡ℎ class so it is 0 or 1, 𝑝𝑖 is the probability of the input predicted by networks. Cross-entropy can also be applied to binary classification since the sigmoid function is a special case of the Softmax function. Here, cross-entropy is known as binary crossentropy loss function, it is expressed as: ℒ = −(ylog(p) + (1 − y)log⁡(1 − p)) (7) where y is ground truth and p is predicted value. Dice coefficient (DSC) is a statistical indicator that can be used to evaluate the similarity between two images. It is represented as: DSC = 2|𝐺𝑇∩𝑆𝑅| |𝐺𝑇|+|𝑆𝑅| (8) where |𝐺𝑇| represents the ground truth magnitude while |𝑆𝑅| represents the segmentation result magnitude, |𝐺𝑇 ∩ Figure 7 Architecture of U-net [53]. III. DATABASE AND EVALUATION RETINAL VESSEL SEGMENTATION METRICS FOR Retina locates in the inner layer of the eyewall. A digital fundus camera attached with a low-power microscope is used to acquire retinal fundus images. The pupil of the human eye 4 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3102176, IEEE Access Chunhui Chen et al,: Retinal Vessel Segmentation Using Deep Learning: A Review (June 2021) is the entry/exit point for fundus camera illumination and imaging light beams on the retina. The retinal fundus images can also be obtained through EasyScan camera based on Scanning Laser Ophthalmoscopy (SLO) [54]. SLO has the advantage of lower light exposure and has a better contrast between vessels and background due to the confocal design [55]. There are many publicly available databases for retinal vessel segmentation [10]. Here we just introduce several main databases. DRIVE [18], STARE [56], CHASE_DB1 [57] and HRF [6] are the most used publicly available databases. DRiDB [58] and ARIA [59, 60] are also available for retinal vessel segmentation but less used in recent years. Images in these six databases were obtained by the colorful fundus photography technique. In addition, two other databases, IOSTAR [61] and RC-SLO [62], whose samples were obtained by SLO, can also be used for retinal vessel segmentation. Table 1 indicates the brief information of these databases. Table 1 Summary of 2-D fundus Image Datasets used for retinal vessel segmentation. Dataset Year Resolution STARE 2000 605x700 DRIVE 2004 ARIA Total FOV Format 20 35° PPM 768x584 40 45° JPEG 2006 768x576 143 50° TIFF HRF 2009 3504×2336 45 45° JPEG CHASE_DB1 2011 999x960 28 30° TIFF DRiDB 2013 720x576 50 45° BMP IOSTAR 2015 1024x1024 30 45° JPEG RC-SLO 2015 360x320 40 / TIFF Images Generally, pixels in FOV of fundus images are classified as vessel pixel (positive) or non-vessel pixel (negative). To measure the identification of pixels, ground truth labels are compared with pixel identifications. On this basis, there are four basic pixel measures i.e., TP (true positives), FP (false positives), FN (false negatives), and TN (true negatives). Table 2 shows the measures of these elements through pixels. Table 2 Pixel measures in vessel segmentation. Classification result Segmentation result Vessel Non-vessel Ground truth Vessel Non-vessel TP FP FN TN Several evaluation metrics are defined to evaluate the performance of segmentation networks. Some of the prevalent metrics are listed in Table 3. Table 3 Evaluation metrics for image segmentation. Matric Expression Sensitivity 𝑆𝑒𝑛 = Specificity 𝑆𝑝𝑒 = Precision 𝑃𝑟𝑒 = Accuracy F1-score 𝐴𝑐𝑐 = Matthews Correlation Coefficient G-mean False Positive Rate (FPR) 𝑇𝑃 𝑇𝑃 + 𝐹𝑃 2𝑇𝑃 2𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁 𝐽𝑆 = MCC = 𝑇𝑁 𝑇𝑁 + 𝐹𝑃 𝑇𝑃 + 𝑇𝑁 𝑇𝑃 + 𝐹𝑃 + 𝑇𝑁 + 𝐹𝑁 𝐹1 = Jaccard Similarity 𝑇𝑃 𝑇𝑃 + 𝐹𝑁 |GT ∩ SR| |𝐺𝑇 ∪ 𝑆𝑅| 𝑇𝑃 −𝑆∗𝑃 𝑁 √𝑆 ∗ 𝑃 ∗ (1 − 𝑆) ∗ (1 − 𝑃) G = √𝑆𝑝𝑒 ∗ 𝑆𝑒𝑛 𝐹𝑃𝑅 = 𝐹𝑃 𝑇𝑁 + 𝐹𝑃 where GT means ground truth and SR means segmented results. N=TP+FP+TN+FN, S=(TP+FN)/N, S=(TP+FP)/N. Here, F1-score is equal to the Dice coefficient, and Sensitivity is also known as Recall or True Positive Rate (TPR). In addition, the Receiver operating characteristic curve (ROC curve) is a plot that summarizes the trade-off between TPR and FPR of a model under different thresholds. Therefore, the ROC curve can be utilized to compare different models under the identical threshold or a specific model under different thresholds. Similar to the ROC curve, the Precision-Recall curve (PR curve) illustrates the trade-off between Precision and Recall. The area under the curves, AUC_ROC (the area under the ROC curve), and AUC_PR (the area under the PR curve), are available to evaluate the overall performance of the networks. IV. EXISTING MODELS SEGMENTATION FOR RETINAL VESSEL In this section, we category and analyze various methods for retinal vessel segmentation according to their network architecture. A. CNN FOR RETINAL VESSEL SEGMENTATION Earlier, some researchers adopted CNNs with only several layers to segment vessels. We review 7 CNNs and summarize their performance evaluations in Table 4. Fan and Mo [63] applied a 5-layer CNN to vessel segmentation and extracted image patches in the green 5 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3102176, IEEE Access Chunhui Chen et al,: Retinal Vessel Segmentation Using Deep Learning: A Review (June 2021) channel as input. According to the comparison between R, G and B channels, the green channel provides the best vesselbackground contrast than red and blue channels. They used 2-norm as the loss function and adopted an optimized threshold to generate the binary vessel map. Liskowski and Krawiec [64] proposed a CNN with 6 layers for retinal vessel segmentation. They applied global contrast normalization (GCN) and zero-phase component analysis (ZCA whitening) to training images in the pre-processing phase. GCN reduced the uneven illumination in images and ZCA abstracted features from universal characteristics and thus focused on the higher-order correlations. Khalaf, et al. [65] constructed a CNN with 7 layers. They divided pixels in an image into 3-class: background, large vessel and small vessel to reduce the intra-classes variance. They extracted the green channel of images and applied adaptive histogram equalization (AHE) and top-hat filtering to the green channel in the pre-processing phase. The green channel and AHE increased image contrast and suppressed noise, and top-hat filtering enhanced vessels in training images. Vengalil, et al. [66] proposed to fine-tune an existing network DEEPLAB-COCO-LARGEFOV using retinal fundus image patches. They replaced the last layer by a convolutional layer and applied a threshold to obtain final vessel maps. They did not adopt any image processing technique because they thought it may lead to undesired outcomes or harm vessel structures. Tan, et al. [67] constructed a 7-layer CNN to make predictions for multiple objects in fundus images, including optic disc, fovea and retinal vessels. They extracted image patches in different channels with different sizes and resized them. Utilizing multiple channels can provide more information which is helpful for multi-object classification. Guo, et al. [68] proposed a CNN with 6 layers and introduced a reinforcement sample learning scheme that trained the network on samples with poor performance. The proposed scheme allows researchers to train networks with fewer iterations of epochs and less training time as well as increase network performance. Uysal and Güraksin [69] proposed a CNN model with several convolutional layers, and they also introduced transposed convolution to up-sample feature maps. Their proposed model made pixel-wise identification and did not perform well. From Table 4 we can see that most CNNs just produced about 94% segmentation accuracy. We suppose that it is because CNNs have only several convolutional layers and do not have strong feature representation capacity, then they can only segment the basic structure and misclassified most of the vessel boundaries and thin vessels, so they are less used in recent years. Table 4 Performance evaluations of CNNs for retinal vessel segmentation. Reference Database Sen Spe Acc DRIVE STARE CHASE_DB1 DRIVE STARE 0.7814 0.7234 0.9702 0.7763 0.7867 0.9788 0.9799 0.9702 0.9768 0.9754 0.9612 0.9614 0.6761 0.9495 0.9566 Khalaf, et al. [65] DRIVE 0.8397 0.9562 Vengalil, et al. [66] HRF / / Tan, et al. [67] DRIVE 0.7537 0.9694 Fan and Mo [63] Liskowski and Krawiec [64] Guo, et al. [68] Uysal and Güraksin [69] DRIVE STARE DRIVE STARE / / 0.7548 0.7377 0.9682 0.9735 AU-ROC Kappa / / 0.972 0.9785 0.7781 0.7622 0.9456 / / 0.9394 0.894 / / / / 0.9199 0.9220 0.9419 0.9471 0.9652 0.9444 / / / Table 5 Performance evaluations of FCNs for retinal vessel segmentation. Reference Dataset Pre Sen Spe Acc AU_ROC Luo, et al. [70] DRIVE / / 0.9741 0.9628 / Dasgupta and Singh [71] DRIVE 0.8498 0.7691 0.9801 0.9533 / 0.754 0.8352 0.8640 0.8010 0.8039 0.8315 0.7779 0.9825 0.9846 0.9745 0.8010 0.9804 0.9858 0.9864 0.9624 0.9734 0.9668 0.9650 0.9576 0.9694 0.9653 0.981 0.9900 0.9810 0.9777 0.9821 0.9905 0.9855 Jiang, et al. [72] Oliveira, et al. [73] DRIVE STARE CHASE_DB1 HRF DRIVE STARE CHASE_DB1 / / 6 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3102176, IEEE Access Chunhui Chen et al,: Retinal Vessel Segmentation Using Deep Learning: A Review (June 2021) Reference Dataset Pre Sen Spe Acc AU_ROC Soomro, et al. [74] DRIVE STARE CHASE_DB1 HRF / 0.87 0.848 0.886 0.829 0.985 0.986 0.982 0.962 0.956 0.968 0.976 0.962 0.986 0.988 0.985 0.978 Li, et al. [75] DRIVE / 0.7752 0.9883 0.9697 0.9844 Atli and Gedik [76] DRIVE STARE CHASE_DB1 / 0.7987 0.6574 0.7876 0.9854 0.9933 0.9845 0.9689 0.9682 0.9676 0.9851 0.9748 0.9892 B. FCN FOR RETINAL VESSEL SEGMENTATION FCNs can make dense and excellent predictions for each pixel in an image patch [26]. In this survey, we review 7 FCNs and list their performance evaluations in Table 5. Oliveira, et al. [73] proposed an FCN and added skip connections to propagate features from shallow layer to deeper layer. They also explored the multiscale nature of the vascular system by using stationary wavelet transform (SWT) which added extra channels to input. Their result illustrated that the deep learning method can benefit from domain knowledge. Jiang, et al. [72] used a network based on the fully convolutional version of AlexNet. They applied Gaussian smooth to reduce the discontinuity between FOV and the replaced region. The segmented vessels were thicker than ground truth, so Jiang, et al. [72] applied a 9x9 filter to refine the result and reduce noise in the post-processing phase. Dasgupta and Singh [71], Soomro, et al. [74] also proposed FCNs for retinal vessel segmentation. Soomro, et al. [74] formulated a 2-classes classification task while Dasgupta and Singh [71] regarded the task as a multi-label inference task. Soomro, et al. [74] introduced principal component analysis (PCA) to convert RGB images into well contrast grayscale images. Li, et al. [75] constructed an FCN with skip connections and introduced active learning to retinal vessel segmentation. Active learning used fewer manually labelled samples to improve the segmentation accuracy of blood vessels. The performance of the proposed model was increased in the iterative training process. Since the consecutive down-sampling operations in the encoder lead to loss of information, which is critical to determine vessel boundaries and thin vessels. Luo, et al. [70] proposed a size-invariant fully convolutional neural network (SIFCN) to reduce its effect. They hold the size of feature maps in each layer by padding and assigning strides and thus reduces loss of information. Atli and Gedik [76] proposed a fully convolutional network and they were the first to use up-sampling and downsampling to capture thin and thick vessels, respectively. Their proposed model made some over segmentation and did not produce a very good performance, especially on STARE database. From Table 4 and Table 5, we can see that FCNs have better performance than CNN's since they have more convolutional layers, then they can learn higher-level features. However, compared with U-net, FCNs have fewer convolutional layers and cannot reuse low-level information better, so it is less used in recent years. C. U-NET FOR RETINAL VESSEL SEGMENTATION U-net has a symmetric architecture and skip connection is applied to send feature maps from encoder to decoder directly [53]. Low-level feature maps contain rich detailed information while high-level have better global information, therefore, U-net can capture local and global information to make better decisions. In this survey, we review 32 U-shaped networks and list their performance evaluations in Table 6. Guo, et al. [77] proposed a U-net and introduced structured dropout to regularize it. The proposed structured dropout is inspired by DropBlock [78] and discards continuous regions of feature maps in a ratio. Sule and Viriri [79] proposed a Unet and applied transpose convolution to the expanding path to recover the lost information. Zhang and Chung [80] regarded retinal vessel segmentation as a multi-class classification task and introduced an edgeaware mechanism. They divided pixels into 5 classes: background, thick vessels, thin vessels, background near thick vessels and background near thin vessels. The network can pay more attention to the boundary areas of vessels in this way. They leveraged deep supervision to ease optimization. Mishra, et al. [81] proposed a simple U-net and introduced data-aware deep supervision to improve thin vessel segmentation. They computed the average input retinal vessel width and matched it with the layer-wise effective receptive fields to find layers that extract vessel features preeminently, and then add auxiliary layers there. Laibacher, et al. [82] proposed a U-shaped network for retinal vessel segmentation which was built on pre-trained components of MobileNetV2. It was the first network to run in real-time on high resolution images. It utilized bottleneck modules and bilinear up-sampling to reduce the number of parameters so that the model could be employed on mobile and embedded systems. The network was trained by using a hybrid loss that combined binary cross-entropy and Jaccard index. 7 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3102176, IEEE Access Chunhui Chen et al,: Retinal Vessel Segmentation Using Deep Learning: A Review (June 2021) Jin, et al. [83] introduced deformable convolution to retinal vessel segmentation. The deformable convolution block adjusted the receptive fields adaptively by learning offsets and therefore captured the retinal vessels at various shapes and scales. The proposed deformable U-net produced better performance than U-net and deformable convolution network [84] on DRIVE, STARE and CHASE_DB1and two other datasets: WIDE [85] and SYNTHE [86]. Similar to Luo, et al. [70], Wang, et al. [87] also wanted to reduce information loss caused by consecutive downsampling layers. They introduced a feature refinement path to U-net which sent low-level feature maps to high-level layers in encoder and decoder, respectively. The proposed feature refinement path can improve the detailed representation ability of the encoder and the discriminative ability of the decoder. While Yin, et al. [88] proposed to add multi-scale grayscale images to each stage of the encoder and decoder to reduce information loss and help information recovery. Dharmawan, et al. [89] proposed a new directionally sensitive blood vessel enhancement method that combined CLAHE with a new match filter to detect micro vessels. The new matched filter was based on multi-scale and orientation modified Dolph-Chebyshev type I function. Their method detected more micro vessels than common CLAHE but still produced many mistakes. Residual learning [90] was also introduced to increase the depth of networks as well as alleviate vanishing/exploding gradients. It was applied to building blocks [91-95] or skip connections [95, 96]. Dilated convolution [97] was also introduced to retinal vessel segmentation to enlarge the receptive fields [98-101]. Lopes, et al. [98] also tested the effect of different downsampling techniques, that is, max-pooling, convolution with 2x2 kernel and convolution with 3x3 kernel. They obtained better results when using convolution as down-sampling operations, which is consistent with Soomro, et al. [74]. Jiang, et al. [99] arranged dilated rates deliberately to obtain a dense sampling of input and thus avoid the chessboard effect. They also introduced depthwise separable convolution [102] to reduce the computation cost and the number of parameters. Soomro, et al. [101] introduced morphological transform and fuzzy C-means to the pre-processing of images. In the postprocessing phase, they applied morphological reconstruction to remove small objects in segmented results. Mou, et al. [95] introduced probability regularized walk (PRW) algorithm to reconnect fractured vessels. PRW is an extension of the random walk algorithm [103] on probability maps. There is a black ring around the field of view (FOV) in fundus images. Networks should pay more attention to the FOV since the black ring does not contain any information. Attention mechanism [104] has been applied to locate the region of interest (ROI) and strengthen feature representations in retinal vessel segmentation. Luo, et al. [105], Lian, et al. [106], Lv, et al. [107] made attention masks manually with the same size as original images to locate ROI. Wang, et al. [108], Li, et al. [109], Li, et al. [110], Fu, et al. [111] Tang, et al. [112] designed attention modules to strengthen feature representations, and their attention maps were learned by networks instead of assigned by experts. Yan, et al. [113] introduced a novel joint loss to alleviate the highly unbalanced pixel ratio between thick and thin vessels in fundus images. They divided vessels into thin vessels and thick vessels to alleviate the unbalance problem. The joint loss includes pixel-wise and segment-level loss which emphasizes more on the thickness consistency of thin vessels. Nasery, et al. [114] proposed a new data augmentation approach. They leveraged vignetting masks to create more annotated fundus images. Their method just adjusted the illumination condition of images but did not change the geometric and morphologic characteristics. Galdran, et al. [115] proposed a new metric for retinal vessel segmentation and tested it using U-net. They introduced normalized mutual information to evaluate the segmentation quality. The new metric was applied to raw vessel probability map and can instruct the selection of threshold to binarize the vessel probability map. Alvarado-Carrillo, et al. [116] focused on the curvilinear structures in vessels, so they proposed Distorted Gaussian Matched Filters (D-GMFs) with adaptive parameters and added them to the beginning and end of a U-net. They did not conduct an ablation study so we cannot know the effect of their proposed D-GMF Adaptive Unit. Considering the large-scale variants of vessels and semantic variants existing in fundus images, Wu, et al. [117] proposed to adjust the receptive field adaptively to capture multi-scale features, they also adaptively fused features to extract more semantic information. They obtained a good result but still need to pay more attention to thin vessels. From Table 6 we can see that U-net can produce about 96% segmentation accuracy, which is higher than FCNs’. U-net can reuse low-level information by concatenating feature maps, which also increases the computation burden, so the input is a small image patch cropped from whole images. The network cannot identify pixels well because an image patch contains less information than a whole image, and it is also constrained by a limited receptive field, although dilated convolution can enlarge the receptive field. 8 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3102176, IEEE Access Chunhui Chen et al,: Retinal Vessel Segmentation Using Deep Learning: A Review (June 2021) Table 6 Performance evaluations of U-nets for retinal vessel segmentation. Reference Database Guo, et al. [77] DRIVE STARE CHASE_DB1 Sule and Viriri [79] DRIVE Zhang and Chung [80] Mishra, et al. [81] Laibacher, et al. [82] Jin, et al. [83] DRIVE STARE CHASE_DB1 DRIVE STARE CHASE_DB1 DRIVE CHASE_DB1 HRF DRIVE STARE CHAS_DB1 HRF Wang, et al. [87] DRIVE Yin, et al. [88] DRIVE CHASE_DB1 Pre Sen Spe ACC AU_ROC F1-score Others 0.8335 0.8605 0.8486 0.7891 0.7548 0.7559 0.9848 0.9899 0.9900 0.9674 0.9725 0.9738 0.9836 0.9850 0.9872 / 0.9764(JS) 0.9763(JS) 0.9738(JS) / 0.7092 0.982 0.9447 0.9721 / / 0.8723 0.7673 0.7670 0.8916 0.8805 0.8771 0.9618 0.9901 0.9909 0.9601 0.9651 0.9634 / / / / / 0.8529 0.8777 0.7630 0.8593 0.7963 0.7595 0.8155 0.7464 0.9800 0.9878 0.9752 0.9874 0.9799 0.9882 0.9900 0.9724 0.9763 0.9742 0.9714 0.9666 / 0.9802 0.9832 0.9804 0.9831 / / 0.9504 0.9712 0.9770 0.9540 0.9601 0.9571 0.9630 0.9703 0.9635 0.9566 0.9641 0.9610 0.9651 0.8091 0.8006 0.7814 0.8237 0.8143 0.7883 -- 0.9566(JS) 0.9642(JS) 0.9610(JS) -- / / / / 0.83 0.984 0.968 0.978 / / / 0.7614 0.7993 0.9837 0.9868 0.9604 0.9783 0.9846 0.9869 / / Dharmawan, et al. [89] DRIVE STARE HRF 0.8157 0.8412 0.7838 0.8314 0.7924 0.8136 0.9726 0.9827 0.9770 / / 0.8235 0.8161 0.7984 0.8992(Gmean) 0.8124(Gmean) 0.8916(Gmean) 0.7991(MCC) 0.7959(MCC) 0.7775(MCC) Xiuqin, et al. [91] DRIVE / 0.965 0.9811 / / 0.8237 0.8147 0.8837(Gmean) 0.8905(Gmean) 0.9649 0.9659 0.9722 0.9713 0.9744 0.9799 0.9816 0.9780 0.9810 0.9830 0.9873 0.9848 / // / 0.8017(MCC) 0.8190(MCC) / / / / 0.9795 0.8227 / / DRIVE STARE DRIVE STARE CHASE_DB1 IOSTAR RC-SLO DRIVE STARE CHASE_DB1 Li, et al. [92] Khan, et al. [93] Guo, et al. [94] Mou, et al. [95] 0.931 0.9863 0.8132 0.8398 0.8275 0.7969 0.8101 0.8252 0.8397 0.844 0.8082 0.8151 0.9783 0.9761 0.9768 0.9799 0.9795 0.9787 0.9792 0.9810 0.9854 0.9879 0.9607 0.9698 0.9648 / / / / Adarsh, et al. [96] DRIVE / 0.7979 0.9794 0.9563 Lopes, et al. [98] DRIVE / 0.7903 0.9813 0.9567 / / Jiang, et al. [99] DRIVE STARE CHASE_DB1 / 0.7839 0.8249 0.7839 0.989 0.9904 0.9894 0.9709 0.9781 0.9721 0.9864 0.9927 0.9866 0.8246 0.8492 0.8062 Biswas, et al. [100] DRIVE / 0.7823 0.9814 0.9561 0.9794 / / 0.802 0.801 0.8075 0.8437 0.7941 0.7598 0.8167 0.8071 0.8432 0.8427 0.7921 0.8352 0.7818 0.7322 0.8452 0.8145 0.8505 0.9334 0.974 0.969 0.9814 0.9762 0.9798 0.9878 0.9704 0.9782 0.9845 0.9836 0.9810 0.9823 0.9819 0.9802 0.9807 0.9883 0.9889 0.9862 0.959 0.961 0.9663 0.9684 0.9558 0.9640 0.9608 0.9565 0.9702 0.9706 0.9568 0.9678 0.9635 0.9544 0.9696 0.9769 0.9797 0.9803 0.948 0.945 0.9846 0.9765 0.9847 0.9824 0.9865 0.9801 0.9825 0.9824 0.9806 0.9875 0.9810 0.9623 0.9842 0.9895 0.9924 0.9912 / / Soomro, et al. [101] Luo, et al. [105] Lv, et al. [107] Wang, et al. [108] Li, et al. [109] Li, et al. [110] DRIVE STARE DRIVE STARE DRIVE STARE CHASE_DB1 DRIVE STARE CHASE_DB1 DRIVE STARE CHASE_DB1 IOSTAR RC-SLO DRIVE STARE CHASE_DB1 / / / / / / 0.8203 0.8419 0.8216 0.8142 0.7892 0.8251 0.8516 0.8105 0.9568(JS) 0.9638(JS) 0.9603(JS) / / ---0.7308(MCC) 0.8119(MCC) / / 9 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3102176, IEEE Access Chunhui Chen et al,: Retinal Vessel Segmentation Using Deep Learning: A Review (June 2021) Reference Database Fu, et al. [111] Tang, et al. [112] Yan, et al. [113] DRIVE STARE CHASE_DB1 DRIVE STARE DRIVE STARE CHASE_DB1 HRF Pre / 0.9682 0.9932 / Sen Spe ACC AU_ROC F1-score Others 0.8342 0.8412 0.8312 0.9682 0.9745 0.7653 0.7581 0.7633 0.7881 0.9732 0.9807 0.9816 0.9795 0.9863 0.9839 0.9769 0.9883 0.9752 0.9801 0.9781 0.9437 0.8267 0.8401 0.8237 0.8155 0.8312 0.9125(AU_PR) 0.9250(AU_PR) 0.9074(AU_PR) 0.9818 0.9846 0.9809 0.9592 0.9555 0.9658 0.9644 0.9551 0.9687 0.9542 0.9612 0.9610 0.6647 / / / / Nasery, et al. [114] DRIVE / / / / 0.9787 / 0.9562(IoU) Galdran, et al. [115] DRIVE / / / 0.9567 0.9750 / / 0.7960 0.7904 0.7530 0.8289 0.8207 0.8365 0.8114 0.8255 0.8267 0.7991 0.8186 0.8239 0.7803 0.7538 0.8681 0.9799 0.9843 0.9863 0.9838 0.9839 0.9839 0.9823 0.9830 0.9881 0.9813 0.9844 0.9813 0.9843 0.9893 0.9797 0.9772 0.9837 0.9798 0.9697 0.9736 0.9744 0.9687 0.9706 0.9751 0.9581 0.9673 0.9670 0.9654 0.9652 0.9699 / 0.8233 0.8141 0.8077 / / / 0.8293 0.8379 0.8191 0.8074 0.8161 0.8350 / Alvarado-Carrillo, et al. [116] Wu, et al. [117] Wang, et al. [119] DRIVE STARE CHASE_DB1 DRIVE STARE CHASE_DB1 HRF IOSTAR LES[118] DRIVE STARE CHASE_DB1 HRF ISOTAR RC-SLO. / / / D. MULTI-MODEL NETWORK FOR RETINAL VESSEL SEGMENTATION Lots of researchers had found the limited prediction capability of a single model, so they proposed multi-model networks for stronger prediction ability. Most of them followed the spirit of U-net and FCN and employed encoderdecoder structure to form sub-models. We review 19 multimodel networks and summary their performance evaluations in Table 7. Some research segmented thin/thick vessels or vessel boundaries/centers separately, then fused the segments to complete a whole segmentation [120-123]. These methods can be regarded as coarse-and-fine segmentation because thick/thin vessels or boundary/center vessels were segmented concurrently and separately. Yan, et al. [120] proposed a three-stage segmentation network for retinal vessels using three sub-networks. The segmentation of the whole vessel tree was divided into three sub-tasks: thick vessel segmentation using FCN, thin vessel segmentation using U-net and fusion of segmentations. Sathananthavathi and Indumathi [121] also proposed a coarse-and-fine strategy. They constructed 2 parallel FCNs, the first FCN was larger and trained with ground truth to extract thick and moderate vessels, while the second FCN was smaller and trained with skeletonized ground truth to extract thin vessels legibly. Outputs of both FCNs were integrated to generate the overall vessel segmentation. Yang, et al. [122] proposed an improved U-net, whose encoder was used as a backbone to extract features, they 0.9837 0.9877 0.9867 0.9842 0.9865 0.9827 0.9823 0.9881 0.9871 0.9837 0.9859 0.9911 arranged 2 decoders to segment thin and thick vessels, respectively. Finally, they added a fusion network to fuse the output of two decoders. Wang, et al. [119] constructed a U-net which is composed of one encoder and three decoders. They used one decoder to generate a coarse probability map and divided an image into 'hard' or 'easy' regions according to the probability map. They used two other decoders to segment vessels in 'easy' and 'hard' regions independently. Finally, they fused all feature maps produced by 3 decoders to generate the final vessel map. They also introduced an attention gate to give more weight to vessel feature responses in decoders. Tian, et al. [123] proposed a multi-model network to learn high- and low-frequency information, respectively. They applied Gaussian high-pass filter and Gaussian low-pass filter to original fundus images to obtain high-frequency or low-frequency information. They sent obtained high- and low-frequency information to two sub-networks with encoder-decoder structures, respectively. They fused the output of two sub-networks to get the final vessel map. More researchers proposed coarse-to-fine segmentation by cascading several sub-networks. The following sub-network can inherit the learning experiences of previous sub-models [124-130]. Generally, they added intra- and inter- skip connections to send low-level feature maps and learned knowledge to deeper layers and sub-networks. The followed sub-network segmented vessels coarsely and the following sub-network refined vessel maps. The following sub-model used segmented results of previous sub-models and original images as input. Wu, et al. [126] added an auxiliary layer to 10 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3102176, IEEE Access Chunhui Chen et al,: Retinal Vessel Segmentation Using Deep Learning: A Review (June 2021) the followed network to get an auxiliary loss, so their model was trained by main supervision and auxiliary supervision. Guo, et al. [131] introduced an incremental learning strategy by cascading five CNNs. They trained the next CNN using the same samples as previous ones and enhanced it by feeding samples that were not performed well in the previous CNN. Finally, the final decision of each pixel was made using a voting scheme on the multiple CNNs results. Lian, et al. [106] observed existing models always applied global pre-processing operations to images that will lose local information. They applied global and local operations simultaneously to enhance the contrast of images. Tang, et al. [132] proposed a network with five identical and parallel sub-networks for ensemble learning. The input of each sub-network was grayscale images by extracting R-G channel image data with different proportions. Probability maps produced by five sub-models were averaged to generate the final segmentation result. Zou, et al. [133] also formulated the task as a multi-class classification task to detect thin vessels with a width less than 2 pixels. They constructed 2 networks, one is for generating labels, one is for retinal vessel segmentation while the last one is for label simplification. Cherukuri, et al. [134] proposed a domain-enriched network that was composed of two parts: a representation network to geometric features from fundus images and a residual task network to make a pixel-level prediction using the obtained features. Their method obtained a good performance but there are still non-vessel pixels identified as vessel pixels. To consider the graphical structure of vessel shape, Shin, et al. [135] proposed a vessel graphic network that combined a graph neural network (GNN) [41] with a CNN to jointly utilize both local appearance and global vessel structure. They did not obtain a good result because their model misclassified many non-vessel pixels as vessel pixels. Tajbakhsh, et al. [136] proposed an error correction mechanism that can learn from segmentation mistakes. The proposed network is divided into three sub-networks: a Unet to produce an initial segmentation, a network to produce diverse but representative error patterns and another U-net to make up the mistake of the initial segmentation map. From Table 7 we can see that multi-model networks can produce about 96.3% segmentation accuracy, which have slight improvement compared with single networks. However, multi-model networks are also more difficult to train and have a higher computation burden. Table 7 Performance evaluations of multi-model networks for retinal vessel segmentation. Reference Database Sen Spe Acc DRIVE STARE DRIVE STARE CHASE_DB1 HRF ISOTAR RC-SLO. DRIVE STARE CHASE_DB1 DRIVE STARE DRIVE STARE CHASE_DB1 0.8278 0.8342 0.7991 0.8186 0.8239 0.7803 0.7538 0.8681 0.7631 0.7735 0.7641 0.7287 0.8172 0.8353 0.7946 0.8176 0.9861 0.9916 0.9813 0.9844 0.9813 0.9843 0.9893 0.9797 0.9820 0.9857 0.9806 0.9818 0.9624 0.9751 0.9821 0.9776 0.9692 0.9740 0.9581 0.9673 0.9670 0.9654 0.9652 0.9699 0.9538 0.9638 0.9607 0.9594 0.9531 0.9579 0.9626 0.9632 Tian, et al. [123] DRIVE 0.8639 0.969 Xia, et al. [124] DRIVE 0.7979 DRIVE STARE CHASE_DB1 DRIVE STARE CHASE_DB1 DRIVE CHASE_DB1 DRIVE STARE DRIVE CHASE_DB1 DRIVE STARE CHASE_DB1 0.7849 0.9024 0.7948 0.7996 0.7963 0.8003 0.8312 0.8044 0.7439 0.8196 0.8160 0.8258 0.7735 0.7715 0.7970 Lian, et al. [106] Wang, et al. [119] Yan, et al. [120] Sathananthavathi and Indumathi [121] Yang, et al. [122] Wang, et al. [125] Wu, et al. [126] Hu, et al. [127] Budak, et al. [128] Francia, et al. [129] Li, et al. [130] AU_ROC F1-score Others / / 0.8637(Pre) 0.8823(Pre) 0.9823 0.9881 0.9871 0.9837 0.9859 0.9911 0.9750 0.9833 0.9776 0.8293 0.8379 0.8191 0.8074 0.8161 0.8350 / / / / / / / 0.8297 0.8155 0.7997 / 0.958 0.956 / / 0.9857 0.9685 / / / 0.9813 0.9934 0.9842 0.9813 0.9863 0.9880 0.9751 0.9861 0.9900 0.9871 0.9567 0.9849 0.9648 0.9582 0.9672 0.9688 0.9567 0.9658 0.9685 0.9735 0.9696 0.9766 0.9573 0.9701 0.9655 0.9788 0.9960 0.9847 0.9830 0.9875 0.9894 0.9821 0.9867 0.9822 0.9868 0.8241 0.9184 0.8220 / / / 0.8303 0.8242 0.9829 0.9852 0.8250 0.9312 0.8205 0.8146 0.8073 0.8055(MCC) 0.8065(MCC) / 0.9838 0.9886 0.9823 / 0.9816 0.9881 0.9851 / 0.9341(Pre) 0.8366(Pre) / 11 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3102176, IEEE Access Chunhui Chen et al,: Retinal Vessel Segmentation Using Deep Learning: A Review (June 2021) Reference Database Guo, et al. [131] Tang, et al. [132] Zou, et al. [133] Cherukuri, et al. [134] Shin, et al. [135] Tajbakhsh, et al. [136] Sen Spe Acc AU_ROC F1-score Others DRIVE STARE DRIVE STARE CHASE_DB1 HRF DRIVE STARE DRIVE STARE CHASE_DB1 HRF DRIVE STARE CHASE_DB1 HRF 0.9859 0.9861 0.8564 0.8162 0.8106 0.7782 0.7761 0.8120 0.8426 0.8667 0.8025 0.8144 0.9382 0.9598 0.9463 0.9546 0.7046 0.5629 0.9710 0.9869 0.9807 0.9843 0.9792 0.9895 0.9823 0.9871 0.9874 0.9733 0.9255 0.9352 0.9364 0.9329 0.9613 0.9539 0.9574 0.9695 0.9654 0.9631 0.9519 0.9704 0.9603 0.9734 0.9693 0.9588 0.9271 0.9378 0.9373 0.9349 0.9737 0.9539 0.9822 0.9898 0.9850 0.9843 0.7613 0.6502 / / 0.7984(MCC) 0.8066(MCC) 0.7700(MCC) 0.7969(MCC) CHASE_DB1 ARIA / / / / E. GENERATIVE ADVERSARIAL NETWORK (GAN) FOR RETINAL VESSEL SEGMENTATION GAN [40] is a type of deep unsupervised learning model, which is composed of a generator and a discriminator. In this survey, we review 13 GANs and list their performance evaluations in Table 8. Most generative models adopted encoder-decoder structure with some improvement modules, such as dense block [137], Wu, et al. [138], and dilated convolution [137, 139], deep supervision [140], attention mechanism [138, 141], skip connections [142], Inception module [143] and others [144]. CNNs were widely used as generators [137-139, 142, 144146], but U-net can also be adopted [141, 143]. Son, et al. [146] explored several models for the discriminators: pixel-GAN, patch-GAN and image-GAN. The results indicate that patch-GAN performs better than others including the one with a single generator. Park, et al. [147] chained two U-nets in the generator and used residual convolution blocks as building blocks in both generator and discriminator. They utilized automatic color equalization (ACE) to enhance images in the pre-processing phase with the Lanczos resampling method to smooth the 0.8129 0.8553 0.8220 0.8364 0.8211 0.7832 / 0.9844 0.9930 0.9858 0.9802 0.9877 0.9830 0.9838 / / / 0.8150 0.7200 / vessel branches and reduce false negatives in the postprocessing phase. GANs based on semi-supervised learning were also explored to address the problem of lacking annotated data. Huo, et al. [148] proposed a semi-supervised framework that combined GAN and self-training scheme, and they adopted particle swarm optimization (PSO) [149] algorithm to choose the hyperparameters in semi-supervised learning since selftraining is sensitive to hyperparameters. They obtained 0.9550/0.8419 of AUC_ROC and AUC_PR on the DRIVE database when using 0.1 labelled and 0.9 unlabeled data. Lahiri, et al. [150] also trained a GAN based on semisupervised learning to learn from both labeled and unlabeled data. They only used 3K annotated image patches to make patch-wise predictions and obtained 0.95/0.96 accuracy and 0.96/0.94 AUC on DRIVE and STARE databases, respectively. Their models outperformed simple U-net and used less annotated data, but they also did not segment vessels as accurately as other improved models based on supervised learning. From Table 8 we can see that GANs produced about 96% segmentation accuracy, which is similar to U-net. Park, et al. [147] obtained the best performance. Compared with CNNs, we need to train generators and discriminators alternatively in GANs, which is more troublesome. Table 8 Performance evaluations of GANs for retinal vessel segmentation. Reference Database Tu, et al. [137] Wu, et al. [138] Ma, et al. [139] Dong, et al. [140] Zhou, et al. [141] Sen Spe Acc AU_ROC AU_PR F1-score DRIVE 0.784 0.985 0.9571 0.985 / / DRIVE 0.7798 0.982 0.9615 / / / / / / 0.9817 0.9847 0.9249 0.9265 0.8574 0.8126 0.8294 0.8812 0.8435 0.8310 0.9828 0.9837 0.9812 0.9781 0.9782 0.9730 0.9729 0.9716 0.9563 0.9671 0.9630 0.9559 0.8310 0.8492 0.8342 0.8074 0.8345 0.9359 0.8218 0.8211 DRIVE STARE DRIVE STARE DRIVE STARE CHASE_DB1 HRF / / 0.9830 0.9863 0.9872 0.9693 0.8397(Pre) 0.7952(Pre) 0.8013(Pre) 0.8115(Pre) 12 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3102176, IEEE Access Chunhui Chen et al,: Retinal Vessel Segmentation Using Deep Learning: A Review (June 2021) Reference Database Sen Spe Acc AU_ROC AU_PR F1-score Yang, et al. [142] DRIVE STARE 0.834 0.8334 0.982 0.9897 0.956 0.9663 0.9786 0.9734 0.9021(Gmean) 0.9283(Gmean) 0.9083(Gmean) 0.8992(Gmean) 0.8821 0.8718 Guo, et al. [143] DRIVE 0.8283 0.9726 0.9542 0.9772 0.9058 0.8215 0.9817 0.9845 0.980 0.991 0.9810 0.9873 0.9154 0.917 0.832 0.842 / / 0.9145 0.9226 0.8163 0.8306 0.7979 0.7845 (MCC) 0.8275 0.8378 / DRIVE STARE DRIVE STARE DRIVE STARE / / / 0.832 0.855 0.982 0.990 0.963 0.973 / / / Park, et al. [147] DRIVE STARE CHASE_DB1 HRF 0.8346 0.8324 --- 0.9836 0.9938 --- 0.9706 0.9876 0.9736 0.9761 0.9868 0.9873 0.9859 0.9852 Huo, et al. [148] DRIVE / / / 0.9550 0.9419 / Lahiri, et al. [150] DRIVE STARE / / 0.95 0.96 0.96 0.94 / / Rammy, et al. [144] He and Jiang [145] Son, et al. [146] 0.8324 0.8370 0.8110 0.7972 Table 9 Performance evaluations of other networks for retinal vessel segmentation. Reference Dataset Sen Spe Acc AUC_ROC F1-score Others Mo and Zhang [20] DRIVE STARE CHASE_DB1 0.7779 0.8147 0.7661 0.978 0.9844 0.9816 0.9521 0.9674 0.9599 0.9782 0.9885 0.9812 / / Ngo and Han [151] DRIVE 0.7464 0.9836 0.9533 0.9752 / / Guo, et al. [152] DRIVE 0.7560 0.9839 0.9625 0.9782 / / DRIVE STARE DRIVE STARE CHASE_DB1 DRIVE STARE DRIVE STARE CHASE_DB1 DRIVE STARE 0.8347 0.8231 0.7632 0.7423 0.7815 0.7772 0.7543 0.7800 0.8201 0.7888 0.7625 0.7709 0.9796 0.9782 0.9792 0.9743 / / / / / 0.9793 0.9814 0.9806 0.9828 0.9801 0.9809 0.9848 0.9510 0.9560 0.9536 0.9603 0.9587 0.9533 0.9632 0.9551 0.9660 0.9627 0.9528 0.9633 Zhuo, et al. [158] DRIVE STARE 0.8432 0.8630 0.9681 0.973 0.9520 0.9620 0.9754 0.9824 Noh, et al. [159] DRIVE STARE CHASE_DB1 0.8354 0.8537 0.8523 0.9746 0.9864 0.9871 0.9569 0.9764 0.9778 0.9820 0.9921 0.9916 Li, et al. [153] Lin, et al. [154] Hu, et al. [155] Guo, et al. [156] Feng, et al. [157] / F. OTHER NETWORK FOR RETINAL VESSEL SEGMENTATION Researchers also proposed networks that cannot be categorized into the forgoing classes due to their unique architectures. The performance of these methods is listed in Table 9. Ngo and Han [151], Guo, et al. [152], Li, et al. [153] adopted multiple input branches to capture multi-scale spatial information. Further, all the feature maps generated by each branch are combined to make predictions. In addition, Guo, et al. [152] applied the K-dimensions tree integrated with the 0.9759 0.9751 0.9796 0.9872 0.9840 0.9678 0.97 / / 0.8208 0.9362 0.7983 / 0.7923(MCC) 0.8142(MCC) 0.7733 (MCC) / 0.8163 0.8233 0.7905(MCC) 0.8044(MCC) 0.9031(Gmean) 0.9158(Gmean) / / hessian matrix to reconnect the broken segments in the postprocessing stage. Some broken vessels were reconnected and the vessel map became cleaner after post-processing. Accuracy and sensitivity were increased while specificity was decreased after post-processing. Li, et al. [153] introduced sparse variables into the label design and improved the cross-entropy loss function to address the unbalance of samples. Li, et al. [153] got better sensitivities than [151] and [152] because they solved the class imbalanced issue. 13 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3102176, IEEE Access Chunhui Chen et al,: Retinal Vessel Segmentation Using Deep Learning: A Review (June 2021) Holistically-nested edge detection (HED) network made a significant advancement on edge detection in an image [160]. It is a single-stream network with multiple side outputs and final predictions are made by fusing multi-scale side outputs. Inspired by the HED network, Mo and Zhang [20], Guo, et al. [156] integrated feature maps generated in different stages of networks to generate the final probability map. Lin, et al. [154], Hu, et al. [155] also constructed single-stream networks based on VGG Net and applied fully-connected conditional random fields (CRFs) [161] to get the final binary segmentation result. CRFs utilized multiscale feature maps in different stages to make full use of spatial contextual information. CRFs can mitigate noise and edge blurring acting as a global smoothness regularizer. Feng, et al. [157] proposed a cross-connected network (CcNet) that has two parallel paths. They used two CRM (convolution-ReLU-Max pooling) modules as building blocks and formed cross-connections between these two paths. They sent feature maps produced by each module in the upper path to each module in the lower path. They concatenated all feature maps generated in the lower path to generate final vessel maps. Since these cross-connections, CcNet can learn multi-scale features efficiently. Zhuo, et al. [158] used three dense blocks to form a straight network and added two bottleneck blocks between them, which aimed to reduce the model complexity and computation cost. Similar to Luo, et al. [70], they also maintained the size of feature maps by cancelling downsampling layers to reduce information loss of tiny vessels. In addition, considering the existing evaluation metrics should not be equally important since great unbalance exists between vessels and non-vessels, Zhuo, et al. [158] proposed a new evaluation index named fusion score, which converts multiple evaluation metrics into a single target. It is expressed in Equation 9: FS = 3∗𝐹1∗𝑀𝐶𝐶∗𝐺𝑚𝑒𝑎𝑛 𝐹1∗𝑀𝐶𝐶+𝐹1∗𝐺𝑚𝑒𝑎𝑛+𝑀𝐶𝐶∗𝐺𝑚𝑒𝑎𝑛 (9) From Table 9 it can be observed that Noh, et al. [159] obtained the best performance in these methods, and these methods produced about 95.5% segmentation accuracy. These models did not have a strong learning capacity because most of them have only several convolutional layers, and their architectures may not be very suitable for this task. V. DISCUSSION In this survey, we have reviewed 89 deep learning models for retinal vessel segmentation, which indicate deep learning has been widely applied to segment retinal vessels. Earlier, researchers applied simple CNNs for vessel segmentation on DRIVE and STARE databases [67, 163]. Moreover, lots of researchers have proposed various improved models for retinal vessel segmentation. FCNs and U-nets were the most leveraged to make dense predictions because of the excellent performance [71, 77, 79]. Later, different improvement modules such as residual block, dilated convolution and attention mechanism, were introduced with U-net to improve the performance of proposed models [99, 100, 105]. On the other hand, researchers also proposed multi-model networks to get a stronger identification capability [120, 126], and others introduced GANs to vessel segmentation [126]. Multibranch networks and HED-shaped networks were also used for vessel segmentation [151, 164]. A. CHALLENGES IN RETINAL VESSEL SEGMENTATION USING DEEP LEARNING According to the existing research, the following challenges are encountered while using deep learning for retinal vessel segmentation: 1) 2) They got a fusion score of 0.8339, 0.8449 on DRIVE and STARE databases, respectively. To reduce the high-frequency information loss caused by consecutive down-samplings in the encoder, Noh, et al. [159] introduced a scale-space approximated CNN (SSA-Net). It is a single-stream network with residual connection and skip connections. They inserted up-sampling layers in the feature generation phase to generate size-invariant feature maps and thus reduce spatial scale-space distortions. CLAHE is widely applied to fundus images to enhance image contrast, which has two parameters: size of the contextual region and clip limit. Most researchers just used default values for CLAHE, but Aurangzeb, et al. [162] introduced PSO to CLAHE to find optimal parameter values. They did not propose a new network but just applied their method to existing models. 3) Lack of well labelled training samples. Although there is a large number of fundus images, accruing annotated data is very difficult to obtain since it requires professional doctors and takes a significant amount of time and cost. Low quality of existing image samples. It hinders deep learning models to learn better feature representations. Image noise, uneven illumination, low contrast especially for thin vessels, centerline reflection and various structures (pathological region, fovea, macula, optic disc) decrease the performance of proposed models. Class imbalance problem of training samples. The different number of positive and negative examples available for training degrades the performance of networks. Class imbalance problem not only exists between foreground and background, but also in thick vessels and thin vessels. Deep learning models tend to classify pixels in boundaries as non-vessels pixels because the number of non-vessels pixels are in large quantities as compared to vessel pixels. The network performs worse on thin vessels than thick vessels since the misclassification of pixels in thin vessels has less influence on the total loss. 14 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3102176, IEEE Access Chunhui Chen et al,: Retinal Vessel Segmentation Using Deep Learning: A Review (June 2021) B. KEY ASPECTS FOR SUCCESSFUL RETINAL VESSEL SEGMENTATION From the analysis of existing methods, a successful model should be able to detect vessels under uneven illumination, low contrast and various regions in fundus images. At the same time, it should be robust enough for images and have strong generalization ability. We identify some key aspects for successful and robust retinal vessel segmentation, which are as follow: 1) Raw image enhancement. Using image enhancement techniques pre-processing phase, such as the conversion of RGB images to grayscale images, normalization, contrast limited adaptive histogram equalization (CLAHE) [165] and gamma correction increase the image quality [71, 91, 105]. We can also adopt morphological operations [166] to increase the quality of images [74, 121]. 2) Data augmentation. The publicly available databases are too small to train a network, so we can utilize regular data augmentation techniques [167] to enlarge the training dataset, such as rotating, flipping, shifting, mirroring and cropping images into image patches [23, 105, 106, 120, 154]. Also, we can leverage transfer learning for this task, such as VGGNet [20, 154-156], ResNet [156] or a fully convolutional version of AlexNet [72]. 3) A Well-designed model. A well-designed model could capture more spatial information, reduce loss of local information and reuse low-level feature maps for accurate segmentation. From the segmentation result, U-net and multi-model networks have a better performance than CNNs and FCNs, that is because they have more convolutional layers then they can extract features better. In addition, skip connections also help the reuse of low-level information, which is very important in identification. Some proposed GANs also obtained high accuracy. In addition, dilated convolution is a good option to enlarge the receptive field and capture more spatial information but still keep the same number of parameters [97]. Residual learning can increase network depth and alleviate network degradation at the same time [90]. A dense connection can make full use of feature maps generated by all previous layers and thus decrease model complexity and mitigate vanishing gradient [168]. 4) Proper loss function. A proper loss function could lead models to pay more attention to vessels, especially thin vessels. Researchers can adopt improved loss functions, such as weighted cross-entropy loss function, to solve the imbalance problem [80, 123, 156]. 5) Vessel map enhancement. The segmentation result contains noise and isolated small vessels, so we can use a matched filter or morphological transform to illuminate them in the post-processing phase [72, 101]. The vessel segments are broken in some cases, and we can reconnect fractured vessels by some techniques, such as PRW and K-dimensions tree [95, 152]. Better visualization of the vessel map helps ophthalmologists diagnose disease easier. 6) Abundant validation: we can not only verify our models using a single database, but also cross-validate networks to evaluate their generalization ability. In cross-validation, a network is trained using samples from one dataset but tested using another dataset [20, 73, 121, 126]. Even we can conduct mixed validation for further check. We can train a network using mixed samples from several databases and test it using the rest samples from these databases [72]. From the analysis of the reviewed articles, several proposed research in terms of models and strategies to improve the performance of networks, such as incremental learning strategy [131], various improvement modules [91, 99, 119], coarse-to-fine segmentation [125], there is still no model can segment vessels perfectly, including segmentation of vessel boundaries and thin vessels, segmentation of background between two closed vessels, segmentation of vessel under the presentence of abnormalities and various structures, segmentation of vessel in cross-connections and robust segmentation between different databases. In addition, the segmented vessels are still fractured and broken in most results, which invites researchers to investigate further to reconnect fractured vessels. Although deep learning has been widely applied to retinal vessel segmentation, there are still some limitations. Compared with human beings, deep learning has less generalization capacity. Compared with conventional methods, such as matched filtering methods and vessel tracing methods, deep learning is more uninterpretable, and it needs massive data and GPUs in training processes, which are not available and expensive for users in some cases. VI. CONCLUSION Geometric characteristics of retinal vessels reflect clinical and pathological features. The ophthalmologist uses vessel maps to diagnose diseases, such as DR and MD. Precise diagnoses of eye abnormalities and their timely treatment are important in preventing global blindness. Computerized automatic segmentation for retinal blood vessels is inspired since manual segmentation of retinal blood vessels is expensive and time-consuming. In the past, researchers proposed different methods for automatic retinal vessel segmentation. Unsupervised models are limited by their accuracy. Machine learning algorithms require handcrafted features and thus are limited by their generalization ability. Currently, deep learning models have been greatly used to image segmentation including retinal images since they do not need any handcrafted features and outperform existing unsupervised methods. This article reviews publications of recent six years for retinal vessel segmentation based on deep learning. The main contribution of our works is to analyze recent models and find out new trends for retinal vessel segmentation. It will be 15 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3102176, IEEE Access Chunhui Chen et al,: Retinal Vessel Segmentation Using Deep Learning: A Review (June 2021) of digital retinal images," Computers in biology and medicine, vol. 37, no. 2, pp. 262-267, 2007. helpful for researchers and industrialists to develop a robust model for retinal vessel segmentation. [12] S. S. Kar and S. P. Maity, "Blood vessel extraction and optic disc removal using curvelet transform and kernel fuzzy c-means," Computers in biology and medicine, vol. 70, pp. 174-189, 2016. [13] K. Rezaee, J. Haddadnia, and A. Tashk, "Optimized clinical segmentation of retinal blood vessels by using combination of adaptive filtering, fuzzy entropy and skeletonization," Applied Soft Computing, vol. 52, pp. 937-951, 2017. [14] Y. Yin, M. Adel, and S. Bourennane, "Retinal vessel segmentation using a probabilistic tracking method," Pattern Recognition, vol. 45, no. 4, pp. 1235-1244, 2012. [15] M. Nergiz and M. Akın, "Retinal vessel segmentation via structure tensor coloring and anisotropy enhancement," Symmetry, vol. 9, no. 11, p. 276, 2017. [16] J. Kaur and D. Mittal, "A generalized method for the detection of vascular structure in pathological retinal images," Biocybernetics and Biomedical Engineering, vol. 37, no. 1, pp. 184-200, 2017. [17] D. Kaba, A. G. Salazar-Gonzalez, Y. Li, X. Liu, and A. Serag, "Segmentation of retinal blood vessels using gaussian mixture models and expectation maximisation," in International Conference on Health Information Science, 2013: Springer, pp. 105-112. [18] J. Staal, M. D. Abràmoff, M. Niemeijer, M. A. Viergever, and B. Van Ginneken, "Ridge-based vessel segmentation in color images of the retina," IEEE transactions on medical imaging, vol. 23, no. 4, pp. 501-509, 2004. [19] P. Dai et al., "A new approach to segment both main and peripheral retinal vessels based on gray-voting and gaussian mixture model," PloS one, vol. 10, no. 6, p. e0127748, 2015. X. You, Q. Peng, Y. Yuan, Y.-m. Cheung, and J. Lei, "Segmentation of retinal blood vessels using the radial projection and semi-supervised approach," Pattern recognition, vol. 44, no. 10-11, pp. 2314-2324, 2011. [20] M. M. Fraz et al., "Blood vessel segmentation methodologies in retinal images – A survey," Computer Methods and Programs in Biomedicine, vol. 108, no. 1, pp. 407-433, 2012/10/01/ 2012, doi: https://doi.org/10.1016/j.cmpb.2012.03.009. J. Mo and L. Zhang, "Multi-level deep supervised networks for retinal vessel segmentation," Int J Comput Assist Radiol Surg, vol. 12, no. 12, pp. 2181-2193, Dec 2017, doi: 10.1007/s11548-0171619-0. [21] G. Litjens et al., "A survey on deep learning in medical image analysis," Medical image analysis, vol. 42, pp. 60-88, 2017. [22] S. Pouyanfar et al., "A survey on deep learning: Algorithms, techniques, and applications," ACM Computing Surveys (CSUR), vol. 51, no. 5, pp. 1-36, 2018. VII. REFERENCE [1] E. T. D. R. S. R. Group, "Fundus photographic risk factors for progression of diabetic retinopathy: ETDRS report number 12," Ophthalmology, vol. 98, no. 5, pp. 823-833, 1991. [2] J. J. Kanski and B. Bowling, Clinical ophthalmology: a systematic approach. Elsevier Health Sciences, 2011. [3] A. Fathi and A. R. Naghsh-Nilchi, "Automatic wavelet-based retinal blood vessels segmentation and vessel diameter estimation," Biomedical Signal Processing and Control, vol. 8, no. 1, pp. 71-80, 2013. [4] J. W. Yau et al., "Global prevalence and major risk factors of diabetic retinopathy," Diabetes care, vol. 35, no. 3, pp. 556-564, 2012. [5] S. W. Franklin and S. E. Rajan, "Computerized screening of diabetic retinopathy employing blood vessel segmentation in retinal images," biocybernetics and biomedical engineering, vol. 34, no. 2, pp. 117-124, 2014. [6] T. Köhler, A. Budai, M. F. Kraus, J. Odstrčilik, G. Michelson, and J. Hornegger, "Automatic noreference quality assessment for retinal fundus images using vessel segmentation," in Proceedings of the 26th IEEE international symposium on computer-based medical systems, 2013: IEEE, pp. 95-100. [7] [8] [9] [10] [11] M. Niemeijer, J. Staal, B. van Ginneken, M. Loog, and M. D. Abramoff, "Comparative study of retinal vessel segmentation methods on a new publicly available database," in Medical imaging 2004: image processing, 2004, vol. 5370: International Society for Optics and Photonics, pp. 648-656. M. R. K. Mookiah et al., "A Review of Machine Learning Methods for Retinal Blood Vessel Segmentation and Artery/Vein Classification," Medical Image Analysis, p. 101905, 2020. M. Al-Rawi, M. Qutaishat, and M. Arrar, "An improved matched filter for blood vessel detection 16 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3102176, IEEE Access Chunhui Chen et al,: Retinal Vessel Segmentation Using Deep Learning: A Review (June 2021) [23] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 10971105. [24] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014. [25] C. Szegedy et al., "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9. [26] J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431-3440. [27] T. A. Soomro et al., "Deep Learning Models for Retinal Blood Vessels Segmentation: A Review," IEEE Access, vol. 7, pp. 71696-71717, 2019, doi: 10.1109/access.2019.2920616. [28] K. B. Khan et al., "A review of retinal blood vessels extraction techniques: challenges, taxonomy, and future trends," Pattern Analysis and Applications, vol. 22, no. 3, pp. 767-802, 2019. [29] M. Badar, M. Haris, and A. Fatima, "Application of deep learning for retinal image analysis: A review," Computer Science Review, vol. 35, p. 100203, 2020. [30] T. Li et al., "Applications of deep learning in fundus images: A review," Medical Image Analysis, p. 101971, 2021. [31] J. Schmidhuber, "Deep learning in neural networks: An overview," Neural networks, vol. 61, pp. 85-117, 2015. [32] G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," science, vol. 313, no. 5786, pp. 504-507, 2006. [33] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.A. Manzagol, and L. Bottou, "Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion," Journal of machine learning research, vol. 11, no. 12, 2010. [34] [35] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, "Greedy layer-wise training of deep networks," in Advances in neural information processing systems, 2007, pp. 153-160. G. E. Hinton, S. Osindero, and Y.-W. Teh, "A fast learning algorithm for deep belief nets," Neural computation, vol. 18, no. 7, pp. 1527-1554, 2006. [36] G. Hinton, "A Practical Guide to Training Restricted Boltzmann Machines," Momentum, vol. 9, p. 1, 2010. [37] Y. LeCun et al., "Handwritten digit recognition with a back-propagation network," in Advances in neural information processing systems, 1990, pp. 396-404. [38] Y. Bengio, P. Simard, and P. Frasconi, "Learning long-term dependencies with gradient descent is difficult," IEEE transactions on neural networks, vol. 5, no. 2, pp. 157-166, 1994. [39] S. Hochreiter and J. Schmidhuber, "Long shortterm memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997. [40] I. Goodfellow et al., "Generative adversarial nets," in Advances in neural information processing systems, 2014, pp. 2672-2680. [41] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, "The graph neural network model," IEEE transactions on neural networks, vol. 20, no. 1, pp. 61-80, 2008. [42] D. H. Hubel and T. N. Wiesel, "Receptive fields and functional architecture of monkey striate cortex," The Journal of physiology, vol. 195, no. 1, pp. 215243, 1968. [43] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998. [44] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in 2009 IEEE conference on computer vision and pattern recognition, 2009: Ieee, pp. 248-255. [45] M. Lin, Q. Chen, and S. Yan, "Network in network," arXiv preprint arXiv:1312.4400, 2013. [46] X. Glorot and Y. Bengio, "Understanding the difficulty of training deep feedforward neural networks," in Proceedings of the thirteenth international conference on artificial intelligence and statistics, 2010, pp. 249-256. [47] S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," arXiv preprint arXiv:1502.03167, 2015. [48] H. N. Mhaskar and C. A. Micchelli, "How to choose an activation function," in Advances in Neural Information Processing Systems, 1994, pp. 319-326. 17 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3102176, IEEE Access Chunhui Chen et al,: Retinal Vessel Segmentation Using Deep Learning: A Review (June 2021) [49] V. Nair and G. E. Hinton, "Rectified linear units improve restricted boltzmann machines," in ICML, 2010. [50] A. L. Maas, A. Y. Hannun, and A. Y. Ng, "Rectifier nonlinearities improve neural network acoustic models," in Proc. icml, 2013, vol. 30, no. 1, p. 3. [51] Y.-L. Boureau, J. Ponce, and Y. LeCun, "A theoretical analysis of feature pooling in visual recognition," in Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 111-118. [52] [53] [54] [55] [56] multiscale line operators," Journal of the Franklin institute, vol. 345, no. 7, pp. 748-765, 2008. [61] J. Zhang, B. Dashtbozorg, E. Bekkers, J. P. Pluim, R. Duits, and B. M. ter Haar Romeny, "Robust retinal vessel segmentation via locally adaptive derivative frames in orientation scores," IEEE transactions on medical imaging, vol. 35, no. 12, pp. 2631-2644, 2016. [62] S. Abbasi-Sureshjani, I. Smit-Ockeloen, J. Zhang, and B. T. H. Romeny, "Biologically-inspired supervised vasculature segmentation in SLO retinal fundus images," in International Conference Image Analysis and Recognition, 2015: Springer, pp. 325334. [63] Z. Fan and J.-J. Mo, "Automated blood vessel segmentation based on de-noising auto-encoder and neural network," in 2016 International Conference on Machine Learning and Cybernetics (ICMLC), 2016, vol. 2: IEEE, pp. 849-856. [64] P. Liskowski and K. Krawiec, "Segmenting Retinal Blood Vessels With Deep Neural Networks," IEEE Trans Med Imaging, vol. 35, no. 11, pp. 2369-2380, Nov 2016, doi: 10.1109/TMI.2016.2546227. [65] F. LaRocca, D. Nankivil, S. Farsiu, and J. A. Izatt, "True color scanning laser ophthalmoscopy and optical coherence tomography handheld probe," Biomedical optics express, vol. 5, no. 9, pp. 32043216, 2014. A. F. Khalaf, I. A. Yassine, and A. S. Fahmy, "Convolutional neural networks for deep feature learning in retinal vessel segmentation," in 2016 IEEE International Conference on Image Processing (ICIP), 25-28 Sept. 2016 2016, pp. 385388. [66] J. V. Soares, J. J. Leandro, R. M. Cesar, H. F. Jelinek, and M. J. Cree, "Retinal vessel segmentation using the 2-D Gabor wavelet and supervised classification," IEEE Transactions on medical Imaging, vol. 25, no. 9, pp. 1214-1222, 2006. S. K. Vengalil, N. Sinha, S. S. S. Kruthiventi, and R. V. Babu, "Customizing CNNs for blood vessel segmentation from fundus images," in 2016 International Conference on Signal Processing and Communications (SPCOM), 12-15 June 2016 2016, pp. 1-4. [67] J. H. Tan, U. R. Acharya, S. V. Bhandary, K. C. Chua, and S. Sivaprasad, "Segmentation of optic disc, fovea and retinal vasculature using a single convolutional neural network," Journal of Computational Science, vol. 20, pp. 70-79, 2017. [68] Y. Guo, Ü. Budak, L. J. Vespa, E. Khorasani, and A. Şengür, "A retinal vessel detection approach using convolution neural network with reinforcement sample learning strategy," Measurement, vol. 125, pp. 586-591, 2018. [69] E. Uysal and G. E. Güraksin, "Computer-aided retinal vessel segmentation in retinal images: convolutional neural networks," Multimedia Tools and Applications, vol. 80, no. 3, pp. 3505-3528, 2021. [70] Y. Luo, H. Cheng, and L. Yang, "Size-Invariant Fully Convolutional Neural Network for vessel T. Wang, D. J. Wu, A. Coates, and A. Y. Ng, "Endto-end text recognition with convolutional neural networks," in Proceedings of the 21st international conference on pattern recognition (ICPR2012), 2012: IEEE, pp. 3304-3308. O. Ronneberger, P. Fischer, and T. Brox, "U-Net: Convolutional Networks for Biomedical Image Segmentation," Cham, 2015: Springer International Publishing, in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, pp. 234-241. R. H. Webb and G. W. Hughes, "Scanning laser ophthalmoscope," IEEE Transactions on Biomedical Engineering, no. 7, pp. 488-492, 1981. [57] M. M. Fraz et al., "An ensemble classificationbased approach applied to retinal blood vessel segmentation," IEEE Transactions on Biomedical Engineering, vol. 59, no. 9, pp. 2538-2548, 2012. [58] P. Prentašić et al., "Diabetic retinopathy image database (DRiDB): a new database for diabetic retinopathy screening programs research," in 2013 8th International Symposium on Image and Signal Processing and Analysis (ISPA), 2013: IEEE, pp. 711-716. [59] D. J. FARNELL. "Automated Retinal Image Analysis (ARIA) Data Set." http://www.damianjjfarnell.com/?page_id=276 (accessed. [60] D. J. Farnell et al., "Enhancement of blood vessels in digital fundus photographs via the application of 18 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3102176, IEEE Access Chunhui Chen et al,: Retinal Vessel Segmentation Using Deep Learning: A Review (June 2021) segmentation of digital retinal images," in 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), 13-16 Dec. 2016 2016, pp. 1-7. [71] A. Dasgupta and S. Singh, "A fully convolutional neural network based structured prediction approach towards the retinal vessel segmentation," in 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), 18-21 April 2017 2017, pp. 248-251. [72] Z. Jiang, H. Zhang, Y. Wang, and S. B. Ko, "Retinal blood vessel segmentation using fully convolutional network with transfer learning," Comput Med Imaging Graph, vol. 68, pp. 1-15, Sep 2018, doi: 10.1016/j.compmedimag.2018.04.005. [73] A. Oliveira, S. Pereira, and C. A. Silva, "Retinal vessel segmentation based on Fully Convolutional Neural Networks," Expert Systems with Applications, vol. 112, pp. 229-242, 2018, doi: 10.1016/j.eswa.2018.06.034. [74] T. A. Soomro, A. J. Afifi, J. Gao, O. Hellwich, L. Zheng, and M. Paul, "Strided fully convolutional neural network for boosting the sensitivity of retinal blood vessels segmentation," Expert Systems with Applications, vol. 134, pp. 36-52, 2019, doi: 10.1016/j.eswa.2019.05.029. [75] W. Li, M. Zhang, and D. Chen, "Fundus Retinal Blood Vessel Segmentation Based on Active Learning," in 2020 International Conference on Computer Information and Big Data Applications (CIBDA), 2020: IEEE, pp. 264-268. [76] İ. Atli and O. S. Gedik, "Sine-Net: A fully convolutional deep learning architecture for retinal blood vessel segmentation," Engineering Science and Technology, an International Journal, vol. 24, no. 2, pp. 271-283, 2021. [77] [78] [79] [80] C. Guo, M. Szemenyei, Y. Pei, Y. Yi, and W. Zhou, "SD-Unet: A Structured Dropout U-Net for Retinal Vessel Segmentation," presented at the 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE), 2019. G. Ghiasi, T.-Y. Lin, and Q. V. Le, "Dropblock: A regularization method for convolutional networks," in Advances in Neural Information Processing Systems, 2018, pp. 10727-10737. O. Sule and S. Viriri, "Enhanced Convolutional Neural Networks for Segmentation of Retinal Blood Vessel Image," in 2020 Conference on Information Communications Technology and Society (ICTAS), 11-12 March 2020 2020, pp. 1-6. Y. Zhang and A. C. Chung, "Deep supervision with additional labels for retinal vessel segmentation task," in International conference on medical image computing and computer-assisted intervention, 2018: Springer, pp. 83-91. [81] S. Mishra, D. Z. Chen, and X. S. Hu, "A DataAware Deep Supervised Method for Retinal Vessel Segmentation," in 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), 2020: IEEE, pp. 1254-1257. [82] T. Laibacher, T. Weyde, and S. Jalali, "M2U-Net: Effective and Efficient Retinal Vessel Segmentation for Real-World Applications," in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 1617 June 2019 2019, pp. 115-124. [83] Q. Jin, Z. Meng, T. D. Pham, Q. Chen, L. Wei, and R. Su, "DUNet: A deformable network for retinal vessel segmentation," Knowledge-Based Systems, vol. 178, pp. 149-162, 2019, doi: 10.1016/j.knosys.2019.04.025. [84] J. Dai et al., "Deformable convolutional networks," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 764-773. [85] R. Estrada, C. Tomasi, S. C. Schmidler, and S. Farsiu, "Tree topology estimation," IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 8, pp. 1688-1701, 2014. [86] H. Zhao, H. Li, S. Maurer-Stroh, and L. Cheng, "Synthesizing retinal and neuronal images with generative adversarial nets," Medical image analysis, vol. 49, pp. 14-26, 2018. [87] D. Wang, G. Hu, and C. Lyu, "FRNet: an end-toend feature refinement neural network for medical image segmentation," The Visual Computer, pp. 112, 2020. [88] P. Yin, R. Yuan, Y. Cheng, and Q. Wu, "Deep Guidance Network for Biomedical Image Segmentation," IEEE Access, vol. 8, pp. 116106116116, 2020. [89] D. A. Dharmawan, D. Li, B. P. Ng, and S. Rahardja, "A new hybrid algorithm for retinal vessels segmentation on fundus images," IEEE Access, vol. 7, pp. 41885-41896, 2019. [90] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778. [91] P. Xiuqin, Q. Zhang, H. Zhang, and S. Li, "A Fundus Retinal Vessels Segmentation Scheme Based on the Improved Deep Learning U-Net Model," IEEE Access, vol. 7, pp. 122634-122643, 2019, doi: 10.1109/access.2019.2935138. 19 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3102176, IEEE Access Chunhui Chen et al,: Retinal Vessel Segmentation Using Deep Learning: A Review (June 2021) [92] D. Li, D. A. Dharmawan, B. P. Ng, and S. Rahardja, "Residual U-Net for Retinal Vessel Segmentation," in 2019 IEEE International Conference on Image Processing (ICIP), 22-25 Sept. 2019 2019, pp. 1425-1429. [93] T. M. Khan, M. Alhussein, K. Aurangzeb, M. Arsalan, S. S. Naqvi, and S. J. Nawaz, "Residual Connection-Based Encoder Decoder Network (RCED-Net) for Retinal Vessel Segmentation," IEEE Access, vol. 8, pp. 131257-131272, 2020. [94] C. Guo, M. Szemenyei, Y. Yi, Y. Xue, W. Zhou, and Y. Li, "Dense Residual Network for Retinal Vessel Segmentation," in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4-8 May 2020 2020, pp. 1374-1378. [95] L. Mou, L. Chen, J. Cheng, Z. Gu, Y. Zhao, and J. Liu, "Dense Dilated Network with Probability Regularized Walk for Vessel Detection," IEEE transactions on medical imaging, vol. 39, no. 5, pp. 1392-1403, 2019. [96] R. Adarsh, G. Amarnageswarao, R. Pandeeswari, and S. Deivalakshmi, "Dense Residual Convolutional Auto Encoder For Retinal Blood Vessels Segmentation," in 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), 6-7 March 2020 2020, pp. 280-284. [97] F. Yu and V. Koltun, "Multi-scale context aggregation by dilated convolutions," arXiv preprint arXiv:1511.07122, 2015. [98] A. P. Lopes, A. Ribeiro, and C. A. Silva, "Dilated Convolutions in Retinal Blood Vessels Segmentation," in 2019 IEEE 6th Portuguese Meeting on Bioengineering (ENBENG), 22-23 Feb. 2019 2019, pp. 1-4. [99] [100] [101] Y. Jiang, N. Tan, T. Peng, and H. Zhang, "Retinal Vessels Segmentation Based on Dilated MultiScale Convolutional Neural Network," IEEE Access, vol. 7, pp. 76342-76352, 2019, doi: 10.1109/access.2019.2922365. R. Biswas, A. Vasan, and S. S. Roy, "Dilated Deep Neural Network for Segmentation of Retinal Blood Vessels in Fundus Images," Iranian Journal of Science and Technology, Transactions of Electrical Engineering, vol. 44, no. 1, pp. 505-518, 2019, doi: 10.1007/s40998-019-00213-7. T. A. Soomro et al., "Impact of Image Enhancement Technique on CNN Model for Retinal Blood Vessels Segmentation," IEEE Access, vol. 7, pp. 158183-158197, 2019. [102] A. G. Howard et al., "Mobilenets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861, 2017. [103] L. Lovász, "Random walks on graphs: A survey," Combinatorics, Paul erdos is eighty, vol. 2, no. 1, pp. 1-46, 1993. [104] A. Vaswani et al., "Attention is all you need," in Advances in neural information processing systems, 2017, pp. 5998-6008. [105] Z. Luo, Y. Zhang, L. Zhou, B. Zhang, J. Luo, and H. Wu, "Micro-Vessel Image Segmentation Based on the AD-UNet Model," IEEE Access, vol. 7, pp. 143402-143411, 2019, doi: 10.1109/access.2019.2945556. [106] S. Lian, L. Li, G. Lian, X. Xiao, Z. Luo, and S. Li, "A Global and Local Enhanced Residual U-Net for Accurate Retinal Vessel Segmentation," IEEE/ACM Trans Comput Biol Bioinform, May 16 2019, doi: 10.1109/TCBB.2019.2917188. [107] Y. Lv, H. Ma, J. Li, and S. Liu, "Attention Guided U-Net With Atrous Convolution for Accurate Retinal Vessels Segmentation," IEEE Access, vol. 8, pp. 32826-32839, 2020, doi: 10.1109/access.2020.2974027. [108] B. Wang, S. Wang, S. Qiu, W. Wei, H. Wang, and H. He, "CSU-Net: A Context Spatial U-Net for Accurate Blood Vessel Segmentation in Fundus Images," IEEE J Biomed Health Inform, vol. PP, Jul 22 2020, doi: 10.1109/JBHI.2020.3011178. [109] X. Li, Y. Jiang, M. Li, and S. Yin, "Lightweight Attention Convolutional Neural Network for Retinal Vessel Segmentation," IEEE Transactions on Industrial Informatics, 2020. [110] K. Li, X. Qi, Y. Luo, Z. Yao, X. Zhou, and M. Sun, "Accurate Retinal Vessel Segmentation in Color Fundus images via Fully Attention-based Networks," IEEE Journal of Biomedical and Health Informatics, 2020. [111] Q. Fu, S. Li, and X. Wang, "MSCNN-AM: A MultiScale Convolutional Neural Network With Attention Mechanisms for Retinal Vessel Segmentation," IEEE Access, vol. 8, pp. 163926163936, 2020. [112] X. Tang, B. Zhong, J. Peng, B. Hao, and J. Li, "Multi-scale channel importance sorting and spatial attention mechanism for retinal vessels segmentation," Applied Soft Computing, p. 106353, 2020. [113] Z. Yan, X. Yang, and K. Cheng, "Joint SegmentLevel and Pixel-Wise Losses for Deep Learning Based Retinal Vessel Segmentation," IEEE 20 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3102176, IEEE Access Chunhui Chen et al,: Retinal Vessel Segmentation Using Deep Learning: A Review (June 2021) Transactions on Biomedical Engineering, vol. 65, no. 9, pp. 1912-1923, 2018. [114] [115] [116] [117] [118] V. Nasery, K. B. Soundararajan, and J. Galeotti, "Learning to Segment Vessels from Poorly Illuminated Fundus Images," in 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), 2020: IEEE, pp. 1232-1236. A. Galdran, P. Costa, A. Bria, T. Araújo, A. M. Mendonça, and A. Campilho, "A no-reference quality metric for retinal vessel tree segmentation," in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2018: Springer, pp. 82-90. D. E. Alvarado-Carrillo, E. Ovalle-Magallanes, and O. S. Dalmau-Cedeño, "D-GaussianNet: Adaptive Distorted Gaussian Matched Filter with Convolutional Neural Network for Retinal Vessel Segmentation," Geometry and Vision, vol. 1386, p. 378, 2021. H. Wu, W. Wang, J. Zhong, B. Lei, Z. Wen, and J. Qin, "SCS-Net: A Scale and Context Sensitive Network for Retinal Vessel Segmentation," Medical Image Analysis, vol. 70, p. 102025, 2021. J. I. Orlando, J. B. Breda, K. Van Keer, M. B. Blaschko, P. J. Blanco, and C. A. Bulant, "Towards a glaucoma risk index based on simulated hemodynamics from fundus images," in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2018: Springer, pp. 65-73. [119] D. Wang, A. Haytham, J. Pottenburgh, O. Saeedi, and Y. Tao, "Hard Attention Net for Automatic Retinal Vessel Segmentation," IEEE J Biomed Health Inform, vol. PP, Jun 17 2020, doi: 10.1109/JBHI.2020.3002985. [120] Z. Yan, X. Yang, and K. T. Cheng, "A Three-Stage Deep Learning Model for Accurate Retinal Vessel Segmentation," IEEE J Biomed Health Inform, vol. 23, no. 4, pp. 1427-1436, Jul 2019, doi: 10.1109/JBHI.2018.2872813. [121] V. Sathananthavathi and G. Indumathi, "Parallel Architecture of Fully Convolved Neural Network for Retinal Vessel Segmentation," Journal of digital imaging, vol. 33, no. 1, pp. 168-180, 2020. [122] L. Yang, H. Wang, Q. Zeng, Y. Liu, and G. Bian, "A Hybrid Deep Segmentation Network for Fundus Vessels via Deep-Learning Framework," Neurocomputing, 2021. [123] C. Tian, T. Fang, Y. Fan, and W. Wu, "Multi-path convolutional neural network in fundus segmentation of blood vessels," Biocybernetics and Biomedical Engineering, 2020. [124] H. Xia, R. Zhuge, and H. Li, "Retinal vessel segmentation via a coarse-to-fine convolutional neural network," in 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2018: IEEE, pp. 1036-1039. [125] K. Wang, X. Zhang, S. Huang, Q. Wang, and F. Chen, "CTF-Net: Retinal Vessel Segmentation via Deep Coarse-To-Fine Supervision Network," in 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), 2020: IEEE, pp. 12371241. [126] Y. Wu, Y. Xia, Y. Song, Y. Zhang, and W. Cai, "NFN+: A novel network followed network for retinal vessel segmentation," Neural Networks, 2020. [127] J. Hu et al., "S-UNet: A Bridge-Style U-Net Framework With a Saliency Mechanism for Retinal Vessel Segmentation," IEEE Access, vol. 7, pp. 174167-174177, 2019, doi: 10.1109/access.2019.2940476. [128] Ü. Budak, Z. Cömert, M. Çıbuk, and A. Şengür, "DCCMED-Net: Densely connected and concatenated multi Encoder-Decoder CNNs for retinal vessel extraction from fundus images," Medical Hypotheses, vol. 134, p. 109426, 2020. [129] G. A. Francia, C. Pedraza, M. Aceves, and S. TovarArriaga, "Chaining a U-Net With a Residual U-Net for Retinal Blood Vessels Segmentation," IEEE Access, vol. 8, pp. 38493-38500, 2020. [130] L. Li, M. Verma, Y. Nakashima, H. Nagahara, and R. Kawasaki, "Iternet: Retinal image segmentation utilizing structural redundancy in vessel networks," in The IEEE Winter Conference on Applications of Computer Vision, 2020, pp. 3656-3665. [131] Y. Guo, Ü. Budak, and A. Şengür, "A novel retinal vessel detection approach based on multiple deep convolution neural networks," Computer methods and programs in biomedicine, vol. 167, pp. 43-48, 2018. [132] P. Tang, Q. Liang, X. Yan, D. Zhang, G. Coppola, and W. Sun, "Multi-proportion channel ensemble model for retinal vessel segmentation," Computers in biology and medicine, vol. 111, p. 103352, 2019. [133] B. Zou et al., "Multi-label classification scheme based on local regression for retinal vessel segmentation," IEEE/ACM transactions on computational biology and bioinformatics, 2020. [134] V. Cherukuri, V. K. Bg, R. Bala, and V. Monga, "Deep retinal image segmentation with regularization under geometric priors," IEEE Transactions on Image Processing, vol. 29, pp. 2552-2567, 2019. 21 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3102176, IEEE Access Chunhui Chen et al,: Retinal Vessel Segmentation Using Deep Learning: A Review (June 2021) [135] S. Y. Shin, S. Lee, I. D. Yun, and K. M. Lee, "Deep vessel segmentation by learning graphical connectivity," Medical image analysis, vol. 58, p. 101556, 2019. [136] N. Tajbakhsh, B. Lai, S. P. Ananth, and X. Ding, "ErrorNet: learning error representations from limited data to improve vascular segmentation," in 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), 2020: IEEE, pp. 13641368. [145] J. He and D. Jiang, "Fundus Image Segmentation Based on Improved Generative Adversarial Network for Retinal Vessel Analysis," in 2020 3rd International Conference on Artificial Intelligence and Big Data (ICAIBD), 2020: IEEE, pp. 231-236. [146] J. Son, S. J. Park, and K.-H. Jung, "Towards accurate segmentation of retinal vessels and the optic disc in fundoscopic images with generative adversarial networks," Journal of digital imaging, vol. 32, no. 3, pp. 499-512, 2019. [137] W. Tu, W. Hu, X. Liu, and J. He, "DRPAN: A novel Adversarial Network Approach for Retinal Vessel Segmentation," in 2019 14th IEEE Conference on Industrial Electronics and Applications (ICIEA), 2019: IEEE, pp. 228-232. [147] K.-B. Park, S. H. Choi, and J. Y. Lee, "M-GAN: Retinal Blood Vessel Segmentation by Balancing Losses Through Stacked Deep Fully Convolutional Networks," IEEE Access, vol. 8, pp. 146308146322, 2020. [138] C. Wu, Y. Zou, and Z. Yang, "U-GAN: Generative Adversarial Networks with U-Net for Retinal Vessel Segmentation," in 2019 14th International Conference on Computer Science & Education (ICCSE), 2019: IEEE, pp. 642-646. [148] [139] J. Ma, M. Wei, Z. Ma, L. Shi, and K. Zhu, "Retinal vessel segmentation based on Generative Adversarial network and Dilated convolution," in 2019 14th International Conference on Computer Science & Education (ICCSE), 2019: IEEE, pp. 282-287. Q. Huo, G. Tang, and F. Zhang, "Particle Swarm Optimization for Great Enhancement in Semisupervised Retinal Vessel Segmentation with Generative Adversarial Networks," in Machine Learning and Medical Engineering for Cardiovascular Health and Intravascular Imaging and Computer Assisted Stenting: Springer, 2019, pp. 112-120. [149] J. Kennedy and R. Eberhart, "Particle swarm optimization," in Proceedings of ICNN'95international conference on neural networks, 1995, vol. 4: IEEE, pp. 1942-1948. [150] A. Lahiri, V. Jain, A. Mondal, and P. K. Biswas, "Retinal Vessel Segmentation Under Extreme Low Annotation: A Gan Based Semi-Supervised Approach," in 2020 IEEE International Conference on Image Processing (ICIP), 2020: IEEE, pp. 418422. [151] L. Ngo and J. Han, "Multi-level deep neural network for efficient segmentation of blood vessels in fundus images," Electronics Letters, vol. 53, no. 16, pp. 1096-1098, 2017. [152] J. Guo, S. Ren, Y. Shi, and H. Wang, "Automatic Retinal Blood Vessel Segmentation Based on Multi-Level Convolutional Neural Network," in 2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), 13-15 Oct. 2018 2018, pp. 1-5. [153] M. Li, Q. Yin, and M. Lu, "Retinal Blood Vessel Segmentation Based on Multi-Scale Deep Learning," in 2018 Federated Conference on Computer Science and Information Systems (FedCSIS), 9-12 Sept. 2018 2018, pp. 1-7. [154] Y. Lin, H. Zhang, and G. Hu, "Automatic Retinal Vessel Segmentation via Deeply Supervised and Smoothly Regularized Network," IEEE Access, vol. [140] [141] Y. Dong, W. Ren, and K. Zhang, "Deep supervision adversarial learning network for retinal vessel segmentation," in 2019 12th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISPBMEI), 2019: IEEE, pp. 1-6. Y. Zhou, Z. Chen, H. Shen, X. Zheng, R. Zhao, and X. Duan, "A refined equilibrium generative adversarial network for retinal vessel segmentation," Neurocomputing, vol. 437, pp. 118130, 2021. [142] T. Yang, T. Wu, L. Li, and C. Zhu, "SUD-GAN: Deep Convolution Generative Adversarial Network Combined with Short Connection and Dense Block for Retinal Vessel Segmentation," Journal of Digital Imaging, pp. 1-12, 2020. [143] X. Guo et al., "Retinal Vessel Segmentation Combined With Generative Adversarial Networks and Dense U-Net," IEEE Access, vol. 8, pp. 194551-194560, 2020. [144] S. A. Rammy, S. J. Anwar, M. Abrar, and W. Zhang, "Conditional Patch-based Generative Adversarial Network for Retinal Vessel Segmentation," in 2019 22nd International Multitopic Conference (INMIC), 2019: IEEE, pp. 16. 22 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3102176, IEEE Access Chunhui Chen et al,: Retinal Vessel Segmentation Using Deep Learning: A Review (June 2021) 7, pp. 57717-57724, 10.1109/access.2018.2844861. [155] [156] [157] 2019, doi: K. Hu et al., "Retinal vessel segmentation of color fundus images using multiscale convolutional neural network with an improved cross-entropy loss function," Neurocomputing, vol. 309, pp. 179-191, 2018. S. Guo, K. Wang, H. Kang, Y. Zhang, Y. Gao, and T. Li, "BTS-DSN: Deeply supervised neural network with short connections for retinal vessel segmentation," Int J Med Inform, vol. 126, pp. 105113, Jun 2019, doi: 10.1016/j.ijmedinf.2019.03.015. S. Feng, Z. Zhuo, D. Pan, and Q. Tian, "CcNet: A cross-connected convolutional network for segmenting retinal vessels using multi-scale features," Neurocomputing, vol. 392, pp. 268-276, 2020, doi: 10.1016/j.neucom.2018.10.098. [158] Z. Zhuo, J. Huang, K. Lu, D. Pan, and S. Feng, "A size-invariant convolutional network with dense connectivity applied to retinal vessel segmentation measured by a unique index," Computer Methods and Programs in Biomedicine, p. 105508, 2020. [159] K. J. Noh, S. J. Park, and S. Lee, "Scale-space approximated convolutional neural networks for retinal vessel segmentation," Comput Methods Programs Biomed, vol. 178, pp. 237-246, Sep 2019, doi: 10.1016/j.cmpb.2019.06.030. [160] S. Xie and Z. Tu, "Holistically-nested edge detection," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1395-1403. [161] H. M. Wallach, "Conditional random fields: An introduction," Technical Reports (CIS), p. 22, 2004. [162] K. Aurangzeb, S. Aslam, M. Alhussein, R. A. Naqvi, M. Arsalan, and S. I. Haider, "Contrast Enhancement of Fundus Images by Employing Modified PSO for Improving the Performance of Deep Learning Models," IEEE Access, vol. 9, pp. 47930-47945, 2021. [163] A. Şengür, Y. Guo, Ü. Budak, and L. J. Vespa, "A retinal vessel detection approach using convolution neural network," in 2017 International Artificial Intelligence and Data Processing Symposium (IDAP), 2017: IEEE, pp. 1-4. [164] J. Song and B. Lee, "Development of automatic retinal vessel segmentation method in fundus images via convolutional neural networks," in 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 11-15 July 2017 2017, pp. 681-684. [165] K. Zuiderveld, "VIII.5. - Contrast Limited Adaptive Histogram Equalization," in Graphics Gems, P. S. Heckbert Ed.: Academic Press, 1994, pp. 474-485. [166] H. Hassanpour, N. Samadiani, and S. M. Mahdi Salehi, "Using morphological transforms to enhance the contrast of medical images," The Egyptian Journal of Radiology and Nuclear Medicine, vol. 46, no. 2, pp. 481-489, 2015/06/01/ 2015, doi: https://doi.org/10.1016/j.ejrnm.2015.01.004. [167] O. Russakovsky et al., "Imagenet large scale visual recognition challenge," International journal of computer vision, vol. 115, no. 3, pp. 211-252, 2015. [168] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "Densely connected convolutional networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700-4708. 23 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/