International Journal of Machine Learning and Cybernetics https://doi.org/10.1007/s13042-020-01096-5 EDITORIAL Recent advances in deep learning Xizhao Wang1 · Yanxia Zhao1 · Farhad Pourpanah2 © Springer-Verlag GmbH Germany, part of Springer Nature 2020 With the recent advancement in digital technologies, the size papers from the articles accepted in this journal to organize of data sets has become too large in which traditional data this issue. Focusing on recent developments in DL archi- processing and machine learning techniques are not able to tectures and their applications, we classify the articles in cope with effectively [1, 2]. However, analyzing complex, this issue into four categories: (1) deep architectures and high dimensional, and noise-contaminated data sets is a huge conventional neural networks, (2) incremental learning, challenge, and it is crucial to develop novel algorithms that (3) recurrent neural networks, and (4) generative models are able to summarize, classify, extract important informa- and adversarial examples. In the following, we give a brief tion and convert them into an understandable form [3–5]. summary of each category and then individually introduce To undertake these problems, deep learning (DL) models related articles. have shown outstanding performances in the recent decade. Deep learning (DL) has revolutionized the future of artifi- cial intelligence (AI). It has solved many complex problems 1 Category 1: deep architectures that existed in the AI community for many years. In fact, and conventional neural networks DL models are deeper variants of artificial neural networks (ANNs) with multiple layers, whether linear or non-linear. Deep neural network (DNN) [10] is one of the most com- Each layer is connected to its lower and upper layers through mon DL models that contains multiple layers of linear and different weights. The capability of DL models in learning non-linear operations. DNN is the extension of standard neu- hierarchical features from various types of data, e.g., numer- ral network with multiple hidden layers, which allows the ical, image, text and audio, makes them powerful in solving model to learn more complex representations of the input recognition, regression, semi- supervised and unsupervised data. In addition, convolutional neural network (CNN) is a problems [6–8]. variant of DNNs, which is inspired by the visual cortex of In recent years, various deep architectures with differ- animals [11]. CNN usually contains three types of layers, ent learning paradigm are quickly introduced to develop including convolution, pooling, and fully connected layers. machines that can perform similar to human or even better The convolution and pooling layers are added in the lower in different domains of application such as medical diagno- levels. The convolution layers generate a set of linear activa- sis, self-driving cars, natural language and image process- tions, which is followed by non-linear functions. In fact, the ing, and predictive forecasting [9]. To show some recent convolution layers apply some filters to reduce complexity advances of deep learning to some extent, we select 14 of the input data [12]. Then, the pooling layers are used for down-sampling of the filtered results. The pooling layers manage to reduce the size of the activation maps by trans- * Xizhao Wang ferring them into a smaller matrix [13]. Therefore, pooling

[email protected]

solves the over-fitting problem by reducing complexity [14]. Yanxia Zhao The fully connected layers are located after the convolution

[email protected]

and pooling layers, in order to learn more abstract represen- Farhad Pourpanah tations of the input data. In the last layer, a loss function,

[email protected]

e.g., a soft-max classifier, is used to map the input data to 1 its corresponding output. CNN-based models have shown College of Management, Hebei University, Baoding 071002, Hebei, China outstanding results in the areas of image processing and 2 computer vision. This category contains four articles. College of Mathematics and Statistics, Shenzhen University, Shenzhen 518060, Guangdong, China 13 Vol.:(0123456789) International Journal of Machine Learning and Cybernetics The paper “Combination of loss functions for deep text samples without re-learning preciously learned samples [18, classification” authored by Hamideh Hajiabadi, Diego 19]. Besides, incremental learning techniques are useful for Molla-Aliod, Reza Monsefi and Hadi Sadoghi Yazdi, consid- training complex structure of DL models when the train- ers ensemble methods at the level of the objective function ing samples are provided over time [20, 21]. This category of a deep neural network. This paper proposed a novel objec- contains two articles. tive function that is a linear combination of single losses and The paper “Cross-modal learning for material percep- integrate the proposed objective function into a deep neural tion using deep extreme learning ma- chine”, authored by network. By doing so, the weights associated with the linear Wendong Zheng, Huaping Liu, Bowen Wang and Fuchun combination of losses are learned by back propagation dur- Sun, proposed a visual-tactile cross-modal retrieval frame- ing the training stage. The impact of the proposed ensemble work to convey tactile information of surface material for loss function is studied on the state-of-the-art convolutional perceptual estimation. In this paper, tactile information of a neural networks for text classification. new unknown surface material is used to retrieve perceptu- In the paper “A deep neural network-based recommen- ally similar surface from an available surface visual sample dation algorithm using user and item basic data” authored set. Specifically, a deep cross-modal correlation learning by Jian-Wu Bi, Yang Liu and Zhi-Ping Fan, a new recom- method, which incorporates the high-level nonlinear repre- mendation algorithm based on deep neural networks is pro- sentation of deep extreme learning machine and class-paired posed. The main idea of the algorithm is to build a regres- correlation learning of cluster canonical correlation analysis, sion model for predicting user ratings based on deep neural is developed. networks. To this end, based on user data and item data, a The paper “DeepCascade-WR: a cascading deep archi- user feature matrix and an item feature matrix are respec- tecture based on weak results for time series prediction” tively constructed using the four types of neural network authored by Chunyang Zhang, Qun Dai and Gang Song, layers, i.e., embedding layer (EL), convolution layer (CL), considers the real- world time series predictions (TSPs) pooling layer (PL) and fully connected layer (FCL). Then, tasks. In this work, a cascading deep architecture based on based on the obtained matrixes, a user-item feature matrix weak results (DeepCascade-WR) is established, which pos- is further constructed using a FCL. On this basis, a regres- sesses deep models marked capability of feature represen- sion model for predicting user ratings is trained to generate tation learning based on complex data. DeepCascade-WR recommendation list. possesses online learning ability and effectively avoids the The paper “A discriminative deep association learning for retraining problem, owing to the property of OS-ELM. In facial expression recognition” authored by Xing Jin, Wenyun addition, DeepCascade-WR naturally inherits some valuable Sun and Zhong Jin, proposed a novel discriminative deep virtues from ELM, including faster training speed, better association learning (DDAL) framework for facial expres- generalization ability and the avoidance of being fallen into sion recognition. In this work, the unlabeled data is used to local optima. train the DNNs with the labeled data simultaneously, in a multi-loss deep network based on association learning. In addition, the discrimination loss is utilized to ensure intra- 3 Category 3: recurrent neural networks class clustering and inter-class centers separating. In the paper “A technical view on neural architecture Recurrent neural networks (RNNs) [22] have the deepest search” authored by Yi-Qi Hu and Yang Yu, a review of structures among the DL algorithms, which are able to map recent advances in neural architecture search (NAS) from sequential input data to their output [23]. Unlike traditional a technical point of view is provided. This paper drew a DNNs, the nodes in each RNN layer are connected to each whole picture of NAS for readers including problem defini- other. This self-connection enables RNNs to memorize tion, basic search framework, key technique towards practice information over time from a sequence of data. The long- and promising future directions. short term memory (LSTM) [24] and gated recurrent units (GRU) [25] are two improved models of RNNs. Although RNNs are powerful, it is difficult to train a long-range 2 Category 2: incremental learning sequence of data due to vanishing or exploding gradient problem [26]. To solve this issue, LSTM and GRU use gate Incremental learning refers to the condition of continu- units to decide what information to keep or remove from the ous model adaptation based on a constantly arriving input previous state. RNN-based models have been widely applied samples [15–17]. Unlike machine learning techniques with to handle sequential learning problems. This category con- batch learning procedure that have to re-execute an itera- tains five articles. tive training procedure using both old and new samples, The paper “DeepSite: bidirectional LSTM and CNN incremental learning techniques require to learn only new models for predicting DNAprotein binding” authored by 13 International Journal of Machine Learning and Cybernetics Yongqing Zhang, Shaojie Qiao, Shengjie Ji and Yizhou the second one performs classification on the candidates on Li, considers the prediction of DNA–protein binding sites which the network is expected to focus. in DNA sequence using DL methods. In this paper, Deep- Site, which is the bidirectional long short-term memory (BLSTM) and CNN, is employed to capture the long-term 4 Category 4: generative models dependencies between the sequence motifs in DNA. and adversarial examples The paper “Single image rain streaks removal: a review and an exploration” authored by Hong Wang, Qi Generative models aim to generate new samples with some Xie, Yichen Wu, Qian Zhao and Deyu Meng, provided variations through learning distribution of the training sam- a detailed review of single-image-based rain removal ples [27]. Variational autoencoders (VAE) [28] and gen- techniques in recent years. These techniques are catego- erative adversarial networks (GAN) [29] are two prominent rized into: early filter-based, conventional prior-based, members of generative models. DL models usually require and recent deep learning-based approaches. In addition, large amount of labeled samples to learn their parameters. inspired by the rationality of DL-based methods and However, obtaining sufficient labeled samples in many prac- insightful characteristics underlying rain shapes, a spe- tical applications is difficult and computationally expensive. cific coarse-to-fine de-raining network architecture is built. To alleviate this problem, generative models can be used This architecture is able to deliver the rain structures and [30]. They can be used to solve recognition, semi-supervised progressively removes rain streaks from the input image, learning, unsupervised feature learning, denoising tasks. accordingly. Despite the great successes of DL models in solving many The paper “Learning deep hierarchical and temporal real-world problems, they can be easily fooled by adversarial recurrent neural networks with residual learning” authored examples [31]. This issue raises concerns in many fields by Tehseen Zia, Assad Abbas, Usman Habib and Muham- such as safety or autonomous vehicles. Thus it is crucial to mad Sajid Khan, studies deep hierarchical and temporal study the effects of adversarial examples on the performance structures in RNNs. The goal is to prove that approximating of DL models. This category contains three articles. identity mapping is crucial for optimizing both hierarchi- In the paper “An adversarial non-volume preserving flow cal and temporal structures. In this regard, a framework, model with Boltzmann priors” authored by Jian Zhang, called hierarchical and temporal residual RNNs, is proposed Shifei Ding and Weikuan Jia, an adversarial non-volume to learn RNNs by approximating identity mappings across preserving flow model with Boltzmann priors (ANVP) for hierarchical and temporal structures. modeling complex high-dimensional densities is proposed. The paper “Weighted multi-deep ranking supervised ANVP introduced an adversarial regularizer into the loss hashing for efficient image retrieval” authored by Jiayong function to penalize the condition that places a high prob- Li, Wing W. Y. Ng, Xing Tian, Sam Kwong and Hui Wang, ability in regions where the training data distribution has a focuses on deep hashing networks for large-scale image low density to generate sharper images. retrieval. This paper proposed a weighted multi-deep rank- The paper “Emotion recognition using multimodal deep ing supervised hashing (WMDRH), which employs multiple learning in multiple psychophysiological signals and video” weighted deep hash tables, to improve precision/recall with- authored by Wang Zhongmin, Zhou Xiaoxiao, Wang Wen- out increasing space usage. A loss function that contains two lang and Liang Chen, proposed an DL based approach to terms: (1) the ranking pairwise loss and (2) the classification train several specialist networks to fuse the features of indi- loss, is used to generate hash codes. The former one ensures vidual modalities. This approach includes a multimodal to generate discriminative hash codes by penalizing more deep belief network (MDBN) and two bimodal deep belief for the (dis)similar image pairs with (small)large Hamming network (BDBN). The MDBN is used to optimize and fuse distances, and the classification loss guarantees the hash unified psychophysiological features derived from the fea- codes to be effective for category prediction. Besides, mul- tures of multiple psychophysiological signals, one DBBN to tiple hash tables are integrated by assigning the appropriate focus on representative visual features among the features weight to each table according to its mean average precision of a video stream, and another DBBN to focus on the high (MAP) score for image retrieval. multimodal features in the unified features obtained from The paper “Pothole detection using location-aware convo- two modalities. lutional neural networks” authored by Hanshen Chen, Min- The paper “Robustness to Adversarial Examples can be ghai Yao and Qinlong Gu, proposed a new method based Improved with Overfitting” authored by Oscar Deniz, Noelia on location- aware convolutional neural networks to detect Vallez, Jesus Salido and Gloria Bueno, studies the effects pothole in road images. The proposed method consists of of adversarial examples on the performance of DL meth- two subnetworks: the first one employs a high-recall network ods. This paper, firstly, argued that the error in ad- versarial model to find as many candidate regions as possible, and examples is caused by high bias, i.e. by regularization that 13 International Journal of Machine Learning and Cybernetics has local negative effects, and then supported this idea by 13. Scherer D, Mller A, Behnke S (2010) Evaluation of pooling oper- experiments in which the robustness to adversarial examples ations in convolutional architectures for object recognition. In: International conference on artificial neural networks, pp 92–101 is measured with respect to the level of fitting to training 14. Nguyen A, Yosinski J, Clune J (2015) Deep neural networks are samples. easily fooled: High confidence predictions for unrecognizable In summary, this issue shows some recent advances in DL images. In: Proceedings of the IEEE conference on computer from a new angle to some extent. It includes fourteen arti- vision and pattern recognition, pp 427–436 15. Pourpanah F, Wang R, Lim CP, Wang X, Seera M, Tan CJ (2019) cles belonging to four categories, among which four belong An improved fuzzy ARTMAP and Q-learning agent model for to the scope of deep architectures and conventional neural pattern classification. Neurocomputing 359:139–152 networks, two belong to the area of incremental learning, 16. Silver DL (2011) Machine lifelong learning: challenges and ben- five belong to the scope of recurrent neural networks, and efits for artificial general intelligence. In: International conference on artificial general intelligence, pp 370–375 three belong to the field of generative models and adver- 17. Pourpanah F, Lim CP, Hao Q (2019) A reinforced fuzzy ART- sarial examples. It aims to provide readers with some useful MAP model for data classification. Int J Mach Learn Cybernet guidelines to know the recent developments in algorithm and 10(7):1643–1655 application of DL, and to give a collection of DL articles for 18. Jain LC, Seera M, Lim CP, Balasubramaniam P (2014) A review of online learning in supervised neural networks. Neural Comput readers convenient to reference. Appl 25:491–509 19. Pourpanah F, Zhang B, Ma R, Hao Q (2018) Non-intrusive human motion recognition using distributed sparse sensors and References the genetic algorithm based neural network. IEEE Sensors, pp 1–4 20. Gepperth A, Hammer B (2016) Incremental learning algorithms and applications. In: European symposium on artificial neural 1. Wang X, Joshua HZ (2015) Uncertainty in learning from big data. networks Fuzzy Sets Syst 258:1–4 21. Sarwar SS, Ankit A, Roy K (2020) Incremental learning in deep 2. Rezvani S, Wang X, Pourpanah F (2019) Intuitionistic convolutional neural networks using partial network sharing. fuzzy twin support vector machines. IEEE Trans Fuzzy Syst IEEE Access 8:4615–4628 27(11):2140–2151 22. Pascanu R, Gulcehre C, Cho K, Bengio Y (2013) How to construct 3. Wang Z, Wang X (2018) A deep stochastic weight assignment deep recurrent neural networks. arXiv:1312.6026 network and its application to chess playing. J Parallel Distrib 23. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning repre- Comput 117:205–211 sentations by back-propagating errors. Nature 323(6088):533–536 4. Sherkatghanad Z, Akhondzadeh M, Salari S, Zomorodi-Mogh- 24. Hochreiter S, Schmidhuber J (1997) Long short-term memory. adam M, Abdar M, Acharya UR, Khosrowabadi R, Salari V Neural Comput 9(8):1735–1780 (2019) Automated detection of autism spectrum disorder using 25. Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evalu- a convolutional neural network. Front Neurosci 13:1325 ation of gated recurrent neural networks on sequence modeling. 5. Pourpanah F, Lim CP, Wang X, Tan CJ, Seera M, Shi Y (2019) arXiv:1412.3555 A hybrid model of fuzzy minmax and brain storm optimization 26. Dang A, Vu TH, Wang JC (2017) A survey of deep learning for for feature selection and data classification. Neurocomputing polyphonic sound event detection. In: International conference on 333:440–451 orange technologies, pp 75–78 6. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 27. Koller D, Friedman N (2009) Probabilistic graphical models: prin- 521(7553):36 ciples and techniques. MIT Press, Cambridge 7. Wang X, Zhang T, Wang R (2017) Noniterative deep learning: 28. Kingma DP, Welling M (2014) Auto-encoding variational Bayes. incorporating restricted boltzmann machine into multilayer ran- In: International conference on learning representations dom weight neural networks. IEEE Trans Syst Man Cybern Syst 29. Goodfellow I, Abadie JP, Mirza M, Xu B, Farley DW, Ozair S, 49(7):1299–1308 Courville A, Bengio Y (2014) Generative adversarial nets. In: 8. Korsuk S, Ahmed RSE, Yee-Wah T, David SRJ, Cree IA, Rajpoot Neural Information Processing Systems, pp 2672–2680 NM (2016) Locality sensitive deep learning for detection and clas- 30. Zhijian O (2018) A review of learning with deep generative mod- sification of nuclei in routine colon cancer histology images. IEEE els from perspective of graphical modeling. arXiv:1808.01630 Trans Med Imaging 35(5):1196–1206 31. Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfel- 9. Sengupta S, Basak S, Saikia P, Paul S, Tsalavoutis V, Atiah F, low I, Fergus R (2013) Intriguing properties of neural networks. Ravi V, Peters A (2020) A review of deep learning with special arXiv:1312.6199 emphasis on architectures, applications and recent trends. Knowl- Based Syst https://doi.org/10.1016/j.knosys.2020.105596 Publisher’s Note Springer Nature remains neutral with regard to 10. Saxe AM, McClelland JL, Ganguli S (2013) Exact solutions to jurisdictional claims in published maps and institutional affiliations. the nonlinear dynamics of learning in deep linear neural networks. arXiv:1312.6120 11. Hubel DH, Wiesel TN (1968) Receptive fields and functional architecture of monkey striate cortex. J Physiol 195(1):215–243 12. Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhut- dinov RR (2012) Improving neural networks by preventing co- adaptation of feature detectors. arXiv:1207.0580 13