International Journal of Machine Learning and Cybernetics https://doi.org/10.1007/s13042-018-0843-4 ORIGINAL ARTICLE A reinforced fuzzy ARTMAP model for data classification Farhad Pourpanah1  · Chee Peng Lim2 · Qi Hao1 Received: 28 December 2017 / Accepted: 6 June 2018 © Springer-Verlag GmbH Germany, part of Springer Nature 2018 Abstract This paper presents a hybrid model consisting of fuzzy ARTMAP (FAM) and reinforcement learning (RL) for tackling data classification problems. RL is used as a feedback mechanism to reward the prototype nodes of data samples established by FAM. Specifically, Q-learning is adopted to develop the hybrid model known as QFAM. A Q-value is assigned to each prototype node, which is updated incrementally based on the prediction accuracy of the node pertaining to each data sample. To evaluate the performance of the proposed QFAM model, a series of experiments with benchmark problems and a real- world case study, i.e., human motion recognition, are conducted. The bootstrap method is used to quantify the results with the 95% confidence interval estimates. The results are also compared with those from FAM as well as other models reported in the literature. The outcomes indicate the effectiveness of QFAM in tackling data classification tasks. Keywords Data classification · Fuzzy ARTMAP · Reinforcement learning · Q-learning 1 Introduction procedure often results in slow convergence to the optimal network weights and overfitting problem when a large data Artificial neural networks (ANNs) [1] are popular data-based set is used [6]. To alleviate these problems, online neural learning models for tackling classification problems. To this networks that are able to learn incrementally have been pro- end, many ANN models such as multi-layered perceptron posed such as PNN, ART, Fuzzy Min–Max network (FMM) (MLP) [2], probabilistic neural network (PNN) [3], radial [7], and related models [8, 9]. A review of online learning in basis function (RBF) [4], and adaptive resonance theory supervised neural network models is presented in Ref. [10]. (ART) [5], have been proposed. The main challenge of data- Fuzzy ARTMAP (FAM) [11] is a supervised neural net- based learning models is to overcome the stability–plastic- work that combines the capability of ART in solving the ity dilemma [5], that means the learning model should be stability–plasticity dilemma and the benefits of fuzzy logic. able to absorb useful information from new data samples Instead of more complex fuzzy sets, e.g., type-2 fuzzy set incrementally without forgetting or corrupting previously or institutional fuzzy sets [12], FAM uses the conventional learned information. However, many ANN models such as type-1 fuzzy set to handle vague and imprecise human the traditional MLP and RBF [4] networks with batch learn- linguistic information in a fast manner. FAM is an online ing procedures are not able to overcome this dilemma. When learning model that operates by measuring the similar- a batch-learning network is given a set of new data samples ity between its prototype nodes and the new input sample after its training phase, the network executes an iterative against a threshold. If the similarity measure is not satis- training procedure using both new and previous data samples fied, a new prototype node can be added to encode the new for learning, in an attempt to preserve its existing knowledge data sample. While new information can be absorbed by base (established based on previous data samples). This increasing the number of prototype nodes incrementally [5], previously learned information can still be preserved in its network structure; therefore overcoming the stability–plas- * Farhad Pourpanah ticity dilemma.

[email protected]

However, it is possible for classification algorithms to 1 School of Computer Science and Engineering, Southern incorrectly classify input samples from different classes that University of Science and Technology, Shenzhen, China are located along the decision boundary [13, 14]. In Ref. 2 Institute for Intelligent Systems Research and Innovation, [15], an optimal hyperplane is generated to maximize the Deakin University, Waurn Ponds, Australia 13 Vol.:(0123456789) International Journal of Machine Learning and Cybernetics margin among two classes for SVM. Indeed, many incre- selects the appropriate winning nodes by considering their mental learning classifiers are susceptible to this problem, associated strengths, which is based on both similarity and including the resource allocating network (RAN) [16], FAM, goodness (Q-value) of individual prototype nodes. In addi- FMM and its variant [17]. This is because the learning algo- tion, FAM chooses only one prototype node, i.e., the one rithms of incremental learning classifiers create similar pro- with the highest score of the choice function. In QFAM, totypes that belong to different classes along the decision all nodes that are able to satisfy the vigilance test form boundary. FAM selects the winning prototype node based the potential winning nodes, and the one with the highest on (firstly) the choice function and (then) the vigilance test. strength score is selected as the final winner. These modi- The problem of misclassification, especially from prototype fications enable QFAM to produce better results, particu- nodes along the decision boundary, becomes significant in larly for undertaking noisy data samples (as demonstrated noisy environments. Many investigations to enhance the in Sects. 5.1.2 and 5.3). To demonstrate the effectiveness of performance of FAM have been reported, e.g., hybrid FAM the proposed QFAM model, several benchmark problems models that harness the merits of the constituents includ- as well as a real-world case study, i.e., human motion rec- ing FAM with Gaussian models [18] and FAM with the ognition, are examined in this paper. The results show that Genetic Algorithm (GA) [19]. In this paper, we investigate QFAM is able to produce promising results in compare with the combination of FAM and reinforcement learning (RL) FAM and another state of the art models reported in the liter- [20] in an attempt to enhance the classification performance ature. The main contributions of this research are two-folds: of FAM. The generalization, i.e., the ratio of the number of correctly classified samples which are not included in the • To develop a hybrid model that combines FAM and training samples, is one of the most important performance Q-learning for more efficient pattern classification; indicators for evaluating the classifier. In Ref. [21], it has • To evaluate the effectiveness of proposed FAM-based been shown that the generalization capability of a classifica- hybrid model tackling noisy and noise-free problems tion algorithm is related to uncertainty of the classification using benchmark and real-world data sets, and to com- problem. pare the results of the proposed model with another state RL is a semi-supervised machine learning technique of the art learning models. that learns from experience by interacting with the environ- ment. It receives a series of patterns from the environment, The remaining of this paper is organized as follows. In selects the best action, and executes it. Then, it receives the Sect. 2, a review on FAM and RL is presented. The FAM reinforcement signal that rewards or penalizes the selected and Q-learning methods are explained in Sect. 3. The pro- action. Two important characteristics of RL are delayed posed QFAM model is described in Sect. 4. In Sect. 5, the reward and trial-and-error [20]. RL algorithms have been experimental study to evaluate the usefulness of QFAM is extensively applied as an effective feedback mechanism to presented. Finally, conclusions and suggestions for further tackle control and decision-making problems. Some success- research are presented in Sect. 6. ful RL applications are cart-pole balancing [22], Backgam- mon game [23], function approximation [24], and elevator dispatching problem [25]. Note that, although less popular, 2 Literature review RL algorithms have been used to improve the performance of classifiers [26–30]. With respect to the ART family of neural networks, FAM is In this research, a hybrid model comprising FAM and one of the most popular models for data classification. Many Q-learning [20], known as QFAM, is introduced. The same hybrid FAM-based models have been proposed by integrat- idea in Ref. [29] has been adopted with some improve- ing FAM with other methods. A Gaussian ARTMAP net- ments. The reinforcement learning model proposed in Ref. work combining Gaussian models and FAM was proposed [29] used a sample-based method, i.e., updating the net- in Ref. [18]. The resulting model was resilient to noise, but it work weights using reinforcement learning. In this research, had a complicated learning rule. In Ref. [31], a hybrid model instead of updating the network weights, a Q-value is formu- of FAM and the PNN was proposed. The hybrid model was lated for each prototype node of FAM and then updated it able to reduce the number of nodes in the PNN and, at the on a sample-by-sample basis during the learning procedure. same time, to provide the capability of probability estima- Unlike the proposed model in Ref. [29] which needs a num- tion in FAM. ber of iterations to train the network, QFAM requires only To overcome the missing feature problems, a hybrid one-pass learning through the data samples. model of FAM and fuzzy c-means clustering was introduced In contrast to FAM that selects the winning prototype [32]. A hybrid model of FAM and the GA was presented in nodes by measuring similarity with respect to the input sam- Ref. [19]. Known as GFAM, the model was able to solve ple based on the choice function and vigilance test, QFAM the problem of category proliferation of FAM. FAMDDA 13 International Journal of Machine Learning and Cybernetics [33] was another hybrid model consisting of FAM and the enhance previous work further. These models performed dynamic decay adjustment (DDA) algorithm. It was able to well, as compared with MLP, in solving the pole-balancing solve the problem of overlapped hyperboxes from different problem. The FALCON [44] was used to solve classification classes. Later in Ref. [34], two evolutionary networks based problems in Ref. [28]. The proposed model, FALCON-R, on FAM and evolutionary programming (EP) were pro- consisted of two FALCON modules, one as a fuzzy pre- posed, i.e., FAMDDA-EP and FAM-EP. The results showed dictor (critic network) and the other as a fuzzy controller that both models were able to achieve high accuracy rates in (action network). Based on TD learning, the critic network comparison with FAMDDA and FAM. A hybrid model of was used to predict the external reinforcement signals and to FAM and the Online Extreme Learning Machine (OELM) provide the internal reinforcement signals for the action net- known as FAM-OELM was introduced in Ref. [35]. It was work. The action network operated as a stochastic explora- able to learn new information without overlapping with pre- tory procedure based on the internal reinforcement signals. vious learned information. In Ref. [36], two evolutionary The results showed that FALCON-R outperformed FAL- networks using FAM and GA are proposed, i.e., FAM-HGA CON. The R-POPTVR (Pseudo-Outer Product Truth Value and FAMDDA-HGA. Both models were able to solve the Restriction) fuzzy neural network was introduced in Ref. imbalanced data classification tasks. [29] for tackling classification problems. It formed an inte- In Ref. [37], FAM operated either as a classifier or as gration of a reinforcement clustering algorithm and POP- a probability estimator. The results showed that FAM per- TVR [45] for classification. The R-POPTVR model was able formed better than Bayesian estimation models. A modified to outperform POPTVR in empirical studies. In Ref. [30], FAM model that solved the one-to-many mapping problem a memory-based RL method was proposed. Known as the was introduced in Ref. [38]. The modified FAM model was modified U-Tree, it was able to learn raw sensor data with a able to carefully approximate the Bayes optimal classifica- minimum amount of knowledge. tion rates, as compared with FAM. On the other hand, FAM As stated earlier, it is difficult for many models to identify with the Manhattan distance for classification of noisy sig- the label of those samples that are located in the boundary nals was presented in Ref. [39]. In Ref. [40], FAM was used area of two different classes, and FAM-based networks suf- to recognize emotion from speech signals. The GA was used fer the same problem. Given a data set, either noise-free or to optimize the FAM parameters, e.g., choice, vigilance, and noisy, FAM-based networks produce similar prototype nodes learning parameters. The results showed that the proposed linked with different target classes in their decision bound- model outperformed MLP, SVM, and k-Nearest Neighbor ary regions. In this case, the concept of Q-learning, which classifiers. The TPPFAM model [41] used a filtering tech- determines the goodness of each prototype node, is utilized nique during the FAM training phase to avoid the problem to determine the strength of similar prototype nodes located of category proliferation. It then employed the Q-max value along the decision boundary regions, and select the appro- and posterior probability during the FAM test phase to give priate ones for predicting a target class during the test phase. a prediction of the target class. On the other hand, RL is a favorite learning technique that has been applied in control, classification, and decision 3 Fuzzy ARTMAP and reinforcement support tasks. RL is a semi-supervised learning method that learning has a number of advantages over supervised learning meth- ods under certain conditions. Unlike supervised learning In this section, the dynamics of FAM and RL are explained. whereby the target output for each input sample is clearly The details are as follows. known, only minimal information that indicates success or failure of a prediction is needed in RL. As such, it does not 3.1 Dynamics of fuzzy ARTMAP require detailed knowledge of the target output. There are two main advantages of RL. Firstly, it has the capability of Fuzzy ARTMAP consists of two fuzzy ART modules (i.e., learning on-line in a search-control-learn mode based on ARTa and ARTb) that are interconnected by a map field, previous experience [42]. Secondly, it is a useful method f  ab, as shown in Fig. 1. Each ART module contains three when there is little knowledge about what and how to per- layers of nodes, i.e., normalization layer f0a ( f0b ), input layer form a task [43]. f1a ( f1b ), and recognition layer f2a ( f2b ). In the normalization In the area of data mining, RL has been used with vari- layer, complement coding is used. Complement coding ous clustering and classification algorithms to improve their avoids the problem of category proliferation [11], whereby performance. In Ref. [26], the concept of random hyperbox an M-dimensional input vector, a [0, 1]M , is transformed into was formulated to improve the FMM network, and RL was a 2M-dimensional vector: used for navigation of vehicle in unknown environments. Then, a stochastic FMM network was introduced [27] to A = (a, 1 − a) ≡ (a1 , … , am , 1 − a1 , … , 1 − am ). (1) 13 International Journal of Machine Learning and Cybernetics Fig. 1 Structure of FAM (adapted from [11]) The input layer receives the complement-coded vectors. If more than one Tj is maximal, the node with the smallest The recognition layer is a dynamic layer containing proto- j index is chosen. Resonance occurs if the match function of type nodes, whereby the number of nodes can be increased the chosen prototype node meets the vigilance criterion [11]: whenever necessary. During learning, the complement- | | |A ∧ waJ | coded input pattern, A, and its target vector, B, are presented | | ≥𝜌 , to ARTa and ARTb, respectively. To compute the similarity |A| a (4) measure between input pattern A and the j-th node in f2a, the following choice function is used [11]: where 𝜌a is the vigilance parameter of ARTa. However, a mismatch occurs if the condition in (4) is not satisfied. Then, | | |A ∧ waj | the current winning node is de-activated, and a new cycle to Tj = | |, | a| choose another winning node is initiated. This search cycle 𝛼 + |wj | (2) | | continues until the current winning J-th node meets the vigi- lance test. If no such node exists, a new node is created in f a2 where waj ≡ (waj1 , … , waj2M ) is the weight vector of node j in to encode the input sample. The same procedure takes place in ART b simultaneously to find the winning target node. f2a , and 𝛼 > 0 is the choice parameter. Initially, Once the winning nodes in ARTa and ARTb are determined, waj1 = ⋯ = waj2M = 1. The winning node, denoted as node J, the map-field vigilance test is applied, i.e., is chosen based on Ref. [11]: | b | |y ∧ wab j || { } | ≥ 𝜌ab , (5) TJ = max Tj ∶ j = 1, 2, … , N . (3) |yb | | | 13 International Journal of Machine Learning and Cybernetics where 𝜌ab is the map-field vigilance parameter, wab j is the 𝜋(s) = arg max(Q(s, a)). (10) weight vector from f2a to f ab, and yb denotes the output vec- a tor of f2b . If the winning ARTb node is K, then Once an action is selected, it is executed, and a reward signal is received. Then, it moves to the next state, S′. Q(s, a) { 1 if k = K ybk = (6) is updated at each step when action a at state s leads to state 0 otherwise. s′ [20]: If the map-field vigilance test fails, this means that the [ ] winning node in f2a gives an incorrect prediction of the tar- Q(s, a) ← Q(s, a) + 𝜉 r + 𝛾 max Q(s� � , a ) − Q(s, a) , (11) � � a ∈A get output. In this case, the vigilance parameter of ARTa is updated in the matching-tracking process, as follows: where 𝜉 ∈ [0, 1] is the learning rate. The reward (reinforcement) signal is system dependent. | | |A ∧ waj | It can be a function or a scalar. In classification tasks, if a 𝜌a = | | + 𝛿, (7) correct prediction is made, this means ‘success,’ otherwise |A| ‘failure.’ In this case, the reward signal can be formulated as where 𝛿 is a small positive value. Then, a new search cycle 0 and 1 for ‘success’ and ‘failure,’ respectively [28]. with the new 𝜌a setting ensues. This process is repeated until the prediction at the map field is correct. Once the map-field vigilance criterion is satisfied, the search process ends, and 4 The proposed QFAM model the learning takes place. The weight vector, waj , is updated In FAM, the similarity between the input sample and the to: existing prototypes in f2a is compared against the vigilance parameter. The winning node in f2a needs to satisfy the map- wa(new) = 𝛽a (A ∧ w(old) ) + (1 − 𝛽a )wJ(old) , (8) J J field vigilance test, in order to confirm its prediction and where 𝛽a is learning rate of ARTa. For fast learning, 𝛽 = 1 initiate the learning procedure. Otherwise, the winner is de- is used. activated, and a new round match-tracking to search for a new f2a winner is triggered. In line with the rationale of RL, 3.2 Reinforcement learning the winner in f2a can be rewarded or penalized, depending on whether it is able to satisfy the map-field vigilance test, Reinforcement learning can be considered as a class of respectively. Markov decision processes [46] that contains four compo- A new model integrating FAM and Q-learning is there- nents, i.e., states (S), actions (A), transition (T), and reward fore proposed. Known as QFAM, its training process is or reinforcement (R) function. S covers all possible states, A divided into two phases, i.e., learning and test. In the learn- is all actions for each state, T is a function that maps a state ing phase, FAM is used as the supervised classifier to asso- to an action, i.e., T: S × A → S, and R is a reward function ciate the input samples and target classes through the map (which can be a scalar). TD [20, 47] is a learning algorithm field. For each new f2a node, a Q-value is assigned. In the based on the generalized policy iteration (GPI) that pre- test phase, a combination of the choice function and Q-value dicts the behavior of a model at the next step by using past is used to choose the winning node in f2a . The details are as experience, e.g., forecasting the weather tomorrow based follows. on current weather information. Incremental learning is one of the advantages of TD methods in comparison with other 4.1 The training phase methods. SARSA, Actor-Critic, and Q-Learning are three different TD methods [20]. In this aspect, Q-learning has In FAM, when each f2a node is chosen as the winner, two been extensively used to solve RL problems [48]. The opti- possible scenarios can happen. One scenario is the f2a win- mal action function is approximated by the learned action ner gives a correct prediction (i.e., the map field vigilance function or known as Q-function. The Q-function for policy test is satisfied), and this leads to learning. Another is the 𝜋 is defined by [20]: f2a winner makes an incorrect prediction (i.e., the map field vigilance test fails), and this leads to match-tracking. As Q(s, a) = r(s, a) + 𝛾V 𝜋 (s, a), (9) such, each f2a node can be rewarded during learning or penal- where 𝛾 ∈ [0, 1] is a discount factor, V (s, a) is the value 𝜋 ized during match-tracking, whereby its Q-value at time t is function of action a in state s under policy 𝜋. formulated as follows. Q-learning usually selects an action that has the largest Q(s, a), i.e., Q(j)t = Q(j)t−1 + 𝜉[r(j)t + 𝛾vig(j)t ], (12) 13 International Journal of Machine Learning and Cybernetics where 𝜉 ∈ [0, 1] is the learning rate, 𝛾 ∈ [0, 1] is the dis- The choice function is first computed, which is followed count factor, vig(j) is the vigilance test value of the f2a winner by the vigilance test for all f2a nodes. Next, all f2a nodes that (node j), and r(j) is defined as satisfy the vigilance test are chosen. Then, the strength of { each selected f2a node is computed, as follows. 1 when learning occurs r(j)t = (13) 0 when match − tracking occurs. strength(j) = 𝜆T(j) + (1 − 𝜆)Q(j), (14) The learning phase of the QFAM model is summarized in Algorithm 1. Algorithm 1. Learning Phase Input: Training samples, parameters of QFAM Output: Parameters of trained QFAM For each training sample 1. Perform complement-coding (Eq. 1), and then propagate the input vector to . 2. Determine winning node based on Eqs. 2 and 3. 3. Perform the vigilance test (Eq. 4). If the vigilance test fails, Deactivate the current winner, and go to Step 2 to choose another winner. End if (The same cycle occurs in ART b simultaneously) 4. Perform the map field vigilance test (Eq. 5) If the map field vigilance test fails Update the Q-value of the winner using Eqs. 12 and 13. Initiate match-tracking using Eq. 7. If all existing nodes in are checked and none of them satisfy the vigilance test Add a new node End If Go to Step 2 Else Update the Q-value of the winner using Eqs. 12 and 13. Update the weight vector of the winner using Eq. 8. End If End For Normalize Q-values between 0 and 1 upon completion of the training. 4.2 The test phase where 𝜆 ∈ [0, 1] is the weighting factor that specifies the pro- portion of the choice function, T(j), and the Q-value func- In the test phase, FAM chooses the winner by using the tion, Q(j), of j-th node in determining the strength of each choice function (Eq. 2) and by verifying again with the vigi- f2a node that satisfies the vigilance test. Once the strength of lance test (Eq. 4). In QFAM, in addition to Eqs. (2) and (4), all nodes are calculated, the one with the highest strength the Q-value is used. Note that all Q-values are normalized score is selected as the winner to give a prediction for the between 0 and 1 after the training phase. Selection of the current input sample. The testing phase of the QFAM model winning node in QFAM is performed as follows: is summarized in Algorithm 2. Algorithm 2. Test Phase Input: parameters of trained QFAM, test samples. Output: performance indicator. For each test sample 1. Perform complement-coding (Eq. 1), and then propagate the input vector to . 2. Determine choice function for all nodes (Eq. 2). 3. Perform vigilance test (Eq. 4) and select all nodes that satisfy the vigilance test. 4. Determine the strength of selected node (Eq. 14). 5. Select the node with the highest strength score as the winning node. 6. Produce a prediction at ARTb based on the map field connection of the winning node. 13 International Journal of Machine Learning and Cybernetics 5 Experimental studies 5.1.1 Different training set size To evaluate the effectiveness of the proposed QFAM model, In this experiment, QFAM was compared with FAM. Fol- a number of benchmark problems and a real-word case study, lowing the same experimental procedure in Ref. [11], the i.e., human motion recognition, were experimented. Firstly, numbers of learning (training) data samples used were 100, the circle-in-the-square problem was used to compare the 1000, 10,000, 100,000, while the number of test samples was performance of QFAM with those from the original FAM fixed to 1000. The test was repeated 10 times, each time with model [11]. Then, a number of benchmark data sets from a different training data set. To evaluate the performance of the UCI machine learning repository [49] were employed QFAM statistically, the bootstrap method [54] with the 95% to compare QFAM with another state of the art models pub- confidence interval was adopted. lished in the literature [50–52]. Finally, a case study, i.e., Figure 2 shows the decision boundary created by QFAM human motion recognition, was conducted to evaluate the subject to different training sizes. More precise decision applicability of the proposed model with real-world data. boundaries are established with increasing number of train- The parameters of QFAM were set as follows. Both ART ing samples. This is because, according to the FAM learning a and ART b operated in the fast-learning and conserva- algorithm, the number of created prototype nodes increases tive mode, i.e., 𝛽a = 𝛽b = 1 and 𝛼a = 𝛼b = 0.001. Since the with respect to increasing number of training samples [11]. problems were related to classification, 𝜌b = 𝜌ab = 1. The The growth rate of prototype nodes is subject to the vigi- Q-learning related parameters were set to 𝜆 = 0.95, 𝛾 = 0.3, lance parameter. Given an input sample, when the winning and 𝜉 = 0.3, after some trials. The vigilance parameter (𝜌a ) prototype node fails to satisfy the vigilance test or it trig- was set to 0.9. gers match-tracking due to an incorrect prediction, a search cycle is initiated. The search cycle identifies another existing 5.1 Circle‑in‑the‑square prototype node to provide a new prediction. If all existing prototype nodes are not able to give a correct prediction, a The circle-in-the-square problem [53] is a benchmark classi- new one is created to encode the current input sample into fication problem. It requires a classifier to distinguish which the network. This incremental learning mechanism is par- points within a unit square are located inside or outside the ticularly useful for separating input samples that have similar circle. The area of the circle, which is located at the center patterns but belong to different classes, which normally exist of the square, is half of the square. close to the decision boundaries. As such, the number of prototype nodes increases with increasing number of input Fig. 2 The decision boundaries created by QFAM subject to different training data sizes Table 1 Test accuracy rates Accuracy (%) (%) for the circle-in-the-square problem Training samples FAM QFAM Lower Mean Upper Lower Mean Upper 100 88.63 89.19 89.89 88.70 90.31 93.52 1000 93.26 93.89 94.51 95.30 96.04 97.10 10,000 95.39 96.14 96.54 96.64 97.15 97.46 100,000 97.89 98.28 98.66 96.55 97.16 98.08 13 International Journal of Machine Learning and Cybernetics Table 2 Test accuracy rates Noise (%) Accuracy (%) (%) for the noisy circle-in-the- square problem FAM QFAM 𝜆 = 0.90 𝜆 = 0.95 𝜆 = 0.98 Lower Mean Upper Lower Mean Upper Lower Mean Upper Lower Mean Upper 0 97.89 98.28 98.66 92.30 93.32 94.68 96.64 97.15 97.46 96.04 96.26 96.98 5 87.60 88.55 89.40 93.78 94.04 94.26 94.46 94.95 95.70 93.88 94.12 94.68 10 78.99 79.73 80.53 93.58 94.23 94.72 93.59 94.56 95.26 92.24 93.82 94.48 Bold results indicate the best classification methods Table 3 The number of prototype nodes of FAM and QFAM with For comparison, FAM was evaluated under the same con- different noise levels dition as QFAM in this experiment. As expected, the accu- Noise (%) Lower bound of 95% Mean Upper bound of racy rates of both networks (FAM and QFAM) reduced by confidence interval 95% confidence increasing the noise level. In general, QFAM with 𝜆 = 0.95 interval performed better for both noisy and noise-free data sets, as 0 362.20 379.11 402.60 compared with QFAM with 𝜆 = 0.90 and 0.98. For data set 5 2230.00 2266.20 2306.60 with 10% noise, QFAM with 𝜆 = 0.90 performed slightly 10 2642.00 3025.40 3385.20 better than QFAM with 𝜆 = 0.98. The accuracy rate of FAM dramatically reduced from 98.28% (for 0% noise) to 79.73% (for 10% noise). The accuracy rate of QFAM with 𝜆 = 0.95 samples, in an attempt to absorb new patterns from different dropped from 97.15% (for 0% noise) to 94.56% (for 10% classes into the network. As a result, a good decision bound- noise). Clearly, QFAM outperformed FAM in tackling noisy ary can be formed to improve generalization for undertaking data statistically, as indicated by the 95% confidence inter- data classification problems. Since QFAM adopts the same vals of the 5 and 10% noise problems. This was owing to FAM learning algorithm, its prototype nodes also increase the capability of QFAM in selecting the appropriate win- subject to incoming training samples, which in turn leads to ning nodes by considering their associated strengths (i.e., improved accuracy. This trend can be observed in Fig. 2, i.e., through their Q-values); therefore improving the perfor- increasing the number of training samples causes a clearer mance of QFAM. decision boundary to be established for the circle-in-the- In addition, Table 3 shows the number of prototype nodes square problem. of FAM and QFAM in the presence of different noise level. Table 1 shows the results of FAM and QFAM. In compar- FAM and QFAM created the same number of nodes because ison with FAM, QFAM outperformed FAM statistically for the learning algorithm is the same for both of them. It can the 1000, 10,000 and 100,000 training samples (no overlap be clearly seen that there is an upward trend in the complex- between the 95% confidence intervals of both model), while ity of both networks by increasing the noise levels. Noisy both QFAM and FAM performed at the same level statisti- samples lead to the creation of similar prototype nodes that cally (owing to the overlap of their 95% confidence inter- belong to the different class, and match-tracking happens vals) for the 100 and 100,000 cases. An additional test with many times during learning. As a result, increasing the level noise added to the training data was conducted to further of noise increases the complexity of both networks. evaluate the usefulness of QFAM in noisy environments. 5.2 Experiments with the UCI data sets 5.1.2 Different noise levels In this section, three experiments were conducted. In the To create noisy data samples for the circle-in-the-square first experiment, QFAM was compared with different MLP problem, the noise was added to the target classes, i.e., 5 networks trained with backpropagation and a real-coded GA, and 10% of the target classes were randomly selected, and as reported in Ref. [50]. Following the same experimental flipped. The training set size was 100,000. The test samples procedure in Ref. [50], the tenfold cross-validation method remained noise-free. Table 2 shows the accuracy rates for was used for performance comparison. The data set was split noise-free (0%) and noisy (5 and 10%) data samples. Three into ten sub-sets; each sub-set included a similar proportion discounting factor (𝜆 ) settings of QFAM were used, i.e., of data samples from each class. Nine sub-sets were used for 0.90, 0.95 and 0.98. training, with the remaining for the test. The same process was repeated ten times, with a different data sub-set used 13 International Journal of Machine Learning and Cybernetics Table 4 Details of the UCI data Data set Type of input Number of data Number of input Num- sets samples features ber of classes Breast cancer Integer 683 10 2 Pima Indian diabetes Integer, real 768 8 2 Iris Real 150 4 3 Glass Real 214 9 6 Dermatology Categorical, integer 358 34 6 Balance scale Categorical 625 4 3 Wine Integer, real 178 13 3 Ionosphere Integer, real 351 34 2 Sonar Real 208 60 2 SPECT heart Categorical 267 22 2 Semeion handwritten digit Integer 1593 256 10 Magic Real 19,020 10 2 Nursery Categorical 12,960 8 5 Table 5 Test accuracy rates (%) Data set Model structure Real-coded Backpropaga- QFAM for the UCI data sets GA [50] tion [50] Lower Mean Upper Breast cancer 1-2-1 95.20 92.48 98.30 98.60 98.84 1-3-1 96.00 92.80 1-5-1 96.50 93.10 1-7-1 96.00 93.00 1-10-1 95.70 92.50 Pima-Indian diabetes 1-2-1 76.00 73.20 84.18 84.86 85.54 1-3-1 77.00 73.50 1-5-1 77.50 73.50 1-7-1 77.60 73.80 1-10-1 77.10 73.60 Iris 1-2-1 97.00 95.30 98.69 98.90 99.52 1-3-1 97.20 96.00 1-5-1 97.00 96.00 1-7-1 97.80 96.80 1-10-1 97.30 96.40 Dermatology 1-2-1 92.50 89.20 96.87 97.24 97.56 1-3-1 94.60 89.00 1-5-1 95.00 89.80 1-7-1 93.70 90.00 1-10-1 94.00 90.70 Glass 1-2-1 64.50 61.00 68.84 70.03 71.28 1-3-1 66.00 62.00 1-5-1 67.60 61.80 1-7-1 67.00 63.60 1-10-1 66.80 63.20 Bold results indicate the best classification methods for the test each time. The tenfold cross-validation process Table 4 shows the details of the UCI data sets used. This was repeated ten times, and the bootstrap method was used data sets contained different characteristics, i.e., input type, to compute the average results. number of samples and features, which were used to evaluate 13 International Journal of Machine Learning and Cybernetics Table 6 Test results for the UCI Data set Nave Bayes [51] Fuzzy gain measure HHONC [51] QFAM data sets [51] Iris 96.00 ± 0.30 96.88 ± 2.40 97.46 ± 2.32 98.37 ± 1.96 Breast cancer 95.90 ± 0.20 98.14 ± 0.90 97.17 ± 1.17 98.50 ± 0.91 Wine 96.75 ± 2.32 98.36 ± 1.26 97.88 ± 2.29 98.03 ± 1.68 Glass 42.90 ± 1.70 69.14 ± 4.69 56.50 ± 7.58 70.50 ± 6.78 Balance scale 89.81 ± 1.29 88.65 ± 1.39 93.31 ± 2.44 94.38 ± 1.90 Bold results indicate the best classification methods Table 7 Details of UCI data sets one-pass learning through the data samples, while other models required multiple iterations through the data samples Data set Number of training Number of samples testing sam- during learning. With the “one-pass” learning methodology ples of QFAM (inherited from FAM), all training data samples were presented only once for learning during the training Ionosphere 216 135 cycle. As compared with batch learning, this one-pass learn- Sonar 151 57 ing methodology was useful to avoid a long training cycle. Spect heart 80 187 In the second experiment, QFAM was compared with Semeion handwritten 162 1431 digit Naive Bayes [55], Fuzzy Gain measure [56], and Hybrid Magic 1902 17,118 Higher Order Neural Classifier (HHONC) [51] using the Nursery 1296 11,664 data sets shown in Table 6. Following the same experimen- tal procedure in Ref. [51], 75 and 25% of the data samples were used for training and test, respectively. The process the effectiveness of QFAM. All input samples (continuous as was repeated 200 times, each time with different randomly well as an integer) were normalized within [0, 1] in accord- selected training and test samples. The average accuracy ance with the dynamics of QFAM. rates and standard deviations are presented in Table  6. Table 5 shows the overall results of QFAM and those QFAM outperformed other models for the Iris, Breast can- reported in Ref. [50]. QFAM outperformed the Real-coded cer, Glass, and Balance scale problems. But, the Fuzzy gain GA and Backpropagation with different model structures measure model performed slightly better than QFAM for for all classification tasks. Note that the model structure the Wine problem. indicated the number of hidden layers of the Real-coded Finally, the effectiveness of QFAM was compared GA and Backpropagation. Note that QFAM required only with the Fuzzy Lattice Reasoning Classifier (FLRC) and Table 8 Test accuracy rates (%) Data set FLR FLR-MD1 FLR-MD2 FLR-MD3 FLR-MD4 QFAM for UCI data sets Ionosphere 91.24 94.07 90.37 95.56 96.30 93.35 Sonar 92.98 91.23 89.47 91.23 96.49 93.94 Spect heart 66.31 66.31 66.31 66.31 67.91 72.89 Semeion hand- 78.97 80.08 80.08 80.08 80.08 83.83 written digit Magic 79.60 80.13 77.90 80.13 80.16 82.13 Nursery 80.45 77.33 77.33 77.33 82.13 84.41 Bold results indicate the best classification methods Table 9 The accuracy rates Noise (%) FAM QFAM Number of prototype nodes and number of prototype nodes of QFAM and FAM with 95% Lower Mean Upper Lower Mean Upper Lower Mean Upper confidence intervals for human motion recognition 0 94.71 95.97 96.97 94.17 95.48 96.59 121.88 126.80 129.36 10 87.72 89.18 90.31 92.73 93.41 93.87 147.00 149.86 151.34 13 International Journal of Machine Learning and Cybernetics FLRC-based on Metric Distance (FLRC-MD) models, i.e., human motion recognition, have been used. Firstly, the per- FLR-MD1, FLR-MD2, FLR-MD3, and FLR-MD4, as reported formance of QFAM has been compared with those from in Ref. [52]. The numbers of training and test samples are FAM using the circle-in-the-square problem. QFAM could shown in Table 7. For each data set, the test was repeated produce promising results, ranging from 95.56 to 97.15%, 100 times, each time with randomized training and test under noisy and noise-free decision boundaries, as compared samples. Table 8 shows the accuracy rates of QFAM and with those from FAM. Then, the capability of QFAM has various models in Ref. [52]. QFAM performed better than been compared with those from other models reported in the other models for the Spect heart, Semeion handwritten literature by using several UCI benchmark problems. The digit, magic and Nursery problems, while, QFAM ranked outcome indicates that QFAM is able to produce promis- second and third for the Sonar and Ionosphere problems, ing results as compared with those from other models in respectively. three separate experiments. The robustness of QFAM has also been evaluated with a real-world case study, i.e., human 5.3 A case study motion recognition. The empirical outcomes indicate that QFAM is useful for undertaking noisy data classification A case study to evaluate the usefulness of QFAM to recog- problems in real-world environments. nize human motions in a real-world environment was con- For further research, the pruning strategy based on the ducted. The task was to recognize three types of motions, Q-value can be applied to reduce the complexity of QFAM, i.e., running, climbing, and walking, by using smartphones. i.e., by removing the less informative nodes in f2a . In addi- Specifically, three smartphones were placed in three pock- tion, rule extraction can be implemented to elucidate the ets of a subject (human), i.e., front pocket, belt pocket, and knowledge encoded in QFAM. More experimental studies shirt pocket [57]. A total of 390 data samples, i.e., raw with real-world data sets will also be conducted in order to time-domain waveform signals, generated by the embedded ascertain the effectiveness of QFAM in real environments. 3-axis accelerometer in the smartphones were recorded. Nine statistical features were extracted, namely mean, root Acknowledgements This work is partially supported by the Sci- ence and Technology Innovation Committee of Shenzhen City mean square, standard deviation, Skewness, Kurtosis, Crest (No. CKFW2016041415372174) and (No. GJHZ201703141144) and factor, Latitude factor, Shape factor and Impulse factor. As the National Natural Science Foundation of China (No. 6177319). such, each input sample was represented by 27 features (9 statistical features for each axis). The tenfold cross valida- tion was adopted. In addition, noise (10%) was added to References the class labels of the training samples, in order to further evaluate the robustness of the proposed QFAM model in 1. Banharnsakun A (2017) Hybrid ABC–ANN for pavement surface noisy environments. distress detection and classification. Int J Mach Learn Cybern 8:699–710 The accuracy rate and the number of prototype nodes for 2. Elamvazuthi I, Duy NHX, Ali Z, Su SW, Khan MKAA, Parasura- noise-free (0%) and noisy (10%) data samples are shown man S (2015) Electromyography (EMG) based classification of in Table 9. The network complexity increased, as expected neuromuscular disorders using multi-layer perceptron. Procedia when the noise was injected into the data samples. The Comput Sci 76:223–228 3. Sun X, Kang F, Wang M, Bian J, Cheng J, Zou DH (2016) 95% confidence intervals (indicated by the lower and upper Improved probabilistic neural network PNN and its application bounds of the mean result) of both FAM and QFAM over- to defect recognition in rock bolts. Int J Mach Learn Cybern lapped each other in the noise-free condition, but not in the 7:909–919 noisy condition. In other words, QFAM and FAM produced 4. Raitoharju J, Kiranyaz S, Gabbouj M (2016) Training radial basis function neural networks for classification via class-specific clus- a statistically similar performance for the noise-free condi- tering. IEEE Trans Neural Networks Learn Syst 27:2458–2471 tion, but QFAM statistically outperformed FAM in the noisy 5. Carpenter GA, Grossberg S (1987) A massively parallel archi- condition for this human motion recognition problem. tecture for a self-organizing neural pattern recognition machine. Comput Vis Graph Image Process 37:54–115 6. Subhi MAB, Mat INA, Zamli KZ, Azizli KA (2010) Modified recursive least squares algorithm to train the hybrid multilayered 6 Conclusions perceptron (HMLP) network. Appl Soft Comput 10:236–244 7. Simpson PK (1992) Fuzzy min-max neural networks. I. Classifica- This paper presented a new hybrid model, known as QFAM, tion. IEEE Trans neural networks 3:776–786 8. Seera M, Chee P Lim (2014) Online motor fault detection and for data classification. It is an incremental learning clas- diagnosis using a hybrid FMM-CART model. IEEE Trans Neural sifier integrating Q-learning and FAM. Q-learning is used Netw Learn Syst 25:806–812 to reward the prototype nodes of FAM during the learning 9. Pratama M, Lu J, Anavatti S, Lughofer E, Lim C-P (2016) An phase. To evaluate the performance of QFAM, a number incremental meta-cognitive-based scaffolding fuzzy neural net- work. Neurocomputing 171:89–105 of benchmark data sets and a real-world case study, i.e., 13 International Journal of Machine Learning and Cybernetics 10. Jain LC, Seera M, Lim CP, Balasubramaniam P (2014) A review 32. Lim CP, Leong JH, Kuan MM (2005) A hybrid neural network of online learning in supervised neural networks. Neural Comput system for pattern classification tasks with missing features. IEEE Appl 25:491–509 Trans Pattern Anal Mach Intell 27:648–653 11. Carpenter GA, Grossberg S, Markuzon N, Reynolds JH, Rosen 33. Tan SC, Rao MVC, Lim CP (2008) Fuzzy ARTMAP dynamic DB (1992) Fuzzy ARTMAP: a neural network architecture for decay adjustment: an improved fuzzy ARTMAP model with a incremental supervised learning of analog multidimensional conflict resolving facility. Appl Soft Comput 8:543–554 maps. IEEE Trans Neural Netw 3:698–713 34. Tan SC, Lim CP (2010) Evolutionary fuzzy ARTMAP neural 12. Ananthi VP, Balasubramaniam P, Lim CP (2014) Segmenta- networks and their applications to fault detection and diagnosis. tion of gray scale image based on intuitionistic fuzzy sets con- Neural Process Lett 31:219–242 structed from several membership functions. Pattern Recognit 35. Wong SY, Yap KS, Yap HJ, Tan SC (2014) A truly online learn- 47:3870–3880 ing algorithm using hybrid fuzzy ARTMAP and online extreme 13. Ashfaq RAR, Wang X-Z (2017) Impact of fuzziness categoriza- learning machine for pattern classification. Neural Process Lett tion on divide and conquer strategy for instance selection. J Intell 42:585–602 Fuzzy Syst 33:1007–1018 36. Tan SC, Watada J, Ibrahim Z, Khalid M (2015) Evolutionary 14. Wang X-Z, Xing H-J, Li Y, Hua Q, Dong C-R, Pedrycz W (2015) fuzzy ARTMAP neural networks for classification of semicon- A study on relationship between generalization abilities and fuzzi- ductor defects. IEEE Trans Neural Netw Learn Syst 26:933–950. ness of base classifiers in ensemble learning. IEEE Trans Fuzzy https://doi.org/10.1109/TNNLS.2014.2329097 Syst 23:1638–1654 37. Carpenter GA, Grossberg S, Reynolds JH (1995) A fuzzy ART- 15. Wang R, Wang X-Z, Kwong S, Xu C (2017) Incorporating diver- MAP nonparametric probability estimator for nonstationary pat- sity and informativeness in multiple-instance active learning. tern recognition problems. IEEE Trans Neural Netw 6:1330–1336 IEEE Trans Fuzzy Syst 25:1460–1475 38. Lim CP, Harrison RF (1997) Modified fuzzy ARTMAP 16. Platt J (1991) A resource-allocating network for function interpo- approaches Bayes optimal classification rates: an empirical dem- lation. Neural Comput 3:213–225 onstration. Neural Netw 10:755–774 17. Mohammed MF, Chee P, Lim (2015) An enhanced fuzzy min– 39. Xu Z, Xuan J, Shi T, Wu B, Hu Y (2009) A novel fault diag- max neural network for pattern classification. IEEE Trans Neural nosis method of bearing based on improved fuzzy ARTMAP Netw Learn Syst 26:417–429 and modified distance discriminant technique. Expert Syst Appl 18. Williamson JR (1996) Gaussian ARTMAP: a neural network for 36:11801–11807 fast incremental learning of noisy multidimensional maps. Neural 40. Gharavian D, Sheikhan M, Nazerieh A, Garoucy S (2011) Speech Netw 9:881–897 emotion recognition using FCBF feature selection method and 19. Daraiseh AA, Georgiopoulos M, Anagnostopoulos G, Wu AS, GA-optimized fuzzy ARTMAP neural network. Neural Comput Mollaghasemi M (2006) GFAM: a genetic algorithm optimization Appl 21:2115–2126 of fuzzy ARTMAP. In: IEEE international conference on fuzzy 41. Zhang Y, Ji H, Zhang W (2014) TPPFAM: use of threshold and systems, pp 315–322 posterior probability for category reduction in fuzzy ARTMAP. 20. Barto A, Sutton RS (1998) Reinforcement learning: an introduc- Neurocomputing 124:63–71 tion. MIT Press, Cambridge 42. Lee JH, Oh SY, Choi DH (1998) TD based reinforcement learning 21. Wang X-Z, Wang R, Xu C (2018) Discovering the relationship using neural networks in control problems with continuous action between generalization and uncertainty by incorporating complex- space. In: IEEE international joint conference on neural networks ity of classification. IEEE Trans Cybern 48:703–715 proceedings. IEEE world congress on computational intelligence, 22. Barto A, Sutton RS, Anderson CW (1983) Neuronlike adaptive pp 2028–2033 elements that can solve difficult learning control problems. IEEE 43. Gullapalli V (1990) A stochastic reinforcement learning algorithm Trans Syst Man Cybern 13:834–846 for learning real-valued functions. Neural Netw 3:671–692 23. Tesauro G (1994) TD-Gammon, a self-teaching Backgammon 44. Lin CJ, Lin CT (1997) An ART-based fuzzy adaptive learning program, achieves master-level play. Neural Comput 6:215–219 control network. IEEE Trans Fuzzy Syst 5:477–496 24. Fauber S, Schwenker F (2013) Neural network ensembles in rein- 45. Zhou RW, Quek C (1996) POPFNN: a pseudo outer-product based forcement learning. Neural Process Lett 41:55–69 fuzzy neural network. Neural Netw 9:1569–1581 25. Barto A, Crites RH (1996) Improving elevator performance using 46. Howard RA (1960) Dynamic programming and markov processes. reinforcement learning. Adv Neural Inf Process Syst 8:1017–1023 Published jointly by the Technology Press of the Massachusetts 26. Likas A, Blekas K (1996) A reinforcement learning approach Institute of Technology and Wiley, New York based on the fuzzy Min–Max neural network. Neural Process Lett 47. Sutton RS (1988) Learning to predict by the methods of temporal 4:167–172 differences. Mach Learn 3:9–44 27. Likas A (2001) Reinforcement learning using the stochastic fuzzy 48. Irodova M, Sloan R (2005) Reinforcement learning and function Min–Max neural network. 13:213–220 approximation. FLAIRS Conference, Melbourne 28. Quah KH, Quek C, Leedham G (2005) Reinforcement learning 49. Bache K, Lichman M (2013) {UCI} machine learning repository. combined with a fuzzy adaptive learning control network (FAL- http://archive.ics.uci.edu/ml. Accessed June 2017 CON-R) for pattern classification. Pattern Recognit 38:513–526 50. Örkcü HH, Bal H (2011) Comparing performances of backpropa- 29. Wong WC, Cho SY, Quek C (2009) R-POPTVR: a novel rein- gation and genetic algorithms in the data classification. Expert forcement-based POPTVR fuzzy neural network for pattern clas- Syst Appl 38:3703–3709 sification. IEEE Trans Neural Netw 20:1740–1755 51. Fallahnezhad M, Moradi MH, Zaferanlouei S (2011) A hybrid 30. Zheng L, Cho S-Y (2011) A modified memory-based reinforce- higher order neural classifier for handling classification problems. ment learning method for solving POMDP problems. Neural Pro- Expert Syst Appl 38:386–393 cess Lett 33:187–200 52. Jamshidi Y, Nezamabadi-pour H (2014) Rule inducing by fuzzy 31. Lim CP, Harrison RF (1997) An incremental adaptive network lattice reasoning classifier based on metric distances (FLRC-MD). for on-line supervised learning and probability estimation. Neural Appl Soft Comput 24:603–611 Netw 10:925–939 53. Cai LY, Kwan HK (1998) Fuzzy classifications using fuzzy infer- ence networks. IEEE Trans Syst Man Cybern Part B Cybern A 28:334–347 13 International Journal of Machine Learning and Cybernetics 54. Efron B (1979) Bootstrap methods: another look at the jackknife. 57. Tan CJ, Lim CP, Cheah Y (2014) A multi-objective evolution- Ann Stat 7:1–26 ary algorithm-based ensemble optimizer for feature selection 55. John GH, Langley P (1995) Estimating continuous distributions in and classification with neural network models. Neurocomputing Bayesian classifiers. In: Proceedings of the eleventh conference on 125:217–228 uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc, pp 338–345 Publisher’s Note Springer Nature remains neutral with regard to 56. Chen SM, Shie JD (2009) Fuzzy classification systems based on jurisdictional claims in published maps and institutional affiliations. fuzzy information gain measures. Expert Syst Appl 36:4517–4522 13