(PDF) A Q-learning-based multi-agent sys

Accepted Manuscript Title: A Q-learning-based multi-agent system for data classification Author: Farhad Pourpanah Choo Jun Tan Chee Peng Lim Junita Mohamad-Saleh PII: S1568-4946(16)30529-4 DOI: http://dx.doi.org/doi:10.1016/j.asoc.2016.10.016 Reference: ASOC 3866 To appear in: Applied Soft Computing Received date: 2-8-2015 Revised date: 25-6-2016 Accepted date: 14-10-2016 Please cite this article as: Farhad Pourpanah, Choo Jun Tan, Chee Peng Lim, Junita Mohamad-Saleh, A Q-learning-based multi-agent system for data classification, Applied Soft Computing Journal http://dx.doi.org/10.1016/j.asoc.2016.10.016 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. A Q-learning-based Multi-Agent System for Data Classification Farhad Pourpanah1, Choo Jun Tan2, Chee Peng Lim3, Junita Mohamad-Saleh1 1 School of Electrical & Electronic Engineering, Universiti Sains Malaysia, Malaysia 2 School of Science and Technology, Wawasan Open University, Malaysia 3 Institute for Intelligent Systems Research and Innovation, Deakin University, Australia Corresponding author: Dr. Chee Peng Lim Deakin University Institute for Intelligent Systems Research and Innovation Locked Bag 20000 Geelong, Victoria 3220 Australia Phone: +61352273307 E-mail:

[email protected]

1 Graphical abstract (a) the Trust-Negotiation-Communication (TNC) reasoning scheme (b) the structure of the TNC-based multi-agent system 2 (c) the procedure of the proposed Q-learning based multi-agent classification system Highlights  A multi-agent classifier system with Q-learning for data classification is proposed.  The Trust-Negotiation-Communication reasoning scheme for agent teaming is utilised.  A trust measurement that combines Q-learning and Bayesian formalism is formulated.  Benchmark problems with and without noise are used to evaluate the proposed model.  The results show that the proposed model is useful for data classification problems. Abstract In this paper, a multi-agent classifier system with Q-learning is proposed for tackling data classification problems. A trust measurement using a combination of Q-learning and Bayesian formalism is formulated. Specifically, a number of learning agents comprising hybrid neural networks with Q-learning, which we have formulated in our previous work, are devised to form the proposed Q-learning Multi-Agent Classifier System (QMACS). The time complexity of QMACS is analysed using the big O-notation method. In addition, a number of benchmark problems are employed to evaluate the effectiveness of QMACS, which include small and large data sets with and without noise. To analyze the QMACS performance statistically, the bootstrap method with 95% confidence interval is used. The results from QMACS are compared with those from its constituents and other models reported in the literature. The outcome indicates the effectiveness of QMACS in combining the predictions from its learning agents to improve the overall classification performance. Keywords: Fuzzy ARTMAP, multi-agent system, Q-learning, trust measurement, data classification 3 1. Introduction Artificial neural networks (ANNs) are useful machine learning models, which have been extensively used to solve problems in various fields [1]. However, individual ANN models have limited capabilities [2], e.g. some ANNs are ineffective in learning from noisy data sets, while other ANNs require a large number of data samples in order to learn well. To tackle these problems, the concept of Neural Network Ensemble (NNE) provides a good solution. Indeed, ensemble models are able to improve accuracy and stability of individual classifiers, as reported in [3,4]. The underlying rationale of an NNE is to combine a number of individual (potentially weak) ANNs for devising a robust ensemble model. A number of ANN-based ensemble models have been proposed [5] and used to handle problems in various domains, e.g. face recognition [6], medical diagnosis [7], and optimization [8]. In tackling data classification problems with an NNE, a pool of classifiers is formulated. The NNE provides an estimation of the target output of a new sample by taking an integrated prediction from all ensemble members. Some useful NNE techniques, which include bagging [9] and boosting [10], create their members by varying the training samples in different approaches. On the other hand, a Multi-Agent System (MAS) requires a reasoning model to formulate the relationship among the agents (e.g. a pool of classifiers) and the environment [11], such as the belief, desire, and intention (BDI) model [12]. In this study, we focus on the Multi Agent System (MAS) as a framework to combine predictions from multiple ANN-based classifiers (learning agents). The MAS model has been used to solve many different problems, e.g. medical diagnosis [13], fault diagnosis [14] control [15], power systems [16], and robot [17]. Similar to NNEs, we use the MAS model to improve the performance of individual ANNs. There are two main challenges in developing a robust MAS-based ensemble: (i) formulating individual ANNs that operate as effective ensemble members with different capabilities, and (ii) devising a robust decision combination algorithm to combine the predictions from the ensemble members for making the final decision. In this aspect, we exploit a new reasoning method in agent-based technology, which is known as the trust-negotiation- communication (TNC) [18,19] scheme, to formulate the MAS-based ensemble. Based on our previous work, three hybrid Fuzzy ARTMAP (FAM) neural networks that incorporate the advantages of Q-learning and the Genetic Algorithm (GA), namely QFAM, pruned QFAM, and QFAM-GA [20], are used as the learning agents for the TNC-based MAS framework. One important contribution of this study is that a new trust measurement for the TNC scheme using Q- learning and Bayesian formalism is derived. The resulting model is known as the Q-learning Multi- Agent Classifier System (QMACS). The main objectives of our research work are two-fold, as follows:  to develop FAM-based networks [20] as effective learning agents of an MAS;  to derive a new trust measurement for the TNC scheme combining Q-learning and Bayesian formalism for undertaking data classification problems To evaluate the performance of QMACS, a number of benchmark problems (various data sets with and without noise) are examined. The results are quantified statistically using the bootstrap method [21], and are compared with those from other models reported in the literature. The bootstrap method has been chosen owing to a number of its benefits in statistical parameter estimation. In particular, the bootstrap method is suitable for problems with small sample sizes, requires weaker 4 assumptions, and is simpler in computation [22]. It is able to utilize a small sample size as a virtual population to generate more data instances. Unlike classical distributions, the bootstrap method does not need strong assumptions for both populations and samples. In addition, the bootstrap method is simple, and a user does not need to have sophisticated mathematical background [22]. 2. Literature Review 2.1 Neural Network-Based Ensemble Models To formulate an effective ensemble model, each ensemble member should produce a different prediction pertaining to the same input [23]. A number of methods are available to create different predictions from data-based learning agents [23]: (i) employing different models to learn from the training samples; (ii) modifying the internal structure and/or parameters of a model; (iii) learning from a diverse sequence of data samples. One of the criticisms of online learning models, which include FAM-based networks, is the learning process is subject to the order or sequence of training data. This limitation is exploited to form an effective FAM-based MAS model in this study. Specifically, multiple FAM-based networks are assembled, whereby each network is trained with a different sequence of training samples. As such, when a test sample is provided, different predictions from multiple FAM-based networks are produced. These predictions can then be combined using a suitable decision combination algorithm to produce the final prediction. In this study, each FAM-based network is treated as a learning agent, and the MAS framework is leveraged as the decision combination framework to produce an ensemble model. Before explaining the details of the proposed FAM-based ensemble model, a review of ANN-based ensemble models is first provided, as follows. In 1990, an ensemble model to improve the performance and efficiency of single ANNs was presented [24]. Since then, ANN ensemble models attract the attention of many researchers for solving various problems, including classification [25] and regression [26]. In general, an ANN ensemble model consists of two phases: (i) creating individual ANNs, and (ii) synthesizing their predictions [27]. An algorithm based on fuzzy integral to build an ensemble of BNNs for tackling data classification problems was introduced in [28]. Fuzzy integral was used to reach a collective prediction from the ensemble members. The results showed that fuzzy integral-based BNN ensemble was able to reduce the classification error. An ensemble of single-layered complex- valued neural networks (CVNNs) for pattern classification was proposed in [29]. To train each CVNN, a gradient-based learning algorithm was employed. A new activation function was introduced for the complex-valued neurons in each CVNN. The ensemble model improved the performance of single-layered CVNNs. A heterogeneous ensemble model based on the dynamic particle swarm optimization (PSO) algorithm for video-based face recognition was described in [30]. The model consisted of a swarm of FAM, a dynamic PSO algorithm, and a long term memory component. The model outperformed individual FAM models. An ANN ensemble classifier that used regularized negative correlation learning was proposed in [31]. A regularization parameter was deployed to reduce the complexity and over-fitting problem of the network, as well as to improve generalization. To optimize the regulation parameters, a gradient descent-based algorithm was employed. The ANN ensemble produced better performance in comparison with the back-propagation multilayer perceptron network. 5 An ensemble of NN classifiers to solve the problem of distributed privacy-preserving from large decentralized data locations was proposed in [32]. The affinity propagation-based confidence ratio technique was used to select the best NN classifiers. The proposed model showed promising results as compared with other pruning methods, such as reduce-error pruning, Kappa pruning, orientation pruning and JAM’s diversity pruning. In [33], a selective ensemble of ANN models to solve Reinforcement Learning (RL) problems was proposed. The experimental results demonstrated that the selective ensemble included fewer numbers of agents as compared with a full ensemble. In [34], an ensemble method was proposed to solve imbalanced data classification problems. The Binary Particle Swarm Optimization (BPSO) was employed as feature selection technique, and Adaboost was used for classification. The BPSO-Adaboost model produced more stable accuracy rates in comparison with those from other techniques, such as SMOTR-Boost, CS-MCS, and ClusterBal. In summary, many ANN ensemble models have been proposed to solve problems in various domains. However, there are two main challenges in designing and developing a robust ANN ensemble model, (i) creating or choosing individual ANNs to operate as ensemble members, (ii) developing an algorithm to combine the prediction from each ensemble member and make a final decision. In this aspect, the MAS framework provides a robust and flexible methodology to combine the outputs obtained from different sources. In the next section, a review on a variety of MAS models is presented. 2.2 Multi-Agent Systems The MAS methodology has been investigated and applied to various domains, e.g., medical diagnosis [13], manufacturing [35], and computer game [36]. In general, an MAS comprises a number of independent agents communicating among themselves through a reasoning or inference scheme in an environment [37]. The key advantage of an MAS is to tackle problems that are not able to be solved by individual agents. A review of the MAS methodology and applications is as follows. For text classification, an MAS based on the decentralized method was proposed in [38]. The model illustrated acceptable performance in terms of classification accuracy. Another multi-agent model for book classification was described in [39]. The robot recognized the book’s barcode and classified it using a specific algorithm in accordance with the Chinese book classification system. An irrigation control system based on a multi-agent model for soil classification was proposed in [40]. The model employed machine learning algorithms to extract the corresponding soil features. The extracted features were used for learning the combination of the field and its soil horizons, and the multi-layered ANN was used for classification. The proposed model achieved considerable classification accuracy. In [41], a synthesis of multi-agent classifier for learning rock faces was presented. The model enhanced classification accuracy using heterogeneous collaborative learning agents. The proposed model outperformed the Kanas geological survey method. An MAS for text classification in a distributed environment was presented in [42]. This model contained three layers: (i) an interface layer, (ii) a load distributer layer, and (iii) a classifier layer. The naïve Bayes classifier was used to categorize documents based on a set of predefined terms. The proposed model performed with good accuracy. An MAS for classification of nonlinear loads in smart grid environment was proposed in [43]. ANNs were utilized as classifiers, and an MAS was used to manage the load classification 6 functions. The MAS contained two agents: smart grid agent and substation agent. The smart grid agent was used to simulate a smart meter at the consumer side and record the load demand measurement. The substation agent received data from the smart grid agent and stored the relevant information. The proposed MAS showed promising results in classifying nonlinear loads. In [44], an MAS known as bat-neural network multi-agent system (BNNMAS) to predict stock price was introduced. BNNMAS achieved accurate and reliable outcome as compared with those from genetic algorithm neural network and generalized regression neural network. In [45], a method that worked based on dispersed knowledge was proposed to organize the structure of a multi-agent decision-making system. A process to remove inconsistencies in knowledge was introduced. The experimental results showed that while the classification error was improved, the proposed method did not reduce the number of agents significantly. 3. Fuzzy ARTMAP-Based Learning Agents FAM is a supervised ANN with the capability of incremental learning [46]. FAM harnesses the advantages of Adaptive Resonance Theory (ART) [47] and fuzzy set theory [48] into a common framework. It is able to solve the stability-plasticity dilemma [47], which the network is able to learn new input patterns without corrupting previously learned patterns. Figure 1 shows the structure of FAM. It consists of two unsupervised fuzzy ART [49] networks (including ARTa and ARTb ).To impose supervision, both fuzzy ART networks are connected through a map field, F ab . Each fuzzy ART network has three layers. The first layer is the normalization layer, f 0a ( f 0b ) for ARTa ( ARTb ), that performs complement coding of the input patterns. To overcome the problem of category proliferation [46,50], complement coding converts an M-dimensional input pattern into a 2M-dimensional input pattern, as follows: A  (a,1  a)  (a1 ,...,am ,1  a1 ,...,1  am ) (1) where a [0,1] . The second layer is the input layer, f1a ( f1b ), that propagates the complement- coded pattern to the third layer, which is known as the recognition layer, f 2a ( f 2b ). The f 2a ( f 2b ) layer contains prototype nodes. During learning, ARTa and ARTb receive the complement-coded of input pattern (A) and its corresponding target pattern (B), respectively. The choice function is utilized to measure the similarity between input pattern A and the j-th prototype node in f 2a , as follows [46]: A  w aj Tj  (2)   w aj where a  0 is the choice parameter, waj  (waj1 ,..,waj 2M ) is the j-th weight vector of f 2a . Initially, waj1  ...  waj 2M  1 . The category node with the highest T j is selected as the winning node, and is denoted as node J. A vigilance test to evaluate similarity between node J and the input pattern against a vigilance threshold is carried out, as follows [46]: A  wJa  a (3) A where  a is the vigilance parameter of ARTa . In the case when the vigilance test fails, the J-th winning node is deactivated (by setting TJ  0 ). A new search cycle to select another winning node that is able to satisfy the vigilance test is triggered. If there is no such node, a new node in f 2a is 7 created to encode the input pattern. The same process occurs in ARTb simultaneously to identify the winning node for the target pattern. Once the winning nodes in ARTa (the J-th node) and ARTb (the K-th node) are determined, the map-field test is performed to confirm the prediction from ARTa to ARTb , as follows [46]: y b  w ab j   ab (4) yb where w ab a j is the weight vector from f 2 to f ab ,  ab is the map-field vigilance parameter, and y b is the output vector of f 2b , as follows [46]: 1 if k  K y kb  { (5) 0 otherwise If the winning ARTa node predicts an incorrect target in ARTb , the map-field vigilance test fails. In this case, a match-tracking process is initiated, and  a is updated as follows [46]: A  waj a   (6) A where  is a small positive value. This causes the vigilance test in ARTa to fail, deactivating the current J-th winning node. As such, a new search cycle with a setting of the vigilance parameter ensues. This procedure is iterated till the map-field vigilance test is satisfied. Then, learning occurs a to update the winning f 2 node, as follows [46]: wJa(new)   a ( A  wJ(old) )  (1   a )wJ(old) (7) where  a is learning rate of ARTa. In this study, three FAM-based ANNs are proposed as the learning agents in the QMACS model. They are hybrid models of FAM, Q-learning, a pruning strategy, as well as the GA, and are denoted as QFAM, Pruned QFAM, and QFAM-GA, respectively. The details of these proposed FAM- based learning agents are explained in the following sections. 3.1 The QFAM Model In our previous work [20], we proposed a hybrid model synthesizing FAM and Q-learning to solve a data classification problems. During the learning phase of FAM, an f 2 node can give a correct or an incorrect prediction. If the prediction is correct, learning takes place. On the other hand, an incorrect prediction leads to match-tracking. As such, Q-Learning is employed to operate as a feedback component to reward (when learning occurs) or penalize (when match-tracking occurs) a a the winning f 2 node. Specifically, a Q-value is assigned to each category node in f 2 , and is updated as follows [20]. Q( j )t  Q( j )t 1   [r ( j )t  vig ( j )t ] (8) where vig ( j ) is the vigilance test value of the J-th winning node in f 2a ,   [0,1] is the discount factor,   [0,1] is the learning rate, and r ( j ) is defined as [20]: 1 in the event of learning r( j)   (9) 0 in the event of match - tracking 8 In addition, the Q-value computed during learning is used to select the winning node in QFAM, as follows. The choice function between the current input pattern and all f 2a nodes is determined using Eq. 2. All the nodes that are able to satisfy the vigilance test (Eq. 3) are selected. The strength of each selected node is computed, as follows [20]. strength( j )  T ( j )  (1   )Q( j ) (10) where   [0,1] is weighting factor, T ( j ) is the choice function of node j, and Q( j ) is the Q-value of node j. The node with the highest strength score is selected as the winning node in f 2a . 3.2 The Pruned QFAM model The introduction of the Q-value in QFAM has another implication, the “goodness” of the prototype nodes in f 2a . In essence, the “goodness” of f 2a nodes can be distinguished using the Q-value determined during the learning stage (based on Eq. 8). This, in turn, leads to a pruning strategy to reduce the complexity of QFAM. As such, f 2a nodes with Q-values smaller than a user-defined threshold are pruned, resulting in a parsimonious network. The resulting network is known as Pruned-QFAM [20]. 3.3 The QFAM-GA model To further enhance the Pruned QFAM-based network, the GA was used in [20]. Two aspects are covered in the GA fitness function, viz., increasing classification accuracy and reducing the number of features used in classification. To reduce the number of features, a “don’t care” scheme [51] is exploited. The GA chromosome is formulated as follows: S  {D11 , D21 ,...,Dd1 , D12 , D22 ,...,Dd2 ,...,D1p , D2p ,...,Ddp } (11) where p is the number of nodes in f 2a after pruning, d is the number of features of each f 2a node, and Ddp is initialized, as follows: 0 for "dont care" features Ddp   (12) 1 for other features To achieve the objective, the GA fitness function is formulated as follows. max imize f (s)  WNCP.NCP(s)  Ws . S (13) where S is the number of features, NCP(s) denotes the number of correctly classified patterns, WNCP and WS are the weights given to each component of the objective, and 0  Ws  WNCP . A summary of the GA procedure is as follows. Step 1) Initialization: Randomly create a population of strings, N pop , with “0” indicating “don’t care” antecedents and “1” otherwise. Step 2) Selection: Choose N pop / 2 pairs of the strings from the current population. The selection probability of string S, P(S), in a population,  , is determined as follows: 9 { f ( S )  f min ( )} P( S )  (14) { f (S )  f min ( )} S where f min ( )  min f (S ) S  (15) Step 3) Crossover: With the crossover probability of 0.5, randomly select a bit position for each chosen pairs. Step 4) Mutation: With the mutation probability of 0.1, apply mutation to the selected strings generated in step 3. S r  1  S r  1 with probabilit y Pm (1  1) (16) S r  1  S r  1 with probabilit y Pm (1  1) Step 5) Elitism: Randomly select and remove one of the strings from the generated strings, and add the string with the highest fitness value from the previous population to the current one. Step 6) Termination: Stop if the termination condition is satisfied. Otherwise, repeat from step 1. 4. The TNC Reasoning Scheme and the Proposed QMAC Model Figure 2 shows the TNC reasoning scheme. The communication link ties all the agents in a multi-agent system together, either directly or indirectly, and helps the agents understand each other through negotiation and trust. Negotiation is concerned with the interaction of agents as a team among themselves as well as with other agent teams. Trust, which is the main part of TNC, characterizes the behavior of an agent in its cooperation with other agents. The structure of QMACS is shown in Figure 3. It contains three layers. The parent agent sits in the top layer, and produces the final decision. The team managers are placed in the middle layer, while the bottom layer comprises individual agents. A new trust measurement is proposed (as detailed in sections 3.2.1 and 3.2.2). In general, a decision combination algorithm based on the Bayesian and belief functions [11] is employed to determine the initial trust of each agent. Then, Q-learning is employed to update the trust score corresponding to the correct or incorrect prediction pertaining to each data sample. In terms of negotiation, many methods have been devised, e.g. distance [52], matchmaker [53], and auctioning [54]. Among them, auction-based techniques are popularly used for negotiation [55]. In this study, the “sealed bid-first price auction” [56] is implemented. This is because the method is straightforward, and can be readily merged with the proposed trust measurement for TNC. According to the “sealed bid-first price auction”, the bid that has the highest price is selected as the winner. As such, an individual agent submits a bid to its manager. A decision is made by the agent manager in accordance with the highest price received. Note that the auction method is used in two phases. Firstly, it is used at the bottom layer to select the winning agent by the agent manager. Secondly, it is used at the middle layer to select the winning manager by the parent agent for making the final decision. The proposed QMACS model contains three teams: team 1 (QFAM), team 2 (Pruned QFAM), and team 3 (QFAM-GA). Each team comprises three individual agents, in which teams 1, 2, and 3 consist of three QFAM, Pruned QFAM, and QFAM-GA models, 10 respectively. The details of the agents and the proposed trust measurement method are described in the following sections. A flow diagram of the key processes in QMACS is shown in Figure 4. As stated earlier, measuring trust is the most important issue in TNC. A method combining Q-learning, Bayesian, and belief functions is proposed for trust measurement in this study. The proposed method consists of two steps. In the first step, the initial trust value of each agent is calculated using a combination of Bayesian and belief functions. In the second step, Q-learning is employed to update the trust value of each agent when it provides a prediction pertaining to each test sample. The details of the steps involved are explained. 4.1 Estimating the Initial Trust Measurement of Agents A method that combines the Bayesian and belief functions [57] is presented to determine the initial trust value of each agent. To record the performance of each agent (the number of correct or incorrect predictions), a confusion matrix is used. The confusion matrix of the k-th agent for a data set with N samples belonging to M classes is formulated as follows [11]:  n11k n12k ... n1k( M 1)   k k   n21 n22 ... n2k( M 1)  CM k   . . . .  (17)    k. . . .  nM 1 nM 2 ... nM ( M 1)  k k k where nij , i  1,..., M , and j  1,...,M  1 represents the total number of input samples of class i predicted as class j by e k (the k-th agent). Note that M+1 represents the class containing input samples that cannot be classified by an agent. The total number of input samples predicted by e k is: M M 1 N   nijk (18) i 1 j 1 The total number of input samples predicted as class i by e k is: M 1 nik   nijk (19) j 1 while the total number of input samples predicted as class j by e k is: M nk j   nijk (20) j 1 The uncertainty of proposition to indicate the truth of a prediction, ek ( x)  j (agent k predicts x as class j) can be calculated from the confusion matrix [11]: nijk nijk p ( x  Ci | ek ( x )  j )  k  M , i  1,..., M (21)  nij n j k i 1 According to [57], the confusion matrix supports all the target classes of the data samples, and the belief function of each classifier for M classes can be written as : 11 bel ( x  Ci | ek ( x)  jk , EN )  P( x  C j | ek ( x)  jk ) , (22) i  1,..., M where EN indicates the operating environment containing all active K agents. The belief functions are propagated and used in accordance with the Bayesian formalism. As a result, Equation 22 can be written as [11]: bel (i)  bel ( x  Ci | e1 ( x)  j1 ,..., ek ( x)  jk , EN )  P( x  Ci | e1 ( x)  j1 ,..., ek ( x)  jk , EN ) (23) P(e1 ( x)  j1 ,..., ek ( x)  jk , EN | x  Ci , EN ) P( x  Ci | EN )  P(e1 ( x)  j1 ,..., ek ( x)  jk | EN ) To simplify Equation 23, it is considered that all K agents operate independently in EN. As such, P(e1 ( x)  j1 ,...,ek ( x)  j k , EN | x  Ci , EN ) P( x  Ci | EN ) P(e1 ( x)  j1 ,...,ek ( x)  j k | EN ) K  P(e ( x)  j , EN | x  C , EN ) k 1 k k i (24)  K  P (e ( x )  j k 1 k k | EN ) By applying Bayes’ rule: P(ek ( x)  jk , EN | x  Ci , EN ) P( x  Ci ek ( x)  jk , EN )  (25) P(ek ( x)  jk | EN ) P( x  Ci | EN ) Equation 24 is updated to: K  P (e k 1 k ( x)  j k , EN | x  C i , EN ) K  P (e k 1 k ( x)  j k | EN ) (26) K  P( x  C | e ( x)  j , EN ) k 1 i k k   P( x  C | EN ) i By putting Equations 26 and 25 into Equation 23: K  P( x  C | e ( x)  j , EN ) k 1 i k k bel (i)  P( x  C | EN ) (27)  P( x  C | EN ) i i To approximate the combined belief function of K classifiers, the independence assumption is used [58]: K  P( x  C k 1 i | ek ( x)  j k ) bel (i )  M K (28)  P( x  C i 1 k 1 i | ek ( x)  j k ) where P( x  Ci | ek ( x)  j k ) can be calculated using Equation 21. The belief function resulted from integrating K agents can be used to compute the initial trust value of each agent in TNC, as follows: 12 K  P( x  C k 1 i | e k ( x)  j k ) initial _ trust(i)  bel (i )  M K (29)  i 1 k 1 P( x  Ci | ek ( x)  j k ) 4.2 Updating the Trust Measurement Once the initial trust value of each agent is calculated, Q-learning is used to update the trust value based on the predicted outcome of each test sample, as follows: trust k (i)  trust (i  1)   (ri k1   (Q _ Valueck ) m (30) where trust (i) is the trust value in response to the i-th test sample by agent k,   [0,1] is the k discount factor,   [0,1] is the learning rate, and Q _ ValueCk is the mean of all Q-values that are m related to class m of agent k. As all prototypes have the same probability to be the winning node, 1 Z Q _ Value Ck m   QC z Z z 0 m (31) where Z is the total number of prototypes that belong to class m, and QCm z is the Q-value of the z- th prototype of class m (calculated during learning using Eq. 9), and ri is the reinforcement signal defined as. 1 , correct prediction at i  1 ri   (32)  1 , incorrect prediction at i  1 where the initial setting of ri (t  0) is 0. The pseudo-code of QMACS is shown in Figure 5. 5. Complexity Analysis of QMACS In this section, the big O-notation analysis [59] is conducted to determine the time complexity of QMACS. The O-notation calculates the worst-case computational time, and is used to present any asymptotic behaviors [60]. The time complexity analysis of FAM is first conducted, as it forms the backbone of QMACS, covering all QFAM, pruned QFAM, QFAM-GA models. As described in Section 3, FAM constitutes a pair of Fuzzy ART models (ARTa and ARTb) and the map field. The computational time requirement for the training cycle of Fuzzy ART was analyzed in [61,62]. Let M be the dimension of the input vector of Fuzzy ART (either with or without complement coding) and N be the number of prototype nodes. As presented in [61,62], Fuzzy ART records the worst-case time complexity of N2+MN, which is asymptotically equivalent to O(N2), when M →∞ and N →∞. During the map field activity of FAM, the worst-case scenario pertaining to match-tracking is all prototype nodes in ARTa (Na) are de-activated after the search cycle. Let Ma and Mb be the dimensions of the input vectors of ARTa and ARTb, while Na and Nb be their prototype nodes, respectively. FAM records the worst-case time complexity of 13 (𝑁𝑎2 +MaNa)+( 𝑁𝑏2 +MbNb)+ Na( 𝑁𝑎2 +MaNa) for ARTa, ARTb, and the map field, respectively, which is asymptotically equivalent to O(𝑁𝑎3 ), when Ma→∞, Na→∞, Mb→∞, Nb→∞. In QFAM, the time complexity is the same as that of FAM, with an additional constant time requirement for Q-value computation. Therefore, the worst-case time complexity of QFAM is governed by FAM, which is asymptotically equivalent to O(𝑁3𝑎 ). In pruned QFAM, the pruning procedure requires activation of the entire QFAM model using the prediction set with some constant time requirement for the pruning operation. As such, the worst-case time complexity of pruned QFAM is also governed by FAM, which is asymptotically equivalent to O(𝑁3𝑎 ). In QFAM-GA, the time complexity analysis can be divided into two parts: pruned QFAM and GA. As stated earlier, the worst-case time complexity of pruned QFAM is asymptotically equivalent to O(𝑁3𝑎 ). For the GA part, the simple GA with a population size of P chromosomes is used. In addition, two-point crossover, uniform mutation, and binary tournament selection operators are adopted in the simple GA, as shown in Algorithm 1. Note that in a population of P chromosome, C represents the accumulated chromosomes within the simple GA evolution process. The selection operator is used to determine the non-dominated chromosome. The worst-case scenario of the binary tournament selection is when the two selected chromosomes are both the last gene in their respective chromosome string, with P chromosomes in total. The two-point crossover operator is used to swap the selected chromosomes. More specifically, the elements between the two selected points are swapped between the parent and offspring. All chromosomes are selected for swapping when the first and last points are chosen in the worst-case scenario. The mutation operator replaces the value of the chosen chromosome with a uniform random value based on the user-specified upper and lower bounds for that particular chromosome. The operation is bounded by the probability of mutation. In the worst-case scenario, all chromosomes are replaced with uniform-based random value with the probability rate close to 1. Therefore, the worst-case scenario for all these operators involves all P chromosomes, and the associated worst-case time complexity is asymptotically equivalent to O(P2). Algorithm 1: The pseudo-code of the simple GA 1: Procedure Simple GA 2: Create a parent population of P chromosomes 3: C ≡∅ 4: while ∣ C∣ < α do 5: Binary tournament selection operator 6: Two-point crossover operator 7: Uniform mutation operator 8: C ≡ C ∪{x} 9: end while 10: P ≡ C and goto 2 Combining the worst-case time complexity analyses of both pruned QFAM (O(𝑁𝑎3 )) and the simple GA (O(P2)), it is clear that the overall worst-case time complexity of QFAM-GA is governed by the former, which is asymptotically equivalent to O(𝑁𝑎3 ) when all variables involved 14 extend to infinity. A number of implementation-related issues exist, which include the availability of data samples as well as computational load and memory requirements. In essence, more data samples leads to a better learning outcome in QMACS through Q-learning. Nevertheless, increasing the number of training data samples results in higher computational load and memory requirements. Besides that, it is always a challenge to obtain a large data set from real-world environments. One way to address the problem of a small data size is to use the bootstrap technique [63]. Bootstrapping is a useful technique for assessing the accuracy of estimators and testing hypothesis for parameters in situations with only small data samples (see the examples in [64] with 10 to 20 data samples). Our previous work has also indicated the usefulness of bootstrapping in analyzing the performance stability of optimization models with 10 data samples [65, 66]. An analysis of the computational load and memory requirements is presented in section 6.3.5. In this regards, advanced digital technologies which include multi-core processors coupled with parallel processing and grid computing platforms can be utilized to mitigate the problem associated with computational issues for real-world implementation. 6. Experimental Studies A number of noisy benchmark problems were employed to evaluate the effectiveness of QMACS. Firstly, the noisy four-circle-in-the-square problem was used. Next, two noisy benchmark problems ( including waveform and LED) from the UCI1 machine learning repository [67], were employed. Then, several biomedical benchmark problems from UCI, Ricco 2 , and Wisconsin Clinical Sciences Center3, were applied. Details of the data sets are shown in Table 1. The results from QMACS were compared with those from its constituents as well as other models reported in the literature. The experimental parameters were set as follows, for QFAM and Pruned QFAM: 𝛽𝑎 = 𝛽𝑏 = 1, 𝛼𝑎 = 𝛼𝑏 = 0.001 , 𝜌𝑏 = 𝜌𝑎𝑏 = 1 , 𝜌𝑎 = 0.9 , 𝛾 = 0.3 ,   0.95 ,   0.3 , and pruning threshold=0.3. As for QFAM-GA, the experimental parameters were set as follows: population size=100, crossover probability=0.9, mutation probability=0.1, WNCP  0.1 , Ws  0.01 , the number of prototypes replaced in each population=20, the stopping condition=1000 generations or no change in the fitness function value for 50 consecutive generations. All these parameters were determined after several trials. 6.1 Case Study I The four-circle-in-the-square problem [68] was utilized to compare the performance of QMACS with those from its constituents as well as reported in [68]. The data samples were generated from five classes with equal probability (0.2). Four classes were identified by four circles inside a unit square, and the remaining area is the fifth class. All classes had the same area size. To have a fair comparison with other methods reported in [68], the same experimental procedure was followed. As such, 2000 samples were randomly generated for both training and test sets. To generate noisy data samples, Gaussian noise with 0 mean and 0.05 standard deviation was injected to all training 1 http://archive.ics.uci.edu/ml [last accessed 15 February 2016]. 2 http://www.eric.univ-lyon2.fr/~ricco/dataset [last accessed 15 February 2016]. 3 https://www.lri.fr/~antoine/Courses/Master-ISI/TD-TP [last accessed 15 February 2016]. 15 samples (input data). Examples of noise-free and noisy data samples are shown in Figure 6. For QMACS, the training samples were split into two subsets (1600 and 400 samples) for learning and validation, respectively. The experiment was repeated 10 times, and the results were quantified using the bootstrap method with 95% confidence intervals. Table 2 shows the accuracy rates from teams 1 (QFAM), 2 (Pruned QFAM), 3 (QFAM-GA), and QMACS. As can be observed, noise caused deterioration in performance. However, team 1 outperformed other teams. QMACS produced statistically better accuracy rates than those from team 3 for both noise-free and noisy data sets, and those from team 2 for the noise-free data set. Team 1 and QMACS performed equally well. Furthermore, QMACS was compared with FAM, dARTMAP, Pruned FAM, FasArt, dFasArt, and Pruned FasArt, as reported in [68]. Table 3 shows the overall results of QMACS and those extracted from [68]. QMACS outperformed FAM, dARTMAP, Pruned FAM and dFasArt for the noise-free problem. Pruned FasArt and FasArt models performed slightly better than QMACS for the noise-free problem. On the other hand, QMACS, with 87.86% accuracy, performed the best statistically for the noisy problem as compared with those reported in [68]. This outcome demonstrated the robustness of QMACS in dealing with noisy data. 6.2 Case Study II Two noisy problems with artificially generated data were used: (i) waveform database generator (version 2), and (ii) LED display domain. To allow a fair comparison with the reported results in the literature, the experimental procedures in [69] and [70] were followed. Firstly, the 10-fold cross validation method was repeated 10 times for the waveform problem, with 90% and 10% of the data samples used for training (80% for learning and 10% for validation) and test, respectively. This procedure followed the one used in [69]. Secondly, 66.7% and 33.3% of LED data samples were employed for training (56.7% for learning and 10% for validation) and test, respectively. The test was repeated 10 times. This procedure followed the one used in [70]. Tables 4 and 5 show the results for both data sets. It can clearly be seen that team 1 performed better in comparison with the other two teams for the waveform problem. For the LED problem, team 3 outperformed the other two teams. The accuracy rates of all teams as well as QMACS for the LED problem deteriorated with increasing noise levels. Table 6 shows the mean accuracy rates with standard deviations of QMACS and those reported in [69] for the waveform problem. QMACS outperformed the methods reported in [69]. For the LED problem, the error rates of QMACS were compared with those reported in [70], as shown in Table 7. For the noise-free problem, the models reported in [70] performed better than QMACS. However, by increasing the level of noise, QMACS outperformed the models in [70]. 16 6.3 Case study III Twelve medical benchmark problems taken from public domain repositories, including heart disease, breast cancer, liver disease, hepatitis and diabetes, were used in this case study. The four heart disease data sets (Eric, SPECT, SPECTF, and Statlog) were taken from the UCI machine learning repository. For the four breast cancer data sets, three of them (WBC, WDBC, and WPBC) were taken from the UCI machine learning repository, while the remaining one (UMC) was taken from the Wisconsin Clinical Sciences Center repository. The hepatitis, Pima Indian Diabetes (PID), and two liver disease data sets (BUPA liver disease and Indian Liver Patient (ILDP)) were taken from the UCI data repository. The classes of each data set were set to 0 and 1 for absence and presence of the target disease, respectively. The details of all data sets are shown in Table 1. Following the same experimental procedure in [71], the 10-fold cross validation, with 80%, 10% and 10% of the samples for learning, validation and test, respectively, was used. In addition to accuracy, sensitivity and specificity are two useful performance indicators to evaluate the performance of classifiers for two-class medical problems. SENS is the ratio of correctly classified positive samples to the total number of samples having a positive outcome, while SPEC is the ratio of correctly classified negative samples to the total number of samples having a negative outcome [72]. 6.3.1 Heart Disease Prediction Table 8 shows the mean accuracy, sensitivity and specificity results of QMACS, as well as those from HM-BagMoov and other ensemble and voting methods reported in [71]. It can be seen that, QMACS outperformed HM-BagMoov and other ensemble and voting models. Specifically, QMACS achieved accuracy rates between 87.03% and 95.34%. The best results of QMACS were achieved from the Statlog data sets, with almost equal performance (95%) for accuracy, sensitivity, and specificity. 6.3.2 Breast Cancer prediction Table 9 shows the mean accuracy, sensitivity and specificity rates of QMACS, HM-BagMoov, Bagging, Majority Voting and Accuracy-Weighting. It can be observed that QMACS has the highest accuracy rates for UMC, WPBC and WBDC data sets as compared with those from other methods. For the WBC data set, QMACS recorded a perfect score for sensitivity, but was inferior in terms of accuracy and specificity as compared with other results reported in [71]. 17 6.3.3 Liver Disease Prediction Table 10 shows the overall results of QMACS and other models reported in [71] for liver disease data sets. Again, QMACS yielded approximately 81% accuracy for both ILPD and BUPA data sets, which was the best as compared with those from other models in [71]. 6.3.4 Hepatitis and Diabetes Disease Prediction A comparison of the mean accuracy, sensitivity, and specificity rates between QMACS and other models for Hepatitis and PID data sets is shown in Table 11. In terms of accuracy, QMACS outperformed all other methods in [71]. Comparatively, QMACS achieved inferior results in terms of sensitivity and specificity for Hepatitis and PID data sets, respectively. 6.3.5 Computational time and memory usage of QMACS A comparison of training time between QMACS and other models reported in [71] is depicted in Table 12. QMACS consumed the longest computational time for 8 (Eric, SPECT, Statlog, BUPA, UMC, WBC, PID and Hepatitis) out of 12 data sets. As QMACS involved nine individual classifiers, it required a longer computational time to complete training, as compared with those from other models in [71]. The memory usage of QMACS is shown in Table 13. QMACS required varying amounts of memory for different data sets, depending on their respective complexities. Eric, BUPA, and Hepatitis used required less amount of memory (16.34, 24.60, and 32.10 kilobytes, respectively), while WDBC and SPECTF required more amount of memory (176.83 and 170.49 kilobytes, respectively). 7. Conclusions In this paper, a multi-agent classifier system, namely QMACS, has been proposed by using the TNC reasoning scheme. Three FAM-based networks (QFAM, pruned QFAM and QFAM-GA) have been utilized as the underlying learning agents in QMACS. A trust measurement using a combination of Q-learning, Bayesian, and belief functions has been devised. An auction scheme using the “sealed-bid first price auction” method has been used for negotiation among the agents as well as agent teams in QMACS. The time complexity of QMACS has been analyzed using the big O-notation method, while the efficiency of QMACS has been evaluated comprehensively using a number of noisy and noise-free benchmark problems. The results have been compared with those from each agent team and also those reported in the literature. The outcomes indicate that while individual teams (QFAM, pruned QFAM, and QFAM-GA) of QMACS behave differently in tackling various benchmark problems, QMACS is always able to improve the performance of individual team members by combining their predictions; therefore producing stable classification results in tackling various problems. In addition, the performances of QMACS are comparable with those from other individual and ensemble models reported in the literature too. Nevertheless, QMACS requires longer computational time for training owing to the use of many agents in its structure. For further work, the effectiveness of QMACS and the proposed Q-learning trust measurement method can be evaluated using real-world data sets from different fields. In addition, instead of 18 using FAM-based models as the learning agents, other learning models, e.g. the Fuzzy Min-Max (FMM) model, can be used in QMACS for evaluation. Both FAM and FMM networks have the capability of solving the stability-plasticity dilemma. Therefore, we aim to carry out a comparison between QFAM and Q-learning with FMM in our future work. 19 References [1] I.A. Basheer, M. Hajmeer, Artificial neural networks: fundamentals, computing, design, and application, J. Microbiol. Methods. 43 (2000) 3–31. [2] I. Masood, A. Hassan, Issues in development of artificial neural network-based control chart pattern recognition schemes, Eur. J. Sci. Res. 39 (2010) 336–355. [3] J. Tian, M. Li, F. Chen, J. Kou, Coevolutionary learning of neural network ensemble for complex classification tasks, Pattern Recognit. 45 (2012) 1373–1385. [4] S. Mao, L. Jiao, L. Xiong, S. Gou, B. Chen, S.-K. Yeung, Weighted classifier ensemble based on quadratic form, Pattern Recognit. 48 (2015) 1688–1706. [5] A.J. Sharkey, Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems, Springer-Verlag New York, Inc., 1999. [6] J.F. Connolly, E. Granger, R. Sabourin, Dynamic multi-objective evolution of classifier ensembles for video face recognition, Appl. Soft Comput. 13 (2013) 3149–3166. [7] S.N. Qasem, S.M. Shamsuddin, Radial basis function network based on time variant multi- objective particle swarm optimization for medical diseases diagnosis, Appl. Soft Comput. 11 (2011) 1427–1438. [8] M. Pulido, P. Melin, O. Castillo, Optimization of ensemble neural networks with fuzzy integration using the particle swarm algorithm for the US Dollar/MX time series prediction, in: IEEE Conf. Norbert Wiener 21st Century, 2014: pp. 1–7. [9] L. Breiman, Bagging predictors, Mach. Learn. 24 (1996) 123–140. [10] R.E. Schapire, The strength of weak learnability, Mach. Learn. 5 (1990) 197–227. [11] A. Quteishat, C.P. Lim, J.M. Saleh, J. Tweedale, L.C. Jain, A neural network-based multi- agent classifier system with a Bayesian formalism for trust measurement, Soft Comput. 15 (2010) 221–231. [12] M. Bratman, Intention, plans, and practical reason, University of Chicago Press, Chicago, 1987. [13] J.S. Shieh, A multi-agent prototype system for medical diagnosis, in: IEEE 3rd Int. Conf. Intell. Syst. Knowl. Eng., 2008: pp. 1265–1270. [14] W. Gefang, F. Xizhi, C. Guoshun, Research on intellectualized fault diagnosis system based on distributed multi-agent technology, in: IEEE 8th Int. Conf. Electron. Meas. Instruments, 2007: pp. 3–405–3–409. [15] M.L. Roloff, M.R. Stemmer, J.F. Hubner, R. Schmitt, T. Pfeifer, G. Huttemann, A multi- agent system for the production control of printed circuit boards using JaCaMo and Prometheus AEOlus, in: 12th IEEE Int. Conf. Ind. Informatics, 2014: pp. 236–241. [16] K. Wilkosz, Utilization of multi-agent system for power system topology verification, in: IEEE Proc. 15th Int. Sci. Conf. Electr. Power Eng., 2014: pp. 3–6. [17] S.W. Park, J.H. Kim, E.H. Kim, J.H. Oh, Development of a multi-agent system for robot soccer game, in: Proc. IEEE Nternational Conf. Robot. Autom., 1997: pp. 626–631. [18] K. Haider, J. Tweedale, P. Urlings, L. Jain, Intelligent decision support system in defense maintenance methodologies, in: IEEE Int. Conf. Emerg. Technol., 2006: pp. 560–567. [19] J. Tweedale, P. Cutler, Trust in multi-agent systems, in: Knowledge-Based Intell. Inf. Eng. Syst., Springer, 2006: pp. 479–485. [20] F. Pourpanah, C.P. Lim, J.M. Saleh, A hybrid model of fuzzy ARTMAP and genetic algorithm for data classification and rule extraction, Expert Syst. Appl. 49 (2015) 74–85. [21] B. Efron, Bootstrap methods: Another look at the jackknife, Ann. Stat. 7 (1979) 1–26. [22] A.C. Cameron, P.K. Trivedi, Regression Analysis of Count Data, Cambridge University 20 Press, 1998. [23] P.M. Granitto, P.F. Verdes, H.A. Ceccatto, Neural network ensembles: Evaluation of aggregation algorithms, Artif. Intell. 163 (2005) 139–162. [24] L.K. Hansen, P. Salamon, Neural network ensembles, IEEE Trans. Pattern Anal. Mach. Intell. 12 (1990) 993–1001. [25] R.M. Fernandez, E.C. Hernandez, S.J. Torres, Hyperspectral image classification by ensembles of multilayer feedforward networks, in: IEEE Int. Jt. Conf. Neural Networks, 2004: pp. 1145–1149. [26] Z.Q. Shen, F.S. Kong, Dynamically weighted ensemble neural networks for regression problems, in: Proc. IEEE Int. Conf. Mach. Learn. Cybern., 2004: pp. 3492–3496. [27] Z.H. Zhou, Y. Jiang, Medical diagnosis with c4.5 rule preceded by artificial neural network ensemble, IEEE Trans. Inf. Technol. Biomed. 7 (2003) 37–42. [28] A.S. Kumar, S.K. Basu, K.L. Majumdar, Robust classification of multispectral data using multiple neural networks and fuzzy integral, IEEE Trans. Geosci. Remote Sens. 35 (1997) 787–790. [29] M.F. Amin, M.M. Islam, K. Murase, Ensemble of single-layered complex-valued neural networks for classification tasks, Neurocomputing. 72 (2009) 2227–2234. [30] J.F. Connolly, E. Granger, R. Sabourin, An adaptive ensemble of fuzzy ARTMAP neural networks for video-based face classification, in: IEEE Congr. Evol. Comput., 2010: pp. 1– 8. [31] X. Fu, S. Zhang, Pattern classification based on neural network ensembles with regularized negative correlation learning, in: IEEE Fourth Glob. Congr. Intell. Syst., 2013: pp. 112– 116. [32] Y. Kokkinos, K.G. Margaritis, Confidence ratio affinity propagation in ensemble selection of neural network classifiers for distributed privacy-preserving data mining, Neurocomputing. 150 (2015) 513–528. [33] S. Faußer, F. Schwenker, Selective neural network ensembles in reinforcement learning: Taking the advantage of many agents, Neurocomputing. 169 (2015) 350–357. [34] G. Haixiang, L. Yijing, L. Yanan, L. Xiao, L. Jinling, BPSO-Adaboost-KNN ensemble learning algorithm for multi-class imbalanced data classification, Eng. Appl. Artif. Intell. 49 (2015) 176–193. [35] J. Barbosa, P. Leitão, E. Adam, D. Trentesaux, Dynamic self-organization in holonic multi- agent manufacturing systems: The ADACOR evolution, Comput. Ind. 66 (2015) 99–111. [36] P.C. Pendharkar, Game theoretical applications for multi-agent systems, Expert Syst. Appl. 39 (2012) 273–279. [37] F. Wu, S. Zilberstein, X. Chen, Online planning for multi-agent systems with bounded communication, Artif. Intell. 175 (2011) 487–511. [38] Y. Fu, W. Ke, J. Mostafa, Automated text classification using a multi-agent framework, in: Proc. 5th Jt. Conf. Digit. Libr., ACM Press, New York, USA, 2005: p. 157. [39] P. Guo, D. Liangxian, Q. Junxia, A library book intelligence classification system based on multi-agent, Phys. Procedia. 24 (2012) 2187–2193. [40] D. Smith, W. Peng, Machine learning approaches for soil classification in a multi-agent deficit irrigation control system, in: IEEE Int. Conf. Ind. Technol., 2009: pp. 1–6. [41] C.M. Gifford, A. Agah, Collaborative multi-agent rock facies classification from wireline well log data, Eng. Appl. Artif. Intell. 23 (2010) 1158–1172. [42] R. Ahmad, S. Ali, D.H. Kim, A multi-agent system for documents classification, in: IEEE 21 Int. Conf. Open Source Syst. Technol., 2012: pp. 28–32. [43] F. de O. Saraiva, W.M.S. Bernardes, E.N. Asada, A framework for classification of non- linear loads in smart grids using Artificial Neural Networks and Multi-Agent Systems, Neurocomputing. 170 (2015) 328–338. [44] R. Hafezi, J. Shahrabi, E. Hadavandi, A bat-neural network multi-agent system (BNNMAS) for stock price prediction: Case study of DAX stock price, Appl. Soft Comput. 29 (2015) 196–210. [45] M. Przybyła-Kasperek, A. Wakulicz-Deja, Global decision-making in multi-agent decision- making system with dynamically generated disjoint clusters, Appl. Soft Comput. 40 (2016) 603–615. [46] G.A. Carpenter, S. Grossberg, N. Markuzon, J.H. Reynolds, D.B. Rosen, Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps., IEEE Trans. Neural Networks. 3 (1992) 698–713. [47] G.A. Carpenter, S. Grossberg, A massively parallel architecture for a self-organizing neural pattern recognition machine, Comput. Vision, Graph. Image Process. 37 (1987) 54–115. [48] L.A. Zadeh, Fuzzy sets, Inf. Control. 8 (1965) 338–353. [49] G.A. Carpenter, S. Grossberg, D.B. Rosen, Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system, Neural Networks. 4 (1991) 759–771. [50] G.A. Carpenter, S. Grossberg, J.H. Reynolds, ARTMAP: Supervised real-time learning and classification of nonstationary data by a self-organizing neural network, Neural Networks. 4 (1991) 565–588. [51] H. Ishibuchi, T. Murata, I.B. Türkşen, Single-objective and two-objective genetic algorithms for selecting linguistic rules for pattern classification problems, Fuzzy Sets Syst. 89 (1997) 135–150. [52] Y. Yan, J. Yen, T.X. Bui, A multi-agent based negotiation support system for distributed transmission cost allocation, in: IEEE Proc. 33rd Annu. Hawaii Int. Conf. Syst. Sci., 2000: p. 11. [53] E. Farazmand, A. Shokoufi, C. Lucas, F. Habibipour, Using a learning mechanism in matchmaking process of agents’ negotiation in multi-agent systems, in: IEEE 19th Int. Conf. Adv. Inf. Netw. Appl., 2005: pp. 59–64. [54] B.Q. Li, J.C. Zeng, Wang Meng, G.M. Xia, A negotiation model through multi-item auction in multi-agent system, in: Proc. IEEE Int. Conf. Mach. Learn. Cybern., 2003: pp. 1866– 1870. [55] Z. Balogh, M. Laclavík, L. Hluchy, Multi agent system for negotiation and decision support, in: Proc. Fourth Int. Sci. Conf. Electron. Comput. Informatics, Citeseer, 2000: pp. 264–270. [56] M. Beer, M. D’inverno, M. Luck, N. Jennings, C. Preist, M. Schroeder, Negotiation in multi-agent systems, Knowl. Eng. Rev. 14 (1999) 285–289. [57] J. Pearl, Probabilistic reasoning in intelligent systems: Networks of plausible inference, 1988. [58] L. Xu, A. Krzyzak, C.Y. Suen, Methods of combining multiple classifiers and their applications to handwriting recognition, IEEE Trans. Syst. Man. Cybern. 22 (1992) 418– 435. [59] T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, Introduction to algorithms, MIT press Cambridge, 2001. [60] D.E. Knuth, Big omicron and big omega and big theta, ACM Sigact News. 8 (1976) 18–24. 22 [61] T. Burwick, F. Joublin, Optimal Algorithmic Complexity of Fuzzy ART, Neural Process. Lett. 7 (1998) 37–41. [62] B. Kim, S.-W. Ban, M. Lee, Growing fuzzy topology adaptive resonance theory models with a push–pull learning algorithm, Neurocomputing. 74 (2011) 646–655. [63] B. Efron, B. Efron, The jackknife, the bootstrap and other resampling plans, SIAM, 1982. [64] A.M. Zoubir, B. Boashash, The bootstrap and its application in signal processing, IEEE Signal Process. Mag. 15 (1998) 56–76. [65] C.J. Tan, C.P. Lim, Y. Cheah, A multi-objective evolutionary algorithm-based ensemble optimizer for feature selection and classification with neural network models, Neurocomputing. 125 (2014) 217–228. [66] C.J. Tan, C.P. Lim, Y.-N. Cheah, A modified micro genetic algorithm for undertaking multi-objective optimization problems, J. Intell. Fuzzy Syst. 24 (2013) 483–495. [67] K. Bache, M. Lichman, {UCI} Machine Learning Repository, (2013). http://archive.ics.uci.edu/ml. [68] P.H. Emilio, G.S. Eduardo, Y.A. Dimitriadis, Study of distributed learning as a solution to category proliferation in Fuzzy ARTMAP based neural systems., Neural Netw. 16 (2003) 1039–57. [69] E. Frank, M. Hall, B. Pfahringer, Locally weighted naive bayes, in: Proc. Ninet. Conf. Uncertain. Artif. Intell., Morgan Kaufmann Publishers Inc., 2002: pp. 249–256. [70] K.T. Leung, D.S. Parker, Empirical comparisons of various voting methods in bagging, in: Proc. Ninth Int. Conf. Knowl. Discov. Data Min., 2003: pp. 595–600. [71] S. Bashir, U. Qamar, F.H. Khan, IntelliHealth: A medical decision support application using a novel weighted multi-layer classifier ensemble framework., J. Biomed. Inform. 59 (2015) 185–200. [72] J.L. Melsa, D.L. Chon, Decision and estimation theory, 1978. 23 Figure 1. The structure of the Fuzzy ARTMAP (FAM) network (adapted from [46]) 24 Other agent(s) Communication Negotiation Other agent(s) Trust Other agent(s) An agent Other agent(s) Figure 2. The Trust Negotiation Communication (TNC) reasoning scheme for a multi-agent system. An agent carries a “trust” measurement, and performs “negotiation” with other agents through the “communication” links (horizontal or vertical) in solving a complex problem. 25 Figure 3. The structure of the proposed QMACS model 26 Figure 4. A flow diagram of the key processes of the proposed QMACS model 27 Algorithm QMACS Input: parameters of QFAM, Pruned QFAM and QFAM-GA (α, β, ρ, γ, λ, ξ, 𝑊𝑁𝐶𝑃 , 𝑊𝑠 , population size, crossover probability, mutation probability, number of prototypes replaced in each population), validation samples, test samples, parameters of trained agents ( wa  (w1a , w2a ,..., wJa ) , wb  (w1b , w2b ,..., wKb ) , wab  (w1ab , w2ab ,..., wJab ) , Q  (Q1 , Q2 ,..., QJ ) ) Output: performance indicators For each agent For each validation sample For each prototype node w aj : Determine TJ (Eq.2) and vig( j ) (Eq. 3) If vig ( j )   a determine strength( j ) (Eq.10) Else ignore the current prototype node ( strength( j )  0 ) End if Winner= the prototype node with the highest strength Update confusion matrix End for Determine initial trust (Eq. 29) and Q _ Value ckm (Eq. 31) End for For each test sample: For each agent For each prototype node w aj : Determine TJ (Eq.2) and vig( j ) (Eq. 3) If vig ( j )   a determine strength( j ) (Eq.10) Else ignore the current prototype node ( strength( j )  0 ) End if Winner= the prototype node with the highest strength End for Winner= the prototype node with the highest strength Determine trust (Eq. 30) Update confusion matrix End for End for -Use the “sealed bid-first price auction” method to identify the winning agent at the team level. -Use the “sealed bid-first price auction” method to identify the team with the highest trust value as the winning team. -Compare the prediction with the actual target class to determine its correctness, -Update the performance indicators End 28 Figure 5. The pseudo-code of the proposed QMACS modely 29 Figure 6: The four-circle-in-the-square problem (with 2000 training samples) 30 Table 1: Data sets from UCI, Ricco and Wisconsin Clinical Sciences Centre Data set Number of Number of Number of data samples input classes features Waveform Database Generator 5000 40 3 (Version 2) LED display domain 1000 24 10 Eric 209 6 2 SPECT 267 22 2 SPCTF 267 44 2 Statlog 270 13 2 UMC 286 9 2 WBC 699 10 2 WPBC 198 34 2 WDBC 569 32 2 Pima Indian Diabetes (PID) 768 8 2 ILPD 583 10 2 BUPA 345 6 2 Hepatitis 155 19 2 31 Table 2: The accuracy rates (%) for the four-circle-in-the-square problem Bootstrap with 95% confidence interval Models Noise-free data set Noisy data set Lower Mean Upper Lower Mean Upper Team 1 90.19 91.07 92.15 86.7 87.82 89.09 Team 2 87.70 88.94 90.40 85.93 86.75 88.27 Team 3 76.24 79.46 87.02 73.08 76.75 80.43 QMACS 90.48 91.11 91.79 86.34 87.86 89.33 32 Table 3: The accuracy rates (%) for the four-circle-in-the-square Models Noise-free data Noisy data FAM [68] 88.16 76.52 dARTMAP [68] 87.19 81.07 Pruned FAM [68] 88.18 81.59 FasArt [68] 92.76 77.21 dFasArt [68] 87.95 80.40 Pruned FasArt [68] 91.74 84.69 QMACS 91.11 87.86 33 Table 4: The accuracy rates (%) for the waveform problem Method Lower Mean Upper Team 1 92.89 93.28 95.43 Team 2 74.21 75.46 76.38 Team 3 87.04 87.61 88.45 QMACS 91.69 93.02 94.74 34 Table 5: The accuracy rates for the LED problem Accuracy (%) with 95% confidence interval Noise Team 1 Team 2 Team 3 QMACS 0% Upper 82.93 64.73 97.27 97.27 Mean 80.80 55.97 95.34 95.46 Lower 78.89 47.87 93.93 93.73 10% Upper 67.87 44.92 82.42 83.22 Mean 65.27 39.97 78.19 78.54 Lower 62.17 34.53 73.03 72.62 25% Upper 56.97 25.65 63.23 62.21 Mean 54.15 22.13 58.82 58.72 Lower 51.42 18.48 55.25 55.95 35 Table 6: The mean accuracy rates (%) with standard deviations for the Waveform problem Method Waveform Locally weighted naive Bayes (LWNB) K=50 [69] 81.88±1.8 Locally weighted naive Bayes (LWNB) K=30 [69] 81.79±1.9 Locally weighted naive Bayes (LWNB) K=100 [69] 82.34±1.8 Naïve Bayes (NB) [69] 80.01±1.4 K-nearest neighbours with distance weighting (KNNDW) K=5 [69] 79.33±1.8 K-nearest neighbours with distance weighting (KNNDW) K=10 [69] 81.12±2.0 K-nearest neighbours with distance weighting (KNNDW) K=20 [69] 83.12±1.9 QMACS 92.98±2.04 36 Table 7: The error rates for the LED problem Noise Plurality Anti- Borda Plurality Pairwise QMACS (Pl) [70] plurality Count with Comparisons (Anti-PI) (BC) [70] Elimination (PC) [70] [70] (PL-Elm) [70] 0% 0 0 0 0 0 0.0454 10% 0.3123 0.5435 0.3063 0.3123 0.3123 0.2146 25% 0.6937 0.7988 0.7087 0.6907 0.6877 0.4128 37 Table 8: The mean accuracy (Acc), sensitivity (Sens), and specificity (Spec) scores for heart disease data sets Method Eric Dataset SPECT Dataset Acc Sens Spec Acc Sens Spec (%) (%) (%) (%) (%) (%) HM-BagMoov [71] 83.82 91.74 73.74 84.77 34.73 97.75 Bagging [71] 81.82 89.74 71.74 82.77 32.73 95.75 Ad aBoot [71] 77.03 80.34 72.83 83.90 36.36 96.23 Majority Voting [71] 77.51 87.18 65.22 83.52 36.36 95.75 Accuracy-Weighting [71] 81.50 85.55 70.99 83.99 32.50 93.99 QMACS 91.93 98.81 87.15 87.03 97.21 72.57 SPECTF Dataset Statlog Dataset HM-BagMoov [71] 83.03 7.45 99.11 87.93 94.67 79.50 Bagging [71] 79.03 5.45 98.11 85.93 92.67 77.50 Ad aBoot [71] 78.28 45.45 86.79 78.89 83.33 73.33 Majority Voting [71] 79.78 9.09 98.11 82.59 90.67 72.50 Accuracy-Weighting [71] 80.03 5.45 95.50 85.55 92.67 77.50 QMACS 90.12 82.58 92.06 95.34 95.45 95.33 38 Table 9: The mean accuracy (%), sensitivity (%) and specificity (%) scores for breast cancer data sets Method UMC WPBC Acc Sens Spec Acc Sens Spec (%) (%) (%) (%) (%) (%) HM-BagMoov [71] 73.45 74.19 95.57 80.34 90.00 19.5 Bagging [71] 71.05 73.75 91.60 77.24 100 20 Ad aBoot [71] 64.34 28.24 79.60 79.80 92.72 38.30 Majority Voting [71] 72.38 20.00 94.53 82.32 98.01 31.91 Accuracy-Weighting [71] 72.95 45.46 85.55 82.85 95.95 32.91 QMACS 92.15 82.81 96.37 94.83 82.21 98.78 WBDC WBC HM-BagMoov [71] 95,56 95.00 99.16 97.11 95.78 95.85 Bagging [71] 94.19 94.60 95.04 97.57 94.13 95.08 Ad aBoot [71] 96.13 94.34 97.20 95.85 92.95 97.38 Majority Voting [71] 95.61 89.15 99.44 96.71 97.38 95.44 Accuracy-Weighting [71] 96.01 94.23 98.50 96.98 93.28 95.01 QMACS 97.53 94.25 99.25 93.50 100.0 81.51 39 Table 10: The mean accuracy (%), sensitivity (%) and specificity (%) scores for liver disease data sets Method ILPD Dataset BUPA Dataset Acc Sens Spec Acc Sens Spec (%) (%) (%) (%) (%) (%) HM-BagMoov [71] 72.20 96.00 2.67 70.16 68.71 89.52 Bagging [71] 71.87 46.59 16.92 69.02 66.72 92.99 Ad aBoot [71] 66.55 81.49 29.34 68.41 53.79 79.00 Majority Voting [71] 71.53 99.04 2.99 71.88 45.52 91.00 Accuracy-Weighting [71] 71.90 75.00 1.00 67.01 59.99 75.98 QMACS 80.82 94.15 47.70 81.12 77.29 83.39 40 Table 11: The mean accuracy (%), sensitivity (%) and specificity (%) for scores for hepatitis and diabetes disease data sets Method Hepatitis Dataset PID Dataset Acc Sens Spec Acc Sens Spec (%) (%) (%) (%) (%) (%) HM-BagMoov [71] 87.04 77.27 51.67 78.21 78.65 92.60 Bagging [71] 85.50 75.63 48.99 77.99 75.96 85.00 Ad aBoot [71] 83.12 95.08 37.50 76.43 52.99 89.00 Majority Voting [71] 83.12 94.26 40.63 76.30 50.00 90.40 Accuracy-Weighting [71] 85.55 75.60 50.99 77.00 65.54 85.55 QMACS 93.49 54.84 98.59 91.13 96.31 80.33 41 Table 12: Comparison of computational time comparison between QMACS and other models Method Training time (ms) Heart disease Liver disease Eric SPECT SPECFT Statlog ILPD BUPA HM-BagMoov 22.1 25.9 118.2 36.9 54.82 30.09 [71] Bagging [71] 22.24 26.88 118.9 38.02 55.95 31.79 Ad aBoot [71] 14.41 1.87 82.24 21.15 13.46 14.33 Majority 0.27 0.36 1.29 0.47 0.7 0.39 Voting [71] QMACS 63.6 65.6 93.4 95.2 49.23 53.28 Breast Cancer Hepatitis and diabetes UMC WPBC WDBC WBC PID Hepatitis HM-BagMoov 17.98 150.90 467.3 40.2 73.5 36.7 [71] Bagging [71] 18.11 187.60 490.90 42.07 74.47 38.19 Ad aBoot [71] 9.60 138.90 186.30 16.96 16.13 2.33 Majority 0.24 2.13 6.03 0.58 0.75 0.49 Voting [71] QMACS 112.5 60.39 57.68 60.72 63.14 45.13 6 42 Table 13: Memory usage of QMACS Method Memory usage (kilobytes) Heart disease Liver disease Eric SPECT SPECFT Statlog ILPD BUPA QMACS 16.34 77.03 170.49 59.68 49.72 24.60 Breast Cancer Hepatitis and diabetes UMC WPBC WDBC WBC PID Hepatitis QMACS 43.91 130.52 176.83 44.88 69.66 32.10 43

(PDF) A Q-learning-based multi-agent system for data classification