(PDF) Feature Selection Based Machine Le

Feature Selection Based Machine Learning to Improve Prediction of Parkinson Disease Nazmun Nahar1 , Ferdous Ara1 , Md. Arif Istiek Neloy1 , Anik Biswas1 , Mohammad Shahadat Hossain2(B) , and Karl Andersson3 1 BGC Trust University Bangladesh Bidyanagar, Chandanaish, Bangladesh

[email protected]

2 University of Chittagong, Chittagong, Bangladesh hossain

[email protected]

3 Lulea University of Technology, 931 87 Skellefteå, Sweden

[email protected]

Abstract. Parkinson’s disease (PD) is a kind of neurodegenerative dis- order characterized by the loss of dopamine-producing cells in the brain. The disruption of brain cells that create dopamine, a chemical that allows brain cells to connect with one another, causes Parkinson’s dis- ease. Control, adaptability, and rapidity of movement are all controlled by dopamine-producing cells in the brain. Researchers have been inves- tigating for techniques to identify non-motor symptoms that show early in the disease as soon as possible, slowing the disease’s progression. A machine learning-based detection of Parkinson’s disease is proposed in this research. Feature selection and classiﬁcation techniques are used in the proposed detection technique. Boruta, Recursive Feature Elimina- tion (RFE) and Random Forest (RF) Classiﬁer have been used for the feature selection process. Four classiﬁcation algorithms are considered to detect Parkinson disease which are gradient boosting, extreme gradient boosting, bagging and Extra Tree Classiﬁer. Bagging with recursive fea- ture elimination was found to outperform the other methods. The lowest number of voice characteristics for the diagnosis in Parkinson attained 82.35% accuracy. Keywords: Parkinson disease · Boruta · RFE · RF · Feature selection 1 Introduction Parkinson’s disease is a well-known nervous system ailment characterized by dif- ﬁculties walking, maintaining body balance, and shaking. This disease also man- ifests itself in mental and behavioral disorders. This condition can aﬀect both men and women. However, it appears that men are more likely to be aﬀected by this [19]. Parkinson’s disease worsens over time once it has been diagnosed. Walk- ing and speech become increasingly diﬃcult for the patient as time goes on. c Springer Nature Switzerland AG 2021 M. Mahmud et al. (Eds.): BI 2021, LNAI 12960, pp. 496–508, 2021. https://doi.org/10.1007/978-3-030-86993-9_44 Feature Selection Techniques Parkinson Disease 497 Parkinson’s disease is more likely to impact people as they get older. Though the probability of being impacted is low at a young age (5% to 10%), the risk grows considerably after the age of 60. Lewy bodies, low-level norepinephrine, hereditary variables, and other factors are some of the causes of Parkinson’s disease [9]. However, Parkinson’s disease is thought to be caused by a lack of dopamine. Dopamine production is reduced when nerve cells die. Dopamine is important because it keeps the brain and nerve cells linked. Researchers are still trying to ﬁgure out why brain cells are dying. Parkinson’s disease treatment is mostly determined by the patient’s state. From early diagnosis through the last phase of the disease, the condition can be divided into several stages. The most prevalent drugs are Levodopa, Dopamine agonists, Amantadine, COMT inhibitors, and others, depending on the condi- tion [15]. However, none of them can guarantee that the ailment will be cured. That is why it is so important to diagnose Parkinson’s disease early. Parkinson’s disease has yet to be diagnosed in a deﬁnite method. To identify Parkinson’s disease in its early stages, most doctors must rely on fundamental symptoms (diﬃculty walking, maintaining bodily balance, shaking, and so on). As a result, the researchers are looking for strategies to identify these non-motor symptoms as early as possible throughout the disease, in order to reduce the disease’s development. This is where Machine Learning enters the picture. Machine learning (ML) is increasingly being utilized for medical diseases detection [25,26,31] due to its ease of implementation and high accuracy. Nowa- days, the quantity and complexity of clinical datasets have been grown rapidly, resulting in large datasets of large dimensions. Reduction of dimensionality attempts to decrease the number of variables. For the dimensionality reduction method, both feature extraction and feature selection strategies are applied. The goal of this study is to reduce computing time by using fewer eﬀective features, a lightweight feature selection methodology, and a classiﬁer has been proposed. The features are derived from the speech signals, making them easier to collect than those derived from MRI or motion-based [19] approaches. The proposed ML-based early PD diagnostic method’s main contributions are mentioned below. – For the identiﬁcation of the most relevant characteristics to be employed in the classiﬁcation task, Boruta, Recursive Feature Elimination (RFE) and Random Forest (RF) approaches were applied. – The results of many popular classiﬁers have been analyzed and the optimal classiﬁer for the PD diagnostic problem has been determined. – The signiﬁcance of applying FS approaches in the preprocessing step of PD patient classiﬁcation has been demonstrated. Bagging classiﬁer performance has been increased by Feature Selection by approximately 8% and Extra Tree Classiﬁer has been increased by approximately 10%. – The fewer speech features required to diagnose PD have been achieved with less eﬀort, and very high accuracy detection rates (82.35%). 498 N. Nahar et al. The remainder of this paper is organized as follows: Sect. 2 summarizes the study of the literature review on current studies in this area. The proposed frame- work and methodology are described in Sect. 3. Section 4 presents experimental results and comparisons between classiﬁcation techniques. The conclusion and future work of the paper is deﬁned in Sect. 5. 2 Literature Review ML was also employed in the literature for the processing of PD. M. AI-Sarem et al. [5] presented the detection of Parkinson’s disease through a number of set methods including random forest, extreme gradient boosting (XGBoost), and CatBoost to increase accuracy. They examine the important feature for each ensemble classiﬁcation using random forest and extreme gradient increase (XGBoost) and categorical boost (CatBoost). T. J. Wronge et al. [37] proposed a Voice Activation Detection (VAD) tech- nique for prediction of Parkinson disease. Raw audio has been extracted and background noise eliminated then sent into two distinct feature-extraction algo- rithms. In the end, they use an algorithm for machine learning. K.R. Wan et al. [36] analyzed papers to be utilized for ML in brain surgery for selection of functions (FS). An ML-based technique is used to ﬁnd the real region to be operated on during brain surgery for Parkinson’s disease. This study focuses on investigations conducted after Parkinson’s disease has been diagnosed. F. Cavallo et al. [11] attempted to forecast Parkinson’s disease based on motion data collected from people’s upper limbs. The researchers implanted a device into the upper limbs of the experimental subjects (both PD patients and healthy persons) and instructed them to perform a series of activities. To get parameters, a spatiotemporal and frequency data analysis was done, and then several supervised learning approaches were employed for classiﬁcation. J.S. Almeida et al. [6] employed a variety of feature extraction and machine learning strategies to detect PD. They discovered that phonation is the most eﬀective activity for detecting Parkinson’s disease. The research examined at K-NN, Multilayer Perceptron (MLP), Optimum Path Forest, and Support Vector Machines (SVM) as classiﬁers. L. Parisi et al. [32], artiﬁcial neural networks were used to minimize voice characteristics for ML-based PD diagnosis. SVM was used to classify the data. Instead of MRI, motion or speech information utilizing ML algorithms, PD was also identiﬁed using handwriting tasks [21]. Despite the fact that high classiﬁ- cation rates were obtained in the literature from ML-based PD diagnosis. They either employed a large number of features, which increases calculation time, or the extraction of the features was diﬃcult even when just a few features were used. As a result, the computation time is increased indirectly. The goal of this study is to reduce computing time by using fewer eﬀective features, a lightweight feature extraction process. The features are derived from speech signals, mak- ing it easier to collect the features than other MRI-based [21] or motion-based Feature Selection Techniques Parkinson Disease 499 approaches in the literature. The ML-based early Parkinson disease prediction is the main contributions. 3 Methodology The workﬂow for this research is described in Fig. 1. There are various steps to the experiment. First, acquire data set from machine-learning repository of the University of California Irvine (UCI). Secondly, we use the RF, Boruta and RFE selection methods to select key features. The next is to compare the diﬀerent models of machine learning for classiﬁcation analysis like Gradient Boosting, Extreme Gradient Boosting, Bagging and Extra Tree Classiﬁer. A mathematical challenge is to determine an optimum subset of features from the attributes list. Our experiment uses a recursive process to get to the root of the issue. In the classiﬁcation data analysis, various models will have varied capabilities. We compare four classiﬁer methods with various attributes to choose the best way for classifying based on each classiﬁer’s accuracy. The steps of the proposed technique are as follows. Fig. 1. Overall research workﬂow 3.1 Data Collection Parkinson Dataset with replicated acoustic features [29] has been used for the training and evaluation of our methodology. A total of 80 participants over 50 years participated in the research. Forty of them were healthy: 22 (55%) male and 18 (45%) female and forty people suﬀered from PD: 27 (67.5%) male and 13 (32.5%) female. A total of 80 participants over 50 years participated in the research. The research is based on 44 acoustic features that may be divided into ﬁve groups. These 44 features are shown in Table 1. 500 N. Nahar et al. Table 1. Dataset description. SL no. Feature group Feature name 1 ID Subjects’s identiﬁer 2 Recording Number of the recording 3 Gender 0 = Man; 1 = Woman 4 Pitch local perturbation Relative jitter (Jitter rel), absolute jitter measure (Jitter abs), relative average perturbation (Jitter RAP) pitch perturbation quotient (Jitter PPQ) 5 Amplitude perturbation Local shimmer (Shim loc), shimmer in dB measures (Shim dB), 3-point amplitude perturbation quotient (Shim APQ3), 5-point amplitude perturbation quotient (Shim APQ5) 6 Harmonic-to-noise ratio HNR05, HNR15, HNR25, HNR35, HNR38 measures 7 Mel frequency cepstral MFCC0, MFCC1,..., MFCC12) and their coeﬃcient-based spectral derivatives (Delta0, Delta1,..., Delta12) measures 8 Recurrence period density RPDE entropy 9 Detrended ﬂuctuation DFA analysis 10 Pitch period entropy PPE 11 Glottal-to-noise excitation GNE ratio 12 Status 0 = Healthy; 1 = PD 3.2 Feature Selection The goal of feature selection is to identify the most signiﬁcant features of a problem domain. The computational speed and accuracy of prediction [16] are improved. In this paper, we apply three very well-known feature selection meth- ods to identify most relevant features. These three feature selection methods are Boruta, Recursive Feature Elimination (RFE) and Random Forest (RF). Boruta: Boruta [22] is an algorithm for feature selection and feature ranking which work based on Random forest algorithm. The beneﬁts of Boruta are to determine the importance of the variable and help to choose signiﬁcant vari- ables statistically. In addition, by increasing p value to 0.01, we may increase the robustness of the method. The number of times the algorithm has executed is known as nEstimator. The higher the nEstimator, the more selective the vari- ables are to be chosen. 100 is the default. This method provides a top down approach with the comparison of the original features with relevant attributes. Feature Selection Techniques Parkinson Disease 501 Feature Feature Elimination (RFE): RFE gives an accurate approach to determine the key variables before we enter into a machine learning system. In this study we present RFE for the classiﬁcation of parkinson diseases with the technique Decision Tree (DT). RFE uses all the features for building a DT model. The RFE eventually removes the irrelevant features which contribute meaninglessly to the DT model. Furthermore, RFE is a strong feature selection approach that is dependent on the learning model [34]. Random Forest (RF): In a data science approach, random forests are some- times employed to select features. The reason for this is that random forests’ tree-based methods are naturally ranked by how eﬀectively they increase node purity [23]. This represents an overall decrease in impurity (called gini impu- rity). At the beginning of the trees there are nodes with the most decrease in impurity, while at the end of the trees there are nodes with the least reduction in impurity. Thus, we may build a subset of the most essential features by cutting trees below a given node. 3.3 Data Splitting In this phase, the Parkinson disease dataset is separated into a training set of 80% and a testing set of 20%. The training set is used to develop the models, and the test set is used to assess the models. 3.4 Training Models and Evaluation Matrices We have trained our models using four classiﬁcation methods which are Gradi- ent Boosting [30], Extreme Gradient Boosting [12], bagging [10] and Extra Tree Classiﬁer [13]. First we train our model using 44 feature. Then we select impor- tant feature from the 44 using three feature selection method and we train our model using the selected feature. The proposed model is evaluated using a set of metrics, including accuracy, recall, precision, F-score, and others [35]. 4 Result and Discussion In this part, the experimental result of our developed classiﬁcation method will be discussed. At ﬁrst we predict Parkinson disease with four classiﬁer which are Gradient Boosting, Extreme Gradient Boosting, Extra Tree Classiﬁer and Bagging. Then we apply three feature selection method and after selecting some important feature then we apply again our above four classiﬁer. Here, we want to show that how feature selector method eﬀect the prediction result. 502 N. Nahar et al. 4.1 System Configuration To execute our proposed methodology, Google Colaboratory was used. Google Colaboratory provides users with free CPU and GPU help to perform Python research projects in the cloud. The user interface of Google Colab is built on Jupiter notebooks. Our proposed technique was created using the PYTHON scikit-learn library [1]. PYTHON scikit-learn library is necessary in order to execute classiﬁcation models like Gradient Boosting, Extreme Gradient Boost- ing, Extra Tree Classiﬁer and Bagging algorithm and it is also necessary to execute feature selection method like RF and Recursive RFE. To implement Boruta classiﬁer we use “Python BorutaPy” library [24]’. 4.2 Result of Classifier Before Applying Feature Selection Method Table 2 shows the result of our proposed four classiﬁer which are Gradient Boost- ing, Extreme Gradient Boosting, Extra Tree Classiﬁer and Bagging. From the Table 1 we can see that the accuracy of Gradient Boosting is 77.21%, Extra Tree classiﬁer accuracy is 71.91%, Bagging classiﬁer accuracy is 75.08% and Extreme Gradient Boosting classiﬁer accuracy is 78.08%. That means Extreme Gradient Boosting gives the best accuracy among all the classiﬁer before applying any feature selection method. Table 2. Accuracy of the classiﬁer before applying feature selection method. Classiﬁer name Accuracy (%) Precision Recall F-Score Gradient Boosting 77.21 0.79 0.72 0.78 Extra Tree Classiﬁer 71.91 0.71 0.74 0.72 Bagging 75.08 0.71 0.83 0.77 Extreme Gradient Boosting 78.08 0.77 0.79 0.78 4.3 Result of Classifier After Applying Boruta Feature Selection Method 14 important features are selected by the Boruta feature selection method. We show these feature in Fig. 2. From the Fig. 2 we can see that DFE feature has the highest score and so this feature is the most important feature to predict parkinson disease. Table 3 shows the result of our proposed four classiﬁer after applying Boruta feature selection method which are Gradient Boosting, Extreme Gradient Boost- ing, Extra Tree Classiﬁer and Bagging. From the Table 3 we can see that the accuracy of Gradient Boosting is 74.16%, Extra Tree classiﬁer accuracy is 72.91%, Bagging algorithm accuracy is 77.08% and Extreme Gradient Boosting classiﬁer accuracy is 75.08%. That means Bagging algorithm gives the best accu- racy among all four the classiﬁer after applying Botuta feature selection method. Feature Selection Techniques Parkinson Disease 503 Fig. 2. Boruta feature scores From the table we can also noticed that the Gradient Boosting and Extreme Gradient Boosting algorithm is decreased but Extra Tree Classiﬁer and Bagging algorithm accuracy is increased after applying Boruta feature selection method. Table 3. Accuracy of the classiﬁer after applying Boruta method. Classiﬁer name Accuracy (%) Precision Recall F-Score Gradient Boosting 74.16 0.74 0.71 0.72 Extra Tree Classiﬁer 72.91 0.73 0.70 0.73 Bagging 77.08 0.76 0.79 0.78 Extreme Gradient Boosting 75.00 0.75 0.75 0.75 4.4 Result of Classifier After Applying Recursive Feature Elimination (RFE) Feature Selection Method The REF feature selection approach selects 21 main features. Figure 3 depicts these features. Figure 3 shows that the HNR25 feature has the best score, indi- cating that it is essential for predicting Parkinson disease. Table 4 shows the result of our proposed four classiﬁer after applying RFE fea- ture selection method which are Gradient Boosting, Extreme Gradient Boosting, Extra Tree Classiﬁer and Bagging. From the Table 4 we can see that the accu- racy of Gradient Boosting is 70.75%, Extra Tree classiﬁer accuracy is 81.25%, Bagging algorithm accuracy is 82.35% and Extreme Gradient Boosting classi- ﬁer accuracy is 79.16%. That means Bagging algorithm gives the best accuracy among all four the classiﬁer after applying RFE feature selection method. From the table we can also noticed that the Gradient Boosting is decreased but Extra Tree Classiﬁer, Bagging and Extreme Gradient Boosting algorithm accuracy is increased after applying RFE feature selection method. 504 N. Nahar et al. Fig. 3. RFE feature scores Table 4. Accuracy of the classiﬁer after applying RFE method. Classiﬁer name Accuracy (%) Precision Recall F-Score Gradient Boosting 70.75 0.70 0.71 0.69 Extra Tree Classiﬁer 81.25 0.78 0.88 0.82 Bagging 82.35 0.80 0.83 0.82 Extreme Gradient Boosting 79.16 0.75 0.88 0.81 4.5 Result of Classifier After Applying Random Forest (RF) Feature Selection Method 16 main features are chosen using the RF feature selection methodology. These features are visualized in Fig. 4. Figure 4 demonstrates that the GNE feature has the best score, showing that it is important for Parkinson’s disease prediction. Table 5 shows the result of our proposed four classiﬁer after applying RF fea- ture selection method which are Gradient Boosting, Extreme Gradient Boosting, Extra Tree Classiﬁer and Bagging. From the Table 5 we can see that the accu- racy of Gradient Boosting is 79.16%, Extra Tree classiﬁer accuracy is 75.00%, Bagging algorithm accuracy is 80.21% and Extreme Gradient Boosting classi- ﬁer accuracy is 78.16%. That means Bagging algorithm gives the best accuracy among all four the classiﬁer after applying RF feature selection method. From the table we can also noticed that the all the classiﬁer algorithm has been increased after applying RF feature selection method. From the above implementation of three feature selection method we noticed the RFE feature selection method performs better than the other. Feature Selection Techniques Parkinson Disease 505 Fig. 4. RF feature scores Table 5. Accuracy of the classiﬁer after applying RF method. Classiﬁer name Accuracy (%) Precision Recall F-Score Gradient Boosting 79.16 0.79 0.79 0.79 Extra Tree Classiﬁer 75.00 0.73 0.79 0.76 Bagging 80.21 0.74 0.96 0.82 Extreme Gradient Boosting 78.16 0.78 0.78 0.78 5 Conclusion In this work, a Feature selection based classiﬁcation system for the early identiﬁ- cation of Parkinson’s disease was designed utilizing the characteristics of speech signals from both PD patients and healthy persons. In the experiments, vari- ous Feature Selection methods and classiﬁcation techniques were employed. The major goal was to increase the model’s performance and accuracy while simul- taneously lowering the computing cost of the classiﬁcation task. The classiﬁca- tion techniques’ accuracy’ were analyzed with and without Feature Selection, and the signiﬁcant impacts of feature selection was demonstrated. The results show that combining feature selection approaches with classiﬁcation techniques is quite beneﬁcial, particularly when working with voice data. The data of our Parkinson disease voice dataset is limited. In future we will collect more data to detect Parkinson disease and we will work with some deep learning method and other method [2–4,7,8,14,17,18,20,27,28,33,38]. In future, we will also work with Parkinson disease MRI Data. References 1. Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2016), pp. 265–283 (2016) 2. Abedin, M.Z., Akther, S., Hossain, M.S.: An artiﬁcial neural network model for epilepsy seizure detection. In: 2019 5th International Conference on Advances in Electrical Engineering (ICAEE), pp. 860–865. IEEE (2019) 506 N. Nahar et al. 3. Ahmed, T.U., Hossain, M.S., Alam, M.J., Andersson, K.: An integrated CNN- RNN framework to assess road crack. In: 2019 22nd International Conference on Computer and Information Technology (ICCIT), pp. 1–6. IEEE (2019) 4. Ahmed, T.U., Jamil, M.N., Hossain, M.S., Andersson, K., Hossain, M.S.: An inte- grated real-time deep learning and belief rule base intelligent system to assess facial expression under uncertainty. In: 2020 Joint 9th International Conference on Infor- matics, Electronics & Vision (ICIEV) and 2020 4th International Conference on Imaging, Vision & Pattern Recognition (icIVPR), pp. 1–6. IEEE (2020) 5. Al-Sarem, M., Saeed, F., Boulila, W., Emara, A.H., Al-Mohaimeed, M., Errais, M.: Feature selection and classiﬁcation using CatBoost method for improving the performance of predicting Parkinson’s disease. In: Saeed, F., Al-Hadhrami, T., Mohammed, F., Mohammed, E. (eds.) Advances on Smart and Soft Computing. AISC, vol. 1188, pp. 189–199. Springer, Singapore (2021). https://doi.org/10.1007/ 978-981-15-6048-4 17 6. Almeida, J.S., et al.: Detecting Parkinson’s disease with sustained phonation and speech signals using machine learning techniques. Pattern Recogn. Lett. 125, 55–62 (2019) 7. Basnin, N., Nahar, L., Hossain, M.S.: An integrated CNN-LSTM model for micro hand gesture recognition. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2020. AISC, vol. 1324, pp. 379–392. Springer, Cham (2021). https://doi.org/10. 1007/978-3-030-68154-8 35 8. Basnin, N., Nahar, L., Hossain, M.S.: An integrated CNN-LSTM model for Bangla lexical sign language recognition. In: Kaiser, M.S., Bandyopadhyay, A., Mahmud, M., Ray, K. (eds.) Proceedings of International Conference on Trends in Com- putational and Cognitive Engineering. AISC, vol. 1309, pp. 695–707. Springer, Singapore (2021). https://doi.org/10.1007/978-981-33-4673-4 57 9. Brazier, Y.: Parkinson’s disease early signs and causes (2021). https://www. medicalnewstoday.com/articles/323396#causes. Accessed 20 May 2021 10. Bühlmann, P.: Bagging, boosting and ensemble methods. In: Gentle, J., Härdle, W., Mori, Y. (eds.) Handbook of Computational Statistics, pp. 985–1022. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-21551-3 33 11. Cavallo, F., Moschetti, A., Esposito, D., Maremmani, C., Rovini, E.: Upper limb motor pre-clinical assessment in Parkinson’s disease using machine learning. Parkinsonism Related Disord. 63, 111–116 (2019) 12. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016) 13. Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Machine Learn. 63(1), 3–42 (2006) 14. Gosh, S., Nahar, N., Wahab, M.A., Biswas, M., Hossain, M.S., Andersson, K.: Recommendation system for E-commerce using alternating least squares (ALS) on apache spark. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2020. AISC, vol. 1324, pp. 880–893. Springer, Cham (2021). https://doi.org/10.1007/978-3-030- 68154-8 75 15. Holland, K.: Everything You Want to Know About Parkinson’s Disease (2021). https://www.healthline.com/health/parkinsons#treatment. Accessed 20 May 2021 16. Hsu, H.H., Hsieh, C.W., Lu, M.D.: Hybrid feature selection by combining ﬁlters and wrappers. Expert Syst. Appl. 38(7), 8144–8150 (2011) 17. Islam, R.U., Hossain, M.S., Andersson, K.: A deep learning inspired belief rule- based expert system. IEEE Access 8, 190637–190651 (2020) Feature Selection Techniques Parkinson Disease 507 18. Islam, R.U., Ruci, X., Hossain, M.S., Andersson, K., Kor, A.L.: Capacity manage- ment of hyperscale data centers using predictive modelling. Energies 12(18), 3438 (2019) 19. Michael, J.: Parkinson Disease (2017). https://www.nia.nih.gov/health/ parkinsonsdisease#:∼:text=Parkinson%27s%20disease%20is%20a%20brain. Accessed 20 May 2021 20. Kabir, S., Islam, R.U., Hossain, M.S., Andersson, K.: An integrated approach of belief rule base and deep learning to predict air pollution. Sensors 20(7), 1956 (2020) 21. Kotsavasiloglou, C., Kostikis, N., Hristu-Varsakelis, D., Arnaoutoglou, M.: Machine learning-based classiﬁcation of simple drawing movements in Parkinson’s disease. Biomed. Signal Process. Control 31, 174–180 (2017) 22. Kursa, M.B., Jankowski, A., Rudnicki, W.R.: Boruta-a system for feature selection. Fund. Inform. 101(4), 271–285 (2010) 23. Kursa, M.B., Rudnicki, W.R.: The all relevant feature selection using random forest. arXiv preprint arXiv:1106.5112 (2011) 24. Kursa, M.B., Rudnicki, W.R., et al.: Feature selection with the boruta package. J. Stat. Softw. 36(11), 1–13 (2010) 25. Mahmud, M., Kaiser, M.S., McGinnity, T.M., Hussain, A.: Deep learning in mining biological data. Cogn. Comput. 13(1), 1–33 (2021) 26. Mahmud, M., Kaiser, M.S., Hussain, A., Vassanelli, S.: Applications of deep learn- ing and reinforcement learning to biological data. IEEE Trans. Neural Netw. Learn. Syst. 29(6), 2063–2079 (2018) 27. Nahar, N., Ara, F., Neloy, M.A.I., Barua, V., Hossain, M.S., Andersson, K.: A com- parative analysis of the ensemble method for liver disease prediction. In: 2019 2nd International Conference on Innovation in Engineering and Technology (ICIET), pp. 1–6. IEEE (2019) 28. Nahar, N., Hossain, M.S., Andersson, K.: A machine learning based fall detection for elderly people with neurodegenerative disorders. In: Mahmud, M., Vassanelli, S., Kaiser, M.S., Zhong, N. (eds.) BI 2020. LNCS (LNAI), vol. 12241, pp. 194–203. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59277-6 18 29. Naranjo, L., Perez, C.J., Campos-Roca, Y., Martin, J.: Addressing voice record- ing replications for Parkinson’s disease detection. Expert Syst. Appl. 46, 286–292 (2016) 30. Natekin, A., Knoll, A.: Gradient boosting machines, a tutorial. Front. Neurorobot. 7, 21 (2013) 31. Noor, M.B.T., Zenia, N.Z., Kaiser, M.S., Al Mamun, S., Mahmud, M.: Applica- tion of deep learning in detecting neurological disorders from magnetic resonance images: a survey on the detection of Alzheimer’s disease, Parkinson’s disease and schizophrenia. Brain Inform. 7(1), 1–21 (2020) 32. Parisi, L., RaviChandran, N., Manaog, M.L.: Feature-driven machine learning to improve early diagnosis of Parkinson’s disease. Expert Syst. Appl. 110, 182–190 (2018) 33. Pathan, R.K., Uddin, M.A., Nahar, N., Ara, F., Hossain, M.S., Andersson, K.: Gender classiﬁcation from inertial sensor-based gait dataset. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2020. AISC, vol. 1324, pp. 583–596. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68154-8 51 34. Richhariya, B., Tanveer, M., Rashid, A., Initiative, A.D.N., et al.: Diagnosis of alzheimer’s disease using universum support vector machine based recursive feature elimination (USVM-RFE). Biomed. Sig. Process. Contr. 59, 101903 (2020) 508 N. Nahar et al. 35. Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classiﬁcation tasks. Inf. Process. Manage. 45(4), 427–437 (2009) 36. Wan, K.R., Maszczyk, T., See, A.A.Q., Dauwels, J., King, N.K.K.: A review on microelectrode recording selection of features for machine learning in deep brain stimulation surgery for Parkinson’s disease. Clin. Neurophysiol. 130(1), 145–154 (2019) 37. Wroge, T.J., Özkanca, Y., Demiroglu, C., Si, D., Atkins, D.C., Ghomi, R.H.: Parkinson’s disease diagnosis using machine learning and voice. In: 2018 IEEE Signal Processing in Medicine and Biology Symposium (SPMB), pp. 1–7. IEEE (2018) 38. Zisad, S.N., Hossain, M.S., Andersson, K.: Speech emotion recognition in neuro- logical disorders using convolutional neural network. In: Mahmud, M., Vassanelli, S., Kaiser, M.S., Zhong, N. (eds.) BI 2020. LNCS (LNAI), vol. 12241, pp. 287–296. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59277-6 26

(PDF) Feature Selection Based Machine Learning to Improve Prediction of Parkinson Disease