Tourism Recommendation Using Machine Learning Approach Anjali Dewangan and Rajdeep Chatterjee Abstract Puri tourism has always remained as the best tourist spot in Odisha. Researchers and town planners have always taken steps in finding out for proper tourism recommendation. But always they have preferred the method of machine learning approach for the tour recommendation models. Some methods give good simulation data but sometimes artificial neural network (ANN) and regression analysis techniques give better results. In this paper, Puri tourism recommendation method has been modelled based on the SOM architect, and by revenue manage- ment system. Here, a complete comparison has been described between supervised and unsupervised machine learning technique for tourism recommendation in Puri. Keywords ANN ⋅ SOM architect ⋅ Regression ⋅ Simulink block diagram Revenue management system 1 Introduction Tourism is just like an industry for a country for not only its development from the cultural point of view but also adds good exchange program for development in terms of trade, industry and many more [1]. So, tourism forecasting is completely necessary for industry’s contribution to the economical development of that region. So, it is very helpful for managers and government. Actually, government orga- nizations use supervised machine learning techniques like regression analysis for achieving marketing targets and helps them to attain marketing potential stability. Managers use these techniques for determination of staffs and capacity for the study of financial projects to build new hotels and do town planning for tourism rec- ommendation in a country. Taking these techniques into account e Tourism helps in A. Dewangan (✉) ⋅ R. Chatterjee School of Computer Engineering, KIIT University, Bhubaneswar, India e-mail:

[email protected]

R. Chatterjee e-mail:

[email protected]

© Springer Nature Singapore Pte Ltd. 2018 447 K. Saeed et al. (eds.), Progress in Advanced Computing and Intelligent Engineering, Advances in Intelligent Systems and Computing 564, https://doi.org/10.1007/978-981-10-6875-1_44 448 A. Dewangan and R. Chatterjee providing information service features such as travel agents, hotel and tourist spots. The main thing is how to optimize time, money and basic cost for food. Tourist needs more demands in short interval of time and never take interest for searching for too long time for an online assistant. So, to cross across this situation ANN and data mining are now being used widely. The essence of this paper is to symbolize a model of tourism based upon ANN and time series. 2 Models 2.1 Time Series Model This has been extensively preferred in the regression analysis for the calculation of trend and seasonality analysis. This helps in the prediction of future tourist to a specific tourist spot. In this paper, this has been used for tourism analysis for Puri destination [2]. Here, in this method basically autoregressive integrated moving average technique (ARIMA) has been used to model a forecasting method for evaluation of fitness function along with that to choose the best error method for to measure the performance. 2.2 Artificial Neural Network (ANN) Model The two methods which are widely used are supervised and unsupervised machine learning approaches. Neural network is widely used in forecasting and prediction of future response for the tourism industry. One of this parts used for analysis is multi-layer perceptron method. It is just like a bridge for input and output layers that is based upon the initial simple perceptron method with many branches of hidden neurons which helps in identifying the capacity for to learn MLP network [3]. In this method, past 10 years of tourist numbers have been collected from various agents and companies and from travel agents also. Then they are being simulated in SOM architecture methodologies. 3 ARIMA Method It includes basically fitness function, genetic algorithm method which helps in regression analysis. Below we have got the fitting function for each trend and monthly variation of tourist coming to Odisha (Puri) [4]. Here basically three errors have been analyzed to fit the error terms in the regression analysis [5, 6]. Among these, three errors only have been selected based upon the performance for nearness Table 1 Tourist in terms of thousand to Puri (Odisha) 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 112 115 145 171 196 204 242 284 315 340 360 417 January 118 126 150 180 196 188 233 277 301 318 342 391 February 132 141 178 193 236 235 267 317 356 362 406 419 March 129 135 163 181 235 227 269 313 348 396 396 461 April 121 125 172 183 229 234 270 318 355 363 420 472 May 135 149 178 218 243 264 315 374 422 435 472 535 June 148 170 199 230 264 302 364 413 465 491 548 622 July 148 170 199 242 272 239 347 405 467 505 559 606 August 136 158 184 209 237 259 312 355 404 404 463 508 September Tourism Recommendation Using Machine Learning Approach 119 133 162 191 211 229 274 306 347 359 407 461 October 104 114 146 172 180 203 237 271 305 310 362 390 November 118 140 166 194 201 229 278 306 336 337 405 432 December 449 450 A. Dewangan and R. Chatterjee to zero and these errors have been calculated for first ten data of tourist and then forecasted and fitted to regression accordingly. 1 N 2 MSE = ∑ Ft − Ft . ð1Þ N r=1 1 N 2 MAE = ∑ Ft − Ft . ð2Þ N r=1 1 N Ft − Ft MAPE = ∑ . ð3Þ N r=1 Ft where N is the size tourist time series data as per their arrival, Ft is the actual value and F t is forecast value at time t for the year span between 2000 and 2011. MAPE is employed for the comparative study of the performance of forecast model, whereas MSE is represented as forecast analysis in SARIMA model and expo- nential model. Here the trend may be linear or quadratic depending upon seasonal component. In the year 2005, the monthly variation was in between 4.5 and 5 but after 1 month it rose to between 6 and 6.5 [7–9]. Below is the data for tourist used in our simulation from the past 10 years coming to Odisha (Puri). Here in Table 1, the past 10 years tourist data has been taken into consideration which says about their numbers of Odisha tourism. These data are in terms of thousands. In these data, it has been revealed that in the year 2000 the number has been increased to a peak level and then decreased to a certain in the months from January to December. And it has been repeated in the same manner, so the trend has been fitted and then found out to be in a quadratic with linear fashion. Then its error analysis has been calculated and then fitted with response function to give the desired trend which helped to analyze the year forecasting in tourism for Puri (Table 2). We did the calculation for first 10 years data of passengers who came to Puri and error analysis upon calculation do not show better calculation because though the Table 2 Error calculation MAE MAPE MSE 2.4676 0.022032 543.6652 2.4076 0.020403 491.232 2.2676 0.017179 389.5462 2.2976 0.017811 409.2222 2.3776 0.01965 467.1886 2.2376 0.016575 370.8781 2.1076 0.014241 300.1336 2.1076 0.014241 300.1336 2.2276 0.016379 364.8678 2.3976 0.020403 483.066 Tourism Recommendation Using Machine Learning Approach 451 error in case of MAE is good than MSE but both are not having less error value so that our theoretical regression does not overlap with analytical data. It shows that MAPE has got good regression analysis in calculation [10–12]. 4 Numerical Computation of Tourist Forecasting and Its Regression Analysis In this paper, we have analyzed Mackey–Glass time-delay differential equation. Here in the equation with our tourist demand for tourism to Puri is like a non-periodic and non-convergent time series that is sensitive to initial conditions. As per the equation: dxðtÞ 0.2xðt − tauÞ = , ðxðtÞ = 0, where t < 0Þ ð4Þ dt 1 + xðt + tauÞ10 − 0.1xðtÞ Here, tau is the time delay for our forecasting of tourist number [13]. This x denotes the ith row and kth column in the regression matrix for tourist number. Then it is fitted with regression model which have an explicit forecasting mecha- nism and well-defined stationary, invertibility requirements. Then it takes the form of linear form as used in statistics where response form for regression fitting has been got. The equation in fitting takes the form as follows: y = β0 + β1 x1 + β2 x2 + β3 x1 x2 + β4 x21 + β3 x22 + ∈ . Here β is the regression coefficient and ε is the MAPE error, and x is the regression matrix and y is the response. Here in the figures in the error analysis, y-axis represents the error and the x-axis represents the number of passengers in terms of thousands [14–16]. Among the entire error fitting, MAPE has been decayed rapidly giving less error in comparison to others. Here for error analysis first 10 number of passenger samples have been taken into account. 5 SOM Architecture Methodologies Self Organizing Maps (SOM) mainly comprises of three things; first one is input layer, middle one is a competitive layer and the last one is output layer. Here neural network has been used for simulation. For training purpose trainer is applied. Input vectors have been classified and they are been grouped in the input space layer. Input layer is totally different from middle competitive layer because this middle layer helps in recognizing neighbouring sections from the input layers. So, both distribution and topology of input vectors are done by this methodology. Hereby the help of topology functions, the neurons are arranged in physical positions. The topology functions are gridtop, hextop or randtop that helps in arranging the neurons in a grid, random or hexagonal way. Distance function is first 452 A. Dewangan and R. Chatterjee Fig. 1 SOM sample hits of all calculated between neurons and then they are positioned accordingly. Here weight function has been taken into consideration for each input vector. With respect to arrival rate, these vectors denote service rate and waiting time of tourist. This design is a kind of competitive network, except bias being utilized. The competitive copy function creates a selection corresponding to i*, the winning neuron for output aspect a1i. Other outcomes of all elements in a1 are 0. Neurons near the winning neuron are up-to-date along with the receiving neuron. Here, the competition of neuron in winning is analogous to tourist coming to their choice hotels and neurons have been made similar to the waiting time of each tourist with their preference budget. In Fig. 1, it tells about the SOM sample hits which consist of large and small weights [17, 18]. The larger weights are in the blue colour. These connect the input to their respective neuron by weights. In the event, the connection patterns of two inputs are incredibly similar, you can assume that the inputs were highly related. In this instance, input 1 has connections that are incredibly different than the ones from input 2. If perhaps the input space is high dimensional, you are not able to visualize all the weight loads at the same time. Here, the black colourings with hexagons rep- resent the neurons. The red lines have been linked to neighbouring neurons. The Tourism Recommendation Using Machine Learning Approach 453 Fig. 2 Simulink block diagram for hotel agent in tourism field Table 3 Hotel package in Puri Hotel Hotel Hotel package Hotel Hotel Hotel package package (#1) package (#2) (#3) package (#4) package (#5) (#6) Season: Season: Season: Season: Season: Season: winter summer summer summer winter winter Destination: Destination: Destination: Destination: Destination: Destination: India, Puri India, Puri India, Puri India, Puri India, Puri India, Puri Hotel Stars: Hotel Stars: Hotel Stars: 3 Hotel Stars: Hotel Stars: 4 Hotel Stars: 3 5 (Mayfair 4 (Fort (Shakti 5 (Mayfair (Hans Coco (Swargadwara Waves) Mahodadhi) International) Heritage) Palms) Hotel) Number of Number of Number of Number of Number of Number of days: 7 days: 7 days: 7 days: 7 days: 7 days: 7 Min price: Min price: Min price: Min price: Min price: Min price: 25000 19000 10000 22000 18500 15000 colourings in the regions made up of the red lines show the distances between neurons. The dark colours symbolize large distances and light colours represent small ranges. Here, the distances reveal the tourist choosing their hotel as per their budget in tourism recommendation model. Here clustering of data has been split into two parts. Here the smaller weights are interconnected with larger weights; the darker bands represent larger weights. In Table 3, we have taken the hotels situated at Puri for tourism recommendation model as per tourist staying and lodging choice with their respective cost. 454 A. Dewangan and R. Chatterjee Fig. 3 Flowchart of tourism recommendation model 6 Tourism Recommended Model The room allocation has been made for a hotel based on the economic principles. The key monetary principles are put on pricing, executes to functions like opti- mization and forecasting and controlling rooms inventory. In this paper, we have divided into 3 parts, namely, travel agent, hotel booking agent and governmental bodies and they are reservation system, DB, and our revenue management system. They all are linked with hotel reservation agent and revenue manager. Figure 2 is for tourism recommendation model which reveals the entire tourism for puri des- tination associated with passengers visiting puri which has been simulated through MATLAB Simulink taking first come and first serve technique into consideration. Here, the important things are the maximum arrival rate and arrival gain rate which has been linked to the exponential distribution of tourist or passengers coming throughout the year from 2001 to 2011. Then FIFO (First Input and First Output) has been taken into consideration [19]. Here queue content and server utilization have been linked with waiting time with exponential distribution asso- ciated with tourist. This exponential function has been developed with passengers in between single month along each row for different years. For example, if we take Tourism Recommendation Using Machine Learning Approach 455 the case of month January, then tourist number has been increased at an exponential rate from 112 to 417. The flowchart for the tourism recommendation model is depicted in Fig. 3. Steps used in Algorithm Input: 2 parameters (queue content + server utilization) 1. Initialize maximum arrival rate and arrival gain rate (0.5, 0.6 and 0.7). 2. For each gain rate fit that to exponential distribution and pass it to FIFO queue by time-based entity generator. 3. Set FIFO. 4. Generate cost for the hotel; hi = hi + 1, if hi < m, go to (queue con- tent + server utilization). 5. Increment i = i + 1. 6. Else choose for alternative hotel package, hi = 1. 7. Compute x which represents the tourist number. 8. Update y as response and add the smallest possible error (MAPE) 9. End for. 10. Output: set of increment numbers corresponding to tourism growth rate. 7 Case Study The above algorithm has been analyzed with Figs. 2 and 4. Here ‘hi’ and ‘m’ represent the cost of hotel package and tourist passenger demand cost. If the cost of hotel package is less than tourist demand, then it will work out or else the tourist has to look for an alternative solution [20]. Here the model after the simulation has been observed with the analytical values. The arrival gain rate for queuing model has been simulated with the range interval of 0.1–0.99. So here we have taken the arrival gain rate as 0.5, 0.6 and 0.7 for tourist when they come to the hotel for booking. Accordingly, we have done forecasting. In Fig. 5, the first one has been simulated with arrival gain rate of 0.5 and the waiting time has been at first rose to 1.5 and then came to steady-state waiting for time 1 [21, 22]. Similarly in the second figure when simulated to arrival gain rate of 0.6 then waiting time rose to 2.5 and then decreased to 1.2 and then it came to a steady state of 1.5. And when in the third figure we did the simulation taking arrival rate gain of 0.7 waiting time rose to 5. Correspondingly, we have the server uti- lization keeps on increasing with an increment of waiting time. In the first figure of server utilization was first increased to 0.57 and then it comes to steady-state value of 0.5. Similarly, for second one it again increased to 0.67 and then it came to 456 A. Dewangan and R. Chatterjee Fig. 4 Layout for hotel agent, travel agent, tourist and governmental agencies (i). Waiting Time vs. Time. (ii). Waiting Time vs. Time. (iii). Waiting Time vs. Time. Fig. 5 a Waiting time versus time. b Waiting time versus time. c Waiting time versus time steady state of 0.6. And hence similarly for arrival gain rate for 0.7, again the server utilization gets increased to 0.75 and then it came to steady state of 0.7. So, the agents play a very important role in the economic development of that region. The more the tourist the more will be the budget and hence harmony between the native’s places of tourist and destination for travelling spot area will take to a friendship bond for which economic growth rate will develop. The values obtained from simulation model are shown in Table 4. Tourism Recommendation Using Machine Learning Approach 457 Table 4 Values obtained Waiting time Waiting time Waiting time from simulation model 1 0.5 0.5 1.5 0.6 0.6 2.5 0.7 0.7 8 Conclusion We have compared the above simulation taking artificial neural network and regression technique into account. But among all methods the algorithm based on neural network and self-organized map depicts the time-saving method with lots of data. And on the other side regression has helped in forecasting the tourist incoming rate in an exponential and linear manner. References 1. Zhang, G.P.: Neural networks in business forecasting, idea group inc.. In: Law, R., Pine, R. (eds.) Tourism Demand Forecasting for the Tourism Industry: A Neural Network Approach, Ch. 6 (2004) 2. Lim, C., McAleer, M.: Time series forecasts of inter-national travel demand for Australia. Tour. Manage. 23, 389–396 (2002) 3. Goh, C., Law, R.: Modeling and forecasting tourism demand for arrivals with stochastic nonstationary seasonality and intervention. Tour. Manage. 23, 499–510 (2002) 4. Law, R., Au, N.: Back-propagation learning in improving the accuracy of neural network-based tourism demand forecasting. Tour. Manage. 21, 331–340 (2000) 5. Elmaghraby, W., Keskinocak, P.: Dynamic pricing in the presence of inventory consider- ations: research overview, current practices, and future directions. Manage. Sci. 49(10), 1287– 1309 (2003) 6. Canina, L., Carvell, S.: Lodging demand for urban hotels in major metropolitan markets. J. Hospitality Tour. Res. 29(3), 291–311 (2005) 7. Gillen, D.W., Morrison, W.G., Stewart, C.: Air travel demand elasticities—concepts, issues and measurement. Technical Report, Department of Finance Canada (2004) 8. Cross, R.G., Higbie, J.A., Cross, Z.N.: Milestones in the application of analytical pricing and revenue management. J. Rev. Pricing Manage. https://doi.org/10.1057/rpm.2010.39 (2010) 9. Scholkopf, B., Smola, A.J.: Learning with Kernels—Support Vector Machines, Regulariza- tion, Optimization, and Beyond. MIT Press (2001) 10. Smola, A.J., Scholkopf, B.: A Tutorial on Support Vector Regression NeuroCOLT. Technical Report, TR-98-030 (2003) 11. Canu, S., Grandvalet, Y., Guigue, V., Rakotomamonjy, A.: SVM and Kernel Methods Matlab Toolbox, Perception Systems et Information. INSA de Rouen, Rouen, France (2005) 12. Li, G., Song, H., Witt, S.F.: Time-varying parameter and fixed parameter linear AIDS—an application to tourism demand forecasting. Int. J. Forecast. 22(1), 57–71 (2006) 13. Lim, C., Mcaleer, M.: Asian tourism to Australia. Ann. Tour. Res. 28(1), 68–82 (2001) 14. Rasmussen, C.E., Williams, C.K.L.: Gaussian Processes for Machine Learning. MIT Press (2006) 15. U.S. Lodging Industry Results, tech. rep., Smith Travel Research (2007) 16. Monthly traffic analysis, Technical Report, International Air Transport Association (2008) 458 A. Dewangan and R. Chatterjee 17. Rushmore, S.: Mid-rate extended-stay provides best return. Hotels 34(5), 42 (2000) 18. Baker, T.K., Collier, D.A.: The benefits of optimizing prices to manage demand in hotel revenue management systems. Prod. Oper. Manage. 12(4), 502–518 (2003) 19. Bellman, R.E.: Dynamic Programming, p. 862270. Dover Publications, Incorporated (2003) 20. Schwarz, Z.: Changes in hotel guests willingness to pay as the date of stay draws closer. J. Hospitality Tour. Res. 24(2), 180–198 (2000) 21. Jeffrey, D., Barden, R.R.D.: Monitoring hotel performance using occupancy time-series analysis—the concept of occupancy time-series analysis—the concept of occupancy performance space. Int. J. Tour. Res. 2(6), 383–402 (2000) 22. Benghalia, M., Wang, P.P.: Intelligent system to support judgmental business forecasting— the case of estimating hotel room demand. IEEE Trans. Fuzzy Syst. 8(4), 380–397 (2000)