(PDF) Practical machine learning based o

Practical machine learning based on cloud computing resources Cite as: AIP Conference Proceedings 2123, 020096 (2019); https://doi.org/10.1063/1.5117023 Published Online: 17 July 2019 Kyriakos N. Agavanakis, George. E. Karpetas, Michael Taylor, Evangelia Pappa, Christos M. Michail, John Filos, Varvara Trachana, and Lamprini Kontopoulou AIP Conference Proceedings 2123, 020096 (2019); https://doi.org/10.1063/1.5117023 2123, 020096 © 2019 Author(s). Practical machine learning based on cloud computing resources Kyriakos N. Agavanakis1,a), George. E. Karpetas2,b), Michael Taylor3,c), Evangelia Pappa4,d), Christos M. Michail5,e), John Filos4,f), Varvara Trachana6,g), Lamprini Kontopoulou7,h) 1. ATRINNO, Attica Research and Innovation, 11631, Athens, Greece 2. Department of Medical Physics, Faculty of Medicine, School of Health Sciences, University of Thessaly, 41110, Larissa, Greece 3. Department of Meteorology, University of Reading, RG6 6BB, UK 4. Department of Public Administration School of Economy and Public Administration, Panteion University of Social and Political Sciences, 17671, Athens Greece 5. University of West Attica, Department of Biomedical Engineering Radiation Physics, Materials Technology and Biomedical Imaging Laboratory, Ag. Spyridonos, 12210, Athens, Greece 6. Department of Biology, Faculty of Medicine, School of Health Sciences, University of Thessaly, 41500, Larisa, Greece 7. General Department, University of Thessaly, 41110, Larissa, Greece a)

[email protected]

Abstract. Machine learning is a domain highly influenced by the rapid evolution of cloud computing and has reached a maturity point where a plethora of data processing capabilities is now widely available. The aim of the present study is to investigate the potential for building a common platform to support direct end-user application of machine learning algorithms across diverse scientific areas, emphasizing not only the suitability of appropriate tools, but also how results can be disseminated and utilized in a shared data environment. Three case studies are presented: i) quality evaluation metrics for tomographic image reconstruction in positron emission tomography (PET), ii) health impacts of surface UV radiation and iii) demographic determinants that influence the perception of fraud and corruption incidents within different industry sectors. Tests showed that commercially available cloud resources are over- sufficient to consolidate results from a variety of teams and applications and are able to contribute to the build-up of a valuable shared knowledge repository. The cloud service platform exploits machine learning models and helps automate the training and prediction process. The suggested approach makes optimization more efficient and supports the transition to a more sustainable global information environment by breaking knowledge silos. Keywords: Machine learning, cloud computing, DaaS, PaaS, SaaS, big data, NoSQL, web services, PET, business ethics, health, knowledge sharing INTRODUCTION Artificial Intelligence (AI) in computer science is the area of engineering intelligent machines capable of perceiving the environment through activities such as perception, learning and reasoning and take actions that maximize their chance of success at some goal [1], [2]. It is the study of how to train the computers so that they can do things which at present human can do better. The subdomain of Machine Learning (ML) evolved from data analytics and pattern recognition – infers models from data streams, by combining their historical relations (often including hidden patterns) and their current trends. [3]. ML is the ability to acquire knowledge or skill automatically and improve from experience to maximize the performance to a certain task. Neural networks have Technologies and Materials for Renewable Energy, Environment and Sustainability AIP Conf. Proc. 2123, 020096-1–020096-16; https://doi.org/10.1063/1.5117023 Published by AIP Publishing. 978-0-7354-1863-9/$30.00 020096-1 always played a central role in ML. Inspired by the structure of the biological brain, neural networks consist of a large number of information processing units (called neurons), which work in unison, organized in layers. After the early stage over-enthusiastic misbeliefs that neural networks are acquire knowledge exclusively by example, FIGURE 1. Serverless architecture the research community lost interests because it has been impractical to train a neural network with more than a couple of layers. Gradually, ML has been evolved to a successful multidisciplinary aggregation of several areas including statistics, information theory, theory of algorithms, probability and functional analysis [4]. And, in the last decade, as a result of the technological advances and the introduction of big data and deep learning [5]–[7], ML made breakthrough and produced quite astonishing results in many application domains such as speech recognition, image recognition, image deconvolution, language translation, game playing, bioinformatics, information retrieval, content recognition, security (e.g. intrusion detection, malware detection) [8]. This evolution has been supported by the rapid advances of the Information and Communication Technologies (ICT) on one side and the explosion of Internet of Things (IoT) and social media that generates unprecedented huge amounts of data streams on the other [9]–[11]. While the scientific community is still hesitating to use and expose ML models at scale, the private sector - driven by the immediate return of investment - is already taking advantage of their achievements for difficult, if not impossible to be deduced, models. Commercial applications include retail shopping (personalized advertising, suggestions, campaigns), self-driven vehicles, language translators, b2b (supply planning, customer care insights, preventive maintenance), financial services (identification of important data insights, fraud detection), government (utilities), health care (wearable sensors, medical exams), and much more. For the rest of this paper, we are going to present the practical usage of existing and commercially available cloud infrastructure for the application of machine learning (ML) technology into pragmatic solutions. The key components of our approach are presented, along with three indicative case studies in order to verity it through end-to-end scenarios. We investigate how the research results exposure through reusable web services with global footprint favors the cooperation between teams, the integration to 3rd party applications and the evaluation feedback from the end users. 020096-2 CLOUD COMPUTING AND ENABLING TECHNOLOGIES A valuable aid to promote innovation is to have simple, easy and self-adjusted tools for experimentation, dissemination and usage of the research results [12]. The overwhelm of advanced and high-performance cloud computing resources has currently reached a maturity point that they are often under used [13]. To create efficient ML systems, the most crucial component is of course scientific expertise. But although machine learning and cloud computing are not necessarily converging, the latter is becoming an enabler for ML applications. Infrastructure as a Service (IaaS) allows to use ad hoc the huge infrastructure of a data center without being limited to team resources and capabilities. But the real advantage comes with Platform as a Service (PaaS) and Software as a Service (SaaS) models, which minimize the needs of upfront investment in terms of cost, ICT expertise and effort. They give the users (and the scientific community) a leverage to focus on their scientific field by extinguishing the manipulating distractions, while having available huge computational power and services on demand. And building on top of them, we can proceed to a Data-as-a-Service (DaaS) approach which is one of the first steps towards the development of shareable knowledge. Commercial and open source communities both support contemporary server-less architectures. They are flexible enough to scale with consistent performance and reliability in order to anticipate for the needs of computationally intensive processes within the context of ML data preparation and model training phases, as well as for the application services built on top of them. ML Platform Architecture Motivated by the above observations, we designed and developed a proof-of-concept framework that aims to be the basis of inspiration for disruptive machine learning solutions in several scientific disciplines. Within the modelling framework defined above, we consider a generic processing, training and publishing platform with specialized functions useful for the ML domain, built on top of the basic semantic middleware infrastructure, so that it leverages its capabilities for mediating the connection between end nodes. It is essentially a platform offered as a bouquet of cloud hosted web services that make extensive use of PaaS and SaaS features like workflows, big data management, ML training models, etc. Leveraging the flexibility of the cloud resources it favors global accessibility, high availability, scalability, performance, high security standards [10]. Information and communication layers are adaptable to different levels of abstraction according to the scope under consideration. Such a tool can serve to implement realistic ML scenarios with the lowest cost and upfront investment. And it enables to argue on software, methodologies, operations & procedures, guidelines, and practices, but most importantly in terms of the impact to the lifecycle of the ML-enabled systems. FIGURE 2. ML platform services A cloud-based micro-service bus architecture has been selected to build up the framework model and to encapsulate the underlying network details and controlling mechanisms. This is adequate to accommodate for the necessary communication patterns, transformations, and interfaces of distributed applications, involving multiple parties with a wide spectrum of complexity and differentiating attributes, according to their nature and the target 020096-3 functionality [14]. In such a framework of universal audience, a protocol-agnostic approach is mandatory for the higher levels of services, leaving to the basic infrastructure the elementary management of low-level protocols and middleware communication for data acquisition or upload. Besides being responsible for acquiring, modeling, training and using the results either online or offline, it also provides automation mechanisms to connect diverse systems that may span the boundaries of the organization itself [15]. Supported use cases are varying from a single source of data (e.g. a practitioner that evaluates an image) that asks for a ML assessment through a web service, up to a 3rd party application server through a connection with an aggregate service with global footprint. Built on top of the commercially available PaaS and SaaS services, and following up the latest technology developments in a stable and managed environment, deals with business process requirements over a variety of distributed and heterogeneous systems with different ownerships. With intelligent technologies, including event processing and data streaming techniques from the IoT domain, enterprise management capture, aggregate, and then analyze real-time and historical data of any variety, volume, and velocity. This allows to design layered APIs, implement integration flows, and build connectors with these low-friction development tools. APIs flexibility is the catalyst for this change, unleashing information and eliminating the friction of integration for unprecedented speed and agility. It makes possible to create more channels for new services and user experiences and empower the innovation acceleration through adapted and exposed functionality in favor of the interoperability with other systems and a wide diversity of local resources. FIGURE 3. API layers The platform provides unprecedented easiness in manipulating data sets and experimenting with the optimization of mathematical models. Besides, a wide variety of readily to be used and customized CNN models, the user can very easily embed his own. R and python scripts are currently supported for this purpose, but other languages are also possible in the future. When an optimum model is found, great attention is given on how it will be used. The aim is to make it worldwide accessible with the minimum effort and complexity and create the opportunity to be used not only as such, but also to be embedded and utilized by other applications. The key point of this approach is the capability to support the (re)usability of the trained models through web services. As it is evident, different scopes include different type of parties and interfacing and operation needs, therefore although similar in design, the implementation has been adapted appropriately for each individual case, to make possible the interoperability with the central and 3rd party systems, while coping with the wide diversity of the local resources, in terms of functionality, interfacing variety and complexity [16]. Communication patterns may include either sync or async mechanisms for individual data or group of them. Multiple channels, protocols, communication patterns and data formats supported e.g. REST and SOAP services, 020096-4 single/group messages, adaptable messages format. The Service Oriented Architecture (SOA) exposes business functionality, via a dedicated and reusable set of data-manipulating services: - Includes a customizable workflow engine, that gives full liberty to the model training procedures - Connects on-premises, hybrid and cloud applications for locally extracting the models but globally consuming their results by custom on-premises or cloud applications - Connects heterogeneous systems and technologies into unified business processes enhancing thus their functionality and exposing enriched collaborative services that involve multiple systems, supporting both Enterprise Application Integration (EAI, i.e. connecting applications within a single organization) and Business-to-Business (B2B, i.e. connecting applications in different organizations) scenarios. One great side effect is that no single application needs to be aware of the complete business process, on how the data are going to be exploited. - Dynamic Connectivity, requests are fulfilled through dynamic system workflows that may be content and context based, i.e. execution alternate paths and the corresponding APIs are selected at runtime for each endpoint based on business data and rules - By exposing their findings, researchers allow partners and users to bring in new use cases from other areas to challenge their current processes, products and solutions. To enrich their case studies with other similar cases and formulate a more complete training dataset with enhanced applicability. - Another important issue is scale, as a cloud-based platform achieves scalability at the lowest cost. This faces directly the obstacles imposed by the large investments required in effort and cost in the traditional approach. The key value in this approach is the capability to support the engagement of individual users/teams while bundling their functionality to the corresponding applications either at the local or global level. A last side-benefit of the micro services architecture is that they can directly fit in the server-less cloud applications approach, as adopted lately by all commercial providers in an effort to accelerate the development while keeping maintenance and utilization indexes as optimum and efficient as possible. Concluding, business logic and collaborative services may be built on top of (micro)services built on top of APIs, performance scalability and extensibility. And other non-functional advantages are equally important, such as security (uniform authentication/ authorization control), data ownership and integrity certification, auditability. MACHINE LEARNING PLATFORM In our proof-of-concept platform, the user is given an environment for easy process setup, experimentation and evaluation without the need to interfere with software platform related technicalities. Cloud resources are available to his own data experiment, capable of scaling according to the compute power needed without needing a support team of software infrastructure and programming experts or even dedicated in depth specialized knowledge. subsets iterations SF MTF 1 8 0.00115888 1 1 14 0.00115888 0.9993054 1 20 0.00115888 0.9994166 3 2 0.00115888 0.9987505 3 6 0.00115888 0.9993148 3 8 0.00115888 0.9993687 3 14 0.00115888 0.9994235 3 20 0.00115888 0.9994295 15 2 0.00115888 0.9992808 15 6 0.00115888 0.9993671 15 10 0.00115888 0.9993962 15 14 0.00115888 0.9994069 FIGURE 4. Training dataset sample 020096-5 FIGURE 5. ML workflow design The input data set can be inserted and used as is, but as it is true for every activity, it may also be processed and manipulated as required through available customizable processing blocks or user-defined ones. As an example, the diagram in FIGURE 5 depicts a process where specific columns are selected from the input dataset, which is subsequently split into two subsets, one meant to train the model and the other to evaluate its performance (accuracy). Then, according to the followed methodology the user decides whether to accept it or not. (a) (b) FIGURE 6. Trained model evaluation of MTF prediction for PET imaging 020096-6 Following on from this we can see how the true power of the platform comes to the surface, since it allows the trained model to be exposed in the form of a web services (FIGURE 7) that are directly usable by any third party, human or machine. Facilitating thus their direct integration to useful applications. For example, a simple web page as shown in FIGURE 8, or a mobile application or any other UI enabled module could easily provide the user with useful results. FIGURE 7. Trained model exposed as web service: REST API request This is a huge disruption from the established model where some machine or application has to include specialized programming code that makes use of the trained models and of course maintain it. A product life cycle that may span a year or more, may be decreased to a few days. FIGURE 8. Web page acting as ML service client One of the most user-centric, simple and useful ways of user interaction is the use of plug-ins for common spreadsheet applications, as shown in the example of 020096-7 FIGURE 9. This enables anyone to utilize the exposed models without having to learn another UI, pattern or any kind of computational technicality: Just by providing the necessary input data and taking back the answer inside the spreadsheet, in the most familiar and trivial way. FIGURE 9. Spreadsheet acting as ML service client CASE STUDIES Three teams from different scientific areas participated in pilot studies and contributed their experience interacting with the system in order to test its performance and functionality with real-world examples and data originating from their primary research activities:  quality evaluation metrics for the tomographic image reconstruction of positron emission tomography (PET) images  health implications on Vitamin-D metabolism and epithelial cell DNA-damage from surface UV spectral radiation  investigation of the demographic determinants influencing the perception of fraud and corruption incidents within different industry sectors The pilots are not intended to perform a component-level functionality assessment of the system, but instead are designed to test the platform and its potential for adding value under real conditions. In each case we had to:  Deploy and configure the system  Execute the pilots  Identify and apply slight modifications to the system for optimal performance  Validate the expected behavior of the entire system  Get feedback from users Case study 1: The influence of iterative reconstruction through MTF estimation using ML in PET This case study extends a previously validated Monte Carlo model [17]–[19] in order to obtain PET image resolution, by using the Gate package which is an open-source extension of the Geant4 Monte Carlo toolkit [20]. Gate combined with STIR software [21] is used to estimate the MTF by using different number of iterations and subsets in the iterative image reconstruction [22]. 020096-8 The major considerations when using iterative reconstruction algorithms, are to retain noise and to constrain processing time to acceptable levels. To achieve this, the number of subsets and iterations should be carefully selected in order to find the right compromise between resolution and noise-processing time [23]. In clinical practice, where the number of iterations is limited due to time restrictions, resolution may differ in different objects within an image, depending on their relative size and intensity [24]. The number of iterations affects both the resolution and intensity of PET in the following manner. By increasing the number of iterations, within a reasonably wide range of practically encountered clinical situations [25], resolution improves, and tumors appear brighter. However, this behavior reaches a point beyond which resolution cannot be further improved [26],[27]. For this purpose, we use a different number of subsets and iterations to train a neutral network to estimate the MTF [28], and then to let it decide and calculate from the available combinations which one is most appropriate in terms of information, complexity and time-consumption. Thus, the clinician will be able to know in advance what is the most optimal combination of image reconstruction parameters to use in system configuration before final image acquisition. Using the proposed platform, it was very easy to experiment with different models in order to find the optimum configuration for predicting the MTF. The prior knowledge that this is a sigmoid function narrowed candidate models but meant that the platform was able to facilitate the heuristics for the optimum configuration. After some experimentation in the context of proof-of-concept, we ended up selecting the boosted decision tree regression algorithm as shown in FIGURE 5. When it was fed with the training data set (Error! Reference source not found.) the system managed to achieve a coefficient of determination equal to 99.76% as shown in FIGURE 6 and was then used to predict MTF values using easy client applications as shown in FIGURE 8 (web) and FIGURE 9 (spreadsheet). Case study 2: Influence of UV dose on Vitamin-D metabolism and epithelial cell DNA- damage The appearance of skin “reddening” or “erythema” is associated with a standard dose of UV radiation equal to 100 Jm-2 as defined by ISO 17166:1999 [29]. UV radiation arriving at the ground also has other profound health implications due to its impact on vitamin D metabolism [30] and on DNA-damage in the epithelial cells of our skin [31]. In order to monitor these effects globally, the Royal Dutch Meteorological Institute (KNMI) in conjunction with the European Space Agency (ESA) have developed an operational processing algorithm [32],[33] to retrieve the erythemal UV dose (kJ m-2) at the Earth’s surface from measurements of the radiance at the top of the atmosphere taken daily by polar-orbiting satellites including GOME, GOME-2, OMI and SCIAMACHY [34]. (a) (b) FIGURE 10. CNN model performance fir bio-UV products The algorithm calculates the vitamin D UV dose and the DNA-damage UV dose (both in units of kJ m-2) by applying a windowing function (or “action spectrum”) to the UV radiance at different wavelengths, and has 020096-9 recently been validated with ground-based instrumentation at the monitoring station in the Laboratory of Atmospheric Physics of the Aristotle University of Thessaloniki, Greece. This was achieved by training a back- propagation neural network using UV radiance measurements at 5 wavelengths in the UV (at 305, 312, 320, 340 and 380 nm) taken at 1-minute intervals from a NILU-UV multi-filter radiometer plus the solar zenith angle as inputs, and modelled values of the erythemal, vitamin D and DNA damage UV doses as outputs derived from a Brewer MKIII spectrophotometer [35]. Initial simulations using the cloud-based service trained on the same data is demonstrating similar levels of precision as that reported by Zempila et al., (2017) using a back-propagation neural network [35]. For vitamin D UV dose, the Coefficient of Determination R2 = 0.995 obtained with the cloud service compares well with the value of R2 = 0.990 obtained in [35]. Case Study 3: Business ethics and work place environment In a period of growing interest on business ethics, the use of artificial neural networks is a novel interdisciplinary approach to fraud and corruption detection [36],[37],[38]. The purpose of the case study is to investigate the demographic determinants influencing the perception of business ethics, in the light of fraud and corruption incidents, within different industry sectors. The major research instrument is a structured questionnaire developed by [36],[40]. A 5-point Likert rating scale was used for the measurement of all items from Strongly Disagree (5) to Strongly Agree (1). Primary data were selected from a randomly selected sample of individuals working in the Greek private sector. The responses are based on work carried out as part of a wider research [41],[42]. The final acquired data set compromises of 1000 individuals. The input variables (14 items) are described as follows: Demographic Information Age; Education level; Code of Ethics; Industry Fraud Incidents Long private telephone calls Surfing the internet for private purposes during working hours Taking company resources home from private use Taking the credit of other people's work Misappropriation of monies received Using insider knowledge for trading in shares and other securities Theft of company resources/embezzling funds Not performing duties in exchange for money or favors from someone Corruption Incidents Nepotism: favoring friends and family from outside the organization Using expense claims unethically FIGURE 11. Input variables for the “Business ethics and work place environment” case study Subsequently we tested the application of ML training, in order to detect and model the dependencies between them. Using a Multi-class neural network classification algorithm, the best results were obtained as shown in the evaluation below, with an overall accuracy of 77% and average accuracy of 90.8%. 020096-10 FIGURE 12. Multiclass Neural Network Classification model evaluation FINDINGS AND DISCUSSION Knowledge sharing The pilot case studies revealed that a platform enabling the practical usage of state-of-the-art machine learning processes without expertise in the supporting activities, is a great tool for the ongoing research and for its exploitation through the exposure of the trained models by web services. However, they also revealed the implications of what could be called as information silos, where activities and results are in the sole availability of the owning group separate although there are multiple teams working in parallel in similar matters. Take for example the first case study: Successful models are extracted from the knowledge and accumulated experience of the medical practitioners. (a) (b) FIGURE 13. Information silos and knowledge dissemination Although providing very promising results, their usability is restricted to the very specific conditions and experiment configuration of the training data. In this case, only 1 out of multiple factors has been investigated, for a specific machine type, of a particular model, under explicit configuration and conditions. Training data are subject to Quantitative factors e.g. Contrast Noise Ratio (CNR), Point Spread Function (PSF), Line Spread Function (LSF), Edge Spread Function (ESF), As well as Qualitative ones related to the medical context: Patient’s movement (typical exam: 30’) , Patient’s anatomical characteristics (e.g. body type) and fat (thin, normal, obese), PET scanner operation mode (2D, 3D), clinical context and type of study, imaging modality and acquisition technology (e.g. PET scanner operation mode 2D/3D). And on top of this, there are many - sometimes 020096-11 controversial - metrics that can be defined to assess the quality of a medical image, in respect to the specific interest of the user. As an example, it may be dependent on Quantitative factors such as Modulation Transfer Function (MTF), Information Content (IC), Normalized Noise Power Spectrum (NNPS), Detective Quantum Efficiency (DQE) etc. The same is true for different experiment parameters for the other case studies as well. And similar restrictions on the findings’ usability have been identified in the other use cases, as their results are also dependent on the geo-location of the studied places, the market segment, etc. Independently from the research process, the final assessment must be extracted from the knowledge and accumulated experience of the experts. The problem is that teams working in similar areas (or even in the same one) cannot easily combine their work results. World-wide online users in the social and gaming platforms do share data, communicate and collaborate as a daily practice, but scientists are not always that open yet. And they do not have the necessary tools and common practices to do it. Move from information silos, to a mentality of collaborating ecosystems, will make possible the exchange of complementary services and/or technologies for the mutual benefit of the involved parties. In such an approach, cooperative eco-systems will share knowledge across their boundaries, while keeping their privacy, integrity and ownership. Therefore, in order for the results to be broadly useful, similar experiments have to be repeated for several different configurations/combinations, whether that is related to experiment conditions, geo-location or whatever else parameter is applicable. In this way we can built a mentality of collaborating ecosystems, of exchanging complementary services and/or technologies for the mutual benefit of the involved parties. In such an approach, cooperative eco-systems will share knowledge across their boundaries, while keeping their privacy and integrity. By providing external users and teams with access, the benefit is twofold: not only they can use them directly (e.g. using the predictive models results for their own purpose), but they also have the opportunity to upload their own datasets and combine them with the original ones. Contributing thus to the creation and optimization of more generic and useful models. Furthermore, there is an additional advantage related to the innovation time-to-market: commercial products life cycle usually takes a lot to incorporate new technology advances and transform them to usable knowledge and functionality that give some added value to the end users. E.g. in the PET case study, we need similar datasets for a wide variety of the influencing factors (PET configurations, energy, model, type) in order to have universally useful data set that can be incorporated by industry manufacturers to a real product and service the global population. The same is true for the bio-UV products research, which could be used by e.g. smart wearable devices that provide alerts that can help protect users against UV-radiation exposure. Data integrity and ownership Another issue that prevents cooperation, has been recognized the need of a generally accepted way to confirm and guarantee the ownership of the data as well as their integrity. The contribution of the proposed approach in this subject, is multifold. First, the training data are not needed after the extraction of the model. This means that they can either be processed at the cloud, leveraging all the mass resources availability and removed then, so that they are kept private, being in the same time usable (through the extracted model) to the rest of the world. This is a first step towards a closer collaboration. Second, the data may be processed at the edge, at a private cloud instance or on-premises, under the control of the owning team. And then, they can provide the trained model to the central oriented services. Third, the cloud platform could digital sign the uploaded data in order to guarantee not only their integrity, but the intellectual property as well. This is in fact a continuously emerging issue, that worries the researchers worldwide. And the suggested approach is aligned with some other efforts in this domain, such as the Bloxberg project (https://bloxberg.org), an initiative formed by 12 research organizations from 10 countries and led by the Max Planck Society. Still in the pilot phase, the initial focus is to allow researchers to publish articles and create a transparent footprint of their work, without necessarily revealing its content. However the ultimate goal is to “change the way scientific data is managed, scientific results are communicated, and scientists collaborate” [43]. Knowledge sharing in the EU European Union has early identified the need of effective knowledge sharing and tried to impose explicit rules in respect to the EU funding, to ensure not only that publicly funded research data are publicly available and usable, but also that the scientific community assembles common grounds for cooperation, while preserving the projects’ data integrity and intellectual properties [44]. However, although mandatory, the effort is not yet fruitful. Most of the times the related activities are practiced by projects stakeholders just typically to satisfy the imposed requirements and fail to provide a practical contribution [45]. This weakness in knowledge distribution and information sharing is resulting in insufficient use of resources, the wasting of resources, lack of innovation 020096-12 potential and missed opportunities [46]. Taking full advantage of existing technology to break boundaries means taking advantage of the considerable benefits the digital revolution has to offer [47], aligned with the statement that “in an open, peaceful democracy, knowledge shared is power multiplied” [48]. Our approach favors organizationally and architecturally a cross-sectoral approach through the bundle of tools and processes into reusable services in favor of innovation acceleration by leveraging knowledge from multidiscipline and multilateral areas [49]. Combining them to a shared reusable knowledge repository, based on the fundamental advantages of existing, matured, highly scalable and easy-to-be-integrated cloud services. Instead of working in knowledge silos, we have the means to form dynamic ecosystems, where products, services or technologies developed by one, serve as foundations upon which others can build complementary ones. The outcome is clear: reduced costs, mitigated risk, streamlined compliance, improved collaboration and increased operational efficiency. A positive side-effect is that this availability can be evolved to help pressure the global scientific community to better coordinate, facilitate and build products and services leveraging complementary technologies. CONCLUSION In this study the practical aspects and capabilities of ML services have been explored, building a pragmatic cloud-based solution that provides efficient integration in a scalable infrastructure for shared knowledge without exposing the data themselves or putting the researchers’ intellectual properties rights at risk. A service-bus architecture is employed in order to mediate the connection between distributed nodes, supported protocols and middleware technologies. It also makes it possible to offer PaaS and DaaS global services in the form of loose coupled interfaces. Overall, the platform acts as a test bench where ML and other computational techniques can be tested and optimized. Researchers get full accessibility to scalable ML resources without the need for upfront investment, not only in terms of infrastructure availability and cost, but also in terms of ICT know-how and effort. With the flexibility of remote access, they can work undistracted to develop complete and functional ML prototypes using their problem-area expertise to test, train and select the optimum models. And subsequently, they can expose them as web services to predict results and serve similar needs with unprecedented operational robustness and scalability. The cloud-based approach can be integrated over a wide variety of distributed and heterogeneous systems. It favours the direct use of 3rd party models through a dedicated and reusable set of services, that can be easily integrated to end users’ applications e.g. web sites, mobile apps, desktop applications, social media apps, etc. It may also be used to accept complementary datasets from other working groups allowing original training datasets to be enriched with additional variables and parameters from the problem domain. This may also be applied to the acquisition of evaluation feedback whereby end-user data may be used for reinforcement learning and continuous improvement. Another important feature is that model predictions can be provided without exposing the input data that trained the ML model, thus preserving the owners’ intellectual property rights. Data integrity may also be preserved by digitally signing and securing data. By applying the suggested approach to multiple scientific fields, it has been shown how this approach helps to establish a creative environment for experimentation and exploitation that breaks knowledge silos, builds higher confidence levels between cooperating teams, boosts innovation, and contributes to a more sustainable environment. ACKNOWLEDGEMENTS This research has been partially supported at the administrative and research levels by ATRINNO. All contributors worked unfunded in a personal capacity and declare no conflict of interest. MT would like to thank VT for useful discussions on the clinical impacts of UV on DNA damage, the National Network for the Measurement of Ultraviolet Solar Radiation (uvnet.gr) for permission to re-use ground-based NILU-UV radiance measurement data and associated UV dose data obtained from a Brewer MKIII spectrophotometer, and KNMI / ESA for the provision of UV radiation monitoring product data used in the validation study of in [35]. 020096-13 REFERENCES [1] J. McCarthy, “What is AI? / Basic Questions.” [Online]. Available: http://jmc.stanford.edu/artificial- intelligence/what-is-ai/index.html. [Accessed: 08-May-2019]. [2] S. Bhalla, S. Singh, and S. Bhalla, “Artificial intelligence and expert systems,” no. 04, p. 9, 2015. [3] “Machine Learning: What it is and why it matters | SAS.” [Online]. Available: https://www.sas.com/en_us/insights/analytics/machine-learning.html. [Accessed: 07-Apr-2019]. [4] A. Munoz, “Machine Learning and Optimization,” p. 14. [5] “Multi-Valued and Universal Binary Neurons - Theory, Learning and Applications | Igor Aizenberg | Springer.” [Online]. Available: https://www.springer.com/gp/book/9780792378242. [Accessed: 07-Apr- 2019]. [6] H. Wang and B. Raj, “On the Origin of Deep Learning,” arXiv:1702.07800 [cs, stat], Feb. 2017, arXiv: 1702.07800. [7] “Some Studies in Machine Learning Using the Game of Checkers - Semantic Scholar.” [Online]. Available: https://www.semanticscholar.org/paper/Some-Studies-in-Machine-Learning-Using-the-Game- of-Samuel/26337762b9b06c7d8a952bebab6408a5e7f9935d. [Accessed: 07-Apr-2019]. [8] L. Zhang, S. Wang, and B. Liu, “_ Deep learning for sentiment analysis: A survey,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 8, no. 4, p. e1253, Jul. 2018, doi: 10.1002/widm.1253. [9] P. Ongsulee, “Artificial intelligence, machine learning and deep learning,” in 2017 15th International Conference on ICT and Knowledge Engineering (ICT&KE), Bangkok, 2017, pp. 1–6, doi: 10.1109/ICTKE.2017.8259629. [10] P. G. Papageorgas, K. Agavanakis, I. Dogas, and D. D. Piromalis, “IoT gateways, cloud and the last mile for energy efficiency and sustainability in the era of CPS expansion: ‘A bot is irrigating my farm.. ,’” presented at the TECHNOLOGIES AND MATERIALS FOR RENEWABLE ENERGY, ENVIRONMENT AND SUSTAINABILITY: TMREES18, Beirut, Lebanon, 2018, p. 030075, doi: 10.1063/1.5039262. [11] Ch. Hatzigeorgiou, K. Bosioli, P. Papageorgas, K. Agavanakis, “A Low-cost Lightning sensing network using cloud technology,” presented at the eRA-10 International Scientific Conference, Piraeus, Greece. [12] Agavanakis K, Panagakis G, Flamis G, and Koutroubinas S, “HW/SW Design considerations, on porting a fuzzy logic library to embedded systems, with application in the replacement of lost neural activity through functional electrical neuromuscular stimulation,” presented at the eRA-5 International Scientific Conference, Piraeus, 2010. [13] K. Agavanakis, P. G. Papageorgas, G. A. Vokas, D. Ampatis, and C. Salame, “Energy trading market evolution to the energy internet a feasibility review on the enabling internet of things (IoT) cloud technologies,” presented at the TMREES18, Technologies and Materials for renewable energy, environment and sustainability, Beirut, Lebanon, 2018, p. 030077, doi: 10.1063/1.5039264. [14] K. Agavanakis, K. Sakellarakis, and S. Koutroubinas, “Moving Intelligent Energy applications upwards: A customer oriented cloud solution,” in The 1st IEEE Global Conference on Consumer Electronics 2012, Tokyo, Japan, 2012, pp. 607–611, doi: 10.1109/GCCE.2012.6379928. [15] K. Thrampoulidis and K. Agavanakis, Wisdom of the Gurus, Editor: Charles Bowman. ch.7, “Object Interaction Diagram, a new technique in OO Analysis and Design”,CAMBRIDGE-SIGS publications, reprinted from: Journal of Object -Oriented Programming, 1996. [16] K. Agavanakis, G. Panagakis, N. Tatlas, and S. Koutroubinas, “Hardware virtualization for rapid and secure CE product development and life cycle management,” in 2011 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 2011, pp. 855–856, doi: 10.1109/ICCE.2011.5722902. [17] Karpetas, George & Michail, Christos & Fountos, George & Valsamaki, Pipitsa & Kandarakis, Ioannis & Panayiotakis, George., “Towards the optimization of nuclear medicine procedures for better spatial resolution, sensitivity, scan image quality and quantitation measurements by using a new Monte Carlo model featuring PET imaging,” Hell J Nucl Med., 2013, doi: 10.1967/s002449910082. [18] C. M. Michail et al., “A novel method for the optimization of positron emission tomography scanners imaging performance,” Hell J Nucl Med, vol. 19(3), pp. 231–240, 2016, doi: 10.1967/s002449910405. [19] Michail C, Agavanakis K, Karpetas G, I. G. Valais, I. S. Kandarakis, G. S. Panayiotakis, and G. P. Fountos, “Information Content in Nuclear Medicine Imaging,” Energy Procedia, vol. 157, pp. 1517– 1524, Jan. 2019, doi: 10.1016/j.egypro.2018.11.317. [20] “OpenGATE Collaboration,” Users Guide V6.1. [21] K. Thielemans et al., “STIR: software for tomographic image reconstruction release 2,” Physics in Medicine and Biology, vol. 57, no. 4, pp. 867–883, Feb. 2012, doi: 10.1088/0031-9155/57/4/867. 020096-14 [22] G. E. Karpetas et al., “Detective quantum efficiency (DQE) in PET scanners: A simulation study,” Applied Radiation and Isotopes, vol. 125, pp. 154–162, Jul. 2017, doi: 10.1016/j.apradiso.2017.04.018. [23] C. J. Jaskowiak, J. A. Bianco, S. B. Perlman, and J. P. Fine, “Influence of reconstruction iterations on 18F-FDG PET/CT standardized uptake values,” J. Nucl. Med., vol. 46, no. 3, pp. 424–428, Mar. 2005. [24] J. S. Liow and S. C. Strother, “The convergence of object dependent resolution in maximum likelihood based tomographic image reconstruction,” Phys Med Biol, vol. 38, no. 1, pp. 55–70, Jan. 1993. [25] C. Riddell et al., “Noise reduction in oncology FDG PET images by iterative reconstruction: a quantitative assessment,” J. Nucl. Med., vol. 42, no. 9, pp. 1316–1323, Sep. 2001. [26] M. Soret, S. L. Bacharach, and I. Buvat, “Partial-Volume Effect in PET Tumor Imaging,” Journal of Nuclear Medicine, vol. 48, no. 6, pp. 932–945, Jun. 2007, doi: 10.2967/jnumed.106.035774. [27] C. Michail et al., “Information Capacity of Positron Emission Tomography Scanners,” Crystals, vol. 8, no. 12, p. 459, Dec. 2018, doi: 10.3390/cryst8120459. [28] G. E. Karpetas, C. M. Michail, G. P. Fountos, I. S. Kandarakis, and G. S. Panayiotakis, “A new PET resolution measurement method through Monte-Carlo simulations:,” Nuclear Medicine Communications, vol. 35, no. 9, pp. 967–976, Sep. 2014, doi: 10.1097/MNM.0000000000000151. [29] “Erythema reference action spectrum and standard erythema dose,” ISO - Online Browsing Platform (OBP), vol. ISO 17166:1999(en). . [30] Internationale Beleuchtungskommission, Ed., Action spectrum for the production of previtamin D3 in human skin. Vienna: CIE Central Bureau, 2006. [31] R. B. Setlow, “The Wavelengths in Sunlight Effective in Producing Skin Cancer: A Theoretical Analysis,” Proceedings of the National Academy of Sciences, vol. 71, no. 9, pp. 3363–3366, Sep. 1974, doi: 10.1073/pnas.71.9.3363. [32] van Geffen, J., van der A, R., van Weele, M., Allaart, M., and Eskes, H, “Surface UV radiation monitoring based on GOME and SCIAMACHY,” in Proceedings of the ENVISAT & ERS Symposium, Salzburg, Austria, 2004, vol. SP 572, 8 pp., ESA, Paris, 2004. [33] M. van Weele, R.J. van der A, J. van Geffen and R. Roebeling, “Space-based surface UV monitoring for Europe using SCIAMACHY and MSG,” in roceedings of the Proceedings of the 12th SPIE International Symposium on Remote Sensing, Bruges, Belgium., 2005. [34] J. van Geffen, M. van Weele, M. Allaart, and R. van Der A, “TEMIS UV index and UV dose operational data products, version 2.” Royal Netherlands Meteorological Institute (KNMI), 2017, doi: 10.21944/temis-uv-oper-v2. [35] M. Zempila, van Geffen, J. H., Taylor, M., Fountoulakis, I., Koukouli, M. E., van Weele, M., Bais, A., Meleti, C., Balis, D, “TEMIS UV product validation using NILU-UV ground-based measurements in Thessaloniki, Greece,” Atmospheric Chemistry and Physics, vol. 17, no. 11, pp. 7157–7174, Jun. 2017, doi: 10.5194/acp-17-7157-2017. [36] M. Krambia‐Kapardis, C. Christodoulou, and M. Agathocleous, “Neural networks: the panacea in fraud detection?,” Managerial Auditing Journal, vol. 25, no. 7, pp. 659–678, Jul. 2010, doi: 10.1108/02686901011061342. [37] J. Perols, “Financial Statement Fraud Detection: An Analysis of Statistical and Machine Learning Algorithms,” AUDITING: A Journal of Practice & Theory, vol. 30, no. 2, pp. 19–50, May 2011, doi: 10.2308/ajpt-50009. [38] M. C. Sorkun and T. Toraman, “Fraud Detection on Financial Statements Using Data Mining Techniques,” Intelligent Systems and Applications in Engineering, no. 5(3): 132-134, 2017. [39] D. L. McCabe, L. K. Trevino, and K. D. Butterfield, “The Influence of Collegiate and Corporate Codes of Conduct on Ethics-Related Behavior in the Workplace,” Business Ethics Quarterly, vol. 6, no. 4, p. 461, Oct. 1996, doi: 10.2307/3857499. [40] M. Kaptein, “Business ethics in the Netherlands: a survey,” Business Ethics: A European Review, vol. 12, no. 2, pp. 172–178, Apr. 2003, doi: 10.1111/1467-8608.00316. [41] E. Pappa and J. Filos, “Embedding and Implementing Code of Ethics: An Exploration of the Greek Private Sector,” presented at the 5th International Scientific Conference of Institute of Humanities and Social Sciences (IAKE2019), Haraklion, Crete, Greece, 2019. [42] E. Pappa, J. Filos and V. Pirkatis, “Ethics Management in the Educational Units Workplace,” presented at the 1st International Congress on Management of Educational Units (ICOMEU 2018), Thessaloniki; Greece, 2018. [43] bloxberg, “https://bloxberg.org.” [44] F. Fusaro, “Sharing knowledge in a digital age, conclusions of a strategic workshop,” Burssels, European Commision’s Information & Society DG, Feb. 2011. [45] “Digital Transition | FUTURIUM | European Commission.” [Online]. Available: https://ec.europa.eu/futurium/en/digital-transition. [Accessed: 06-Apr-2019]. 020096-15 [46] “Better Knowledge - Action 10- Building innovation and dissemination accelerator - FUTURIUM - European Commission.url.” . [47] “‘Opening up Education’ - Making the 21th century classroom a reality | Digital Single Market.” [Online]. Available: https://ec.europa.eu/digital-single-market/node/67636. [Accessed: 06-Apr-2019]. [48] “Truth, trust and democracy: in a digital world, is knowledge still power? | European Parliamentary Research Service Blog.” . [49] K. Agavanakis, T. Antonakopoulos, and V. Makios, “On Applying Fuzzy Sets in the Evaluation Process of Object-Oriented Supporting CASE Tools,” in Proceedings of the 21st EUROMICRO Conference (EUROMICRO 95), Como, Italy, 1995, University of Patras, 26500 Patras, Greece. 020096-16

(PDF) Practical machine learning based on cloud computing resources