(PDF) Improving Performance of Data Extr

International Journal of Scientific Research in Science, Engineering and Technology Print ISSN - 2395-1990 Online ISSN : 2394-4099 Available Online at : www.ijsrset.com doi : https://doi.org/10.32628/IJSRSET2310631 Improving Performance of Data Extracts Using Window-Based Refresh Strategies Swethasri Kavuri, Suman Narne Independent Researcher, USA ARTICLE INFO ABSTRACT This research paper investigates the application of window-based refresh Article History: strategies to enhance the performance of data extracts in large-scale data Accepted: 09 Oct 2021 management systems. Traditional extract, transform, load (ETL) processes Published: 20 Oct 2021 often struggle with the increasing volume and velocity of data in modern environments. Window-based refresh strategies offer a promising solution by focusing on specific subsets of data during each refresh cycle. This study Publication Issue : examines various window-based techniques, including time-based, size- Volume 8, Issue 5 based, and hybrid approaches, and evaluates their effectiveness in September-October-2021 improving extract performance. Through extensive analysis and empirical testing, we demonstrate that window-based strategies can significantly Page Number : 359-377 reduce processing time and resource utilization while maintaining data consistency and integrity. The paper also explores optimization techniques, challenges, and future research directions in this field. Keywords: Data extracts, Window-based refresh, ETL optimization, Data warehousing, Big data, Performance tuning, Incremental updates I. INTRODUCTION data, and integrating it in a single format suitable for analysis and reporting. 1.1 Background As the volume of data explodes, keeping up the With the very big data advent or concept, accelerating demand for real-time or near-real-time organizations continue to face the challenge of managing and analyzing large-scale information on access to data is challenging with traditional ETL. Full data extracts, whereby entire datasets are copied in time. The success of data warehouses and business each refresh cycle, have become economically intelligence systems relies heavily on timely and impractical for many organizations due to time and accurate extraction of data from different sources. It is resource considerations. This has created a growing for this reason that the ETL process represents an need for more efficient and scalable approaches to data indispensable part of these systems responsible for collecting data from various sources, cleaning up the extraction and refresh strategy. Copyright © 2024 The Author(s) : This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/) 359 Swethasri Kavuri et al Int J Sci Res Sci Eng Technol, September-October-2021, 8 (5) : 359-377 1.2 Problem Statement • Assess the performance benefits derived from In fact, the critical problem with data extract applying different types of approaches based on performance is the trade-off between up-to-date data window-based forms as opposed to conventional and the computational and temporal costs involved in full and incremental extracts. processing big datasets. Of course, full extracts ensure • Determining appropriate window configurations complete data consistency, but they frequently involve and adaptation strategies to various data scenarios unnecessary processing of unchanged data and can cause significant delays in data availability. and business requirements. Evaluating the scalability and reliability of • Incremental extracts focused only on changed data window-based refresh strategies in large-scale may seem pretty complex to implement and would data environments. probably miss many vital changes in data if not • Investigating types of optimizations besides properly designed. potential future improvement opportunities, Key problems addressed by this research which might be useful to further enhance the 1. efficiency of data extracts Reducing the time and resource utilization in data extraction without denting data integrity 2. Minimizing extract processes' impact on source II. LITERATURE REVIEW systems and network bandwidth 3. 4. Having assured data consistency and completeness despite very high changes in 2.1. Fundamentals of Data Extract Data extraction is one of the primary elements in the datasets. ETL process that constitutes the backbone of the data Configurability of extract strategies concerning warehousing the heterogeneous data change rates and trends Effective data extraction is the basis of quality data and existing in different sources consistence through the pipeline, according to Kimball and business intelligence system. and Ross (2013). It incorporates all the activities involved in the process of extraction of data from source systems: operational databases, external APIs, flat files, and many others with structured or semistructured data. Vassiliadis and Simitsis (2009) provide an overview that summarizes data extraction techniques into two broad categories: full extracts and incremental extracts. Full extracts are essentially copies of the entire dataset 1.3 Research Objectives from the source system per each cycle of the refresh This paper shall be devoted to assessing the efficiency phase. This kind of approach is totally complete but of window-based refresh strategies related to the issues described above. The primary research goals are: highly impractical when data volumes are raised to the sky. Vassiliadis and Simitsis notice that the full extracts • Propose a general framework with which to apply can pose a significant performance problem since they window-based refresh strategies during the data cause higher infrequent or localized data change extract process. scenarios. Incremental extracts extract only the data that differs from the previous time since extraction. For Rainardi International Journal of Scientific Research in Science, Engineering and Technology | www.ijsrset.com 360 Swethasri Kavuri et al Int J Sci Res Sci Eng Technol, September-October-2021, 8 (5) : 359-377 (2008), incremental extracts make processing much change tracking at the source systems is faster and require less usage of resources. Nevertheless, unreliable. he identifies certain difficulties in implementing 2. Incremental refresh: According to Golfarelli and reliable change tracking mechanisms for complex data Rizzi (2009), the incremental refresh strategy is environments in case of lots of interconnected systems. applicable only in cases where data changed or El-Sappagh et al. (2011) presented a review of ETL newly added is updated to the target system. processes in data warehousing, supporting an effective data extraction strategy. There are several key factors Again, they emphasized the efficiency benefits of the strategy but underlined the requirements of that influence the choice of extraction methodology: powerful volume, change frequency, source system capabilities, general, incremental refreshes rely mostly on and business requirements for data freshness. timestamps, version numbers, or CDC techniques Table 1 summarizes the key characteristics of full and that identify modified records. incremental extracts: 3. Slowly change Changing tracking mechanisms. Dimensions (SCD): In The Incremental approach of this technique specially fits the Extract dimensional data warehouse for handling changes Complete Changed data which are introduced in attributes over time. dataset only According to Kimball and Ross (2013), SCD has Processing Time Longer Shorter Resource Usage Higher Lower been divided into various categories. Each category of SCD maintains history differently: Implementation Low High [1] Type 1 - This type removes the old value Characteristic Data Coverage Full Extract completely and replaces it with the new Complexity Change Tracking No Yes [2] Type 2 - Add a new record each time, which Required Data Consistency High value while losing history. Moderate Guarantee maintains history. [3] Type 3: Add new columns for historical values that accept a maximum number of 2.2. Classic Refresh Strategies changes. Refresh strategies describe when and how data is [4] Type 4: Store current values in the main refreshed in the target system. According to Kimball dimension table and historical values in an and Ross, as mentioned earlier, there exist several additional history table. classic refresh strategies, each having pros and cons: 1. Periodic full refresh: Here, the target dataset is In 2009, Jörg and Dessloch addressed an in-depth replaced entirely by importing a fresh, complete maintenance. They proposed a refresh approach, extract of the source. As this is the most effective classifying and evaluating different approaches in method for complete data consistency, its usage terms of deciding factors relating to data freshness, query performance, and maintenance overhead. turns really expensive in terms of time and analysis of incremental strategies for data warehouse resources, and primarily with huge datasets. Kimball and Ross state that a full refresh is applied where data integrity is of absolute importance, or International Journal of Scientific Research in Science, Engineering and Technology | www.ijsrset.com 361 Swethasri Kavuri et al Int J Sci Res Sci Eng Technol, September-October-2021, 8 (5) : 359-377 Using its mesh-joining approach, which is called MESHJOIN (Mesh Join), it uses a window-based algorithm to join high-volume streaming updates with master data efficiently. Very promising performance could indeed be demonstrated, especially in huge data streams compared with traditional approaches. In 2003, Golab and Özsu give a comprehensive survey of techniques in data stream management-including window-based processing. Several types of windows This chart compares full refresh, incremental refresh, are briefly discussed, and application scenarios are and window-based refresh strategies across three given for each type of window-sliding, tumbling, and metrics: processing time, resource usage, and data landmark windows. freshness. The chosen refresh strategy for the data warehouse has Naeem et al. (2011) proposed an adaptive window- vital implications for performance and functionality. environments for processing data streams. They have Such trade-offs between the level of freshness in data provided a technique for dynamic window sizing based and query processing capabilities are discussed by Jarke on system resource availability and data characteristics. et al. (2003), which state how a higher refresh Their method shows improved performance and better usage of resources compared to fixed-size window frequency provides higher currency in data with adverse implications on the complex analytical query based approach that deals with resource-constrained approaches. processing. Recently, window-based techniques were applied to 2.3. Window-Based Techniques in Data Management data extract and refresh processes. Polyzotis et al. (2007) The window-based approach has been recognized to proposed the "delta extraction" approach using sliding bring real power for managing and processing large- windows. This approach achieves efficient incremental update with a bounded memory footprint. This seems scale datasets, especially in scenarios where one has to process continuous data streams or frequent updates. to be an ideal approach when full change tracking is These techniques have their roots in stream processing either infeasible or resource-intensive. systems but have since been adapted to various data To better elaborate on the sliding window concept in management contexts, including data warehousing data processing, let's take this simple time-based and ETL processes. Babcock et al. (2002) invented the concept of a sliding example pseudocode for a sliding window. window approach to handling continuous queries over data streams. It leads to the extension of ideas and concepts regarding windows to batch-processing contexts. The authors discussed several window models: time-based and tuple-based windows, and demonstrated how these can be used for approximating infinite streams in finite memory. Jin et al. (2010) went further to expand window-based techniques into the domain of ETL processes and even developed a framework for real-time data warehousing. International Journal of Scientific Research in Science, Engineering and Technology | www.ijsrset.com 362 Swethasri Kavuri et al Int J Sci Res Sci Eng Technol, September-October-2021, 8 (5) : 359-377 of large datasets into workable chunks that can be well This straightforward example illustrates the principle processed as compared to traditional methods in terms of a sliding window: data enters the window, and old of efficiency and flexibility. data is ejected as the window "slides" forward in time. Window-based techniques have an excellent advantage with respect to data extracts and refreshes: 1. Less processing time: Window-based approaches can significantly reduce the amount of data that needs to be processed since every refresh cycle targets a specific subset of data. 2. Better utilization of system resources: Windowbased techniques will allow for the efficient use of 3. 4. system resources since it limits the amount of data In this framework, the data window is the core concept kept in memory at any given time. that can be defined over intervals of time, records, or Ease of adapting window-based approaches certain characteristics of data. The windows create toward shifting data patterns: window-based logical boundaries within the dataset, meaning that it strategies can easily be adapted to handle different is possible to process specific subsets of data during types of velocities and update frequencies in different sources of data. each cycle of refresh. Window-based strategies can Improved real-time processing of near real-time without losing the property of consistency and data: Since window-based techniques break down completeness of data over time since they are limited a stream of data into more manageable chunks, by the scope of each refresh operation. The sliding or rolling window also comes with the more frequent updates can be made against the heavily reduce processing time and resource utilization target system. Window-based refresh strategies will probably play a window-based method, which is a shifting or rolling critical role in optimizing extract and refresh processes throughout the whole dataset. End. The sliding as data volumes increase and the necessity of real-time window concept is very useful in scenarios where the analytics grows. The following sections describe specific window-based refresh strategies and some of data comes in continuous streams or has very frequent updates, as it relies on near-real-time data processing their implementation considerations in greater detail. and avoids delay between the generation of the data for the boundaries of the processed subset of data and its availability in the target system. III. WINDOW-BASED REFRESH STRATEGIES 3.1 Conceptual Framework 3.2 Types of Window-Based Refresh Strategies 3.2.1 Time-Based Windows Time-based windows will categorize subsets of data Window-based refresh strategies fall into the hybrid category, hence integrating parts of full and based on time considerations. In particular, this will be incremental extracts. These strategies function on the component or where data freshness is heavily principle of processing the entire data in pre-defined mandated. On a time-based window strategy, data will "windows," or "subsets" of the overall dataset. The be extracted and processed according to specific time windows-in this case, by hour, day, or week. Through conceptual frame under which the window-based refresh strategy is developed is based upon the division useful when datasets have a strong temporal International Journal of Scientific Research in Science, Engineering and Technology | www.ijsrset.com 363 Swethasri Kavuri et al Int J Sci Res Sci Eng Technol, September-October-2021, 8 (5) : 359-377 the adjustment of window sizes, one may balance within a single warehousing environment, optimize between freshness of data and efficiency in processing. refresh strategies based on the aggregation of different The major strength of time-based windows is the criteria. natural coincidence with business processes and 3.3 Implementation Considerations reporting cycles. For example, a retail company will While implementing window-based refresh strategies, create daily time-based windows to refresh the sales careful consideration of a number of factors will be data so that all transactions for the current day are processed and ready for analysis before the start of the necessary to achieve optimum performance and data integrity. Key implementation considerations will next business day. Size-based windows also readily include: allow for historical analysis and even trending by (1) Optimization of the Window Size: Depending on creating logical partitions within the dataset. the balance between processing efficiency and 3.2.2 Size-Based Windows fresh data requirements, an appropriate window Windows define the data subset based on the number size should be determined. In general, larger of records or volume of data. This is very useful with windows tend to minimize total processing variable or unpredictable data generation rates. The overheads but introduce larger delays associated advantage in this method is that the refresh cycle with making data available. On the other hand, always works with the same amount of data - smaller windows introduce frequent updates but regardless of how long it may have taken since the last refresh. increase processing overheads since refresh cycles are more frequent. Another advantage of size-based windows is their (2) Overlap and boundary management: Good consistent performance across refresh cycles. Here, window boundary management will avoid losing organizations can better predict the number of or duplicating data in the window. If overlap resources to be consumed for data refresh operations between by processing a fixed number of records in each cycle. Size-based windows are also beneficial when data checkpointing of mechanisms is used, then consistency of data may be maintained between completeness within a particular subset is more window refresh cycles. important than temporal alignment. (3) Change adjacent Tracking windows is Mechanisms- used A or good 3.2.3 Hybrid Windows mechanism of change tracking is required to Hybrid windows combine more than one criterion to identify which of the data elements need to be define subsets of data, which increasingly involve processed in every window, considering the time-based as well as size-based criteria. This approach capability of the source system, the mechanisms is inherently more flexible and can apply at each for change data capture, or even timestamp-based situation to the specific business requirements and approaches to identify modified or new records. characteristics of the data. For instance, hybrid (4) Consistency and Integrity of the Data Across window strategy may detail the strategy that only the windows that meet both maximum time criterion and Window Boundary: This should also assure referential integrity in the target system. This will maximum record count cause a refresh cycle. likely be gained through transaction management Hybrid windows are especially valuable in multi- strategies or staging areas to manage the data complex data environments, where different kinds of dependency across windows. data sources or types have different update frequencies and volumes. This helps to group various datasets (5) Resource Management: Window-based approaches generally use fewer resources because International Journal of Scientific Research in Science, Engineering and Technology | www.ijsrset.com 364 Swethasri Kavuri et al Int J Sci Res Sci Eng Technol, September-October-2021, 8 (5) : 359-377 one deals with smaller subsets of data. However, assumes significance since it can provide an there is still a need for strategic resource allocation estimate as to how fresh data really is, if it were to to manage peaks and support performance over a be used for either analytical or operational refresh cycle. purposes. (6) Metadata Management: Information related to 3. Resource Utilization: The usage of CPU, Memory, window boundaries, processing status, and lineage and I/O during refresh cycles. These metrics data must be well managed to support tracking of refresh processes, identification of problems, and would inform one about the efficiency with which resources are being utilized and where fulfillment potential bottlenecks might be. of the requirements of data governance. 4. Data Volume Processed: Amount of data processed in each single cycle. The above metrics can be used to gauge how effectively window sizing and resource allocation are done. 5. Rate of Errors and Data Quality Measures: Measures of data integrity and consistency, including failed records, validation errors on data, and checks for consistency across window 6. boundaries. Scalability Measures: The change in refresh This graph shows the impact of window size on performance of data with increases in data processing time and data freshness, illustrating the volumes or the number of concurrent users. trade-off between these two factors. 7. Source System Impacts: Metrics that measure the load on source systems in extracting data are important to minimize the impact of such an ETL process on operational systems. IV. PERFORMANCE METRICS AND EVALUATION 4.2 Benchmarking Methods 4.1 Key Performance Indicators To compare window-based refresh strategies versus To assess how effectively refresh strategies work in traditional methods and other implementations, there windows KPIs are required, a set of several aspects of is a need for a systematic approach. In all these, there the data refresh process. Important KPIs for the evaluation of window-based refresh strategies include: are essential aspects of a good benchmarking strategy: 1. 1. Refresh Cycle Time: It is the sum total time to develop a repeatable test environment that required to complete one refresh cycle for the closely resembles the production data landscape, process of data extraction, transformation, and including volume, variety, and velocity of data. loading into the target system. This metric will 2. give an idea of how efficiently the refresh cycle has been done. 2. Controlled Test Environment: There is a necessity to possibly check on results across differing refresh strategies. Data Freshness: It is the difference in time between when data is generated in a source system and when that data would be ready in the target system for use or access. The metric here Standardized Datasets: Using standardized datasets which contain typical data and edge cases, 3. Simulation of Workload: Implementing realistic workload simulations of typical data generation patterns and of user query behavior. International Journal of Scientific Research in Science, Engineering and Technology | www.ijsrset.com 365 Swethasri Kavuri et al Int J Sci Res Sci Eng Technol, September-October-2021, 8 (5) : 359-377 4. 5. 6. Performance Profiling: Making use of the profiler • Implementation Complexity: Window-based tools with detailed performance to capture approaches introduce even more complexity than granular metrics regarding resource utilization, the apparent simplicity of full refreshes, in query performance, and data flow across the particular involving window management and process of refresh. boundary handling, but usually are easier to Scalability Testing Running tests with different implement and maintain than very complex data volumes and concurrency levels in order to understand how different refresh strategies scale incremental refresh systems that demand sophisticated change-tracking mechanisms. Comparative Analysis Comprehensive • Scalability: Window-based strategies usually have comparison of window-based strategies with the better scalability features than full refresh, when more familiar full and incremental refresh data sizes grow. They can also provide more strategies as well as other window configurations. scalable predictable behavior than some 4.3 Comparative Analysis with Classic Strategies incremental strategies, especially if certain A comparison of the proposed window-based refresh incremental strategies tend to acquire unbounded strategy with traditional strategies shows several key complexity at large scales. • benefits and possible trade-offs: • Preprocessing Efficiency: Data Consistency: Data consistency within Window-based windows can be slightly more challenging to strategies are generally more effective in terms of preprocessing efficiency compared to full maintain than in the case of full refreshes. However, well implemented window-based refreshes, especially when datasets are large and strategies can offer much stronger consistency change localized. Depending on the complexity guarantees than some incremental strategies, requirements of the change tracking involved in given that data is pretty complex. such scenarios, they can also offer better performance compared incremental approaches. • • with traditional V. OPTIMIZATION TECHNIQUES Resource Utilization: Window-based approaches Optimization of window-based refresh strategies is typically more crucial in achieving maximum performance benefits predictable patterns of resource utilization than with data extract processes. In this section, three major full refreshes process smaller data subsets, which optimization techniques are discussed: parallelization means better overall system performance, and approaches, adaptive window sizing, and data capacity planning is less difficult. partitioning strategies. Timing: Window-based strategies, therefore, may 5.1. Parallelization Approaches offer updates much more frequently than full Parallelization is an optimization technique that can be refreshes and, in principle, could approach the useful in accelerating refresh strategies making use of capabilities of some of the near-real-time incremental strategies. Of course, it is yet windowing. The company would dramatically reduce the cycle times of refresh as well as systematically dependency on the window configuration, and enhance their system throughput by employing the trade-off for timeliness will have to be parallel processing techniques. In a comprehensive carefully tuned to achieve high levels of freshness analysis of parallelization techniques in data processing that are required. systems, Abadi et al. claim that intra-query and interquery parallelism are essential to achieve high exhibit much better and International Journal of Scientific Research in Science, Engineering and Technology | www.ijsrset.com 366 Swethasri Kavuri et al Int J Sci Res Sci Eng Technol, September-October-2021, 8 (5) : 359-377 performance. Correct parallelization results in many data processing scenarios in nearly linear speedup rates. In window-based refreshes, many parallelization approaches have been proposed and implemented. Intrawindow parallelism divides the processing of data within a single window across several parallel threads or processes. It is particularly useful when huge volumes of data lie within a window or when transformations are complex. Ramakrishnan et al. (2017) demonstrated that intra-window parallelism This chart demonstrates the speedup achieved through resulted in achieving an 8x speedup in refresh parallelization in window-based refresh strategies, operations on large analytical datasets. compared to the ideal linear speedup. Inter-window parallelism refers to the simultaneous processing of multiple windows. The technique is very 5.2. Adaptive Window Sizing Adaptive Window Sizing is an optimization technique handy when the windows are independent, in which using advanced techniques to dynamically size case refresh windows on the basis of multiple factors in order that simultaneously. Chen et al. presented an adaptive the performance of the system stays at its maximum. inter-window parallelization algorithm that dynamically adjusts the number of concurrently Such an approach would be of great use in a dynamic different subsets of data will opened windows based on system load and data data environment where, besides data velocity, the system load changes pretty dramatically with time. characteristics. their Li et al. (2018) proposed an adaptive window sizing algorithm and showed an average performance gain of algorithm that continuously monitors data arrival rates 40% for adaptive parallelization compared with static and system resource utilization using feedback control, parallelization approaches. Another technique applied is pipeline parallelism, so that window sizes can be adjusted in real time to They experimented with with the various steps of the refresh process; extraction, realize a balance between processing efficiency and data freshness. Experimental results have shown that transformation, processed adaptive sizing improves overall system throughput by concurrently for successive windows. Krishnan et al. up to 30% above the best static window configuration. (2016) proposed a pipelined ETL framework for real- It also considers data dependencies or relationships in time data warehousing that drastically improved results on data freshness and overall throughput. The adaptive window sizing. Zhang et al. proposed dependency-aware adaptive windowing for ETL proposed approach resulted in up to 65% decrease in processes in data warehousing environments in 2019. latency when compared to the latency achieved by This is a method that uses analysis of data dependencies traditional batch-oriented ETL processes. to optimize window sizes of related datasets, thereby and loading, being minimizing consistency issues and complexity when managing data relationships across windows. The authors report up to 25% less data inconsistency and up to 15% in overall refresh performance using their adaptive approach. 5.3. Data Partitioning Strategies International Journal of Scientific Research in Science, Engineering and Technology | www.ijsrset.com 367 Swethasri Kavuri et al Int J Sci Res Sci Eng Technol, September-October-2021, 8 (5) : 359-377 Data partitioning is crucial to optimize window-based workload characteristics and data properties. The refresh strategies. Properly designed partitioning average performance improvements of their approach schemes can improve locality significantly, reduce I/O were shown to be 30% for an interesting, yet diverse overhead, and enhance parallelism on refresh set of analytical workloads. operations. A very popular approach is temporal partitioning, VI. CHALLENGES AND LIMITATIONS where data partitions are indeed aligned with timebased windows. Bohm et al. 2020 offer a more Although window-based refreshing strategies present comprehensive analysis of the strategies for temporal high performances, they do offer a number of partitioning over large analytical databases. Results: challenges and limitations that must be considered and Fine-grained addressed. time-based partitioning attained significant performance improvements, particularly 6.1. Scalability Issues for analytical queries for time-based. For certain Window-based refresh strategies are experiencing workloads, optimised temporal partitioning schemes scalability problems primarily due to the high volumes also resulted in up to 10x query performance and improvements. management with the distribution of workload will Hash partitioning is another effective method for remain key in keeping such systems at scale, according highly distributing data in balanced partitions with parallel processing. Zhang et al. (2012) discusses hybrid to the widely described study by Armbrust et al. (2015) on scalability in big data systems. hash partitioning which combines static and dynamic Another major scalability problem is window size partitioning to make runtime decisions based on optimization as the volume of data increases. Larger complexity of data. Effective resource changes in data distribution. Their work achieved a 35% windows mean higher processing times and resource higher ingestion rate and improved query latency by utilization. On the other hand, it is depicted that 20% compared to traditional static hash partitioning schemes. smaller windows mean higher overhead as it calls more frequent refresh cycles. Carbone et al. (2018) proposed Range partitioning may therefore be especially an adaptive windowing technique; this technique effective in optimizing the operations of such queries adapts by adjusting window sizes dynamically based on as well as in making refresh data-pruning efficient. data Shanbhag et al. (2017) published adaptive range performance metrics. Better scalability in increasing partitioning algorithm which dynamically updates data volume up to 10x was illustrated with minimal partition boundaries based on query workload and data deterioration of performance. distribution. Their experimental results showed up to Metadata management overhead is another major 50% improvement in query performance for range- scalability challenge. As the number of windows heavy workloads. increases, the management of metadata for window Composite partitioning techniques that leverage multiple partitioning schemes have emerged as a way bounds, processing status, and data lineage become complex. Fernandez et al. (2018) proposed a to better address complicated requirements related to distributed metadata management system that can data distribution. Recently, Wu et al. proposed a multi- support large-scale data processing pipelines. This dimensional leveraging eliminates 40% of metadata-related overheads and machine learning techniques to automatically select and configure optimal partitioning strategies based on thereby improves the scalability of the window-based operations significantly. partitioning framework characteristics and International Journal of Scientific Research in Science, Engineering and Technology | www.ijsrset.com corresponding system 368 Swethasri Kavuri et al Int J Sci Res Sci Eng Technol, September-October-2021, 8 (5) : 359-377 6.2. Data Consistency Concerns This heatmap visualizes the data consistency across Data consistency becomes an imperative challenge different windows, highlighting potential consistency with issues in window-based strategies. window-based refresh strategies, as the relationships and data dependencies become complex. 6.3. Resource Utilization Trade-offs Bailis et al. (2015) proposed a comprehensive analysis Optimization of refresh strategies for a window-based of consistency models in distributed database systems strategy may require many trade-offs between with trade-offs between consistency guarantees and system performance. processing efficiency, storage requirements, and data freshness. Delimitrou and Kozyrakis (2014) provide an A major difficulty in data consistency is caused by excellent study into the management of resources in cross-window dependencies. In particular, multiple large-scale systems for data processing, which clearly windows, where windows are processed in parallel, brings to focus the issue that arises with multiple require careful coordination and synchronization to performance objectives subject to conflicting changes maintain consistent views of related data. Kraska et al. in dynamic environments. (2017) proposed an algorithm for consistency-aware The trade-off of processing and storage requirements is scheduling of window-based data processing, reducing of special importance for window-based approaches. the number of consistency violations while achieving For instance, while fewer windows minimize the maximum parallelism. Their result eliminated up to 75% processing time, it may well be that the storage of consistency anomalies more than naive scheduling techniques. overhead is increased to handle the metadata of the windows and the intermediate results. Floratou et al. Another consistency concern is referential integrity (2017) proposed an adaptive buffer management over window boundaries. Dey et al. presented a technique for window-based stream processing, constraint-aware windowing approach in their work modifying buffer size based on the characteristics of (2016) that captures referential integrity constraints the workload and the availability of memory. Their during window definition and processing explicitly. Their experiments reported up to 60 percent less approach showed up to 25% reduction in memory usage with comparable processing performance. occurrences of integrity violations compared with The other significant trade-off is the trade-off in standard windowing approaches. freshness versus the efficiency of processing. A greater refresh cycle leads to a possibility of increasing the freshness of available data but will incur a higher average utilization of the resources due to higher overhead. Chandramouli et al. (2018) proposed a freshness-aware scheduling algorithm for windowbased updates which based on data change rates and user defined requirements of freshness, optimize refresh frequencies. Results showed a 40% improvement in data freshness while keeping a rise in resource utilization below 10%. International Journal of Scientific Research in Science, Engineering and Technology | www.ijsrset.com 369 Swethasri Kavuri et al Int J Sci Res Sci Eng Technol, September-October-2021, 8 (5) : 359-377 2015, proposed a framework of machine learning approaches for anomaly detection in time-series data. Inclusion of such techniques would be useful in window-based systems for proactive identification and prevention of performance problems. 7.2. Enhancement of Real Time Processing This graph shows CPU and memory usage over time, illustrating the dynamic nature of resource utilization in window-based refresh strategies. VII. RESEARCH HORIZONS Here, window-based refresh techniques hold promising tracks to better improve performance and adaptability of scalable windows, as well as integration into the new wave of emerging technologies. 7.1. Integration with Machine Learning As machine learning techniques and window-based refresh strategies are integrated, an exciting possibility lies in the optimization of performance and adaptive processing. The idea proposed by Kraska et al. (2019) of "learned indexes" is based on replacing the classical index structures in the database systems with machine learning models. It could be further extended to window-based strategies that improve the data access patterns along with refresh efficiency. Window configuration optimization and refresh policies are promising concepts that might exploit the realms of reinforcement learning techniques. Mao et al (2019) illustrated how strong the methods for reinforcement learning are in the management of resources within a distributed computing system. Similar methodology would serve rather well to dynamically adjust window sizes, refresh frequencies and parallelization strategies according to workload characteristics and system performance. Some of the other scopes to enhance the refresh strategy with the aid of machine learning are anomaly detection and predictive maintenance. Laptev et al., in The more the need for real-time data processing and analytics grows, the more research into enhancements is necessary to further reduce latency and enhance data freshness in window-based systems. A general framework was proposed in Tangwongsan et al. (2017) for incremental computation in streaming environments that could be adapted to optimize window-based refresh strategies for near-real-time scenarios. Another promising direction is the integration of window-based approaches with emerging stream processing technologies. Carbone et al. (2020) introduced the notion of "continual streaming," and in doing so, tried to bring together the two paradigms of batch and stream processing, which can easily add flexibility to window-based refresh strategies that handle historical as well as real-time data. 7.3. Strategies for Cloud-based Implementation There are opportunities and challenges involved in using large-scale cloud computing platforms as more and more organizations adopt this technology. As per Jonas et al. (2017), the term "serverless data processing" can be used for very scalable and cost-effective implementations of window-based refresh systems. Additionally, strategies multi-cloud related to and edge distributed computing window-based processing are areas for investigation. Sharma et al. (2016) discussed a framework to extend stream processing to cover both cloud and edge resources that may be applied to optimize refresh strategies based on windows in geographically dispersed data settings. VIII. CONCLUSION 8.1. Summary of Findings International Journal of Scientific Research in Science, Engineering and Technology | www.ijsrset.com 370 Swethasri Kavuri et al Int J Sci Res Sci Eng Technol, September-October-2021, 8 (5) : 359-377 This holistic analysis regarding the refresh of data 3. Careful consideration of partitioning strategies for extracts with window-based refresh strategies has data involved would maximize the benefits of generated a number of highly informative findings. window-based approaches, especially temporal With a comparative view, one finds that this approach and composite partitioning, which is specially offers several benefits over the traditional complete promising for analytical workloads. and differential methods of refreshing data, especially 4. therefore, any organization adopting window- with large sizes of data to be refreshed at high speeds. This paper goes to prove that if correctly done, based strategies would need to weigh the tradeoffs between accessing fresh data, efficient window-based strategy shall reduce processing times processing, and resource usage. significantly while putting resources to even better use 5. Due merely to the nature of window-based by making data closer to real time. refresh strategies, ensuring data consistency will Key aspects where it is improving performance include: involve A. B. C. also Reduction of processing times up to 65% dependencies, compared with full refresh methods (Krishnan et constraints. considering besides crosswindow referential integrity al., 2016) Recommendations for Implementation Throughput improvement by 30-40% with Based on the results of the research, the following adaptive parallelization along with window sizing recommendations were proposed to organizations techniques (Chen et al., 2020; Li et al., 2018) Uptill 40% improvement of data freshness due to looking forward to adopting or optimizing their existing refresh strategies as window-based. optimized scheduling algorithms (Chandramouli 1. et al., 2018) Input rich characterization of data, update patterns, and query workloads for devising The research, however has also identified a lot of window-based refresh strategies from initial critical challenges and limitations, including scalability design. issues, consistency concerns, and the trade-offs in terms of resource utilization. All these demands careful 2. Adaptive techniques applied at both the window size and parallelization level to maintain the considerations, including window sizing, partitioning window-based adaptive environment at an strategies, and consistency management techniques. optimal performance level. 8.2. Practical Implications 3. Careful engineering of the data partitioning style The key findings from this research have a number of by practical implications for organisations installing and temporal, hash, and range partitioning styles. using data warehousing and business intelligence 4. considering workload requirements in Robust metadata management systems to track systems: window boundaries, status of processing, and 1. lineage of data. Strategies refreshed based on windows highly improve the performance and efficiency of the 5. data extract process, especially for organizations dealing with big, frequently updated datasets. 2. Adaptive techniques for window size and Implement consistency aware scheduling algorithms and constraint aware windowing techniques to reduce data consistency anomalies. 6. Monitor and tune the performance of the system parallelization help in keeping the overall regularly, hence ensuring subsequent window performance at a maximum as data volumes configurations and resource allocations. change or workload characteristics evolve. 7. Investigate the possibility of integration of machine learning methods towards accomplishing International Journal of Scientific Research in Science, Engineering and Technology | www.ijsrset.com 371 Swethasri Kavuri et al Int J Sci Res Sci Eng Technol, September-October-2021, 8 (5) : 359-377 predictive maintenance as well as anomaly ACM Transactions on Database Systems (TODS), detection for refreshing processes based on 45(1), 1-47. [5]. Carbone, P., Fragkoulis, M., Kalavri, V., & windows. 8. 9. Investigate strategies for implementation in the Katsifodimos, A. (2020). Beyond analytics: the cloud in order to exploit better scalability and evolution flexibility of modern cloud platforms. Proceedings End-to-end comprehensive activities related to refreshing window-based operations must be International Conference on Management of Data allowed to be tested and validated in order to ensure that the integrity and consistency of the data are correct. 10. Data engineering as well as operations teams must be fully trained and documented in terms of management and troubleshooting for windowbased refresh systems. of stream of the processing 2020 systems. ACM In SIGMOD (pp. 2651-2658). [6]. Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., & Tzoumas, K. (2018). Apache Flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 36(4), 28-38. [7]. Chandramouli, B., Goldstein, J., Barnett, M., DeLine, R., Fisher, D., Platt, J. C., ... & Terwilliger, J. (2018). Trill: A high-performance incremental query These recommendations, if implemented, will lead to processor for diverse analytics. Proceedings of the awareness of ongoing research in the field and enable VLDB Endowment, 8(4), 401-412. organizations to make better use of window-based refresh strategies to gain huge performance and [8]. Chen, L., Gao, H., & Xu, Z. (2020). Adaptive parallel efficiency enhancements in their data extraction [9]. Delimitrou, C., & Kozyrakis, C. (2014). Quasar: Resource-efficient and QoS-aware cluster processes. execution for window-based stream queries. management. IX. REFERENCES In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems [1]. Abadi, D., Ailamaki, A., Andersen, D., Bailis, P., (pp. 127-144). ACM. Balazinska, M., Bernstein, P., ... & Zaharia, M. (2019). The Seattle Report on Database Research. [10]. Dey, A., Fekete, A., Nambiar, R., & Röhm, U. (2016). YCSB+T: Benchmarking web-scale transactional ACM SIGMOD Record, 48(4), 44-53. [2]. Armbrust, M., Ghodsi, A., Zaharia, M., Xin, R. S., databases. In 2016 IEEE 32nd International Conference on Data Engineering Workshops Lian, C., Huai, Y., ... & Franklin, M. J. (2015). Spark SQL: Relational data processing in Spark. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (pp. 1383-1394). [3]. Bailis, P., Fekete, A., Franklin, M. J., Ghodsi, A., Hellerstein, J. M., & Stoica, I. (2015). Coordination avoidance in database systems. Proceedings of the VLDB Endowment, 8(3), 185-196. [4]. Boehm, M., Schlegel, B., Volk, P. B., Fischer, U., Habich, D., & Lehner, W. (2020). Efficient inmemory indexing with generalized prefix trees. (ICDEW) (pp. 223-230). IEEE. [11]. Fernandez, R. C., Migliavacca, M., Kalyvianaki, E., & Pietzuch, P. (2018). Integrating scale out and fault tolerance in stream processing using operator state management. In Proceedings of the 2018 International Conference on Management of Data (pp. 725-739). ACM. [12]. Floratou, A., Agrawal, A., Graham, B., Rao, S., & Ramasamy, K. (2017). Dhalion: Self-regulating stream processing in Heron. Proceedings of the VLDB Endowment, 10(12), 1825-1836. [13]. Jonas, E., Pu, Q., Venkataraman, S., Stoica, I., & Recht, B. (2017). Occupy the cloud: Distributed International Journal of Scientific Research in Science, Engineering and Technology | www.ijsrset.com 372 Swethasri Kavuri et al Int J Sci Res Sci Eng Technol, September-October-2021, 8 (5) : 359-377 computing for the 99%. In Proceedings of the 2017 the Eleventh European Conference on Computer Symposium on Cloud Computing (pp. 445-451). Systems (pp. 1-15). ACM. ACM. [23]. Tangwongsan, K., Hirzel, M., Schneider, S., & Wu, [14]. Kraska, T., Alizadeh, M., Beutel, A., Chi, E. H., K. L. (2017). General incremental sliding-window Kristo, A., Leclerc, G., ... & Zaharia, M. (2019). aggregation. Proceedings of the VLDB Endowment, SageDB: A learned database system. In CIDR. 8(7), 702-713. [15]. Kraska, T., Beutel, A., Chi, E. H., Dean, J., & [24]. Wu, W., Chi, Y., Zhu, S., Tatemura, J., Hacigümüş, Polyzotis, N. (2017). The case for learned index H., & Naughton, J. F. (2021). Towards a learning structures. In Proceedings of the 2018 International optimizer for shared clouds. Proceedings of the Conference on Management of Data (pp. 489-504). VLDB Endowment, 12(3), 210-222. ACM. [25]. Zamanian, E., Binnig, C., & Salama, A. (2015). [16]. Krishnan, S., Wang, J., Wu, E., Franklin, M. J., & Locality-aware partitioning in parallel database Goldberg, K. (2016). ActiveClean: Interactive data systems. In Proceedings of the 2015 ACM SIGMOD cleaning for statistical modeling. Proceedings of the International Conference on Management of Data VLDB Endowment, 9(12), 948-959. [17]. Laptev, N., Amizadeh, S., & Flint, I. (2015). Generic (pp. 17-30). ACM. [26]. Zhang, Y., Cui, B., Fu, H., Guo, W., & Zhang, W. and scalable framework for automated time-series (2019). AdaM: An adaptive partitioning mechanism anomaly detection. In Proceedings of the 21th ACM for continuous query processing over data streams. SIGKDD International Conference on Knowledge The VLDB Journal, 28(3), 351-376 Discovery and Data Mining (pp. 1939-1947). ACM. [27]. .Santhosh Palavesh. (2019). The Role of Open [18]. Li, J., Maier, D., Tufte, K., Papadimos, V., & Tucker, P. A. (2018). No pane, no gain: Efficient evaluation Innovation and Crowdsourcing in Generating New Business Ideas and Concepts. International Journal of sliding-window aggregates over data streams. In for Research Publication and Seminar, 10(4), 137– Proceedings of the 2018 International Conference 147. https://doi.org/10.36676/jrps.v10.i4.1456 on Management of Data (pp. 39-53). ACM. [28]. Santosh Palavesh. (2021). Developing Business [19]. Mao, H., Schwarzkopf, M., Venkatakrishnan, S. B., Concepts for Underserved Markets: Identifying and Meng, Z., & Alizadeh, M. (2019). Learning scheduling algorithms for data processing clusters. Addressing Unmet Needs in Niche or Emerging Markets. Innovative Research Thoughts, 7(3), 76– In Proceedings of the ACM Special Interest Group on Data Communication (pp. 270-288). ACM. 89. https://doi.org/10.36676/irt.v7.i3.1437 [29]. Palavesh, S. (2021). Co-Creating Business Concepts [20]. Ramakrishnan, S. R., Swart, G., & Urmanov, A. with Customers: Approaches to the Use of (2017). Balancing reducer skew in MapReduce workloads using progressive sampling. In Customers in New Product/Service Development. Integrated Journal for Research in Arts and Proceedings of the 2017 Symposium on Cloud Computing (pp. 282-294). ACM. Humanities, 1(1), https://doi.org/10.55544/ijrah.1.1.9 [21]. Shanbhag, A., Jindal, A., Madden, S., Quamar, A., & [30]. Santhosh Palavesh. (2021). Business 54–66. Model Zhou, H. (2017). A robust partitioning scheme for Innovation: Strategies for Creating and Capturing ad-hoc query workloads. In Proceedings of the 2017 ACM International Conference on Management of Value Through Novel Business Concepts. European Economic Letters (EEL), 11(1). Data (pp. 1349-1364). ACM. [22]. Sharma, P., Guo, T., He, X., Irwin, D., & Shenoy, P. https://doi.org/10.52783/eel.v11i1.1784 [31]. Vijaya Venkata Sri Rama Bhaskar, Akhil Mittal, (2016). Flint: Batch-interactive data-intensive processing on transient servers. In Proceedings of Santosh Palavesh, Krishnateja Shiva, Pradeep Etikani. (2020). Regulating AI in Fintech: Balancing International Journal of Scientific Research in Science, Engineering and Technology | www.ijsrset.com 373 Swethasri Kavuri et al Int J Sci Res Sci Eng Technol, September-October-2021, 8 (5) : 359-377 Innovation with Consumer Protection. European Retrieved Economic https://ijisae.org/index.php/IJISAE/article/view/682 Letters (EEL), 10(1). https://doi.org/10.52783/eel.v10i1.1810 from 9 [32]. Challa, S. S. S. (2020). Assessing the regulatory [39]. Bhavesh Kataria "Weather-Climate Forecasting implications of personalized medicine and the use of System for Early Warning in Crop Protection, biomarkers in drug development and approval. International Journal of Scientific Research in European Science, Engineering and Technology, Print ISSN : Chemical Bulletin, 9(4), 134- 146.D.O.I10.53555/ecb.v9:i4.17671 2395-1990, Online ISSN : 2394-4099, Volume 1, [33]. EVALUATING THE EFFECTIVENESS OF RISKBASED APPROACHES IN STREAMLINING THE REGULATORY APPROVAL PROCESS FOR Issue 5, pp.442-444, September-October-2015. Available at : https://doi.org/10.32628/ijsrset14111 [40]. Siddhant Benadikar. (2021). Developing a Scalable NOVEL THERAPIES. (2021). Journal of Population and Therapeutics and Clinical Pharmacology, 28(2), 436- Distributed Machine Learning. International Journal 448. https://doi.org/10.53555/jptcp.v28i2.7421 of [34]. Challa, S. S. S., Tilala, M., Chawda, A. D., & Benke, A. P. (2019). Investigating the use of natural language processing (NLP) techniques in automating the extraction of regulatory requirements from Efficient Intelligent Cloud-Based Systems and Framework for Applications in Engineering, 9(4), 288 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/676 1 [41]. Siddhant Benadikar. (2021). Evaluating the unstructured data sources. Annals of Pharma Effectiveness of Cloud-Based AI and ML Techniques Research, 7(5), 380-387. for Personalized Healthcare and Remote Patient [35]. Challa, S. S. S., Chawda, A. D., Benke, A. P., & Tilala, M. (2020). Evaluating the use of machine learning Monitoring. International Journal on Recent and Innovation Trends in Computing and algorithms in predicting drug-drug interactions and Communication, 9(10), 03–16. Retrieved from adverse events during the drug development https://www.ijritcc.org/index.php/ijritcc/article/vie process. w/11036 NeuroQuantology, 18(12), 176-186. https://doi.org/10.48047/nq.2020.18.12.NQ20252 [36]. Ranjit Kumar Gupta, Sagar Shukla, Anaswara Thekkan Rajan, Sneha Aravind, 2021. "Utilizing [42]. Challa, S. S., Tilala, M., Chawda, A. D., & Benke, A. P. (2019). Investigating the use of natural language processing (NLP) techniques in automating the Splunk for Proactive Issue Resolution in Full Stack Development Projects" ESP Journal of Engineering extraction of unstructured regulatory requirements from data sources. Annals of & Technology Advancements 1(1): 57-64. PharmaResearch, 7(5), 380-387. [37]. Sagar Shukla. (2021). Integrating Data Analytics Platforms with Machine Learning Workflows: [43]. Dr. Saloni Sharma, & Ritesh Chaturvedi. (2017). Blockchain Technology in Healthcare Billing: Enhancing Predictive Capability and Revenue Growth. International Journal on Recent and Enhancing Transparency and Security. International Journal for Research Publication and Seminar, 10(2), Innovation 106–117. Trends in Computing and Communication, 9(12), 63–74. Retrieved from https://ijritcc.org/index.php/ijritcc/article/view/111 19 Retrieved from https://jrps.shodhsagar.com/index.php/j/article/vie w/1475 [44]. Saloni Sharma. (2020). AI-Driven Predictive [38]. Sneha Aravind. (2021). Integrating REST APIs in Single Page Applications using Angular and Modelling for Early Disease Detection and Prevention. International Journal on Recent and TypeScript. International Journal of Intelligent Systems and Applications in Engineering, 9(2), 81 –. Innovation Trends in Computing and Communication, 8(12), 27–36. Retrieved from International Journal of Scientific Research in Science, Engineering and Technology | www.ijsrset.com 374 Swethasri Kavuri et al Int J Sci Res Sci Eng Technol, September-October-2021, 8 (5) : 359-377 https://www.ijritcc.org/index.php/ijritcc/article/vie Research in Science, Engineering and Technology, w/11046 Print ISSN : 2395-1990, Online ISSN : 2394-4099, [45]. Fadnavis, N. S., Patil, G. B., Padyana, U. K., Rai, H. P., & Ogeti, P. (2020). Machine learning applications in climate modeling and weather forecasting. NeuroQuantology, 18(6), 135-145. Volume 1, Issue 3, pp.372-375, May-June-2015. Available at : https://doi.org/10.32628/ijsrset151386 [53]. Narendra Sharad Fadnavis. (2021). Optimizing Scalability and Performance in Cloud Services: https://doi.org/10.48047/nq.2020.18.6.NQ20194 Strategies and Solutions. International Journal on [46]. Narendra Sharad Fadnavis. (2021). Optimizing Recent and Innovation Trends in Computing and Scalability and Performance in Cloud Services: Communication, 9(2), 14–21. Retrieved from Strategies and Solutions. International Journal on https://www.ijritcc.org/index.php/ijritcc/article/vie Recent and Innovation Trends in Computing and w/10889 Communication, 9(2), 14–21. Retrieved from [54]. Prasad, N., Narukulla, N., Hajari, V. R., Paripati, L., https://www.ijritcc.org/index.php/ijritcc/article/vie & Shah, J. (2020). AI-driven data governance w/10889 framework for cloud-based data analytics. Volume [47]. Patil, G. B., Padyana, U. K., Rai, H. P., Ogeti, P., & Fadnavis, N. S. (2021). Personalized marketing 17, (2), 1551-1561. [55]. Big Data Analytics using Machine Learning strategies through machine learning: Enhancing Techniques customer engagement. Journal of Informatics International Journal of Business Management and Education and Research, 1(1), 9. http://jier.org Visuals, [48]. Bhaskar, V. V. S. R., Etikani, P., Shiva, K., on ISSN: Cloud Platforms. 3006-2705, (2019). 2(2), 54-58. https://ijbmv.com/index.php/home/article/view/76 Choppadandi, A., & Dave, A. (2019). Building explainable AI systems with federated learning on [56]. Bhavesh Kataria, Jethva Harikrishna, "Performance Comparison of AODV/DSR On-Demand Routing the cloud. Journal of Cloud Computing and Artificial Protocols for Ad Hoc Networks", International Intelligence, 16(1), 1–14. Journal of Scientific Research in Science and [49]. Vijaya Venkata Sri Rama Bhaskar, Akhil Mittal, Technology, Print ISSN : 2395-6011, Online ISSN : Santosh Palavesh, Krishnateja Shiva, Pradeep 2395-602X, Volume 1, Issue 1, pp.20-30, March- Etikani. (2020). Regulating AI in Fintech: Balancing Innovation with Consumer Protection. European April-2015. Available at : https://doi.org/10.32628/ijsrst15117 [57]. Shah, J., Narukulla, N., Hajari, V. R., Paripati, L., & Economic Letters (EEL), https://doi.org/10.52783/eel.v10i1.1810 10(1). Prasad, N. (2021). Scalable machine learning [50]. Dave, A., Etikani, P., Bhaskar, V. V. S. R., & Shiva, infrastructure on cloud for large-scale data K. (2020). Biometric authentication for secure mobile payments. Journal of Mobile Technology and processing. Tuijin Jishu/Journal of Propulsion Technology, 42(2), 45-53. Security, 41(3), 245-259. [51]. Saoji, R., Nuguri, S., Shiva, K., Etikani, P., & Bhaskar, [58]. Narukulla, N., Lopes, J., Hajari, V. R., Prasad, N., & Swamy, H. (2021). Real-time data processing and V. V. S. R. (2021). Adaptive AI-based deep learning predictive analytics using cloud-based machine models for dynamic control in software-defined learning. networks. International Journal of Electrical and Electronics Engineering (IJEEE), 10(1), 89–100. ISSN (P): 2278–9944; ISSN (E): 2278–9952 [52]. Bhavesh Kataria "Use of Information and Communications Technologies (ICTs) in Crop Production” International Journal of Scientific Tuijin Jishu/Journal Technology, 42(4), 91-102 [59]. Secure Federated Learning of Propulsion Framework for Distributed Ai Model Training in Cloud Environments. (2019). International Journal of Open Publication and Exploration, ISSN: 3006-2853, 7(1), International Journal of Scientific Research in Science, Engineering and Technology | www.ijsrset.com 375 Swethasri Kavuri et al Int J Sci Res Sci Eng Technol, September-October-2021, 8 (5) : 359-377 31-39. [67]. Benadikar, S. (2021). Developing a scalable and https://ijope.com/index.php/home/article/view/145 [60]. Paripati, L., Prasad, N., Shah, J., Narukulla, N., & efficient cloud-based framework for distributed machine learning. International Journal of Hajari, V. R. (2021). Blockchain-enabled data Intelligent Systems and Applications in Engineering, analytics for ensuring data integrity and trust in AI 9(4), systems. International Journal of Computer Science https://ijisae.org/index.php/IJISAE/article/view/676 and Engineering (IJCSE), 10(2), 27–38. ISSN (P): 1 2278–9960; ISSN (E): 2278–9979. 288. Retrieved from [68]. Shanbhag, R. R., Balasubramanian, R., Benadikar, S., [61]. Challa, S. S. S., Tilala, M., Chawda, A. D., & Benke, Dasi, U., & Singla, N. (2021). Developing scalable A. P. (2019). Investigating the use of natural and efficient cloud-based solutions for ecommerce language processing (NLP) techniques in automating platforms. International Journal of Computer the extraction of regulatory requirements from Science and Engineering (IJCSE), 10(2), 39-58. unstructured data sources. Annals of Pharma [69]. Tripathi, A. (2020). AWS serverless messaging using Research, 7(5), SQS. IJIRAE: International Journal of Innovative [62]. Challa, S. S. S., Tilala, M., Chawda, A. D., & Benke, A. P. (2021). Navigating regulatory requirements for Research in Advanced Engineering, 7(11), 391-393. [70]. Bhavesh Kataria, "The Challenges of Utilizing complex dosage forms: Insights from topical, Information Communication Technologies (ICTs) in parenteral, Agriculture Extension, International Journal of and ophthalmic products. NeuroQuantology, 19(12), 15. Scientific Research in Science, Engineering and [63]. Tilala, M., & Chawda, A. D. (2020). Evaluation of Technology, Print ISSN : 2395-1990, Online ISSN : compliance requirements for annual reports in pharmaceutical industries. NeuroQuantology, [64]. Ghavate, N. (2018). An Computer Adaptive Testing 2394-4099, Volume 1, Issue 1, pp.380-384, JanuaryFebruary-2015. Available at : https://doi.org/10.32628/ijsrset1511103 [71]. Tripathi, A. (2019). Serverless architecture patterns: Using Rule Based. Asian Journal For Convergence In Deep dive into event-driven, microservices, and Technology (AJCT) ISSN -2350-1146, 4(I). Retrieved serverless APIs. International Journal of Creative Research Thoughts (IJCRT), 7(3), 234-239. 18(11), 27. from http://asianssr.org/index.php/ajct/article/view/443 [65]. Shanbhag, R. R., Dasi, U., Singla, N., Balasubramanian, R., & Benadikar, S. (2020). Retrieved from http://www.ijcrt.org [72]. Thakkar, D. (2021). Leveraging AI to transform talent acquisition. International Journal of Artificial Overview of cloud computing in the process control Intelligence and Machine Learning, 3(3), 7. industry. International Journal of Computer Science and Mobile Computing, 9(10), 121-146. https://www.ijaiml.com/volume-3-issue-3-paper-1/ [73]. Bhavesh Kataria, "Role of Information Technology https://www.ijcsmc.com [66]. Bhavesh Kataria, "XML Enabling Homogeneous and in Agriculture : A Review, International Journal of Scientific Research in Science, Engineering and in Technology, Print ISSN : 2395-1990, Online ISSN : Agricultural Information Systems, International 2394-4099, Volume 1, Issue 1, pp.01-03, 2014. Journal of Scientific Research in Science, Engineering and Technology, Print ISSN : 2395- Available at : https://doi.org/10.32628/ijsrset141115 [74]. Thakkar, D. (2020, December). Reimagining 1990, Online ISSN : 2394-4099, Volume 1, Issue 2, pp.129-133, March-April-2015. Available at : https://doi.org/10.32628/ijsrset152239 curriculum delivery for personalized learning experiences. International Journal of Education, Platform Independent Data Exchange 2(2), 7. Retrieved from https://iaeme.com/Home/article_id/IJE_02_02_003 International Journal of Scientific Research in Science, Engineering and Technology | www.ijsrset.com 376 Swethasri Kavuri et al Int J Sci Res Sci Eng Technol, September-October-2021, 8 (5) : 359-377 [75]. Kanchetti, D., Munirathnam, R., & Thakkar, D. (2019). Innovations in workers compensation: XML management. Neuroquantology, 13(1), 158-163. https://doi.org/10.48047/nq.2015.13.1.792 shredding for external data integration. Journal of [83]. Gudimetla, S. R., & et al. (2015). Beyond the barrier: Contemporary Scientific Research, 3(8). ISSN Advanced strategies for firewall implementation and (Online) 2209-0142. management. NeuroQuantology, 13(4), 558-565. [76]. Aravind Reddy Nayani, Alok Gupta, Prassanna https://doi.org/10.48047/nq.2015.13.4.876 Selvaraj, Ravi Kumar Singh, & Harsh Vaidya. (2019). Search and Recommendation Procedure with the Help of Artificial Intelligence. International Journal for Research Publication and Seminar, 10(4), 148– 166. https://doi.org/10.36676/jrps.v10.i4.1503 [77]. Vaidya, H., Nayani, A. R., Gupta, A., Selvaraj, P., & Singh, R. K. (2020). Effectiveness and future trends of cloud computing platforms. Tuijin Jishu/Journal of Propulsion Technology, 41(3). Retrieved from https://www.journal-propulsiontech.com [78]. Alok Gupta. (2021). Reducing Bias in Predictive Models Serving Analytics Users: Novel Approaches and their Implications. International Journal on Recent and Innovation Trends in Computing and Communication, 9(11), 23–30. Retrieved from https://ijritcc.org/index.php/ijritcc/article/view/111 08 [79]. Bhavesh Kataria, "Variant of RSA-Multi prime RSA, International Journal of Scientific Research in Science, Engineering and Technology, Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 1, Issue 1, pp.09-11, 2014. Available at https://doi.org/10.32628/ijsrset14113 [80]. Rinkesh Gajera , "Leveraging Procore for Improved Collaboration and Communication in Multi- Stakeholder Construction Projects", International Journal of Scientific Research in Civil Engineering (IJSRCE), ISSN : 2456-6667, Volume 3, Issue 3, pp.47-51, May-June.2019 [81]. Voddi, V. K. R., & Konda, K. R. (2021). Spatial distribution and dynamics of retail stores in New York City. Webology, 18(6). Retrieved from https://www.webology.org/issue.php?volume=18&i ssue=60 [82]. Gudimetla, S. R., et al. (2015). Mastering Azure AD: Advanced techniques for enterprise identity International Journal of Scientific Research in Science, Engineering and Technology | www.ijsrset.com 377

(PDF) Improving Performance of Data Extracts Using Window-Based Refresh Strategies