A Spark Based Frequent Itemset Mining Using Resource Management for Implementation of Fp-Growth Algorithm in Cloud Environment
Main Article Content
Abstract
The information generated from different sources such as mobile devices, sensors, web cameras in day to day life is growing exponentially and is processed in big data. These processed data have become high important for all the major domains, such as research, business and industry. One of the big data processing platform is the Apache Spark which can handle both batch processing and real time streaming data. Cloud computing which can provide required resources are used to meet the real time processing requirements of streaming applications. In a virtualized cloud environment, where multiple bigdata applications are deployed, the performance interference can also affectthe performance of the streaming tasks resulting in the performance degradation of the jobs. Association Rule Mining is the algorithm used to find the strongly related patterns between itemsets. FP-Growth algorithm is the widely used Association Rule mining algorithm. In this paper, the parallel FP-Growth algorithm is implemented in Spark Framework. The execution time is reduced and the cost is optimized by effective utilization of Spark resources with heterogeneous resource allocation.
