Skip to the main content

Original scientific paper

https://doi.org/10.17559/TV-20250807002887

An Optimized Apriori-Based Frequent Itemset Mining Approach Using Apache Spark for Large-Scale Datasets

D. Elavarasi ; Department of Computer Science and Engineering, Mount Zion College of Engineering andTechnology, Pudukkottai, India *
R. Kavitha ; Department of Information and Technology, Velammal College of Engineering and Technology, Madurai, India

* Corresponding author.


Full text: english pdf 519 Kb

page 1071-1078

downloads: 0

cite


Abstract

Frequent itemset mining, the foundation of association rule mining, is a widely used technique for extracting valuable patterns from large corporate datasets. Among the early algorithms, the Apriori algorithm is well-known, yet it suffers from two major limitations: repeated dataset scans and the need to generate all candidate itemsets prior to support calculation. These drawbacks significantly impact performance, particularly in large-scale and distributed environments. To address these challenges, we propose an enhanced approach, USAHFAPIM (Uplift Scale Apriori-Based High Frequent Association Pruning Item Sets Mining), that leverages the Apache Spark framework for efficient processing of massive datasets with minimal memory consumption. The approach introduces two key innovations. First, it extracts itemsets by dynamically assessing input data, directly computing their support and confidence, which are used to calculate lift and determine strong associations. Second, it improves search efficiency by pruning redundant or duplicate data using a frequency-based filtering mechanism that reduces data loss. Through these mechanisms, USAHFAPIM enhances data analysis efficiency and significantly reduces execution time for large-scale and sparse datasets. Experimental results demonstrate that USAHFAPIM outperforms traditional algorithms such as Eclat, FP-Growth, and standard Apriori, achieving an accuracy of 94%, a precision of 93%, a recall of 92%, a false positive rate (FPR) of 0.08, and an execution time of 25-32 seconds at a minimum support threshold of 0.36%. These results confirm that USAHFAPIM is highly efficient and scalable for both dense and sparse datasets in big data environments.

Keywords

Apache Spark; apriori algorithm; association rule pruning; big data analytics; frequent itemset mining

Hrčak ID:

346719

URI

https://hrcak.srce.hr/346719

Publication date:

30.4.2026.

Visits: 0 *