An Optimized Apriori-Based Frequent Itemset Mining Approach Using Apache Spark for Large-Scale Datasets

Elavarasi, D.; Kavitha, R.

doi:10.17559/TV-20250807002887

Technical gazette, Vol. 33 No. 3, 2026.

Original scientific paper

https://doi.org/10.17559/TV-20250807002887

An Optimized Apriori-Based Frequent Itemset Mining Approach Using Apache Spark for Large-Scale Datasets

D. Elavarasi ; Department of Computer Science and Engineering, Mount Zion College of Engineering andTechnology, Pudukkottai, India *
R. Kavitha ; Department of Information and Technology, Velammal College of Engineering and Technology, Madurai, India

* Corresponding author.

Full text: english pdf 519 Kb

page 1071-1078

downloads: 200

cite

APA 6th Edition

Elavarasi, D. & Kavitha, R. (2026). An Optimized Apriori-Based Frequent Itemset Mining Approach Using Apache Spark for Large-Scale Datasets. Tehnički vjesnik, 33 (3), 1071-1078. https://doi.org/10.17559/TV-20250807002887

MLA 8th Edition

Elavarasi, D. and R. Kavitha. "An Optimized Apriori-Based Frequent Itemset Mining Approach Using Apache Spark for Large-Scale Datasets." Tehnički vjesnik, vol. 33, no. 3, 2026, pp. 1071-1078. https://doi.org/10.17559/TV-20250807002887. Accessed 19 Jul. 2026.

Chicago 17th Edition

Elavarasi, D. and R. Kavitha. "An Optimized Apriori-Based Frequent Itemset Mining Approach Using Apache Spark for Large-Scale Datasets." Tehnički vjesnik 33, no. 3 (2026): 1071-1078. https://doi.org/10.17559/TV-20250807002887

Harvard

Elavarasi, D., and Kavitha, R. (2026). 'An Optimized Apriori-Based Frequent Itemset Mining Approach Using Apache Spark for Large-Scale Datasets', Tehnički vjesnik, 33(3), pp. 1071-1078. https://doi.org/10.17559/TV-20250807002887

Vancouver

Elavarasi D, Kavitha R. An Optimized Apriori-Based Frequent Itemset Mining Approach Using Apache Spark for Large-Scale Datasets. Tehnički vjesnik [Internet]. 2026 [cited 2026 July 19];33(3):1071-1078. https://doi.org/10.17559/TV-20250807002887

IEEE

D. Elavarasi and R. Kavitha, "An Optimized Apriori-Based Frequent Itemset Mining Approach Using Apache Spark for Large-Scale Datasets", Tehnički vjesnik, vol.33, no. 3, pp. 1071-1078, 2026. [Online]. https://doi.org/10.17559/TV-20250807002887

Abstract

Frequent itemset mining, the foundation of association rule mining, is a widely used technique for extracting valuable patterns from large corporate datasets. Among the early algorithms, the Apriori algorithm is well-known, yet it suffers from two major limitations: repeated dataset scans and the need to generate all candidate itemsets prior to support calculation. These drawbacks significantly impact performance, particularly in large-scale and distributed environments. To address these challenges, we propose an enhanced approach, USAHFAPIM (Uplift Scale Apriori-Based High Frequent Association Pruning Item Sets Mining), that leverages the Apache Spark framework for efficient processing of massive datasets with minimal memory consumption. The approach introduces two key innovations. First, it extracts itemsets by dynamically assessing input data, directly computing their support and confidence, which are used to calculate lift and determine strong associations. Second, it improves search efficiency by pruning redundant or duplicate data using a frequency-based filtering mechanism that reduces data loss. Through these mechanisms, USAHFAPIM enhances data analysis efficiency and significantly reduces execution time for large-scale and sparse datasets. Experimental results demonstrate that USAHFAPIM outperforms traditional algorithms such as Eclat, FP-Growth, and standard Apriori, achieving an accuracy of 94%, a precision of 93%, a recall of 92%, a false positive rate (FPR) of 0.08, and an execution time of 25-32 seconds at a minimum support threshold of 0.36%. These results confirm that USAHFAPIM is highly efficient and scalable for both dense and sparse datasets in big data environments.

Keywords

Apache Spark; apriori algorithm; association rule pruning; big data analytics; frequent itemset mining

Hrčak ID:

346719

URI

https://hrcak.srce.hr/346719

Publication date:

30.4.2026.

Visits: 369 *

Login and registration

Technical gazette, Vol. 33 No. 3, 2026.

Abstract

Keywords

Hrčak ID:

URI

Publication date: