Skip to the main content

Original scientific paper

https://doi.org/10.32985/ijeces.16.7.1

A Scalable Distributed Approach for Exploration Global Frequent Patterns

Houda Essalmi ; Laboratory of Engineering Sciences, Polydisciplinary Faculty of Taza, University of Sidi Mohamed Ben Abdellah Fez, Morocco *
Anass El Affar orcid id orcid.org/0009-0009-9545-0373 ; Laboratory of Engineering Sciences, Polydisciplinary Faculty of Taza, University of Sidi Mohamed Ben Abdellah Fez, Morocco

* Corresponding author.


Full text: english pdf 932 Kb

page 553-564

downloads: 178

cite


Abstract

Finding patterns in transactional databases regularly is an essential part of data mining since it makes it simpler to identify significant connections and reoccurring patterns in datasets. Scalable, high-performance computing solutions that employ parallel computing systems to optimize resource efficiency and data analysis as data volumes continue to grow are necessary for efficiently processing large databases. To solve these issues, this paper presents Exploration Global Frequent Patterns (EGFP), a new parallel algorithm designed to generate global frequent patterns in different distributed datasets. By facilitating the distribution of workloads and data partitioning, the approach reduces communication costs and ensures efficient parallel execution. Our approach uses two prefix-tree structures to generate a significantly compacted and structured representation of frequent patterns. The first structure local-tree serves to store local support values to effectively collect and arrange transaction data. Global prefix counts are then aggregated and ranked to improve frequency-based analysis and provide a more organized and useful representation of frequent patterns. To find the globally prevalent patterns, a Master site develops a second structure global-tree for each prefix based on this arranged data. Experimental results on large-scale benchmark datasets show that EGFP outperforms other existing methods including CD and PFP-tree in terms of execution time and scalability, while incurring considerably less communication cost.

Keywords

Data mining; Parallel Processing; Frequent Patterns tree; Communication costs;

Hrčak ID:

335131

URI

https://hrcak.srce.hr/335131

Publication date:

20.8.2025.

Visits: 373 *