Skip to the main content

Original scientific paper

https://doi.org/10.32985/ijeces.17.1.5

Parallel and Distributed Multi-level Entropy- Based Approach for Adaptive Global Frequent Pattern Mining in Large Datasets

Houda Essalmi ; Laboratory of Engineering Sciences, Polydisciplinary Faculty of Taza, University of Sidi Mohamed Ben Abdellah Fez, Morocco *
Anass El Affar ; Laboratory of Engineering Sciences, Polydisciplinary Faculty of Taza, University of Sidi Mohamed Ben Abdellah Fez, Morocco

* Corresponding author.


Full text: english pdf 1.407 Kb

page 49-64

downloads: 95

cite


Abstract

Frequent pattern mining in distributed settings remains a significant challenge due to predominantly high computational expenses and high communication overhead. This paper presents AGFPM (Adaptive Global Frequent Pattern Mining), a novel solution that integrates an extensible Master-Slave architecture with an advanced pruning technique that relies on binary entropy and statistical quartiles. AGFPM proposes two primary data structures: the LP-Tree (Local Prefix Tree) and the GP-Tree (Global Prefix Tree). A single pass of each local Slave site is used to build one LP-Tree, and low information value branches are pruned early on by entropy and quartile thresholds. Rather than transferring complete trees, only succinct metadata is sent to the Master site, where the GP-Tree is built from globally sorted items in order of their entropy rankings. A significant aspect of AGFPM is the flexible pruning approach: either the GP-Tree is pruned or not pruned, based on user criteria. This provides a dynamic adjustment between the performance and generality of results, thereby allowing control over the level of compression applied when generating global patterns. Global frequent patterns are then recursively mined from the GP-Tree based on conditional sub-GP-Trees. Frequent patterns are extended at each level of the hierarchy by intersecting the common prefix paths, guided by a Global Header Table. AGFM demonstrates improved performance in execution time, scalability, and robustness against low support thresholds relative to existing methods.

Keywords

Data mining; Distributed Datasets; FP-tree; Communication Overhead; Frequent patterns mining; Binary Entropy; Quartile-based Pruning;

Hrčak ID:

342322

URI

https://hrcak.srce.hr/342322

Publication date:

5.1.2026.

Visits: 239 *