Skip to the main content

Original scientific paper

https://doi.org/10.2498/cit.1001706

A Recovery Scheme for Cluster Federations Using Sender-based Message Logging

Bidyut Gupta ; Southern Illinois University, Carbondale, IL, USA
Ruslan Nikolaev ; Southern Illinois University, Carbondale, IL, USA
Raja Chirra ; Southern Illinois University, Carbondale, IL, USA


Full text: english pdf 257 Kb

page 127-139

downloads: 580

cite


Abstract

A cluster federation is a union of clusters and is heterogeneous. Each cluster contains a certain number of processes. An application running in such a computing environment is divided into communicating modules so that these modules can run on different clusters. To achieve fault-tolerance different clusters may employ different check pointing schemes. For example, some may use coordinated schemes, while some other may use communication-induced schemes. It may complicate the recovery process. In this paper, we have addressed the complex problem of recovery for cluster computing environment. The proposed approach handles both inter cluster orphan and lost messages unlike the existing works in this area. We first propose an algorithm to determine a recovery line so that there does not exist any inter cluster orphan message between any pair of the cluster level check points belonging to the recovery line. The main feature of the proposed algorithm is that it can be executed simultaneously by all clusters in the cluster federation. Next we apply the sender-based message logging idea to effectively handle all inter cluster lost messages to ensure correctness of computation.

Keywords

cluster federation; cluster level; checkpoint; recovery

Hrčak ID:

71050

URI

https://hrcak.srce.hr/71050

Publication date:

30.6.2011.

Visits: 1.328 *