Skip to the main content

Original scientific paper

https://doi.org/10.17559/TV-20200520034015

Data Deduplication Technology for Cloud Storage

Qinlu He ; School of Information and Control Engineering, Xi'an University of Architecture and Technology, Xi'an 710043, China
Genqing Bian* ; School of Information and Control Engineering, Xi'an University of Architecture and Technology, Xi'an 710043, China
Bilin Shao ; School of Management, Xi'an University of Architecture and Technology, Xi'an 710043, China
Weiqi Zhang ; School of Information and Control Engineering, Xi'an University of Architecture and Technology, Xi'an 710043, China


Full text: english pdf 861 Kb

page 1444-1451

downloads: 828

cite


Abstract

With the explosive growth of information data, the data storage system has stepped into the cloud storage era. Although the core of the cloud storage system is distributed file system in solving the problem of mass data storage, a large number of duplicate data exist in all storage system. File systems are designed to control how files are stored and retrieved. Fewer studies focus on the cloud file system deduplication technologies at the application level, especially for the Hadoop distributed file system. In this paper, we design a file deduplication framework on Hadoop distributed file system for cloud application developer. Proposed RFD-HDFS and FD-HDFS two data deduplication solutions process data deduplication online, which improves storage space utilisation and reduces the redundancy. In the end of the paper, we test the disk utilisation and the file upload performance on RFD-HDFS and FD-HDFS, and compare HDFS with the disk utilisation of two system frameworks. The results show that the two-system framework not only implements data deduplication function but also effectively reduces the disk utilisation of duplicate files. So, the proposed framework can indeed reduce the storage space by eliminating redundant HDFS file.

Keywords

cloud storage; data deduplication; distributed; file deletion; HDFS

Hrčak ID:

244744

URI

https://hrcak.srce.hr/244744

Publication date:

17.10.2020.

Visits: 1.991 *