Technical gazette, Vol. 27 No. 5, 2020.
Original scientific paper
https://doi.org/10.17559/TV-20200520034015
Data Deduplication Technology for Cloud Storage
Qinlu He
; School of Information and Control Engineering, Xi'an University of Architecture and Technology, Xi'an 710043, China
Genqing Bian*
; School of Information and Control Engineering, Xi'an University of Architecture and Technology, Xi'an 710043, China
Bilin Shao
; School of Management, Xi'an University of Architecture and Technology, Xi'an 710043, China
Weiqi Zhang
; School of Information and Control Engineering, Xi'an University of Architecture and Technology, Xi'an 710043, China
Abstract
With the explosive growth of information data, the data storage system has stepped into the cloud storage era. Although the core of the cloud storage system is distributed file system in solving the problem of mass data storage, a large number of duplicate data exist in all storage system. File systems are designed to control how files are stored and retrieved. Fewer studies focus on the cloud file system deduplication technologies at the application level, especially for the Hadoop distributed file system. In this paper, we design a file deduplication framework on Hadoop distributed file system for cloud application developer. Proposed RFD-HDFS and FD-HDFS two data deduplication solutions process data deduplication online, which improves storage space utilisation and reduces the redundancy. In the end of the paper, we test the disk utilisation and the file upload performance on RFD-HDFS and FD-HDFS, and compare HDFS with the disk utilisation of two system frameworks. The results show that the two-system framework not only implements data deduplication function but also effectively reduces the disk utilisation of duplicate files. So, the proposed framework can indeed reduce the storage space by eliminating redundant HDFS file.
Keywords
cloud storage; data deduplication; distributed; file deletion; HDFS
Hrčak ID:
244744
URI
Publication date:
17.10.2020.
Visits: 1.991 *