Abstract:
The distributed storage system uses low-cost equipment to provide scalable and flexible data storage services, which greatly reduces the system construction cost. However, the heterogeneity of device capacity and the diversity of storage pool in Ceph storage system will lead to the unbalanced distribution of replica data, which brings new challenges to the performance and reliability of the system. To solve this problem, a data balancing method based on cluster topology and storage pool awareness is proposed. While optimizing the balance and performance of the storage system, it avoids the equipment with small storage capacity from bearing too much workload, and also reduces the resource waste of the equipment with large storage capacity. First, the dynamic global weight balancer DGWBalancer is designed, which comprehensively considers the cluster topology, cluster storage utilization, device storage utilization and storage pool, and obtains the corresponding data distribution node selection strategy through greedy algorithm, dynamically and globally adjusts the weight of various devices in the cluster and storage devices in the storage pool, so that the data can be more reasonably and evenly distributed to various devices, Achieve the purpose of improving the reliable storage utilization and performance of cloud storage services. The experimental results show that compared with Ceph's existing mgr balancer, DGWBalancer can achieve better results in data balancing, improving the balance degree by 350% and the cluster reliable storage utilization rate by 13.5%; In terms of performance, it improved throughput by 10% and IOPS by 17%.