In this thesis, we present our works on resource management in large scale systems, especially for enterprise cluster and storage systems. Large-scale cluster systems become quite popular among a community of users by offering a variety of resources. Such systems require complex resource management schemes for multi-objective optimizations and should be specific to different system requirements. In addition, burstiness has often been found in enterprise workloads, being a key factor in performance degradation. Therefore, it is an extremely challenging problem of managing heterogeneous resources (e.g., computing, networking and storage) for such a large scale system under bursty conditions while providing performance guarantee and cost efficiency.;To solve this problem, we first investigate the issues of classic load balancers under bursty workloads and explore the new algorithms for effective resource allocation in cluster systems. We demonstrate that burstiness in user demands diminishes the benefits of some existing load balancing algorithms. Motivated by this observation, we develop a new class of burstiness-aware load balancing algorithms. First, we present a static version of our new load balancer, named ArA, which tunes the schemes for load balancing by adjusting the degree of randomness and greediness in the selection of computing sites. An online version of ArA has been developed as well, which predicts the beginning and the end of workload bursts and automatically adjusts the load balancers to compensate. The experimental results show that this new load balancer can adapt quickly to the changes in user demands and thus improve performance in both simulation and real experiments.;Secondly, we work on data management in enterprise storage systems. Tiered storage architectures provide the shared storage resources to a large variety of applications which might demand for different service level agreements (SLAs). Furthermore, any user query from a data-intensive application could easily trigger a burst of disk I/Os to the back-end storage system, which eventually causes performance degradation. Therefore, we present a new approach for automated data movement in multi-tiered storage systems aiming to support multiple SLAs for applications with dynamic workloads at the minimal cost.;In addition, Flash technology can be leveraged in virtualized environments as a secondary-level host-side cache for I/O acceleration. We present a new Flash Resource Manager, named vFRM, which aims to maximize the utilization of Flash resources with the minimal I/O cost. It identifies the data blocks that benefit most from being put on Flash, and lazily and asynchronously updates Flash. Further, we investigate the benefits of the global versions of vFRM, named g-vFRM, for managing Flash resources among multiple heterogeneous VMs. Experimental evaluation shows that both vFRM and g-vFRM algorithms can achieve better cost-effectiveness than traditional caching solutions, and cost orders of magnitude less memory and I/O bandwidth.
展开▼