Running a dataflow in a cloud environment usually generates many useful intermediate datasets. A strategy for running a dataflow is to decide which datasets should be stored, while the rest of them are regenerated. The intermediate dataset storage (IDS) problem asks to find a strategy for running a dataflow, such that the total cost is minimized. The current best algorithm for linear-structure IDS takes O(n~4) time, where "linear-structure" means that the structure of the datasets in the dataflow is a pipeline. In this paper, we present a new algorithm for this problem, and improve the time complexity to O(n~3), where n is the number of datasets in the pipeline.
展开▼