Last week, my paper on randomized task graph scheduling was accepted at the Workshop on Workflows in Support of Large-Scale Science collocated with the SC18 in Dallas.

My idea behind this paper was to improve on the extremely good performance of the HEFT [1] scheduler I had observed in various experiments. My attack on the problem was to allocate a larger time budget to allow exploring variations of HEFT’s usually already good schedules.

One way to do this is to alter the task priorities that HEFT uses to schedule tasks. The default is to prioritize tasks according to the amount of downstream work that follows up a task. Then, tasks with more downstream work are considered first and placed such that they finish as early as possible. While this is a good starting point, both fixed priorities and greedy finish time minimization can lead to sub-optimal decisions. In the paper, I took a randomized approach that alters the relative priorities of tasks in certain parts of the task graph in a random fashion. By combining modifications in different parts of the task graph, we open up a large search space that contains better schedules that are ignored by HEFT for the sake of scheduling more quickly. My algorithm searches for a predefined number of seconds, giving a user flexibility in balancing the conflicting goals of scheduling quality and speed.

Since the number of possible relative task priorities is in O(n!) I used an adaptive search mechanism that’s based on estimating the probabilities of randomly sampling better task priorities in certain areas of the search space. I used the estimates to balance exploration of the search space and exploitation of the information already collected in the search space. The result is a budgeted randomized adaptive search algorithm for task priorities that minimize workflow execution time.

My abstract on the paper [2]:

[1] H. Topcuoglu, S. Hariri, and Min-You Wu, “Performance-effective and low-complexity task scheduling for heterogeneous computing,” TPDS, vol. 13, no. 3, pp. 260–274, Mar. 2002. [2] C. Witt, S. Wheating, U. Leser, “LOS: Level Order Sampling for Task Graph Scheduling on Heterogeneous Resources,” Proceedings of the 13th Workshop on Workflows in Support of Large-Scale Science. ACM, 2018.List scheduling is an approach to task graph scheduling that has been shown to work well for scheduling tasks with data dependencies on heterogeneous resources. Key to the performance of a list scheduling heuristic is its method to prioritize the tasks, and various ranking schemes have been proposed in the literature. We propose a method that combines multiple random rankings instead of a using a deterministic ranking scheme.

We introduce L-Orders, which are a subset of all topological orders of a directed acyclic graph. L-Orders can be used to explore targeted regions of the space of all topological orders. Using the observation that the makespans in one such region are often approximately normal distributed, we estimate the expected time to solution improvement in certain regions of the search space. We combine targeted search and improvement time estimations into a time budgeted search algorithm that balances exploration and exploitation of the search space. In 40,500 experiments, our schedules are 5% shorter on average and up to 40% shorter in extreme cases than schedules produced by HEFT.