MIT researchers have developed a groundbreaking system that automatically learns how to schedule data-processing operations across thousands of servers, a task traditionally reserved for human-designed algorithms. This innovation promises to dramatically enhance the efficiency of today's power-hungry data centers.
Data centers house tens of thousands of servers that constantly process tasks from developers and users. Cluster scheduling algorithms allocate these incoming tasks across servers in real-time, optimizing computing resources and accelerating job completion.
Conventionally, humans fine-tune scheduling algorithms based on basic guidelines and various tradeoffs. They might code algorithms to prioritize certain jobs or distribute resources equally. However, workloads—comprising combined tasks—vary greatly in size and complexity, making it virtually impossible for humans to optimize scheduling algorithms for specific workloads, resulting in suboptimal efficiency.
The MIT researchers delegated all manual coding to machines. In a paper presented at SIGCOMM, they describe a system that leverages reinforcement learning (RL), a trial-and-error machine-learning technique, to customize scheduling decisions for specific workloads in particular server clusters.
To achieve this, they developed novel RL techniques capable of training on complex workloads. During training, the system explores various ways to allocate incoming workloads across servers, ultimately finding an optimal balance between resource utilization and processing speed. No human intervention is needed beyond a simple instruction like "minimize job-completion times."
Compared to the best human-designed scheduling algorithms, the researchers' system completes jobs 20-30% faster and twice as fast during high-traffic periods. The system excels at efficiently compacting workloads, minimizing waste. Results suggest the system could enable data centers to handle the same workload at higher speeds using fewer resources.
"When machines handle trial and error, they can explore different scheduling strategies and automatically identify superior approaches," explains Hongzi Mao, a PhD student in the Department of Electrical Engineering and Computer Science (EECS). "This automatically improves system performance. Even a 1% improvement in utilization can save millions of dollars and substantial energy in data centers."
"There's no universal solution for scheduling decisions," adds co-author Mohammad Alizadeh, an EECS professor and researcher in the Computer Science and Artificial Intelligence Laboratory (CSAIL). "In existing systems, these are hard-coded parameters determined upfront. Our system learns to tune its scheduling policy characteristics based on the data center and workload."
Joining Mao and Alizadeh on the paper are postdocs Malte Schwarzkopf and Shaileshh Bojja Venkatakrishnan, and graduate research assistant Zili Meng, all of CSAIL.
Reinforcement Learning for Scheduling
Typically, data processing jobs enter data centers as graphs of "nodes" and "edges." Each node represents a computation task, with larger nodes requiring more computational power. Edges connect related tasks. Scheduling algorithms assign nodes to servers based on various policies.
Traditional RL systems aren't designed to process such dynamic graphs. These systems employ a software "agent" that makes decisions and receives feedback as rewards. Essentially, the agent aims to maximize rewards for actions to learn ideal behavior in specific contexts. While they can help robots learn tasks like object picking by processing video or images through pixel grids, they struggle with graph-structured data.
To create their RL-based scheduler, called Decima, the researchers developed a model capable of processing graph-structured jobs and scaling to large numbers of jobs and servers. Their system's "agent" is a scheduling algorithm that leverages a graph neural network, commonly used for processing graph-structured data. They implemented a custom component that aggregates information across graph paths—such as quickly estimating computation needed to complete specific graph sections. This is crucial for job scheduling because "child" nodes cannot execute until their "parent" nodes finish, making anticipation of future work along different graph paths essential for effective scheduling decisions.
To train their RL system, the researchers simulated numerous graph sequences mimicking data center workloads. The agent makes decisions about allocating each graph node to servers. For each decision, a component calculates a reward based on performance at specific tasks—such as minimizing average processing time for a single job. The agent continues refining its decisions until achieving the highest possible reward.
Input-Dependent Baselining
One concern is that some workload sequences are more difficult to process due to larger tasks or more complex structures. These will inherently take longer to process—and thus yield lower rewards—than simpler ones. However, this doesn't necessarily indicate poor system performance: the system might perform well on a challenging workload but still be slower than on an easier one. This variability in difficulty makes it challenging for the model to evaluate its actions.
To address this, the researchers adapted a technique called "baselining" for this context. This technique calculates averages across scenarios with numerous variables and uses those averages as baselines for comparing future results. During training, they computed a baseline for each input sequence. They then allowed the scheduler to train on each workload sequence multiple times. Next, the system averaged performance across all decisions made for the same input workload. This average serves as the baseline against which the model can compare future decisions to determine their effectiveness. They call this new technique "input-dependent baselining."
The researchers note this innovation is applicable to many computer systems. "This represents a general approach to reinforcement learning in environments where input processes affect the environment, and you want each training event to consider one sample of that input process," he explains. "Almost all computer systems operate in constantly changing environments."
Aditya Akella, a computer science professor at the University of Wisconsin at Madison whose group has designed several high-performance schedulers, believes the MIT system could further enhance their own policies. "Decima can go further and identify optimization opportunities that are simply too onerous to realize through manual design and tuning processes," Akella states. "The schedulers we designed achieved significant improvements over production techniques in terms of application performance and cluster efficiency, but there remained a gap with the ideal improvements we could potentially achieve. Decima demonstrates that an RL-based approach can discover policies that help bridge this gap further. Decima improved on our techniques by approximately 30%, which was a huge surprise."
Currently, their model is trained on simulations recreating incoming online traffic in real-time. Next, the researchers hope to train the model on real-time traffic, which could potentially crash servers. Consequently, they're developing a "safety net" that will stop their system before it causes a crash. "We think of it as training wheels," Alizadeh explains. "We want this system to continuously train, but it needs certain training wheels that prevent it from falling over if it goes too far."