Schedulers in large-scale Kubernetes (K8s) clusters, such as the Godel Scheduler, are often required to schedule a large number of Pods within a short period. To meet service level objectives, such as minimizing job waiting times from users’ perspectives, it is the scheduler’s top priority to maximize scheduling throughput without compromising scheduling quality. In this context, the scheduler is often unable to make optimal scheduling decisions. Moreover, even if some scheduling placements are initially optimal, they are typically not sustainable over time as the cluster state continually changes. As a result, the production environment may suffer from issues like severe resource fragmentation, uneven node loads, and high inter-application communication costs due to suboptimal instance placements.
There were no pre-existing viable solutions for improving these suboptimal placements other than manual operations. Therefore, we introduced the Gödel Rescheduler to address these issues more effectively.
The objectives of Gödel Rescheduler are:
Define the rescheduling process to standardize interactions between the rescheduler and other components, such as the Gödel Scheduler.
Build a standardized rescheduling framework that ensures extensibility, ease of integration with new rescheduling strategies, and support for additional rescheduling scenarios.
Develop robust error-handling and migration constraint mechanisms to maintain cluster stability.
Overall Architecture
Policy Manager
Policy Configurator: Reads from the ReschedulerPolicy Config Yaml file to configure detectors, algorithms, triggers, and other necessary settings.
Detector: Identifies nodes and Pods that meet migration criteria, with options to set a maximum limit on the number of nodes and Pods to migrate. The migration scope can be configured as either local (specific nodes or node level) or global.
Algorithm Provider: Reassigns Pods identified by the detector using the specified placement algorithm.
Validator: Checks and validates the placement results generated by the Algorithm Provider.
Policy Controller: Manages the overall workflow of the Policy Manager, utilizing the capabilities of related modules to produce globally or locally optimal placement results.
Movement Manager
Movement Generator: Based on PDB and other restrictions, the Movement Generator partitions the recommended placement results from the Policy Manager into multiple migration batches, organizing them as Movement objects.
Task Killer: Deletes corresponding Pods based on migration decisions.
Movement Recycler: Periodically cleans up migration decisions (Movement); the trigger conditions for cleanup are customizable.
Movement Controller: Manages the overall workflow of the movement Manager.
Gödel Scheduler
Receives Movement objects created by the Movement Manager, giving priority to the recommended nodes in Movement when scheduling the same type of Pod.
We will demonstrate the rescheduling process using the BinPacking strategy, which consolidates Pods as much as possible. Specifically, the BinPacking strategy identifies nodes with low resource utilization and attempts to move Pods from these nodes to those with the highest resource utilization. In the previous deployment step, we have enabled the following BinPacking rescheduling strategy in the configuration file.
This file configures the BinPacking detector and algorithm plugins and declares that the policy check can be triggered periodically or by signal 12. Additionally, the detector plugin defines some necessary parameters for the policy; the configuration above indicates that Pods on nodes with a CPU utilization of 0.5 or lower will be rescheduled.
Submit a Sample Workload
$ kubectl apply -f docs/examples/binpacking/workload.yaml
$ kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-856557487b-92qvn 1/1 Running 0 3s 10.244.1.8 godel-demo-default-worker <none> <none>
nginx-deployment-856557487b-kgv7f 1/1 Running 0 3s 10.244.1.9 godel-demo-default-worker <none> <none>
Gödel Rescheduler
Schedulers in large-scale Kubernetes (K8s) clusters, such as the Godel Scheduler, are often required to schedule a large number of Pods within a short period. To meet service level objectives, such as minimizing job waiting times from users’ perspectives, it is the scheduler’s top priority to maximize scheduling throughput without compromising scheduling quality. In this context, the scheduler is often unable to make optimal scheduling decisions. Moreover, even if some scheduling placements are initially optimal, they are typically not sustainable over time as the cluster state continually changes. As a result, the production environment may suffer from issues like severe resource fragmentation, uneven node loads, and high inter-application communication costs due to suboptimal instance placements.
There were no pre-existing viable solutions for improving these suboptimal placements other than manual operations. Therefore, we introduced the Gödel Rescheduler to address these issues more effectively.
The objectives of Gödel Rescheduler are:
Overall Architecture
Policy Manager
ReschedulerPolicy Config Yamlfile to configure detectors, algorithms, triggers, and other necessary settings.Movement Manager
Gödel Scheduler
Quick Start
1. Set Up the Local Gödel Scheduler Environment
Prepare the Gödel Cluster
Refer to: Local Gödel Environment Setup with KIND, ensuring it includes commit 2309f09a9f38b9da7acbe99085445120c0c64a4e.
Switch kubectl Context
Enable Rescheduling for Gödel Components: dispatcher, scheduler and binder
2. Deploy Gödel Rescheduler
clone gödel scheduler repo to your machine
Change to the gödel rescheduler directory & build rescheduler image
Load rescheduler image to kind cluster
Install rescheduler component
3. BinPacking Rescheduling Example
We will demonstrate the rescheduling process using the BinPacking strategy, which consolidates Pods as much as possible. Specifically, the BinPacking strategy identifies nodes with low resource utilization and attempts to move Pods from these nodes to those with the highest resource utilization. In the previous deployment step, we have enabled the following BinPacking rescheduling strategy in the configuration file.
This file configures the BinPacking detector and algorithm plugins and declares that the policy check can be triggered periodically or by signal 12. Additionally, the detector plugin defines some necessary parameters for the policy; the configuration above indicates that Pods on nodes with a CPU utilization of 0.5 or lower will be rescheduled.
Submit a Sample Workload
Untaint godel-demo-default-control-plane Node
Manually Trigger Rescheduling
Observe Rescheduling Results
Two Pods have been moved to the node godel-demo-default-control-plane, as expected.
Contribution Guide
Please refer to: Contributing to Gödel Rescheduler