Introducing the Scheduler Algorithm

The pod scheduler determines the placement of new pods onto nodes in the OKD cluster. It is designed to be highly configurable and adaptable to different clusters.

The default configuration shipped with OKD supports the common data center concepts of zones and regions by using node labels, affinity rules, and anti-affinity rules.

The OKD pod scheduler algorithm follows a three-step process:

1. Filtering nodes.

The scheduler filters the list of running nodes by evaluating each node against a set of predicates, such as the availability of host ports, or whether a pod can be scheduled to a node experiencing either disk or memory pressure.

Additionally, a pod can define a node selector that matches the labels in the cluster nodes.

Nodes whose labels do not match are not eligible.

A pod can also define resource requests for compute resources such as CPU, memory, and storage. Nodes that have insufficient free computer resources are not eligible.

Another filtering check evaluates if the list of nodes have any taints, and if so whether the pod has an associated toleration that can accept the taint. If a pod cannot accept the taint of a node, then the node is not eligible. By default, master nodes include the taint A pod that does not have a matching toleration for this taint will not be scheduled to a master node.

The end result of the filtering step is typically a shorter list of node candidates that are eligible to run the pod. In some cases, none of the nodes are filtered out, which means the pod could run on any of the nodes. In other cases, all of the nodes are filtered out, which means the pod cannot be scheduled until a node with the desired prerequisites becomes available.

A full list of predicates and their descriptions can be found in the references section.

2. Prioritizing the filtered list of nodes.

The list of candidate nodes is evaluated using multiple priority criteria that add up to a weighted score. Nodes with higher values are better candidates to run the pod.

Among the criteria are affinity and anti-affinity rules. Nodes with higher affinity for the pod have a higher score, and nodes with higher anti-affinity have a lower score.

A common use for affinity rules is to schedule related pods to be close to each other, for performance reasons. An example is to use the same network backbone for pods that synchronize with each other.

A common use for anti-affinity rules is to schedule related pods that are not too close to each other, for high availability. An example is to avoid scheduling all pods from the same application to the same node.

3. Selecting the best fit node.

The candidate list is sorted based on these scores, and the node with the highest score is selected to host the pod. If multiple nodes have the same high score, then one is selected in a round-robin fashion.

The scheduler is flexible and can be customized for advanced scheduling situations. Additionally, although this course will focus on pod placement using node labels and node selectors, pods can also be placed using pod affinity and anti-affinity rules, as well as node affinity and antiaffinity rules.

Customizing the scheduler and covering these alternative pod placement scenarios is outside the scope of this course.