Introducing the Machine API

OKD can scale the cluster in and out by adding or removing workers through the Machine API. The scaling capabilities allow the cluster to provide enough computing power for your applications. The process can be either manual or automatic, depending on your needs.

The following diagram shows the Machine API (running as an operator) and the API resources that it manages. The operator provides a variety of controllers that interact with the cluster resources, such as the MachineSet controller, which describes a group of worker nodes.

The API resources for automatic scaling (ClusterAutoscaler and MachineAutoscaler) are not managed by the Machine API operator. The Cluster Autoscaler operator manages these two resources.

The Machine API provides the following custom resources:

  • Machines are the fundamental compute unit in a cluster. Each Machine resource correlates to a physical or virtual node.

  • MachineSets describe a group of hosts. MachineSets are to machines what ReplicaSets are to pods. You can scale in and out the number of replicas (machines) specified by this resource.

    In a default installation, each worker has its own machine set, excluding the master nodes. By default, OKD creates one machine set per availability zone when the cluster is running in AWS. Use a machine set to customize your cluster topology, for example, if you need to define a set per region.

  • MachineAutoscaler and ClusterAutoscaler are used to automatically scale resources in a cloud deployment. Automatic scaling also ensures that if a worker node becomes unresponsive, then pods are quickly evacuated to another worker node for improved availability.

    Scaling is useful when the cluster is under stress, or for workloads whose computing requirements change. This resource allows you to scale a variety of components, such as nodes, cores, and memory.

apiVersion: ""
kind: "ClusterAutoscaler"
    name: "default"
        maxNodesTotal: 20
        enabled: true
        delayAfterAdd: 10s
        delayAfterDelete: 10s
        delayAfterFailure: 10s
  • MachineHealthCheck verifies the health of a machine (such as a worker node) and takes action if the resource is unhealthy.