This article talks about the fundamental concepts of overprovisioning within a Kubernetes Cluster. We will explore the definition of overprovisioning, its necessity, and how to calculate various aspects related to it. So, without further delay, let’s dive right in.
Need of Overprovisioning
It’s a methodology for preparing your cluster for future demands from hosted applications to prevent potential bottlenecks.
Let’s consider a scenario in which the Kubernetes-hosted application needs to increase the number of pods (horizontal scaling) beyond the cluster’s available resources. As a result, additionally spawned pods end up in a pending
state because there are not enough resources on the cluster to schedule them. Even if you are using the Elastic Kubernetes Service (EKS) Cluster Autoscaler (referred to as CA), there is a minimum 10-second delay for CA to recognize the need for more capacity and communicate this requirement to the Auto Scaling Group (ASG). Furthermore, there is an additional delay as the ASG scales out, launches a new EC2 instance, goes through the boot-up process, executes necessary bootstrap scripts, and is marked as READY
by Kubernetes in the cluster. This entire process typically takes a minute or two, during which time application pods remain in a pending
state.
To avoid these delays and ensure immediate capacity availability for unscheduled pods, overprovisioning can be employed. This is accomplished through the use of pause pods.
Concept of pause pods
Pause pods are non-essential, low-priority pods that are created to reserve cluster resources, such as CPU, memory, or IP addresses. When critical pods require this reserved capacity, the scheduler evicts these low-priority pause pods, allowing the critical pods to utilize the freed-up resources. But, what happens to these evicted pause pods?
After being evicted, these pause pods are automatically re-created by their respective replica set and initially start in a pending state. At this point, the Cluster Autoscaler (CA) intervenes, as explained earlier, to provide the additional capacity required. Since pause pods do not serve any specific applications, it is acceptable for them to remain in a pending state for a certain period. Once the new capacity becomes available, these pause pods consume it, effectively reserving it for future requirements.
How does scale-in work with Pause pods?
Now that we’ve grasped how pause pods assist in scenarios requiring cluster scale-out, the next question arises: could these pause pods potentially hold onto resources unnecessarily and block your cluster’s scale-in actions? Here’s the scenario: when the Cluster Autoscaler (CA) identifies nodes with light utilization (perhaps containing only pause pods), it proceeds to evict these low-priority pause pods as part of the node termination process (a scale-in action). Subsequently, these evicted pods are re-created in a pending state. However, during this period, the node count has decreased by one, and the cluster-proportional-autoscaler (HPA) recalculates the new required number of pause pods. This number is typically lower, resulting in the termination of the newly pending pause pods.
Pause pod calculations
Pause pod deployment should be configured with the cluster-proportional-autoscaler
i.e. HPA. Set it to use Linear mode by defining the below configuration in the respective ConfigMap
as follows:
linear:
{
"coresPerReplica": 1,
"nodesPerReplica": 1,
"min": 1,
"max": 50,
"preventSinglePointFailure": true,
"includeUnschedulableNodes": true
}
This configuration means:
coresPerReplica
: One pause pod per core, meaning one pause pod for each core.nodesPerReplica
: One pause pod per node, signifying one pause pod for each node.min
: At least one pause pod.max
: A maximum of 50 pause pods.
When both coresPerReplica
and nodesPerReplica
are used, the system calculates both values and selects the greater of the two. Let’s calculate for a cluster with 4 nodes, each using the m7g.xlarge
instance type, which has 4 cores per node:
- 4 nodes, meaning 4 pause pods (one per node).
- 16 cores, which equates to 16 pause pods (one per core).
So, in this case, the cluster-proportional-autoscaler
will spawn a total of 16 pause pods for the cluster.
Now, let’s explore the process of calculating the CPU request configuration for Pause pods and, as a result, determine the overprovisioned capacity of the cluster.
Let’s consider, each individual pause pod is set to request 200 milliCPU (mCPU); from the cluster’s computing resources point of view, it amounts to 20% of a single CPU core’s capacity. Given that we are using one pause pod per CPU core, this effectively results in overprovisioning 20% of the entire cluster’s computational resources.
Depending on the criticality and frequency of spikes in the applications running on the cluster, you can assess the overprovisioning capacity and compute the corresponding configurations for the pause pods.