Tag Archives: EKS

What is PDB in Kubernetes?

Ever wondered what is PDB i.e. Pod Disruption Budget in the Kubernetes world? Then this small post is just for you!

PDB foundation!

PDB i.e. Pod Disruption Budget is a method to make sure the minimum number of Pods are always available for a certain application in the Kubernetes cluster. That is a kind of one-liner for explaining PDB 🙂 Let’s dive deeper and understand what is PDB. What does PDB offer? Should I define PDB for my applications? etc.

What is Pod Disruption Budget?

The Replicaset in Kubernetes helps us to keep multiple replicas of the same Pod to handle the load or add an extra layer of availability in containerized applications. But, those replicas are tossed during cluster maintenance or scaling actions if you don’t tell the control plane (Kubernetes master/ Kubernetes API server) how they should be terminated.

The PDB is a way to let control plane how the Pods in a certain Replicaset should be terminated. The PDB is a Kubernetes kind that should be associated with the Deployment kind.

How PDB is defined?

It’s a very small kind and offers only three fields to configure:

  • spec.selector: Defines the Pods to which PDB will be applied
  • spec.minAvailable: An absolute number or percentage. It’s the number of Pods that should always remain in a running state during evictions.
  • spec.maxUnavailable: An absolute number or percentage. It’s the maximum number of Pods that can be unavailable during evictions.
  • You can only specify either spec.minAvailable or spec.maxUnavailable

A sample Kubernetes manifest for PDB looks like this –

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: sample-pdb
  namespace: <namespace> #optional
  Annotations:           #optional
    key: value 
  labels:                #optional
    key: value
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: web

Here –

  • metadata: The PDB name, the namespace in which PDB lives, annotations and labels that are applied to the PDB itself.
  • spec: It’s a PDB config we discussed above.

How does PDB work?

Let’s look at how this configuration takes into effect. For better understanding, we will consider a simple application that has 5 replicas.

Case 1: PDB is configured with minAvailable to be 3.

This means we are telling the control plane that it can evict at most (5 running – 3 minavailable) 2 Pods at a time. That means we are allowing 2 disruptions at a time. This value is also called disruptionsAllowed . So, in a situation where the control plane needs to move all the 5 Pods, it will evict 2 Pods first then once those 2 evicted Pods, respawns on the new node and goes into the Running state, it will evict the next 2 and lastly 1. In a process, it makes sure that there are always 3 Pods in the Running state.

Case 2: PDB is configured with maxUnavailable to be 2

It’s the same effect as above! Basically, you are telling the control plane at any given point of time 2 Pods can be evicted meaning 5-2 = 3 Pods should be running!

The Allowed Disruptions is calculated on the fly. It always considers the Pods in Running state only. Continuing with the above example, if out of 5 Pods, 2 Pods are not in a Running state (for maybe some reason) then disruptionsAllowed is calculated as 3-3=0. This means only 3 Pods are in the Running state and all 3 should not be evicted since PDB says it wants a minimum of 3 Pods in the Running state all the time.

In a nutshell: disruptionsAllowed = Number of RUNNING Pods – minAvailable value

How to check Pod Disruption Budget?

One can use the below command to check the PDB –

$ kubectl get poddisruptionbudgets -n <namespace>

Then, kubectl describe can be used to get the details of each PDB fetched in the output of the previous command.

Should I define PDB for my applications?

Yes, you should! It’s a good practice to calculate and properly define the PDB to make your application resilient to Cluster maintenance/scaling activities.

The minimum number is to have minAvailable as 1 and replicas 2. Or make sure that minAvailable is always less than the replica count. The wrongly configured PDB will not allow Pod evictions and may disturb the cluster activities. Obviously, cluster admins can force their way in but then it means downtime in your applications.

You can also implement cluster constraints for PDB so that new applications won’t be allowed to deploy unless they have PDB manifest as well in the code.

How to install Cluster Autoscaler on AWS EKS

A quick rundown on how to install Cluster Autoscaler on AWS EKS.

CA on EKS!

What is Cluster Autoscaler (CA)

Cluster Autoscaler is not a new word in the Kubernetes world. It’s a program that scales out or scales in the Kubernetes cluster as per capacity demands. It is available on Github here.

For scale-out action, it looks for any unschedulable pods in the cluster and scale-out to make sure they can be scheduled. If CA is running with default settings, then it checks every 10 seconds. So basically it detects and acts for scale-out in 10 secs.

For scale in action it watches nodes for their utilization and any underutilized node will be elected for scale in. The elected node will have to remain in an un-needed state for 10 minutes for CA to terminate it.

CA on AWS EKS

As you know now, CA’s core functionality is spawning new nodes or terminating the un-needed ones, it’s essential it must be having underlying infrastructure access to perform these actions.

In AWS EKS, Kubernetes nodes are EC2 or FARGATE compute. Hence, Cluster Autoscaler running on EKS clusters should be having access to respective service APIs to perform scale out and scale in. It can be achieved by creating an IAM role with appropriate IAM policies attached to it.

Cluster Autoscaler should be running in a separate namespace (kube-system by default) on the same EKS cluster as a Kubernetes deployment. Let’s look at the installation

How to install Cluster Autoscaler on AWS EKS

Creating IAM role

IAM role of Autoscaler needs to have an IAM policy attached to it with the below permissions –

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "sts:AssumeRole",
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:DescribeLaunchConfigurations",
                "autoscaling:DescribeTags",
                "autoscaling:SetDesiredCapacity",
                "autoscaling:TerminateInstanceInAutoScalingGroup",
                "ec2:DescribeLaunchTemplateVersions"
            ],
            "Resource": "*",
            "Effect": "Allow"
        }
    ]
}

You will need to use this policy ARN in eksctl command. Also, make sure you have an IAM OIDC provider associated with your EKS cluster. Read more in detail here.

As mentioned above, we need to have an IAM role in a place that can be leveraged by Cluster Autoscaler to perform resource creation or termination on AWS services like EC2. It can be done manually, but it’s recommended to perform it using eksctl command for its comfort and perfection! It takes care of trust relationship policy and related conditions while setting up a role. If you do not prefer eksctl then refer to this document to create it using AWS CLI or console.

You need to run it from the terminal where AWS CLI is configured.

# eksctl create iamserviceaccount --cluster=<CLUSTER-NAME> --namespace=<NAMESPACE> --name=cluster-autoscaler --attach-policy-arn=<MANAGED-POLICY-ARN> --override-existing-serviceaccounts --region=<CLUSTER-REGION> --approve

where –

  • CLUSTER-NAME: Name of the EKS Cluster
  • NAMESPACE: ns under which you plan to run CA. Preference: kube-system
  • CLUSTER-REGION: Region in which EKS Cluster is running
  • MANAGED-POLICY-ARN: IAM policy ARN created for this role
# eksctl create iamserviceaccount --cluster=blog-cluster --namespace=kube-system --name=cluster-autoscaler --attach-policy-arn=arn:aws:iam::xxxxxxxxxx:policy/blog-eks-policy --override-existing-serviceaccounts --region=us-east-1 --approve
2022-01-26 13:45:11 [&#x2139;]  eksctl version 0.80.0
2022-01-26 13:45:11 [&#x2139;]  using region us-east-1
2022-01-26 13:45:13 [&#x2139;]  1 iamserviceaccount (kube-system/cluster-autoscaler) was included (based on the include/exclude rules)
2022-01-26 13:45:13 [!]  metadata of serviceaccounts that exist in Kubernetes will be updated, as --override-existing-serviceaccounts was set
2022-01-26 13:45:13 [&#x2139;]  1 task: {
    2 sequential sub-tasks: {
        create IAM role for serviceaccount "kube-system/cluster-autoscaler",
        create serviceaccount "kube-system/cluster-autoscaler",
    } }2022-01-26 13:45:13 [&#x2139;]  building iamserviceaccount stack "eksctl-blog-cluster-addon-iamserviceaccount-kube-system-cluster-autoscaler"
2022-01-26 13:45:14 [&#x2139;]  deploying stack "eksctl-blog-cluster-addon-iamserviceaccount-kube-system-cluster-autoscaler"
2022-01-26 13:45:14 [&#x2139;]  waiting for CloudFormation stack "eksctl-blog-cluster-addon-iamserviceaccount-kube-system-cluster-autoscaler"
2022-01-26 13:45:33 [&#x2139;]  waiting for CloudFormation stack "eksctl-blog-cluster-addon-iamserviceaccount-kube-system-cluster-autoscaler"
2022-01-26 13:45:50 [&#x2139;]  waiting for CloudFormation stack "eksctl-blog-cluster-addon-iamserviceaccount-kube-system-cluster-autoscaler"
2022-01-26 13:45:52 [&#x2139;]  created serviceaccount "kube-system/cluster-autoscaler"

The above command prepares the JSON CloudFormation template and deploys it in the same region. You can visit the CloudFormation console and check it.

Installation

If you choose to run CA in different namespace by defining custom namespace in manifest file, then replace kube-system with appropriate namespace name in all below commands.

Download and prepare your Kubernetes to manifest file.

# curl -o cluster-autoscaler-autodiscover.yaml https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml
# sed -i 's/<YOUR CLUSTER NAME>/cluster-name/g' cluster-autoscaler-autodiscover.yaml

Replace cluster-name with EKS cluster name.

Apply the manifest to your EKS cluster. Make sure you have the proper context set for your kubectl command so that kubectl is targeted to the expected EKS cluster.

# kubectl apply -f cluster-autoscaler-autodiscover.yaml
serviceaccount/cluster-autoscaler configured
clusterrole.rbac.authorization.k8s.io/cluster-autoscaler created
role.rbac.authorization.k8s.io/cluster-autoscaler created
clusterrolebinding.rbac.authorization.k8s.io/cluster-autoscaler created
rolebinding.rbac.authorization.k8s.io/cluster-autoscaler created
deployment.apps/cluster-autoscaler created

Add annotation to cluster-autoscaler service account with ARN of the IAM role we created in the first step. Replace ROLE-ARN with IAM role arn.

# kubectl annotate serviceaccount cluster-autoscaler -n kube-system eks.amazonaws.com/role-arn=<ROLE-ARN>
$ kubectl annotate serviceaccount cluster-autoscaler -n kube-system eks.amazonaws.com/role-arn=arn:aws:iam::xxxxxxxxxx:role/eksctl-blog-cluster-addon-iamserviceaccount-Role1-1X55OI558WHXF --overwrite=true
serviceaccount/cluster-autoscaler annotated

Patch CA for adding eviction related annotation

# kubectl patch deployment cluster-autoscaler -n kube-system -p '{"spec":{"template":{"metadata":{"annotations":{"cluster-autoscaler.kubernetes.io/safe-to-evict": "false"}}}}}'
deployment.apps/cluster-autoscaler patched

Edit CA container command to accommodate below two arguments –

  • --balance-similar-node-groups
  • --skip-nodes-with-system-pods=false
# NEW="        - --balance-similar-node-groups\n        - --skip-nodes-with-system-pods=false"
# kubectl get -n kube-system deployment.apps/cluster-autoscaler -o yaml | awk "/- --node-group-auto-discovery/{print;print \"$NEW\";next}1" > autoscaler-patch.yaml
# kubectl patch deployment.apps/cluster-autoscaler -n kube-system --patch "$(cat autoscaler-patch.yaml)"
deployment.apps/cluster-autoscaler patched

Make sure the CA container image is the latest one in your deployment definition. If not you can choose a new image by running –

# kubectl set image deployment cluster-autoscaler -n kube-system cluster-autoscaler=k8s.gcr.io/autoscaling/cluster-autoscaler:vX.Y.Z

Replace X.Y.Z with the latest version.

$ kubectl set image deployment cluster-autoscaler -n kube-system cluster-autoscaler=k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.1
deployment.apps/cluster-autoscaler image updated

Verification

Cluster Autoscaler installation is now complete. Verify the logs to make sure Cluster Autoscaler is not throwing any errors.

# kubectl -n kube-system logs -f deployment.apps/cluster-autoscaler