Tag Archives: Kubernetes

How to install Cluster Autoscaler on AWS EKS

A quick rundown on how to install Cluster Autoscaler on AWS EKS.

CA on EKS!

What is Cluster Autoscaler (CA)

Cluster Autoscaler is not a new word in the Kubernetes world. It’s a program that scales out or scales in the Kubernetes cluster as per capacity demands. It is available on Github here.

For scale-out action, it looks for any unschedulable pods in the cluster and scale-out to make sure they can be scheduled. If CA is running with default settings, then it checks every 10 seconds. So basically it detects and acts for scale-out in 10 secs.

For scale in action it watches nodes for their utilization and any underutilized node will be elected for scale in. The elected node will have to remain in an un-needed state for 10 minutes for CA to terminate it.

CA on AWS EKS

As you know now, CA’s core functionality is spawning new nodes or terminating the un-needed ones, it’s essential it must be having underlying infrastructure access to perform these actions.

In AWS EKS, Kubernetes nodes are EC2 or FARGATE compute. Hence, Cluster Autoscaler running on EKS clusters should be having access to respective service APIs to perform scale out and scale in. It can be achieved by creating an IAM role with appropriate IAM policies attached to it.

Cluster Autoscaler should be running in a separate namespace (kube-system by default) on the same EKS cluster as a Kubernetes deployment. Let’s look at the installation

How to install Cluster Autoscaler on AWS EKS

Creating IAM role

IAM role of Autoscaler needs to have an IAM policy attached to it with the below permissions –

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "sts:AssumeRole",
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:DescribeLaunchConfigurations",
                "autoscaling:DescribeTags",
                "autoscaling:SetDesiredCapacity",
                "autoscaling:TerminateInstanceInAutoScalingGroup",
                "ec2:DescribeLaunchTemplateVersions"
            ],
            "Resource": "*",
            "Effect": "Allow"
        }
    ]
}

You will need to use this policy ARN in eksctl command. Also, make sure you have an IAM OIDC provider associated with your EKS cluster. Read more in detail here.

As mentioned above, we need to have an IAM role in a place that can be leveraged by Cluster Autoscaler to perform resource creation or termination on AWS services like EC2. It can be done manually, but it’s recommended to perform it using eksctl command for its comfort and perfection! It takes care of trust relationship policy and related conditions while setting up a role. If you do not prefer eksctl then refer to this document to create it using AWS CLI or console.

You need to run it from the terminal where AWS CLI is configured.

# eksctl create iamserviceaccount --cluster=<CLUSTER-NAME> --namespace=<NAMESPACE> --name=cluster-autoscaler --attach-policy-arn=<MANAGED-POLICY-ARN> --override-existing-serviceaccounts --region=<CLUSTER-REGION> --approve

where –

  • CLUSTER-NAME: Name of the EKS Cluster
  • NAMESPACE: ns under which you plan to run CA. Preference: kube-system
  • CLUSTER-REGION: Region in which EKS Cluster is running
  • MANAGED-POLICY-ARN: IAM policy ARN created for this role
# eksctl create iamserviceaccount --cluster=blog-cluster --namespace=kube-system --name=cluster-autoscaler --attach-policy-arn=arn:aws:iam::xxxxxxxxxx:policy/blog-eks-policy --override-existing-serviceaccounts --region=us-east-1 --approve
2022-01-26 13:45:11 [&#x2139;]  eksctl version 0.80.0
2022-01-26 13:45:11 [&#x2139;]  using region us-east-1
2022-01-26 13:45:13 [&#x2139;]  1 iamserviceaccount (kube-system/cluster-autoscaler) was included (based on the include/exclude rules)
2022-01-26 13:45:13 [!]  metadata of serviceaccounts that exist in Kubernetes will be updated, as --override-existing-serviceaccounts was set
2022-01-26 13:45:13 [&#x2139;]  1 task: {
    2 sequential sub-tasks: {
        create IAM role for serviceaccount "kube-system/cluster-autoscaler",
        create serviceaccount "kube-system/cluster-autoscaler",
    } }2022-01-26 13:45:13 [&#x2139;]  building iamserviceaccount stack "eksctl-blog-cluster-addon-iamserviceaccount-kube-system-cluster-autoscaler"
2022-01-26 13:45:14 [&#x2139;]  deploying stack "eksctl-blog-cluster-addon-iamserviceaccount-kube-system-cluster-autoscaler"
2022-01-26 13:45:14 [&#x2139;]  waiting for CloudFormation stack "eksctl-blog-cluster-addon-iamserviceaccount-kube-system-cluster-autoscaler"
2022-01-26 13:45:33 [&#x2139;]  waiting for CloudFormation stack "eksctl-blog-cluster-addon-iamserviceaccount-kube-system-cluster-autoscaler"
2022-01-26 13:45:50 [&#x2139;]  waiting for CloudFormation stack "eksctl-blog-cluster-addon-iamserviceaccount-kube-system-cluster-autoscaler"
2022-01-26 13:45:52 [&#x2139;]  created serviceaccount "kube-system/cluster-autoscaler"

The above command prepares the JSON CloudFormation template and deploys it in the same region. You can visit the CloudFormation console and check it.

Installation

If you choose to run CA in different namespace by defining custom namespace in manifest file, then replace kube-system with appropriate namespace name in all below commands.

Download and prepare your Kubernetes to manifest file.

# curl -o cluster-autoscaler-autodiscover.yaml https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml
# sed -i 's/<YOUR CLUSTER NAME>/cluster-name/g' cluster-autoscaler-autodiscover.yaml

Replace cluster-name with EKS cluster name.

Apply the manifest to your EKS cluster. Make sure you have the proper context set for your kubectl command so that kubectl is targeted to the expected EKS cluster.

# kubectl apply -f cluster-autoscaler-autodiscover.yaml
serviceaccount/cluster-autoscaler configured
clusterrole.rbac.authorization.k8s.io/cluster-autoscaler created
role.rbac.authorization.k8s.io/cluster-autoscaler created
clusterrolebinding.rbac.authorization.k8s.io/cluster-autoscaler created
rolebinding.rbac.authorization.k8s.io/cluster-autoscaler created
deployment.apps/cluster-autoscaler created

Add annotation to cluster-autoscaler service account with ARN of the IAM role we created in the first step. Replace ROLE-ARN with IAM role arn.

# kubectl annotate serviceaccount cluster-autoscaler -n kube-system eks.amazonaws.com/role-arn=<ROLE-ARN>
$ kubectl annotate serviceaccount cluster-autoscaler -n kube-system eks.amazonaws.com/role-arn=arn:aws:iam::xxxxxxxxxx:role/eksctl-blog-cluster-addon-iamserviceaccount-Role1-1X55OI558WHXF --overwrite=true
serviceaccount/cluster-autoscaler annotated

Patch CA for adding eviction related annotation

# kubectl patch deployment cluster-autoscaler -n kube-system -p '{"spec":{"template":{"metadata":{"annotations":{"cluster-autoscaler.kubernetes.io/safe-to-evict": "false"}}}}}'
deployment.apps/cluster-autoscaler patched

Edit CA container command to accommodate below two arguments –

  • --balance-similar-node-groups
  • --skip-nodes-with-system-pods=false
# NEW="        - --balance-similar-node-groups\n        - --skip-nodes-with-system-pods=false"
# kubectl get -n kube-system deployment.apps/cluster-autoscaler -o yaml | awk "/- --node-group-auto-discovery/{print;print \"$NEW\";next}1" > autoscaler-patch.yaml
# kubectl patch deployment.apps/cluster-autoscaler -n kube-system --patch "$(cat autoscaler-patch.yaml)"
deployment.apps/cluster-autoscaler patched

Make sure the CA container image is the latest one in your deployment definition. If not you can choose a new image by running –

# kubectl set image deployment cluster-autoscaler -n kube-system cluster-autoscaler=k8s.gcr.io/autoscaling/cluster-autoscaler:vX.Y.Z

Replace X.Y.Z with the latest version.

$ kubectl set image deployment cluster-autoscaler -n kube-system cluster-autoscaler=k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.1
deployment.apps/cluster-autoscaler image updated

Verification

Cluster Autoscaler installation is now complete. Verify the logs to make sure Cluster Autoscaler is not throwing any errors.

# kubectl -n kube-system logs -f deployment.apps/cluster-autoscaler