Basics of Overprovisioning in EKS Cluster

This article talks about the fundamental concepts of overprovisioning within a Kubernetes Cluster. We will explore the definition of overprovisioning, its necessity, and how to calculate various aspects related to it. So, without further delay, let’s dive right in.

Overprovisioning basics!

Need of Overprovisioning

It’s a methodology for preparing your cluster for future demands from hosted applications to prevent potential bottlenecks.

Let’s consider a scenario in which the Kubernetes-hosted application needs to increase the number of pods (horizontal scaling) beyond the cluster’s available resources. As a result, additionally spawned pods end up in a pending state because there are not enough resources on the cluster to schedule them. Even if you are using the Elastic Kubernetes Service (EKS) Cluster Autoscaler (referred to as CA), there is a minimum 10-second delay for CA to recognize the need for more capacity and communicate this requirement to the Auto Scaling Group (ASG). Furthermore, there is an additional delay as the ASG scales out, launches a new EC2 instance, goes through the boot-up process, executes necessary bootstrap scripts, and is marked as READY by Kubernetes in the cluster. This entire process typically takes a minute or two, during which time application pods remain in a pending state.

To avoid these delays and ensure immediate capacity availability for unscheduled pods, overprovisioning can be employed. This is accomplished through the use of pause pods.

Concept of pause pods

Pause pods are non-essential, low-priority pods that are created to reserve cluster resources, such as CPU, memory, or IP addresses. When critical pods require this reserved capacity, the scheduler evicts these low-priority pause pods, allowing the critical pods to utilize the freed-up resources. But, what happens to these evicted pause pods?

After being evicted, these pause pods are automatically re-created by their respective replica set and initially start in a pending state. At this point, the Cluster Autoscaler (CA) intervenes, as explained earlier, to provide the additional capacity required. Since pause pods do not serve any specific applications, it is acceptable for them to remain in a pending state for a certain period. Once the new capacity becomes available, these pause pods consume it, effectively reserving it for future requirements.

How does scale-in work with Pause pods?

Now that we’ve grasped how pause pods assist in scenarios requiring cluster scale-out, the next question arises: could these pause pods potentially hold onto resources unnecessarily and block your cluster’s scale-in actions? Here’s the scenario: when the Cluster Autoscaler (CA) identifies nodes with light utilization (perhaps containing only pause pods), it proceeds to evict these low-priority pause pods as part of the node termination process (a scale-in action). Subsequently, these evicted pods are re-created in a pending state. However, during this period, the node count has decreased by one, and the cluster-proportional-autoscaler (HPA) recalculates the new required number of pause pods. This number is typically lower, resulting in the termination of the newly pending pause pods.

Pause pod calculations

Pause pod deployment should be configured with the cluster-proportional-autoscaler i.e. HPA. Set it to use Linear mode by defining the below configuration in the respective ConfigMap as follows:

linear:
  {
    "coresPerReplica": 1,
    "nodesPerReplica": 1,
    "min": 1,
    "max": 50,
    "preventSinglePointFailure": true,
    "includeUnschedulableNodes": true
  }

This configuration means:

  • coresPerReplica: One pause pod per core, meaning one pause pod for each core.
  • nodesPerReplica: One pause pod per node, signifying one pause pod for each node.
  • min: At least one pause pod.
  • max: A maximum of 50 pause pods.

When both coresPerReplica and nodesPerReplica are used, the system calculates both values and selects the greater of the two. Let’s calculate for a cluster with 4 nodes, each using the m7g.xlarge instance type, which has 4 cores per node:

  • 4 nodes, meaning 4 pause pods (one per node).
  • 16 cores, which equates to 16 pause pods (one per core).

So, in this case, the cluster-proportional-autoscaler will spawn a total of 16 pause pods for the cluster.

Now, let’s explore the process of calculating the CPU request configuration for Pause pods and, as a result, determine the overprovisioned capacity of the cluster.

Let’s consider, each individual pause pod is set to request 200 milliCPU (mCPU); from the cluster’s computing resources point of view, it amounts to 20% of a single CPU core’s capacity. Given that we are using one pause pod per CPU core, this effectively results in overprovisioning 20% of the entire cluster’s computational resources.

Depending on the criticality and frequency of spikes in the applications running on the cluster, you can assess the overprovisioning capacity and compute the corresponding configurations for the pause pods.

GitHub Do’s and Don’ts

A curated list of GitHub Do’s and Don’ts

GitHub guidelines

Here are some pointers that can help you define GitHub usage guidelines.

GitHub Do’s

It is recommended to create a new GitHub repository for each new project.

To ensure proper code management, a CODEOWNERS file should be included in the repository to indicate the list of individuals/GitHub teams responsible for approving the code changes. Additionally, a README.md file should be added to provide information about the repository and its code.

A .gitignore file should also be included to prevent unwanted files from being committed.

Regularly committing and pushing changes to the remote branch will prevent loss of work and ensure that the branch name is reserved.

A consistent branching strategy should be used, with new bugs or features branching off the master branch.

Before creating a pull request, it is essential to review thoroughly, lint your code and perform tests to ensure it is error-free and does not contain any unnecessary code.

Make use of PR templates to endure the contributor submits all the necessary information for PR review.

When creating pull requests, it is important to include detailed commit messages and mention any relevant Ticket IDs or information about testing or proof of concepts. This will help reviewers understand the changes and the reliability of the code.

It is always necessary to create a pull request when merging changes from one branch to another, and the master branch should be protected to prevent direct editing. Pull request standards should be followed to merge code even in repositories without protection.

During the PR review process, when changes to the code are requested, do not delete the branch, instead, ensure the new changes are made in the same branch and aligned with the original pull request so that the history is visible to future reviewers.

Having multiple people review a pull request is recommended, although it is not required.

For large projects, using git-submodule can be beneficial. Keep your branches up to date with development branches and consult with peers to ensure consistency in your project.

Be cautious when merging code on repositories integrated with automation, as it can automatically alter related resources. Ensure you understand the potential impacts of your code commits.

After each release, it is best practice to merge changes from the release branch to the master.

Regularly maintain your repositories by deleting branches that are no longer needed after merging a feature or bug fix.

Consider using Git Hooks for automation where appropriate.

Access to repositories is controlled using GitHub teams, regularly check that only the intended teams and users have access.

After creating a GitHub user ID, enabling multi-factor authentication (MFA) from the settings is crucial.

GitHub Don’ts

Do not share your GitHub credentials with new team members. They should only have access to the codebase after they have secured their credentials through the established process.

The organization should implement the policy of protecting the master branch in GitHub. Even if you encounter an unprotected repository, do not commit directly to the master/main branch. Instead, please take action to have it protected as soon as possible.

Do not delay progress by regularly not pushing changes made on local branches to remote branches.

Avoid working on outdated master branches. Always use the most recent version by taking a fresh pull.

Never include sensitive information, such as application secrets or proprietary data, in public repositories. Implement secret scanning in PR checks via GitHub actions. e.g. Trufflehog

Avoid committing large files in the repository, as GitHub has a default upload limit of 100MB. Consult with peers/higher to determine the best strategy for a specific project. Git LFS is another option to consider.

Avoid creating one pull request that addresses multiple issues or environments.

Before resetting a branch, make sure to commit or stash your changes to avoid losing them.

Be cautious when using force push, as it may override remote files and should only be used by those who are experienced with it.

Do not alter or delete the public history.

Avoid manually copying code from GitHub private repositories. Instead, use the Fork functionality to duplicate the code safely.

Avoid using public GitHub actions on repositories without going through defined procedures and approvals.

Securing AWS credentials in WSL using aws-vault

This article will walk you through step-by-step procedures to secure AWS CLI credentials using the aws-vault tool.

aws-vault configuration in WSL

Why aws-vault?

Configuring the AWS CLI saves the AWS access key and secret key in plain text format under ~/.aws/credentials file. This is a security concern since the location is known, anybody can scan/look for the keys, and credentials can be compromised.

Hence, to store AWS keys more securely, we need some tool/software to keep the keys in encrypted format rather than plain text. We are discussing the aws-vault open-source tool widely used to secure AWS keys.

Also read: Setting up WSL for Sysadmin work.

How to install and configure aws-vault in WSL?

We will use the pass as a secure backend for aws-vault while configuring it in WSL as the pass is a native Linux password manager. The list of compatible secure backends for aws-vault can be found here.

Install pass utility in WSL. You can leverage package manager depending on your Linux flavour.

# apt install pass

The next step is to generate the gpg key. You will be prompted to enter a password for the key in this process. This password is necessary to unlock password storage for fetching the passwords within.

$ gpg --generate-key
...
GnuPG needs to construct a user ID to identify your key.

Real name: shrikant
Email address: info@kerneltalks.com
You selected this USER-ID:
    "shrikant <info@kerneltalks.com>"

Change (N)ame, (E)mail, or (O)kay/(Q)uit? O
...

pub   rsa3072 2022-09-25 [SC] [expires: 2024-09-24]
      110577AFEE906D84224719BCD0F123xxxxxxxxxx
uid                      shrikant <info@kerneltalks.com>
sub   rsa3072 2022-09-25 [E] [expires: 2024-09-24]

Initiate the new password storage along with gpg key encryption. Use gpg key created in the previous step in the below command –

$ pass init -p aws-vault 110577AFEE906D84224719BCD0F123xxxxxxxxxx
mkdir: created directory '/root/.password-store/aws-vault'
Password store initialized for 110577AFEE906D84224719BCD0F123xxxxxxxxxx (aws-vault)

Now, the pass backend is ready to use with aws-vault.

Install aws-vault. Simply grab the latest release from here. Move and rename it to one of the binary directories and assign execute permission.

$ wget https://github.com/99designs/aws-vault/releases/download/v6.6.0/aws-vault-linux-amd64
$ mv aws-vault-linux-amd64 /usr/local/sbin/aws-vault
$ chmod +x /usr/local/sbin/aws-vault

You have pass configured and aws-vault installed. Your system is ready to save the AWS credentials in aws-vault.

By default, aws-vault considers keyctl backend. Export the below variables for configuring it with pass backend. It’s good to export them in the login profile of WSL so that they will be exported for all your future shell sessions.

$ export AWS_VAULT_BACKEND=pass
$ export AWS_VAULT_PASS_PASSWORD_STORE_DIR=/root/.password-store/aws-vault

Add your AWS credentials in aws-vault with the below command –

# aws-vault add my-aws-account
Enter Access Key ID: XXXXXXXXXXXXX
Enter Secret Access Key: 
Added credentials to profile "my-aws-account" in vault

my-aws-account is an account alias/name for identification purposes only.

Verify if aws-vault saved the credentials in pass password storage.

$ pass show
Password Store
└── aws-vault
    └── my-aws-account

At this point, you have successfully secured your AWS credentials, and you can safely remove ~/.aws/credentials file. If you have more than one AWS account configured in the credentials file, add all of them into aws-vault before deleting it.

pass saves the password storage in ~/.password-store in encrypted format.

$ ls -lrt ~/.password-store/aws-vault/
total 8
-rw------- 1 root root   41 Sep 25 22:01 .gpg-id
-rw------- 1 root root  831 Sep 25 14:22 my-aws-account.gpg

How to use aws-vault?

When you want to run any command using AWS CLI, you can execute it like this –

$ aws-vault exec <profile-name> -- aws <cli-command>
# aws-vault exec my-aws-account -- aws s3 ls

Here: my-aws-account is the AWS profile name. aws-vault also reads the ~/.aws/config file to figure out AWS profiles configured on the machine.

You can authenticate properly with credentials stored securely in the encrypted form! aws-vault also handles the IAM role and MFA prompts defined in AWS profiles.

Error troubleshooting

gpg: decryption failed: No secret key

pass password storage is locked and needs to be unlocked. Run pass show <password-name> , and it should prompt for the gpg key password.

$ pass show my-aws-account
┌───────────────────────────────────────────────────────────────┐
│ Please enter the passphrase to unlock the OpenPGP secret key: │
│ "shrikant <info@kerneltalks.com>"                             │
│ 3072-bit RSA key, ID 0DB81DDFxxxxxxxx,                        │
│ created 2022-09-25 (main key ID D0F123Bxxxxxxxxx).            │
│                                                               │
│                                                               │
│ Passphrase: _________________________________________________ │
│                                                               │
│         <OK>                                   <Cancel>       │
└───────────────────────────────────────────────────────────────┘

Now, try the aws-vault command, and it should work fine.


aws-vault: error: Specified keyring backend not available, try --help

Ensure you have exported both parameters mentioned in the above configuration steps so that aws-vault will use the pass backend.


aws-vault: error: exec: Failed to get credentials for kerneltalks-test: operation error STS: GetSessionToken, https response error StatusCode: 403, RequestID: e514d0e5-b8b6-4144-8616-05061eeed00c, api error SignatureDoesNotMatch: The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.

It points to the wrong access/secret key combination added to the secure vault. Delete the respective key from the vault and re-add it. If you lost your secret key, then re-create new pair from the AWS console and add it to the key-vault.

$ aws-vault remove kerneltalks-test
Delete credentials for profile "kerneltalks-test"? (y|N) y
Deleted credentials.
$aws-vault add kerneltalks-test
Enter Access Key ID: xxxxxxxxxx
Enter Secret Access Key: 
Added credentials to profile "kerneltalks-test" in vault

aws-vault: error: rotate: Error creating a new access key: operation error IAM: CreateAccessKey, https response error StatusCode: 403, RequestID: 2392xxxx-x452-4bd4-80c3-452403baxxxx, api error InvalidClientTokenId: The security token included in the request is invalid

If the profile is enabled with MFA and you are trying to run some IAM modification, then you should encounter the above error. Use --no-session flag with the command. Read in-depth regarding the no-session flag here.

What is PDB in Kubernetes?

Ever wondered what is PDB i.e. Pod Disruption Budget in the Kubernetes world? Then this small post is just for you!

PDB foundation!

PDB i.e. Pod Disruption Budget is a method to make sure the minimum number of Pods are always available for a certain application in the Kubernetes cluster. That is a kind of one-liner for explaining PDB 🙂 Let’s dive deeper and understand what is PDB. What does PDB offer? Should I define PDB for my applications? etc.

What is Pod Disruption Budget?

The Replicaset in Kubernetes helps us to keep multiple replicas of the same Pod to handle the load or add an extra layer of availability in containerized applications. But, those replicas are tossed during cluster maintenance or scaling actions if you don’t tell the control plane (Kubernetes master/ Kubernetes API server) how they should be terminated.

The PDB is a way to let control plane how the Pods in a certain Replicaset should be terminated. The PDB is a Kubernetes kind that should be associated with the Deployment kind.

How PDB is defined?

It’s a very small kind and offers only three fields to configure:

  • spec.selector: Defines the Pods to which PDB will be applied
  • spec.minAvailable: An absolute number or percentage. It’s the number of Pods that should always remain in a running state during evictions.
  • spec.maxUnavailable: An absolute number or percentage. It’s the maximum number of Pods that can be unavailable during evictions.
  • You can only specify either spec.minAvailable or spec.maxUnavailable

A sample Kubernetes manifest for PDB looks like this –

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: sample-pdb
  namespace: <namespace> #optional
  Annotations:           #optional
    key: value 
  labels:                #optional
    key: value
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: web

Here –

  • metadata: The PDB name, the namespace in which PDB lives, annotations and labels that are applied to the PDB itself.
  • spec: It’s a PDB config we discussed above.

How does PDB work?

Let’s look at how this configuration takes into effect. For better understanding, we will consider a simple application that has 5 replicas.

Case 1: PDB is configured with minAvailable to be 3.

This means we are telling the control plane that it can evict at most (5 running – 3 minavailable) 2 Pods at a time. That means we are allowing 2 disruptions at a time. This value is also called disruptionsAllowed . So, in a situation where the control plane needs to move all the 5 Pods, it will evict 2 Pods first then once those 2 evicted Pods, respawns on the new node and goes into the Running state, it will evict the next 2 and lastly 1. In a process, it makes sure that there are always 3 Pods in the Running state.

Case 2: PDB is configured with maxUnavailable to be 2

It’s the same effect as above! Basically, you are telling the control plane at any given point of time 2 Pods can be evicted meaning 5-2 = 3 Pods should be running!

The Allowed Disruptions is calculated on the fly. It always considers the Pods in Running state only. Continuing with the above example, if out of 5 Pods, 2 Pods are not in a Running state (for maybe some reason) then disruptionsAllowed is calculated as 3-3=0. This means only 3 Pods are in the Running state and all 3 should not be evicted since PDB says it wants a minimum of 3 Pods in the Running state all the time.

In a nutshell: disruptionsAllowed = Number of RUNNING Pods – minAvailable value

How to check Pod Disruption Budget?

One can use the below command to check the PDB –

$ kubectl get poddisruptionbudgets -n <namespace>

Then, kubectl describe can be used to get the details of each PDB fetched in the output of the previous command.

Should I define PDB for my applications?

Yes, you should! It’s a good practice to calculate and properly define the PDB to make your application resilient to Cluster maintenance/scaling activities.

The minimum number is to have minAvailable as 1 and replicas 2. Or make sure that minAvailable is always less than the replica count. The wrongly configured PDB will not allow Pod evictions and may disturb the cluster activities. Obviously, cluster admins can force their way in but then it means downtime in your applications.

You can also implement cluster constraints for PDB so that new applications won’t be allowed to deploy unless they have PDB manifest as well in the code.

Coding GitHub action for automated CloudFormation template linting

GitHub action code for automated CloudFormation template linting on PR

Cloudformation Template Linting

GitHub is a popular version control software used widely by companies. And it is the best place to manage your AWS IaC i.e. CloudFormation templates! With the ever-growing AWS infrastructure and hence the template versions, it’s always a good practice to have your CloudFormation templates linted for any syntax errors. It saves a lot of time as you know the errors beforehand and not at the time of deployment on AWS! It will be more time saving if it just gets linted when someone raises the pull request (PR) so that the code owner, as well as the developer, knows the code modifications in PR are linted and sane to be approved.

cfn-linter is the best CloudFormation linter available. It can be implemented via GitHub actions for automated lint actions on PR submission. I will walk you through the process of setting it up.

Understanding the flow

First of all, you need to get the list of modified files in the PR so that you can run a linter against it. It can be managed by using readymade available actions or using git commands. There are a couple of actions like tj-actions, Get all changed files, etc.

Once we got the list, we need to filter out files that are potentially not CloudFormation templates. You don’t want to feed non-template files to linter as that would result in failure. I did this using grep and also allowed the shell to continue even if grep exists with a non-zero exit code. This will prevent GitHub action from failing if there are no template files modified in the given PR.

Lastly, lint all templates one by one using cfn-lint. I am ignoring the warning using -i W flag to avoid failing GitHub actions due to warnings.

Code

All these points are summarised in below GitHub Action code –

name: Lint CloudFormation Templates
on: 
    pull_request:
      branches:
        - main
jobs:
  cloudformation-linter:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v2
      - name: Setup Cloud Formation Linter with Latest Version
        uses: scottbrenner/cfn-lint-action@v2     
      - name: Fetch changed file list
        uses: tj-actions/changed-files@v17.2
        id: changed-files
      - name: Run Linter
        shell: bash
        run: |
          > list
          for file in ${{ steps.changed-files.outputs.all_changed_files }}; do
            echo $file >> list
          done
          set +e
          cat list | grep -e json -e yaml -e yml | grep -v .github > lint_list
          set -e
          if [ -s lint_list ]; then
          for i in `cat lint_list`
          do
          echo "Linting template: $i"
          cfn-lint -t $i -i W
          done
          else
          echo "No Cloudformation template detected in commit!"
          exit 0
          fi

You need to place this file with the name of your choice under <repo>/.github/workflows the directory. If you have some different master branch naming conventions or different strategies on when code should be listed, then make necessary changes in on: section.

GitHub Action

Once the action config is in place, PR submission will see automated checked in it.

Linter github action

If you click on Details, you will see the details about the action.

GitHub actions details

Your CloudFormation templates are being linted when PR is raised!

Setting up WSL for Sysadmin work

A list of tools/configurations to make sysadmin life easy on Windows workstation!

Linux lovers on Windows!

This article is intended for the sysadmins who use Windows workstations for their job and yet would love to have Linux experience on it. Moreover, if they are interacting with AWS CLI, GIT, etc. CLI based tools on daily basis then its best suited for them. I list all the tools and their respective configurations you must have in your arsenal to make your journey peaceful, less frustrating and avoid non-Linux workstation issues. I expect the audience to be comfortable with Linux.

Without further a due let’s get started.

Windows Subsystem for Linux

First of all, let’s get Linux on the Windows 🙂 WSL is a Windows feature available from Windows 10 (WSL Install steps). Install the latest (at the time of this article draft) Ubuntu 20.04 LTS from Microsoft Store. Post-installation you can run it just like other Windows apps. For the first login, you will be prompted to set a username and password. This user is configured to switch to root using sudo.

Now, you have a Linux subsystem running on your Windows! Let’s move on to configure it to ease up daily activities.

Install necessary packages using apt-get. I am listing here frequently useful for your quick reference –

I even configured WSL normal user to perform passwordless sudo into root at the login to save the hassle of typing command and password to switch into root. I love to work at root # prompt!

Avoid sound beeps from Linux terminal

With WSL, one thing you might like to avoid is workstation speaker beeps/bells due to the Linux terminal prompt of vi editors. Here is how you can avoid them :

# echo set bell-style none >>/etc/inputrc # Stops prompt bells
# echo set visualbell >> ~/.vimrc # Stops vi bells

Setting up Git on WSL

Personal Authentication Token (PAT) or SSH keys can be leveraged for configuring Git on WSL. I prefer to use SSH keys so listing steps here –

  • Create and add SSH keys to GitHub account. Steps here.
  • Authorize the organizations for the Public key you are uploading to Git by visiting Key settings on Git.
  • Add ssh-agent service startup and key identity addition at login under user/shell profile. Dirty way to do it on bash is adding below lines in ~/.bashrc file.
eval "$(ssh-agent -s)"
ssh-add /root/.ssh/git_id_rsa
  • Add alias to your Git folder on Windows drive so that you can navigate to it quickly when running all Git commands like repo clone. It can be done by adding below command to your user/shell profiles. You can choose alias (gitdir) of your owne choice and the destination cd <path> too.
alias gitdir='cd /mnt/c/Users/<username>/Downloads/Github'    

Setting up prompt to show current Git branch

It’s easy. You need to tweak your prompt PS1 with git branch command output!

The git branch output looks like this –

# git branch
* master

With help of sed you can take out branch name from it. Obviously, you also want to redirect error (on non-git directory command will fail). And add brackets around branch name to have the same look like gitbash prompt. That sums up to below code –

# git branch 2> /dev/null | sed -e '/^[^*]/d' -e 's/* \(.*\)/(\1)/'
(master)

Add this to a function and call this function in your PS1! Ta da. Sample prompt with colours from Ubuntu. Don’t forget to set this into shell profile (e.g. ~/.bashrc) so that it will be loaded on your login.

git_branch() {
  git branch 2> /dev/null | sed -e '/^[^*]/d' -e 's/* \(.*\)/(\1)/'
}
export PS1="\[\e]0;\u@\h: \w\a\]${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\] \[\033[00;32m\]\$(git_branch)\[\033[00m\]# "

Code platform

Oh yes, even sysadmins code for their automation stuff and with IaC being hot in the market it’s essential for sysadmins to code as well. Since we discussed this article is intended for Windows users, Microsoft Visual Code is an undefeated entry here! Its superb code editing tool with numerous plugins makes you comfortable to code.

Tweaking Visual code for PuTTY like experience

PuTTY is the preferred tool for SSHing in Linux world. The beauty of PuTTY lies in its copy-paste capabilities. The same capabilities can be configured on MS Visual code terminal.

Head to the terminal settings by entering the command Terminal: configure Terminal Settings in the command palette (ctrl + shift + p). On the setting screen set below options –

MS VS code setting for Putty copy-paster behaviour!

Setting up VS Code to launch from WSL

Since we already configured Git on WSL, it makes sense to directly run code . command in WSL from Git Directory and have VS code started on the Windows workstation. For that, you just need to add the alias of the code.exe file with an absolute path on Windows to code command!

If you have installed VS code with default config then the below command in your user/shell profile should do the trick.

alias code='/mnt/c/Users/<username>/AppData/Local/Programs/Microsoft\ VS\ Code/code.exe'

Code linters

There are two ways you can have your code linted locally before you commit it on Git.

  1. Install respective code linter binaries/packages on WSL. Its Linux!
  2. Install code linters on VS code if appropriate plugin is available.

Running docker on WSL without installing Docker Desktop for Windows

With WSL version 2, one can run docker on WSL without installing the docker desktop for windows. The Docker installation remains the same inside WSL just like any other Linux installation.

Once installed make sure you are running on WSL version 2. If not upgrade to WSL 2.

Convert the current WSL distro to make use of WSL 2 using the command in PowerShell –

> wsl --set-version <distro-name> 2
## Example wsl --set-version Ubuntu-20.04 2

Now, launch WSL and start the docker by incoming /usr/bin/dockerd binary! You can set an alias to dockerd & start it quickly in the background.

You can also set up cron so that it will start at boot. Note: It did not work for me in WSL

@reboot /usr/bin/dockerd &

Or, you can add the below code in your login profile like .bashrc file so that docker will run at your login.

ps -ef |grep -iq dockerd
if [ $? == 0 ]; then
:
else
/usr/bin/dockerd &
fi

If you have more tips please let us know in the comments below!

Kubernetes tools

Install a text-based UI tool for managing the K8s clusters. Its K9s. Simple installation with standalone binary can be done using the below commands –

# wget -qO- https://github.com/derailed/k9s/releases/download/v0.25.18/k9s_Linux_x86_64.tar.gz | tar zxvf -  -C /tmp/
# mv /tmp/k9s /usr/local/bin

You need to set the context from CLI first and then run k9s command.

How to install Cluster Autoscaler on AWS EKS

A quick rundown on how to install Cluster Autoscaler on AWS EKS.

CA on EKS!

What is Cluster Autoscaler (CA)

Cluster Autoscaler is not a new word in the Kubernetes world. It’s a program that scales out or scales in the Kubernetes cluster as per capacity demands. It is available on Github here.

For scale-out action, it looks for any unschedulable pods in the cluster and scale-out to make sure they can be scheduled. If CA is running with default settings, then it checks every 10 seconds. So basically it detects and acts for scale-out in 10 secs.

For scale in action it watches nodes for their utilization and any underutilized node will be elected for scale in. The elected node will have to remain in an un-needed state for 10 minutes for CA to terminate it.

CA on AWS EKS

As you know now, CA’s core functionality is spawning new nodes or terminating the un-needed ones, it’s essential it must be having underlying infrastructure access to perform these actions.

In AWS EKS, Kubernetes nodes are EC2 or FARGATE compute. Hence, Cluster Autoscaler running on EKS clusters should be having access to respective service APIs to perform scale out and scale in. It can be achieved by creating an IAM role with appropriate IAM policies attached to it.

Cluster Autoscaler should be running in a separate namespace (kube-system by default) on the same EKS cluster as a Kubernetes deployment. Let’s look at the installation

How to install Cluster Autoscaler on AWS EKS

Creating IAM role

IAM role of Autoscaler needs to have an IAM policy attached to it with the below permissions –

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "sts:AssumeRole",
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:DescribeLaunchConfigurations",
                "autoscaling:DescribeTags",
                "autoscaling:SetDesiredCapacity",
                "autoscaling:TerminateInstanceInAutoScalingGroup",
                "ec2:DescribeLaunchTemplateVersions"
            ],
            "Resource": "*",
            "Effect": "Allow"
        }
    ]
}

You will need to use this policy ARN in eksctl command. Also, make sure you have an IAM OIDC provider associated with your EKS cluster. Read more in detail here.

As mentioned above, we need to have an IAM role in a place that can be leveraged by Cluster Autoscaler to perform resource creation or termination on AWS services like EC2. It can be done manually, but it’s recommended to perform it using eksctl command for its comfort and perfection! It takes care of trust relationship policy and related conditions while setting up a role. If you do not prefer eksctl then refer to this document to create it using AWS CLI or console.

You need to run it from the terminal where AWS CLI is configured.

# eksctl create iamserviceaccount --cluster=<CLUSTER-NAME> --namespace=<NAMESPACE> --name=cluster-autoscaler --attach-policy-arn=<MANAGED-POLICY-ARN> --override-existing-serviceaccounts --region=<CLUSTER-REGION> --approve

where –

  • CLUSTER-NAME: Name of the EKS Cluster
  • NAMESPACE: ns under which you plan to run CA. Preference: kube-system
  • CLUSTER-REGION: Region in which EKS Cluster is running
  • MANAGED-POLICY-ARN: IAM policy ARN created for this role
# eksctl create iamserviceaccount --cluster=blog-cluster --namespace=kube-system --name=cluster-autoscaler --attach-policy-arn=arn:aws:iam::xxxxxxxxxx:policy/blog-eks-policy --override-existing-serviceaccounts --region=us-east-1 --approve
2022-01-26 13:45:11 [&#x2139;]  eksctl version 0.80.0
2022-01-26 13:45:11 [&#x2139;]  using region us-east-1
2022-01-26 13:45:13 [&#x2139;]  1 iamserviceaccount (kube-system/cluster-autoscaler) was included (based on the include/exclude rules)
2022-01-26 13:45:13 [!]  metadata of serviceaccounts that exist in Kubernetes will be updated, as --override-existing-serviceaccounts was set
2022-01-26 13:45:13 [&#x2139;]  1 task: {
    2 sequential sub-tasks: {
        create IAM role for serviceaccount "kube-system/cluster-autoscaler",
        create serviceaccount "kube-system/cluster-autoscaler",
    } }2022-01-26 13:45:13 [&#x2139;]  building iamserviceaccount stack "eksctl-blog-cluster-addon-iamserviceaccount-kube-system-cluster-autoscaler"
2022-01-26 13:45:14 [&#x2139;]  deploying stack "eksctl-blog-cluster-addon-iamserviceaccount-kube-system-cluster-autoscaler"
2022-01-26 13:45:14 [&#x2139;]  waiting for CloudFormation stack "eksctl-blog-cluster-addon-iamserviceaccount-kube-system-cluster-autoscaler"
2022-01-26 13:45:33 [&#x2139;]  waiting for CloudFormation stack "eksctl-blog-cluster-addon-iamserviceaccount-kube-system-cluster-autoscaler"
2022-01-26 13:45:50 [&#x2139;]  waiting for CloudFormation stack "eksctl-blog-cluster-addon-iamserviceaccount-kube-system-cluster-autoscaler"
2022-01-26 13:45:52 [&#x2139;]  created serviceaccount "kube-system/cluster-autoscaler"

The above command prepares the JSON CloudFormation template and deploys it in the same region. You can visit the CloudFormation console and check it.

Installation

If you choose to run CA in different namespace by defining custom namespace in manifest file, then replace kube-system with appropriate namespace name in all below commands.

Download and prepare your Kubernetes to manifest file.

# curl -o cluster-autoscaler-autodiscover.yaml https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml
# sed -i 's/<YOUR CLUSTER NAME>/cluster-name/g' cluster-autoscaler-autodiscover.yaml

Replace cluster-name with EKS cluster name.

Apply the manifest to your EKS cluster. Make sure you have the proper context set for your kubectl command so that kubectl is targeted to the expected EKS cluster.

# kubectl apply -f cluster-autoscaler-autodiscover.yaml
serviceaccount/cluster-autoscaler configured
clusterrole.rbac.authorization.k8s.io/cluster-autoscaler created
role.rbac.authorization.k8s.io/cluster-autoscaler created
clusterrolebinding.rbac.authorization.k8s.io/cluster-autoscaler created
rolebinding.rbac.authorization.k8s.io/cluster-autoscaler created
deployment.apps/cluster-autoscaler created

Add annotation to cluster-autoscaler service account with ARN of the IAM role we created in the first step. Replace ROLE-ARN with IAM role arn.

# kubectl annotate serviceaccount cluster-autoscaler -n kube-system eks.amazonaws.com/role-arn=<ROLE-ARN>
$ kubectl annotate serviceaccount cluster-autoscaler -n kube-system eks.amazonaws.com/role-arn=arn:aws:iam::xxxxxxxxxx:role/eksctl-blog-cluster-addon-iamserviceaccount-Role1-1X55OI558WHXF --overwrite=true
serviceaccount/cluster-autoscaler annotated

Patch CA for adding eviction related annotation

# kubectl patch deployment cluster-autoscaler -n kube-system -p '{"spec":{"template":{"metadata":{"annotations":{"cluster-autoscaler.kubernetes.io/safe-to-evict": "false"}}}}}'
deployment.apps/cluster-autoscaler patched

Edit CA container command to accommodate below two arguments –

  • --balance-similar-node-groups
  • --skip-nodes-with-system-pods=false
# NEW="        - --balance-similar-node-groups\n        - --skip-nodes-with-system-pods=false"
# kubectl get -n kube-system deployment.apps/cluster-autoscaler -o yaml | awk "/- --node-group-auto-discovery/{print;print \"$NEW\";next}1" > autoscaler-patch.yaml
# kubectl patch deployment.apps/cluster-autoscaler -n kube-system --patch "$(cat autoscaler-patch.yaml)"
deployment.apps/cluster-autoscaler patched

Make sure the CA container image is the latest one in your deployment definition. If not you can choose a new image by running –

# kubectl set image deployment cluster-autoscaler -n kube-system cluster-autoscaler=k8s.gcr.io/autoscaling/cluster-autoscaler:vX.Y.Z

Replace X.Y.Z with the latest version.

$ kubectl set image deployment cluster-autoscaler -n kube-system cluster-autoscaler=k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.1
deployment.apps/cluster-autoscaler image updated

Verification

Cluster Autoscaler installation is now complete. Verify the logs to make sure Cluster Autoscaler is not throwing any errors.

# kubectl -n kube-system logs -f deployment.apps/cluster-autoscaler

Creating Identity provider for AWS EKS

A quick post on creating EKS OIDC provider.

EKS OIDC provider!

We will be creating OpenID Connect Identity Provider for the AWS EKS cluster in the IAM service. It will enable to establish trust between AWS account and Kubernetes running on EKS. For using IAM roles with service accounts created under the EKS cluster, it must have the OIDC provider associated with the cluster. Hence, it’s important to have this created at the beginning of the project along with the cluster.

Let’s get into steps to create an OIDC provider for your cluster.

First, you need to get the OpenID Connect provider URL from EKS Cluster.

  • Navigate to EKS console
  • Click on Cluster name
  • Select Configuration tab and check under Details
OpenID URL on EKS console.

Now head back to the IAM console

  • Click on Identity providers under Access management on left hand side menu
  • Click on Add provider button
Add provider in IAM
  • Select OpenId Connet
  • Paste EKS OpenId provider URL in the give field
  • Click on Get thumbprint button
  • Add sts.amazonaws.com in Audience field
  • Click on Add provider button.
IdP thumbprint

Identity provider is created! View its details by clicking on the provider name.

EKS OIDC

If you are using CloudFormation as an IaC tool then below resource block can be used to create OIDC for the EKS cluster :

OidcProvider:
    Type: AWS::IAM::OIDCProvider
    Properties: 
      Url: !GetAtt EksCluster.OpenIdConnectIssuerUrl
      ThumbprintList: 
        - 9e99a48a9960b14926bb7f3b02e22da2b0ab7280
      ClientIdList:
        - sts.amazonaws.com

Where –

  • EksCluster is the logical ID of the EKS cluster resource in the same CloudFormation template.
  • 9e99a48a9960b14926bb7f3b02e22da2b0ab7280 is EKS thumbprint for region us-east-1. Refer this document to get thumbprints.

How to configure kubectl for AWS EKS

Steps to configure CLI for running kubectl commands on EKS clusters.

kubectl with EKS!

kubectl is the command-line utility used to interact with Kubernetes clusters. AWS EKS is AWS managed Kubernetes service broadly used for running Kubernetes workloads on AWS Cloud. We will be going through steps to set up the kubectl command to run with the AWS EKS cluster. Without further due, let’s get into it.

AWS CLI configuration

Install AWS CLI on your workstation and configure it by running –

# aws configure
AWS Access Key ID [None]: AKIAQX3SNXXXXXUVQ
AWS Secret Access Key [None]: tzS/a1sMDxxxxxxxxxxxxxxxxxxxxxx/D
Default region name [us-west-2]: us-east-1
Default output format [json]: json

If you require to switch roles before you can access your AWS environment then configure your CLI with roles.

Once configured, verify your CLI is working fine and reaching to appropriate AWS account.

# aws sts get-caller-identity
{
    "UserId": "AIDAQX3SNXXXXXXXXXXXX",
    "Account": "xxxxxxxxxx",
    "Arn": "arn:aws:iam::xxxxxxxxxx:user/blog-user"
}

kubectl configuration

Install kubectl command if not already. Update kubeconfig with the cluster details you want to connect to –

# aws eks --region us-west-2 update-kubeconfig --name <CLUSTER-NAME>
# aws eks --region us-east-1 update-kubeconfig --name blog-cluster
Added new context arn:aws:eks:us-east-1:xxxxxxxxxx:cluster/blog-cluster to C:\Users\linux\.kube\config

At this point your kubeconfig point to the cluster of your interest. You can execute kubectl commands and those will be executed against the cluster you mentioned above.

# kubectl get pods --all-namespaces
NAMESPACE     NAME                       READY   STATUS    RESTARTS   AGE
kube-system   coredns-66cb55d4f4-hk9p5   0/1     Pending   0          6m54s
kube-system   coredns-66cb55d4f4-wmtvf   0/1     Pending   0          6m54s

I did not add any nodes yet to my EKS cluster hence you can see pods are in a pending state.

If you have multiple clusters configured in kubeconfig then you must switch context to interested cluster before running kubectl commands. To switch context –

# kubectl config use-context <CONTEXT-NAME>
# kubectl config use-context arn:aws:eks:us-east-1:xxxxxxxxxx:cluster/blog-cluster
Switched to context "arn:aws:eks:us-east-1:xxxxxxxxxx:cluster/blog-cluster".

You can verify all configured contexts by analysing ~/.kube/config file.

Troubleshooting errors

If your IAM user (configured in AWS CLI) is not authorized on the EKS cluster then you will see this error –

# kubectl get pods --all-namespaces
error: You must be logged in to the server (Unauthorized)

Make sure your IAM user is authorised in the EKS cluster. This can be done by adding user details under mapUsers field in the configmap named aws-auth residing in kube-system namespace. You will be able to fetch and edit it with the user who built the cluster in the first place. By default, AWS adds the IAM user as system:masters in config map who built the cluster. You have to configure the same IAM user with kubectl and edit this configmap for adding other IAM users to the cluster.

$ kubectl get -n kube-system configmap/aws-auth -o yaml
apiVersion: v1
data:
  mapRoles: |
    - groups:
      - system:bootstrappers
      - system:nodes
      rolearn: arn:aws:iam::xxxxxxxxxx:role/blog-eks-role
      username: system:node:{{EC2PrivateDNSName}}
  mapUsers: |
    - userarn: arn:aws:iam::xxxxxxxxxx:user/blog-user
      username: blog-user
      groups:
        - system:masters