AWS CloudFormation IaC Generator!

AWS has recently introduced a Console-to-Code functionality, allowing users to automatically generate Infrastructure as Code (IaC) templates based on actions performed in the EC2 console. Additionally, AWS offers the CloudFormation IaC Generator, a compact tool embedded in the AWS CloudFormation console. This tool facilitates the creation of CloudFormation templates for pre-existing resources within your account. In the following post, we will walk through this tool and step-by-step instructions on how to use it.

Exploring AWS CloudFormation IaC Generator

What is the IaC generator?

The AWS CloudFormation IaC generator lets you generate a template for AWS resources that are already provisioned in your account and not being managed by CloudFormation. It does makes sense to exclude resources managed by CloudFormation since you already have a template for them!

Firstly, initiate a scan of your account under the IaC generator console to provide the tool with a comprehensive list of resources not governed by CloudFormation. There are a few quota limits on these scans which can be referred here.

Upon completion of the scan, the tool displays a list of resources that are not under CloudFormation management. Begin creating the template by selecting these resources, and the tool will then generate the template in either YAML or JSON format.

The powerful IaC generator capabilities of CloudFormation can also be accessed via the AWS CLI, making it a compelling choice for streamlined and efficient automation.

How to use the IaC generator?

IaC generator console
  • If you are using it for the first time, you might want to hit the Start a new scan button for running the first resources scan of this tool. This scan will identify all the AWS resources in your account that are not managed by CloudFormation.
  • Click on Create template button.
  • On a create wizard, provide the template details.
Template details
  • We are choosing to create a template from scratch. If you want to add a resource in exsiting CloudFormation stack then choose ‘Upadte the template for an existing stack‘. Make an appropriate choice for deletion and update replace policies.
  • On a next screen, choose resources to be included in the IaC template.
Adding scanned resources
  • If the resource you are looking for is not in the list, probably you need to initiate the new scan since your scan inventory might be the old one.
  • Click on Next button and choose related resources if any.
Related resources
  • The tool gathers the list of dependent or related resources from the resource you chose in the last step. Since we selected S3 bucket with no external dependency like KMS key, etc. we are not having any related resource.
  • Click on Next button for the final Review screen of the wizard.
Review
  • Review the selections and click Create template button.
IaC template is ready!
  • The teamplate will be genrated. You will have option to choose YAML or JSON format. You can copy or download it as well. Or you can import it into a CloudFormation Stack by clicking Import to stack button.
  • There is also AWS CDK application command in last tab AWS CDK

Importing the AWS resources in to the CloudFormation stack

It is always advisable to manage AWS resources through IaC templates, as this approach enhance resource management and minimizes the cloud clutter. One common challenge in the IaC journey is incorporating manually created existing resources into IaC. The IaC generator proves valuable in this scenario by assisting in the creation of IaC templates for your resources, facilitating a seamless transition.

Using IaC generator, you can generate the IaC template of existing resources that are not managed by Cloudformation. Once the template is generated, it also offers an option to import it in to the stack. Select that option (the last screesnhot in above section) and it will kickstart Cloudfomation import stack wizard.

Review all the information on the console, and then proceed to initiate the creation of a new stack that will import the chosen resources. I won’t go into a detailed, step-by-step procedure here, as it follows the standard CloudFormation stack process.

The CloudFormation stack will be created and you can see the selected resource is now imported and managed by CloudFormation.

Import resource completed.

This is a convenient method to scan for resources not governed by CloudFormation and subsequently import them into a CloudFormation stack, thereby fostering the adoption of Infrastructure as Code practices within your account and organization.

Conclusion

The IaC generator from AWS is a commendable initiative aimed at assisting customers in achieving 100% compliance with Infrastructure as Code for their infrastructure. It provides a seamless experience, effortlessly identifying non-IaC resources and smoothly importing them into CloudFormation stacks. Furthermore, it expedites the adoption of IaC by automatically generating code templates for you.

Exploring CloudFormation Git Sync!

In late Nov 2023, amazon announced the new CloudFormation Git sync feature. Let’s explore this new feature; how it works, how it impacts CD patterns of Infrastructure as Code (IaC), etc.

CloudFromation Git Sync Feature!

What is the CloudFormation Git sync?

Recently announced CloudFormation Git sync feature lets customers deploy and sync the CloudFormation IaC code directly from remote Git repositories. This will be a game changer in the future as this feature might empower customers to omit the Continuous Deployment tools like Jenkins, GitHub Actions, etc. altogether and hence their maintenance.

In summary, the customer is required to establish a GitHub Connection with AWS and subsequently generate a CloudFormation stack using the relevant IaC template repository information. Once the stack is set up, CloudFormation continually monitors the template files in the remote Git repository. Any new commits in the remote repo, trigger the automatic deployment of the corresponding changes to AWS.

Pre-requisite for CloudFormation Git sync

  • A Git repository with a valid CloudFormation IaC template
  • A GitHub Connector configured for the target AWS account
  • IAM role for Git Sync operations. It should have –
    • Access IAM policy
    • Trust policy

IAM policy

For tighter security control, one can scope down the IAM policy for certain CloudFormation Stacks using a resource block in the SyncToCloudFormation statement.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "SyncToCloudFormation",
            "Effect": "Allow",
            "Action": [
                "cloudformation:CreateChangeSet",
                "cloudformation:DeleteChangeSet",
                "cloudformation:DescribeChangeSet",
                "cloudformation:DescribeStackEvents",
                "cloudformation:DescribeStacks",
                "cloudformation:ExecuteChangeSet",
                "cloudformation:GetTemplate",
                "cloudformation:ListChangeSets",
                "cloudformation:ListStacks",
                "cloudformation:ValidateTemplate"
            ]
        },
        {
            "Sid": "PolicyForManagedRules",
            "Effect": "Allow",
            "Action": [
                "events:PutRule",
                "events:PutTargets"
            ],
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "events:ManagedBy": [
                        "cloudformation.sync.codeconnections.amazonaws.com"
                    ]
                }
            }
        },
        {
            "Sid": "PolicyForDescribingRule",
            "Effect": "Allow",
            "Action": "events:DescribeRule",
            "Resource": "*"
        }
    ]
}

Trust policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "TrustPolicy",
            "Effect": "Allow",
            "Principal": {
                "Service": "cloudformation.sync.codeconnections.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

How to create a stack using CloudFormation Git Sync?

Let’s look at the step-by-step procedure to deploy a new CloudFormation stack using Git Sync.

Environment setup considered here –

  • An AWS account configured with GitHub Connection in the Developer Tools Console.
  • A Private GitHub repository cfn-git-sync-poc that hosts a valid CloudFormation template at cloudformation/s3-stack.yaml
  • An IAM role was cfn-github-role created on the target AWS account with above stated policies.

Now, it’s time to get our hands dirty!

  • Log in to the AWS CloudFormation console. Click on the Create stack dropdown button and choose With new resources (standard)
  • On a Create stack page, select Template is ready and Sync from Git
Create stack using Git
  • Start providing the stack details like the stack name of your choice and select automatic Deployment file creation. If you wish to create a deployment file, please refer to this file format details.
Stack details
  • Then, specify the Git repository information. Ensure that you have a configured and active GitHub connection to fill in the fields in this segment. Choose the repository containing the Infrastructure as Code (IaC), specify the deployment branch, and designate a Deployment file path where AWS will store the newly committed deployment file. Lastly, furnish the IAM role details that will be utilized for executing Git Sync operations.
Repository details
  • Lastly, enter the file path for the CloudFormation IaC template inside the remote Git repo and define the parameters that are declared in the template. These details will be used to build the Deployment file.
Template and parameter details
  • Click the Next button and you will enter the Stack options page.
  • Stack options are the same as any other CloudFormation stack hence we will not discuss them here.
  • After verifying/modifying Stack options click Next button on the page.
  • Now, you will enter the review page where you can see the Deployment file constructed using the details provided and it will be committed to the remote Git repo on defined location
Deployment file!
  • Review the rest of the configurations and click the Submit button.
  • Now, CloudFormation will commit and push the Deployment file to the remote Git repo by raising a Pull Request (PR). The message will be flagged on the page. Meanwhile, since the Deployment file is not found in the remote repo, the Provisioning status will be marked as Failed. Once PR is approved, Deployment file will be available in the remote repo.
The First deploy using CloudFormation Git Sync
  • Head over to GitHub and you should see a PR raised by Amazon to commit the deployment file into the remote repo.
PR raised by AWS Connector
Deployment file creation by AWS Connector in PR
  • Review, approve, and merge the PR.
  • Once PR is merged the code modification is detected by CloudFormation and it starts provisioning the stack.
Stack provisioning started
  • Once all defined resources are created, stack creation will complete and CloudFormation keep monitoring the Git repository for any new changes.
The Stack is deployed successfully!

In multiple environments such as development, testing, staging, and production, it’s possible to utilize distinct Stack Deployment files while leveraging a single CloudFormation IaC template file. These varied Deployment files can be organized into separate folders within the same repository for better segregation. By selecting the appropriate deployment file based on the environment where the CloudFormation stack is being established, the process becomes more streamlined. Something like –

cloudformation
├── dev
│   └── deployment.yaml
├── test
│   └── deployment.yaml
├── staging
│   └── deployment.yaml
├── production
│   └── deployment.yaml
└── template.yaml

By selecting the appropriate deployment file based on the environment where the CloudFormation stack is being established, the process becomes more streamlined. I anticipate that the Git Sync feature will evolve with additional capabilities in the future, potentially prompting customers to reconsider the necessity of separate Continuous Deployment (CD) software.

What do you think about this feature?

What are your thoughts on the CloudFormation Git Sync feature? Could it potentially revolutionize the game? Will it make several Continuous Deployment tools obsolete? It feels like an ArgoCD for Kubernetes, operating in a remarkably similar manner. For sure, it may not yet offer extensive control over commit-wide deployments, but the potential for future enhancements is exciting. This feature appears to transform the Infrastructure as Code (IaC) landscape for AWS customers, possibly luring some back from alternative IaC platforms. Witnessing its development and utilization in enterprise productions is going to be exciting!

How to add a GitHub connection from an AWS account?

In this blog post, we will guide you through a step-by-step process to establish a GitHub connection in an AWS account.

Creating GitHub Connection for AWS

What is a connection?

Firstly, let’s understand the concept of a connection in the AWS world. In AWS, a connection refers to a resource that is used for linking third-party source repositories to various AWS services. AWS provides a range of Developer tools, and when integration is required with third-party source repositories such as GitHub, GitLab, etc., the connection serves as a means to achieve this.

Adding a connection to connect GitHub with AWS

Let’s dive into the step-by-step procedure to add a connection that helps your AWS account to talk with your personal GitHub repositories.

AWS Developer Tools Connection console
  • On a wizard screen, select Github and name your connection.
  • Click on Connect to GitHub button
Create Connection wizard
  • Now, AWS will try to connect to GitHub and access your account. Ensure you are already logged into GitHub and you should see below authorization screen. If not, you will need to login to GitHub first.
Authorize AWS connector for GitHub
  • You can review the permissions being allowed to AWS on your account by clicking Learn more link on this screen.
  • Click on Authorize AWS Connector for GitHub
  • After authorizing the AWS connector, you should be back to the GitHub connection settings page.
  • At this point, AWS requires a GitHub Apps detail that will allow Amazon to access your GitHub repositories and make modifications to them.
  • AWS also offers to create a GitHub app on your behalf if it’s not created already. You can use the Install a new app button here to let AWS create the GitHub app in your account.
  • In that case, you need to verify the configuration (repo selection) and then click the Install button.
Installing AWS Connector GitHub App
  • Once the App is created, the GitHub Apps ID will be populated in the wizard or manually enter the ID if the App is already created.
GitHub Apps details for creating a connection
  • Click on Connect button
  • You should be greeted with a success message with the new connection created!
GitHub Connection is created!

Your GitHub connection is now ready. You can use this connection in compatible AWS services and let those services access your Github repositories.

Exploring the Latest AWS Console-to-Code Feature

On November 2023 AWS announced the Preview going live for the new feature AWS Console-to-Code. Two months later, in this blog, we will explore this feature, learn about how to use it, what are the limitations, etc.

AWS Console-to-Code

What is the AWS Console-to-Code feature?

It’s the latest feature from AWS made available in the EC2 console that leverages Generative AI to convert the actions performed on the AWS EC2 console into the IaC (infrastructure as Code) code! It’s a stepping stone towards IaC creation methods in the world of AWS cloud.

The actions carried out on the AWS console during the current session are monitored by the feature in the background. These recorded actions are then made available to the user to select up to 5 of these actions, along with their preferred language. AWS then utilizes its Generative AI capabilities to automatically generate code that replicates the objectives achieved through manual actions on the console.

It also generates the AWS CLI command alongside the IaC code.

The usefulness of the AWS Console-to-Code feature

With the current list of limitations and the preview stage, this feature might not be a game changer but it does have potential in the future. The AWS Console-to-Code feature will surely help developers and administrators to get the IaC skeleton quickly to start from and speed up the IaC coding with less effort.

This feature simplifies the process of generating AWS CLI commands, eliminating the need to constantly consult documentation and manually construct commands with the correct arguments. As a result, it accelerates automation deliveries with reduced effort.

By the way, there is no additional cost to use Console-to-Code so it doesn’t hurt to employ it for initial IaC drafting!

Limitation of AWS Console-to-Code feature

  • Currently, it’s in the ‘Preview’ phase.
  • Only available in North Virginia (us-east-1) region as of today.
  • It can generate IaC code in the listed types and languages only –
    • CDK: Java
    • CDK: Python
    • CDK: TypeScript
    • CloudFoprmation: JSON
    • CloudFoprmation: YAML
  • It does not retain data across sessions. The actions that are performed in the current session are made available for Code Generation. Meaning if you refresh the browser page, it resets the action list and starts recording afresh.
  • Up to 5 actions can be selected to generate code.
  • Actions from the EC2 console only are recorded. However, I observed even a few actions like Security Group creation or Volume listing, etc. are not being recorded.

How to use the AWS Console-to-Code feature

  • Login to the EC2 console and select region N. Virginia (us-east-1)
  • On the left-hand side menu, ensure you have a Console-to-Code link.
  • Perform some actions in the EC2 console like launching an instance, etc.
  • Navigate to Console-to-Code by clicking on the link in the left-hand side menu.
  • It will present you with a list of recorded actions. Select one or a maximum of 5 actions for which you want to generate code. You can even filter the recorded actions as per their Type:
    • Show read-only: Read-only events like Describe*
    • Show mutating: Events that modified/created/deleted or altered the AWS resources.
  • Click on the drop-down menu and select the type and language for the code.
AWS console-to-code recorded actions
  • It should start generating code.
  • After code generation, you have an option to copy or download it. You can also copy the AWS CLI command on the same page.
Python code generated by AWS Console-to-Code
  • It also provides the generated code’s explanation at the bottom of the code.

Crafting a One-Page resume website without spending a dollar

This post details my side project—a one-page resume website that I constructed without incurring any expenses.

What is one-page resume website?

A one-page resume website serves as a digital manifestation of your resume, offering a modern alternative to the traditional paper-based format. While PDF resumes remain effective for email sharing, the dynamic nature of resume websites provides an excellent digital representation of your profile. Particularly advantageous in fields like digital, marketing, and art, a resume website becomes an ideal platform to showcase portfolios, offering a level of presentation that is challenging to achieve with traditional paper or PDF formats.

I created my one-page resume website as a side project. And when it’s budget-friendly, why not!

We are talking about: https://resume.kerneltalks.com

The frontend

The one-page resume website is static, meaning its content doesn’t change frequently. Being a single-page site eliminates the need for a content management system, making a basic HTML website the ideal choice – especially if you’re not into web design and don’t have a substantial budget or time commitment.

Since web design isn’t my expertise, I opted for an HTML template as a foundation. If you know HTML, customizing the template with your data is a straightforward task, and your site is ready to be hosted.

I chose a free HTML template, so there’s zero investment made. It’s essential to respect the personal use terms specified by the template providers and include a link back to them on your website.

The backend

Given that it’s a one-page static HTML website, it makes sense to minimize hosting expenses (in terms of money and time). Amazon S3 website hosting seems to be the optimal choice. Uploading files to an S3 bucket, configuring a few settings, and your website is live! You can find the complete procedure here.

Since it’s a single HTML page, my website’s size is a mere 4MB, resulting in negligible (I would say zero) S3 storage charges.

Amazon S3 website hosting provides an AWS-branded HTTP endpoint. If you desire a custom domain or HTTPS protocol, integrating CloudFront with S3 is the solution.

I utilized a subdomain on my existing registered domain and set up a CNAME entry for it against CloudFront’s DNS.

If you don’t have a domain, expect a registration investment of around ~$13 per year. In my case, it was zero. Additionally, CloudFront bills based on the number of requests, so your costs will depend on your website’s traffic. Considering the nature of a personal resume website, significant traffic, and associated costs are negligible.

For HTTPS, you can create a free SSL certificate in Amazon Certificate Manager and use it with the CloudFront distribution.

The result

The outcome is a serverless, responsive, compact, HTML-based, secure, static, one-page resume website, leveraging the top-tier CDN, Amazon CloudFront, all achieved with no upfront investment!

Architecture

You could make it more secure by adding a WAF (Web Application Firewall) on top of CloudFront though. Since it was a zero dollar project, WAF was not included in the design at that time.

Scaling with AWS PrivateLink

In this article, we’ll discuss the scalability aspects of AWS PrivateLink. We’ll examine how the expansion of the service consumer VPC count impacts AWS PrivateLink implementation and its management. Additionally, we will delve into key considerations for designing a scalable solution using AWS PrivateLink.

Scale with AWS PrivateLink

AWS PrivateLink Primer

AWS PrivateLink provides a method for making your service accessible to other VPCs through a secure, private network connection over the AWS backbone network. This ensures that your data remains within the AWS network, thereby improving security and lowering data transfer expenses compared to when utilizing the public internet. The basic architecture of AWS PrivateLink is depicted as follows –

AWS PrivateLink architecture

To set up the connection, you must establish an Endpoint Service within the service provider VPC, using a network/gateway load balancer. In the service consumer VPC, you should create a VPC endpoint that links to this Endpoint Service. The endpoint policies facilitate access control by specifying which principles are permitted to connect to the Endpoint Service. Please refer to this AWS documentation for more details.

Scalability aspect

Now, let’s discuss the scalability aspect concerning AWS PrivateLink. When we talk about scalability, we’re referring to the expansion of the number of VPCs acting as service consumers. In scenarios where you have critical or shared services hosted within the service provider VPC and made accessible through AWS PrivateLink for consumption by services located in different VPCs, it’s clear that the count of consumer VPCs will keep increasing. Therefore, it becomes essential to take scalability considerations into account.

Various VPC endpoints situated in different consumer VPCs can establish connections with a single endpoint service located in the service provider VPC. Hence, you can think of a high-level architecture as below –

Multiple VPC endpoints to one endpoint service

Furthermore, it’s important to note that AWS PrivateLink can enable communication to endpoints located in different AWS Regions through the use of Inter-Region VPC Peering.

I recommend reading this AWS blog, which outlines an architecture involving PrivateLink and Transit Gateway. This approach has the potential to significantly decrease the number of VPC endpoints, streamline the deployment of VPC endpoints, and offer cost optimization benefits, especially when implementing solutions at scale.

Scaling considerations

While it’s possible to configure many-to-one connectivity using AWS PrivateLink, there are several important factors to keep in mind when considering this type of scaling:

  • Cost and management: As you introduce new consumer VPCs to AWS PrivateLink, you’ll also be adding new VPC endpoints to your infrastructure, which can add to your billing and infrastructure management overhead.
  • AWS PrivateLink quotas: Be sure to take into account AWS PrivateLink quotas, as these define the limits for various aspects of your PrivateLink setup.
  • Network throughput: VPC endpoints support a maximum throughput of 100Gbps. This is an important consideration for applications that have high network demands when exposed through AWS PrivateLink.
  • LB quotas: Be considerate about network load balancer quotas/gateway load balancer quotas.
  • IP requirements: AWS PrivateLink consumes a certain number of IP addresses for Load Balancers and endpoints from your VPC’s IP address pool. Ensure that your VPCs can accommodate these IP requirements without causing IP address exhaustion.

Transit Gateway as an alternative?

Let’s look at Transit Gateway if it can be an alternative in a continually expanding VPC environment.

  • If unidirectional traffic is your primary requirement, AWS PrivateLink is the choice.
  • For a cost-efficient solution, AWS PrivateLink is certainly more economical than Transit Gateway.
  • It’s worth noting that Transit Gateway is not suitable when dealing with VPCs that have overlapping CIDRs.
  • In a nutshell, Transit Gateway becomes a viable alternative only when you are designing a highly scalable solution involving a significantly huge number of participating VPCs with non-overlapping CIDRs, and your solution prioritizes simplicity and reduced management overhead over cost considerations.

Understanding the basics of Lambda Function URLs

In this guide, we’ll take you through the fundamental concepts of Lambda Function URLs. We’ll discuss their definition, explore their applications, and address security considerations, providing a comprehensive overview.

What is the Lambda Function URL?

It’s a dedicated, unique, and static URL for your Lambda function, enabling remote invocation of the backend Lambda function over the network call. This straightforward and budget-friendly method simplifies Lambda function invocation, bypassing the need for managing complex front-end infrastructure like API Gateway, Load Balancers, or CloudFront. However, this comes at the expense of advanced features provided by these services.

It follows the format:

https://<url-id>.lambda-url.<region>.on.aws

Why to use Lambda Fuction URL?

  • Creating them is quite straightforward and simple. The AuthType (security) is the only configuration you need to provide. CORS config is optional.
  • They come at no additional cost.
  • Once configured, they require minimal maintenance.
  • For straightforward use cases, they can replace the need for designing, managing, and incurring the costs of front-end infrastructure, such as API Gateway.
  • They are most appropriate for development scenarios where you can prioritize other aspects of applications/architecture over the complexity of Lambda invocation methods.

When to use Lambda Function URLs?

Lambda Function URLs serve a valuable role in accelerating the testing and development of the application, by prioritizing Lambda invocations in the application’s progress, while the method of invocation takes a backseat.

In production, they’re practical when your design doesn’t necessitate the advanced features provided by alternative invocation methods like API Gateway or Load Balancers, etc.

These URLs are also beneficial when dealing with a limited number of Lambdas, offering a simple, cost-effective, and maintenance-free approach to invocations.

How to secure Lambda Function URLs?

You can manage access to Lambda Function URLs by specifying the AuthType, which offers two configurable options:

  1. AWS_IAM: This allows you to define AWS entities (users or roles) that are granted access to the function URL. You need to ensure a proper resource policy is in place allowing intended entities access to Action: lambda:InvokeFunctionUrl
  2. NONE: Provides public, unauthenticated access. Use this option cautiously, as it allows unrestricted access. When you choose this option, Lambda automatically creates a resource-based policy with Principal: * and Action: lambda:InvokeFunctionUrl and attaches to function.

It’s important to remember that Lambda’s resource-based policy is always enforced in conjunction with the selected AuthType. Please read this AWS documentation for more details.

The Lambda resource policy can be configured at Lambda > Configuration > Permissions > Resource-based policy statements.

With the basics of Lambda Function URLs in mind, refer to how to create Lambda Function URL and kick-start your journey with them!

VPC Peering vs AWS PrivateLink vs Transit Gateway

In this article, we will compare three different ways to cross-VPC communication: VPC peering, AWS PrivateLink, and Transit Gateway. We’ll also discuss when to use each one and help you choose the best option. It’s important to note that we won’t dive deep into each implementation; instead, we’ll focus on their advantages, limitations, and ideal usage scenarios.

Peering or PrivateLink or Transit Gateway!

When operating Cloud Native applications, maintaining private and secure communication between applications is crucial. These applications may be distributed across various VPCs, whether within the same account or across different accounts. In such scenarios, we establish cross-VPC communication through the use of VPC peering, AWS PrivateLink, or Transit Gateway.

Let’s look at them one by one.

VPC Peering

It is a networking connection between two VPCs where network traffic can be routed across two VPCs. Read more about VPC peering here. Let’s look at the pros and cons of the VPC peering –

Advantages

  • Relatively straightforward to configure. It’s an invite-accept configuration.
  • Create network connectivity between two VPCs, resulting in a scalable network connection solution, enabling all resources in one VPC to communicate with resources in the other.
  • A simple, secure, and budget-friendly option.
  • VPC Peering comes at no additional cost; you are only billed for data transfer costs. The data transfer cost for VPC peering within the same Availability Zone (AZ) is completely free.

Limitations

  • Peering VPCs with overlapping CIDRs is not possible.
  • Peering is non-transitive.

Ideal usage

  • Individual VPC-to-VPC connections.
  • A situation that demands full network connectivity with other VPC.
  • A use case where a simple and cost-effective solution is expected.
  • This approach is not well-suited for handling a large number of VPCs. In such cases, Transit Gateway is the preferred solution. Since mesh networking between a large number of VPCs using peering adds complexity to the architecture.

AWS PrivateLink

It’s an AWS service that enables you to access AWS services over a private network connection, rather than over the public internet. Read more about AWS PrivateLink here.

Advantages

  • A selective sharing of services between VPCs. Unlike VPC peering, where all VPC network access is unrestricted, AWS PrivateLink permits only specific services to be accessible across VPC.
  • This is a secure solution for private connectivity of services across VPCs or on-premises.

Limitation

  • It’s a connectivity option between your VPC and AWS services, not between VPCs. For VPC-to-VPC connectivity, consider VPC peering or Transit Gateway.
  • The setup process is complex.
  • It necessitates the creation of Network Load Balancers (NLB), Application Load Balancers (ALB), and Gateway endpoints, which introduces additional costs and management overhead.
  • Enabling PrivateLink for existing services requires design adjustments, including the incorporation of the above components into the current architecture.

Ideal usage

  • It can be valuable in hybrid cloud configurations to make services accessible privately between VPCs and on-premises environments.
  • It’s beneficial for accessing AWS’s public services like Amazon DynamoDB and Amazon S3 through AWS’s backbone network, ensuring secure, fast, and reliable connectivity while potentially reducing network costs.
  • It’s applicable for creating isolation by selectively exposing specific services to particular VPCs.

Transit Gateway

AWS Transit Gateway is a service that makes network routing easier for your Amazon Virtual Private Clouds (VPCs), on-premises networks, and VPN connections. It helps to simplify and centralize network routing. Read more about Transit Gateway here.

Advantages

  • A concrete method to link numerous VPCs, network devices, VPN connections, or an AWS Direct Connect gateway, featuring transitive routing for the simplification of network design.
  • Multicast support facilitates effortless distribution of content and data to various endpoints.
  • Efficiently manage and control large-scale networking via a single, unified service.

Limitations

Ideal usage

  • It is well-suited for hub-and-spoke architectures, designs that involve a significant number of VPCs, transitive routing needs, and global or multi-region network designs.
  • It is designed for scalability and is particularly suitable for continuously expanding environments.
  • It’s valuable for efficiently managing network connectivity among a large number of diverse participants.

Which one should I use?

As we’ve discussed, each of these three networking approaches has its specific areas of focus tailored to particular use cases. Consequently, the choice depends entirely on your unique requirements.

VPC Peering is an excellent choice when you need to connect a limited number of VPCs with minimal cost implications and management overhead.

AWS PrivateLink is the right option when you intend to selectively expose services to other VPCs, although it involves additional costs, extra networking components, and the associated management overhead.

Transit Gateway can serve as an alternative to VPC Peering as you scale to a larger number of VPCs, simplifying network management at the expense of some additional costs. It’s also well-suited for connecting various network entities with anticipated scalability.

How to overprovision the EKS cluster?

In this article, I will guide you through the process of overprovisioning the EKS cluster, through a detailed step-by-step approach. Furthermore, in the later section, we will explore methods for testing the functionality of overprovisioning.

EKS Cluster overprovisioning!

If you want to understand what is overprovisioning, I recommend referring to my previously published article on overprovisioning.

Let’s get started.

Prechecks

Ensure your setup adheres to the below prerequisites. It should be; unless you are running ancient infrastructure 🙂

  • Ensure you are running Kubernetes 1.14 or later since pod priority and preemption are first introduced in 1.14 version.
  • Verify Cluster Autoscaler’s default priority cutoff is set to -10. It is the default since version 1.12.

The manifests provided in this article are with bare minimum specifications. You need to modify them depending on your requirements like the use of non-default namespace, custom labels or annotations, etc. The method of deploying these manifests varies. The simple way is with kubectl apply -f manifest.yaml or the complex way is via Helm charts or ArgoCD apps, etc.

Defining the PriorityClass 

In Kubernetes, we can set a custom priority for pods using something called PriorityClass. In order to configure overprovisioning, you need to use a PriorityClass lower than zero because the default pod priority is zero. It allows you to set the lower priority for pause pods and ensures that these pods are preempted when the time comes. To deploy this custom PriorityClass on your cluster, use the following simple manifest:

apiVersion: scheduling.k8s.io/v1
description: This priority class is for overprovisioning pods only.
globalDefault: false
kind: PriorityClass
metadata:
  name: overprovisioning
value: -1

Define Autoscaler strategy

A ConfigMap is utilized to define the autoscaler policy for overprovisioning deployment. The process of calculation is explained here. Please refer to the below manifest:

apiVersion: v1
data:
  linear: |-
    {
      "coresPerReplica": 1,
      "nodesPerReplica": 1,
      "min": 1,
      "max": 50,
      "preventSinglePointFailure": true,
      "includeUnschedulableNodes": true
    }
kind: ConfigMap
metadata:
  name: overprovisioning-cm

RBAC Config

Next is RBAC configuration, with the three components: ServiceAccount, ClusterRole, and ClusterRoleBinding. These components give autoscaler deployment the necessary access to adjust the size of the pause pod deployment based on the required scaling. Please refer to the manifest:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: overprovisioning-sa
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: overprovisioning-cr
rules:
  - apiGroups:
      - ''
    resources:
      - nodes
    verbs:
      - list
      - watch
  - apiGroups:
      - ''
    resources:
      - replicationcontrollers/scale
    verbs:
      - get
      - update
  - apiGroups:
      - extensions
      - apps
    resources:
      - deployments/scale
      - replicasets/scale
    verbs:
      - get
      - update
  - apiGroups:
      - ''
    resources:
      - configmaps
    verbs:
      - get
      - create
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: overprovisioning-rb
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: overprovisioning-cr
subjects:
  - kind: ServiceAccount
    name: overprovisioning-sa

Pause pods deployments

Creating pause pods is an easy task. You can use a custom image to set up a healthy pod that acts as a placeholder in the cluster. The size of this pod, CPU, and memory configurations, can be adjusted based on your needs. Make sure to calculate the appropriate size to effectively block cluster resources using pause pods.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: overprovisioning
  name: overprovisioning
spec:
  selector:
    matchLabels:
      app: overprovisioning
  template:
    metadata:
      labels:
        app: overprovisioning
    spec:
      containers:
          image: nginx (any custom image)
          name: pause
          resources:
            limits:
              cpu: Ym
              memory: YMi
            requests:
              cpu: Xm
              memory: XMi
      priorityClassName: overprovisioning

Autoscaler deployment

Proceed with the deployment of the autoscaler. The objective of these pods is to supervise the replica count of the above pause pod deployment, based on the linear strategy employed by the autoscaler. This mechanism allows for the expansion or reduction of replicas and the efficient allocation of cluster resources through the utilization of pause pods. Execute the deployment by employing the provided manifest below:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: overprovisioning-as
spec:
  replicas: 1
  selector:
    matchLabels:
      app: overprovisioning-as
  template:
    metadata:
      labels:
        app: overprovisioning-as
    spec:
      containers:
        - command:
            - /cluster-proportional-autoscaler
            - '--namespace=XYZ'
            - '--configmap=overprovisioning-cm'
            - '--target=deployment/overprovisioning'
            - '--logtostderr=true'
            - '--v=2'
          image: gcr.io/google_containers/cluster-proportional-autoscaler-amd64:{LATEST_RELEASE}
          name: autoscaler
      serviceAccountName: overprovisioning-sa

You now have an overprovisioning mechanism that allows you to allocate more resources than necessary to your cluster. To verify if it’s working correctly, you can perform the below test.

Testing the functionality

To prevent the need for scaling the entire cluster, please execute the tests on a single node within the cluster by employing Pod Affinity. Identify the node with running pause pods and direct the creation of new pods to this specific node through pod affinity specs.
Do not define any PodPriority in this test deployment, Kubernetes will automatically assign a Priority of 0 to this deployment. Meanwhile, our pause pods are configured with a priority of -1, indicating lower priority compared to regular workload pods or these test pods.
Upon creating this deployment, it should trigger the eviction of the pods on the designated node to prioritize the new test pods with a higher priority.

The pause pods should be terminated, and the new test pods should swiftly transition into a running state on this designated node. The terminated pause pods will be subsequently re-initiated as pending by their respective replica set and will search for a place on another node to run.

Basics of Overprovisioning in EKS Cluster

This article talks about the fundamental concepts of overprovisioning within a Kubernetes Cluster. We will explore the definition of overprovisioning, its necessity, and how to calculate various aspects related to it. So, without further delay, let’s dive right in.

Overprovisioning basics!

Need of Overprovisioning

It’s a methodology for preparing your cluster for future demands from hosted applications to prevent potential bottlenecks.

Let’s consider a scenario in which the Kubernetes-hosted application needs to increase the number of pods (horizontal scaling) beyond the cluster’s available resources. As a result, additionally spawned pods end up in a pending state because there are not enough resources on the cluster to schedule them. Even if you are using the Elastic Kubernetes Service (EKS) Cluster Autoscaler (referred to as CA), there is a minimum 10-second delay for CA to recognize the need for more capacity and communicate this requirement to the Auto Scaling Group (ASG). Furthermore, there is an additional delay as the ASG scales out, launches a new EC2 instance, goes through the boot-up process, executes necessary bootstrap scripts, and is marked as READY by Kubernetes in the cluster. This entire process typically takes a minute or two, during which time application pods remain in a pending state.

To avoid these delays and ensure immediate capacity availability for unscheduled pods, overprovisioning can be employed. This is accomplished through the use of pause pods.

Concept of pause pods

Pause pods are non-essential, low-priority pods that are created to reserve cluster resources, such as CPU, memory, or IP addresses. When critical pods require this reserved capacity, the scheduler evicts these low-priority pause pods, allowing the critical pods to utilize the freed-up resources. But, what happens to these evicted pause pods?

After being evicted, these pause pods are automatically re-created by their respective replica set and initially start in a pending state. At this point, the Cluster Autoscaler (CA) intervenes, as explained earlier, to provide the additional capacity required. Since pause pods do not serve any specific applications, it is acceptable for them to remain in a pending state for a certain period. Once the new capacity becomes available, these pause pods consume it, effectively reserving it for future requirements.

How does scale-in work with Pause pods?

Now that we’ve grasped how pause pods assist in scenarios requiring cluster scale-out, the next question arises: could these pause pods potentially hold onto resources unnecessarily and block your cluster’s scale-in actions? Here’s the scenario: when the Cluster Autoscaler (CA) identifies nodes with light utilization (perhaps containing only pause pods), it proceeds to evict these low-priority pause pods as part of the node termination process (a scale-in action). Subsequently, these evicted pods are re-created in a pending state. However, during this period, the node count has decreased by one, and the cluster-proportional-autoscaler (HPA) recalculates the new required number of pause pods. This number is typically lower, resulting in the termination of the newly pending pause pods.

Pause pod calculations

Pause pod deployment should be configured with the cluster-proportional-autoscaler i.e. HPA. Set it to use Linear mode by defining the below configuration in the respective ConfigMap as follows:

linear:
  {
    "coresPerReplica": 1,
    "nodesPerReplica": 1,
    "min": 1,
    "max": 50,
    "preventSinglePointFailure": true,
    "includeUnschedulableNodes": true
  }

This configuration means:

  • coresPerReplica: One pause pod per core, meaning one pause pod for each core.
  • nodesPerReplica: One pause pod per node, signifying one pause pod for each node.
  • min: At least one pause pod.
  • max: A maximum of 50 pause pods.

When both coresPerReplica and nodesPerReplica are used, the system calculates both values and selects the greater of the two. Let’s calculate for a cluster with 4 nodes, each using the m7g.xlarge instance type, which has 4 cores per node:

  • 4 nodes, meaning 4 pause pods (one per node).
  • 16 cores, which equates to 16 pause pods (one per core).

So, in this case, the cluster-proportional-autoscaler will spawn a total of 16 pause pods for the cluster.

Now, let’s explore the process of calculating the CPU request configuration for Pause pods and, as a result, determine the overprovisioned capacity of the cluster.

Let’s consider, each individual pause pod is set to request 200 milliCPU (mCPU); from the cluster’s computing resources point of view, it amounts to 20% of a single CPU core’s capacity. Given that we are using one pause pod per CPU core, this effectively results in overprovisioning 20% of the entire cluster’s computational resources.

Depending on the criticality and frequency of spikes in the applications running on the cluster, you can assess the overprovisioning capacity and compute the corresponding configurations for the pause pods.