Skip to content

Streamlining AMI Updates for Worker Nodes in Amazon EKS Clusters

Imagine this scenario: your clusters, the backbone of your infrastructure, are currently running worker nodes based on an older AMI version. An alarming email from the security team informs you that the AMI ID being used has serious security vulnerabilities. The urgency to address issues like this becomes paramount because these pose a direct threat to the integrity and security of your infrastructure.

Critical security issues like this call for the ability for quick action. How can nodes across all clusters be updated quickly?

Scenarios like this are exactly why we have invested heavily in developing the Fleet Plans functionality. This can help you identify and update all of the impacted worker nodes in various clusters smoothly in this situation.

sequenceDiagram
    autonumber
    participant admin as Admin
    participant rafay as Rafay

    rect rgb(191, 223, 255)
    Note over admin,rafay: Upgrades of Insecure AMIs
    admin->>rafay: Identify Impacted EKS Clusters <br> (Input = AMI ID)
    admin->>rafay: Create Fleet Plan <br> (Impacted Clusters)
    admin->>rafay: Execute Fleet Plan
    admin->>rafay: Verify all EKS clusters <br>in fleet are using new AMI
    end

Identifying Impacted Clusters

It is always a good practice to have a clear understanding of the current state of your EKS clusters. i.e. which AMIs are being used to power the worker nodes across all your AWS accounts?

As a centralized management platform, the Rafay platform provides an extremely convenient API for this. It will give you a list of all the AMIs linked to your clusters along with their nodegroup names.

curl -s -H "X-RAFAY-API-KEYID:$RAFAY_API" https://console.rafay.dev/edge/v1/projects/<project_id>/dashboard/clusters/nodegroupamis/ | jq
Click to show/hide JSON output returned by the above API
{
    "amis": [
        {
            "ami": "AL2_x86_64",
            "clusters": [
                {
                    "name": "demo-eks-cluster-1",
                    "id": "kz8z9qk",
                    "project_id": "2l5l9ek",
                    "nodegroups": [
                        {
                            "name": "ng-f3c86cb1",
                            "id": "mpl5qx2"
                        }
                    ]
                },
                {
                    "name": "demo-eks-cluster-2",
                    "id": "k01ynl2",
                    "project_id": "2l5l9ek",
                    "nodegroups": [
                        {
                            "name": "ng-e12df902",
                            "id": "m1wldyk"
                        }
                    ]
                },
                {
                    "name": "demo-eks-cluster-3",
                    "id": "kz8q6gk",
                    "project_id": "2l5l9ek",
                    "nodegroups": [
                        {
                            "name": "ng-cc7d39bf",
                            "id": "2ql8z5m"
                        }
                    ]
                }
            ]
        },
        {
            "ami": "ami-0d9781fbca2991de5", <---AMI ID Before Update 
            "clusters": [
                {
                    "name": "demo-eks-cluster-1",
                    "id": "kz8z9qk",
                    "project_id": "2l5l9ek",
                    "nodegroups": [
                        {
                            "name": "ami-ng",
                            "id": "2w7q8ek"
                        }
                    ]
                },
                {
                    "name": "demo-eks-cluster-2",
                    "id": "k01ynl2",
                    "project_id": "2l5l9ek",
                    "nodegroups": [
                        {
                            "name": "custom-ami-node2",
                            "id": "2lr0jq2"
                        }
                    ]
                },
                {
                    "name": "demo-eks-cluster-3",
                    "id": "kz8q6gk",
                    "project_id": "2l5l9ek",
                    "nodegroups": [
                        {
                            "name": "custom-ami-node",
                            "id": "kognyj2"
                        }
                    ]
                }
            ]
        }
    ]
}

Identify Impacted Clusters

If you know the AMI ID that is impacted due to the security vulnerability, you can add filters and return only the clusters and nodegroups that are using that AMI ID. For example, suppose you want to find out all your clusters and nodegroups using the AMI ID ami-02b4071bb2bda80f3.

curl -s -H "X-RAFAY-API-KEYID:$RAFAY_API" https://console.stage.rafay.dev/edge/v1/projects/2l5l9ek/dashboard/clusters/nodegroupamis/ | jq '.amis[] | select(.ami == "ami-02b4071bb2bda80f3") | {ami, clusters: [.clusters[].name], nodegroups: [.clusters[].nodegroups[].name]}'

This output shows the AMI ID in use for clusters in a project. You can use the same API and loop through every project in your Rafay Org to find the AMI IDs used in other clusters as well.


Using Fleet Plans to Update Impacted AMIs

Rafay's Fleet Plan provides an efficient way for managing the lifecycle of 100s of clusters. For example, it can help you update impacted AMIs for all worker nodes in your clusters. What is really cool is that you can do this for 100s of worker nodes that are part of different node groups at the same time. It's super fast and easy.

Follow along as we explore this upgrade journey for our example with updating vulnerable AMIs in our organization. Since AMI upgrades in an EKS cluster are rolling upgrades in nature, we know that they do not cause application downtime. Given the serious nature of the vulnerability, we wish to move fast and upgrade everything at once.

Create Fleet Plan

Our fleet plan (shown below) will have all the instructions needed to upgrade the nodes in our node group using a custom AMI. We're focusing on the node groups with the problematic AMI, upgrading them all at once to the latest AMI that's got everything fixed.

Configuration

Here's the fleet plan configuration in YAML:

kind: FleetPlan
apiVersion: infra.k8smgmt.io/v3
metadata:
  name: customami
  project: demofleet
spec:
  fleet:
    kind: clusters
    labels:
      rafay.dev/k8sVersion: '1.25'
    projects:
      - name: demofleet
  operationWorkflow:
    operations:
      - name: ops1
        action:
          type: patch
          description: update ng using ami
          name: custom-amiupdate
          patchConfig:
            - op: replace
              path: .spec.config.managedNodeGroups[1].ami
              value: ami-02b4071bb2bda80f3

This fleet plan is a simple way to update the AMI to the latest version for clusters based on EKS v1.25. It works by using the fleet label called k8sVersion to identify clusters that are running EKS v1.25. To use this fleet plan, you can either use the Rafay CLI or the Rafay Web Console.

  • Save the fleet plan configuration in a file.
  • Run the following command.
./rctl apply -f <fleet plan config.yaml>

To monitor the progress using Rafay CLI, use

./rctl getjobs fleetplan <name>

Once you have executed the fleet plan, you can also monitor its progress. Some illustrative examples are shown below.

Monitor Progress

InProgress FleetPlan

Success

Success FleetPlan

Once this is complete, you can rerun the same command from the first step to verify if the AMIs for all the node groups have been updated using the fleet plan.

Click to show/hide JSON output returned by re-running the API Post Update
{
    "amis": [
        {
            "ami": "AL2_x86_64",
            "clusters": [
                {
                    "name": "demo-eks-cluster-1",
                    "id": "kz8z9qk",
                    "project_id": "2l5l9ek",
                    "nodegroups": [
                        {
                            "name": "ng-f3c86cb1",
                            "id": "mpl5qx2"
                        }
                    ]
                },
                {
                    "name": "demo-eks-cluster-2",
                    "id": "k01ynl2",
                    "project_id": "2l5l9ek",
                    "nodegroups": [
                        {
                            "name": "ng-e12df902",
                            "id": "m1wldyk"
                        }
                    ]
                },
                {
                    "name": "demo-eks-cluster-3",
                    "id": "kz8q6gk",
                    "project_id": "2l5l9ek",
                    "nodegroups": [
                        {
                            "name": "ng-cc7d39bf",
                            "id": "2ql8z5m"
                        }
                    ]
                }
            ]
        },
        {
            "ami": "ami-02b4071bb2bda80f3", <--- Updated AMI
            "clusters": [
                {
                    "name": "demo-eks-cluster-1",
                    "id": "kz8z9qk",
                    "project_id": "2l5l9ek",
                    "nodegroups": [
                        {
                            "name": "ami-ng",
                            "id": "2w7q8ek"
                        }
                    ]
                },
                {
                    "name": "demo-eks-cluster-2",
                    "id": "k01ynl2",
                    "project_id": "2l5l9ek",
                    "nodegroups": [
                        {
                            "name": "custom-ami-node2",
                            "id": "2lr0jq2"
                        }
                    ]
                },
                {
                    "name": "demo-eks-cluster-3",
                    "id": "kz8q6gk",
                    "project_id": "2l5l9ek",
                    "nodegroups": [
                        {
                            "name": "custom-ami-node",
                            "id": "kognyj2"
                        }
                    ]
                }
            ]
        }
    ]
}

Understanding the Fleet Plan

Here is a simplified explanation of what our fleet plan above does:

  • It identifies all clusters that are running EKS v1.25.
  • It uses the AMI part of the fleet plan action for EKS v1.25.
  • It updates the nodes in each cluster to the AMI ID we specified

Benefits of Using a Fleet Plan

  • It is a simple and efficient way to update the AMI for multiple clusters at the same time.
  • It reduces the risk of human error.
  • It provides a centralized view of the update process.

Conclusion

In the fast-changing world of infrastructure management, it is important to stay ahead of vulnerabilities. Rafay's Fleet Plans make it easy to keep your clusters secure by automating the process of updating AMIs across the fleet of your clusters. This is especially important for organizations that manage a 100s of clusters.

Learn more about Rafay's Fleet feature here.

  • You can sign up for a free trial if you wish to try this out yourself
  • Contact us if you prefer to see a demonstration of this instead.

Important

The Fleet Plans feature is available at no extra charge for all Rafay customers. Please contact your Rafay CSM to have this feature enabled for you in your Org.

Sincere thanks to readers of our blog who spend time reading our product blogs and suggest ideas and topics. Please Contact the Rafay Product Team if you would like us to write about other topics.