Skip to content

Product Blog

Developer Self Service via Cluster Templates

Our recent release update in May adds support for a number of new features and enhancements and we have written about the these enhancements and new features in our blogs. This blog is focused on Cluster Templates for GKE that enables customers to implement a Developer Self Service for Kubernetes clusters.

We added support for cluster templates in early 2022 starting with support for Amazon EKS initially, then followed by cluster templates for Azure AKS and with this release, cluster templates for Google's GKE. Common Use Cases for Cluster Templates are "Ephemeral Clusters" for lower environments such as:

  • Developer Test Beds
  • QA environments
  • Product support to replicate customer issues

Amazon EKS v1.25 using Rafay

Our recent release update in May to our Preview environment adds support for a number of new features and enhancements. We will write about the other new features in separate blogs. This blog is focused on our turnkey support for Amazon EKS v1.25.

Both new cluster provisioning and in-place upgrades of existing EKS clusters are supported. As with most Kubernetes releases, this version also deprecates and removes a number of features. To ensure there is zero impact to our customers, we have made sure that every feature in the Rafay Kubernetes Operations Platform has been validated on this Kubernetes version.

This release will be promoted from Preview to Production in a few days and will be made available to all customers.

Note that no action is needed on the part of our SaaS customers with the new release. Once the rollout is completed, all they need to do is learn about the new features and determine how and when they would like to use them.

Kubernetes v1.26 for Rafay MKS

Our recent release update in May to our Preview environment adds support for a number of new features and enhancements. We will write about these in separate blogs. This blog is focused on support for Kubernetes v1.26 with Rafay MKS (i.e. upstream Kubernetes for bare metal and VM based environments).

Both new cluster provisioning and in-place upgrades of existing clusters are supported. As with most Kubernetes releases, this version also deprecates and removes a number of features. To ensure there is zero impact to our customers, we have made sure that every feature in the Rafay Kubernetes Operations Platform has been validated on this Kubernetes version. This will be promoted from Preview to Production in a few days and will be made available to all customers.

Kubernetes v1.26 Release

AI/ML Superpowers for Kubernetes Troubleshooting

In the last two blogs (part 1 and part 2), we discussed the challenges customers face with running AI/ML on Kubernetes and innovative solutions to address these challenges. In this blog, we will flip this on its head and look at how AI/ML can make Kubernetes easier to use and operate.

Solutions for Key Kubernetes Challenges for AI/ML in the Enterprise - Part 2

This is part-2 of our blog series on challenges and solutions for AI/ML in the enterprise. This blog is based on our learnings over the last two years as we worked very closely with our customers that make extensive use of Kubernetes for AI/ML use cases. In part-1, we looked at the following:

  • Why Kubernetes is particularly compelling for AI/ML.
  • Described some of the key challenges that organizations will encounter with AI/ML and Kubernetes

In this part, we will look at some innovative approaches by which organizations can address these challenges.

EKS Anywhere Bare Metal Cluster Management Made Easy with Rafay

Amazon Elastic Kubernetes Service (EKS) is a managed service that you can use to run Kubernetes on AWS without needing to install, operate, and maintain your own Kubernetes control plane or nodes. One of the options available with EKS is EKS Anywhere for Bare Metal environments, which allows you to run Kubernetes on your own hardware. While this provides more control and flexibility to businesses, it also comes with its own set of challenges.

While the benefits of managing EKS Anywhere on Bare Metal are significant, it’s important to note that the process can be challenging and time consuming, particularly if you lack experience with kubernetes on Bare Metal infrastructure. This is where Rafay’s Kubernetes Platform for managing EKS Anywhere Bare Metal clusters can come in handy.

In this blog, we'll explore how Rafay’s Platform can address pain points and simplify the management process.


Rafay's Terraform Provider

Terraform is today one of the most popular tools to provision resources on all major cloud platforms, such as Amazon Web Services, Google Cloud Platform, and Microsoft Azure. It uses Infrastructure as a Code (IaC) to automate infrastructure provisioning. This blog will discuss the Terraform provider for Rafay.

The Rafay Terraform provider Terraform Provider is a plugin that allows Terraform to manage resources in the Rafay platform. It enables users to automate the creation, configuration, and deletion of Rafay resources such as clusters, projects, policies, and environments. It is available as an open-source project on GitHub, and it can be installed using the standard Terraform plugin installation process Terraform installation


Key Kubernetes Challenges for AI/ML in the Enterprise - Part 1

This blog is based on our learnings over the last two years as we worked very closely with our customers that make extensive use of Kubernetes for AI/ML.

This is part-1 of a two part series. In part-1, we will

  • Start by looking at why Kubernetes is particularly compelling for AI/ML.
  • Describe some of the key challenges that organizations will encounter with AI/ML and Kubernetes

In part-2, we will look at ways by which organizations can address these challenges.

Announcing our April 2023 (v1.24) Release

A few weeks back in early April 2023, we upgraded our Preview environment to v1.24 of the Rafay Kubernetes Operations Platform. Our sincere thanks to our customers and partners that have been actively testing the new functionality. We have received timely feedback that we have been able to incorporate into our product documentation and into the platform as well.

Today, we upgraded our Production environment to this release. As always, our customers will have seamless access to the new functionality with no interruptions to their applications or clusters. In this blog, I will describe some of the new features that are part of this release.

April Release v1.24

How Platform Teams can enable developers to use their preferred Kubernetes Tools

There are cases where developers may prefer to use tools on their laptops such as Lens Desktop to visualize resources and interact with Kubernetes clusters. The use of a desktop based app such as Lens can be a better user experience for developers over the Kubectl CLI.

In this blog, we will describe how Platform Teams can use Rafay’s Zero-Trust Access service to enable developers to use popular Kubernetes visualization apps to troubleshoot their applications. Watch a video showing how a developer can use Lens Desktop with Rafay's Zero Trust Kubectl Access service to securely and remotely access Kubernetes clusters.

Goldilocks Zone for AKS

In this blog, we will look at the process used by Microsoft Azure to add support for new Kubernetes versions for their "Managed" Azure Kubernetes Service (AKS). We will also look at recommendations for customers on things they need to consider to operate their AKS clusters at scale without issues.

Azure's AKS managed Kubernetes is supported globally in 60+ regions. As one can imagine, it is not practical to update software in all these regions in one fell swoop. The AKS team at Microsoft employs a Safe Deployment Practice (SDP) where new releases are rolled out gradually in phases. This means that any given time, something new is being rolled out to some region.

Note

The AKS team maintains a Release Tracker that provides visibility to customers that require it.

Considerations for In-Place Upgrades to Amazon EKS v1.24

Recently, AWS added support for Kubernetes v1.24 for their Amazon EKS offering. One significant change with this version is the removal of Dockershim as the Container Runtime (CRI). Amazon EKS clusters v1.24 onwards are standardized on "containerd".

New Amazon EKS v1.24 clusters are provisioned with containerd. Watch a brief video showcasing how customers can use Rafay to configure and provision an Amazon EKS v1.24 cluster.

When EKS clusters are upgraded to v1.24, the nodes in the EKS cluster's data plane are seamlessly migrated from "Dockershim" to "containerd".

graph LR
  A[Dockershim] --> B[Containerd];

Although this transition is mostly "behind the scenes" for users, the transition from Dockershim -> Containerd can cause disruptions to deployed applications that may be dependent on Docker. In this blog, we will look at what Rafay has done to protect our customers during an in-place upgrade to EKS v1.24.

Considerations for Windows Containers on Kubernetes

With increasing adoption of Kubernetes in organizations, we are seeing interest from a number of customers that would like to deploy and operate their "legacy Windows applications" on Kubernetes as well.

In this blog, we have attempted to capture our learnings from working with customers that use the Rafay Kubernetes Operations Platform to deploy and operate Kubernetes clusters with Windows based containerized applications.

Kubernetes Cluster Insights for Platform Teams

Many customers of the Rafay Kubernetes Operations Platform are "Platform Teams". In many cases, the first priority for these platform teams is to "take over and standardize" existing Kubernetes clusters in active use by application teams.

However, one of the challenges they run into with the take over process is nobody in the team has complete clarity into what resources already exist on the cluster and for what purpose. Identifying an accurate list manually can be extremely error prone and time consuming for both the platform teams as well as the various application teams resulting in delays in adoption and standardization efforts.

Seamless Landing Zone For Our Docs

Overview

These past few months, we've invested a lot of time as a team working to create a more seamless experience for our customer-facing product documentation. While we built a lot of content covering key functionality of the platform, one of the key questions that would arise from our readers was how do I know where to start?

Deploying Backstage in Kubernetes With Enterprise-Grade Governance and Automation

Introduction To Backstage

Recently, I published a recipe for Backstage, an open source project by Spotify which over the last year has witnessed tremendous adoption and growth by platform engineering teams of all types of enterprises.

Some of the key features of Backstage include:

  • an easy-to-use interface for developers
  • extensible plugin ecosystem (for ex. plugins available for GitHub Actions, ArgoCD, AWS, and more)
  • ability to easily build and publish tech documentation
  • native Kubernetes plugin for cloud-native apps
  • ability to compose different developer workflows into an Internal Developer Portal (IDP)

Spinning up cost effective clusters for training sessions

We have been running a number of internal and external (with partners/customers) enablement sessions over the last few weeks to provide "hands-on, labs based training" on some recently introduced capabilities in the Rafay Kubernetes Operations Platform.

Here's what we setup for those enablement sessions:

  • Each attendee was provided with their own Kubernetes cluster
  • We spun up ~25 "ephemeral" Kubernetes clusters on Digital Ocean (for life of the session)
  • We needed the clusters to be provisioned in just a few minutes for the training exercise
  • Each attendee had their own dedicated "Project" in the Rafay Org

A question that we frequently got asked after those enablement sessions was "I would love to run similar sessions with my extended team, how much did it cost to run those clusters?".

Our total spend for ~25 ephemeral clusters on Digital Ocean for these enablement sessions was less than $15. It was no wonder there has been so much interest in this.

We decided that it would help everyone if we shared the automation scripts and the methodology we have been using to provision Digital Ocean clusters and to import them to Rafay's platform here.

Digital Ocean

Cluster Blueprints and Drift Detection

Around three years back, we noticed many of our customers struggling with enterprise wide standardization of their Kubernetes clusters. Every cluster in their Organization was a snowflake and they were looking for a way to enforce that every cluster had a "baseline set of add-ons". This prompted us to develop Cluster Blueprints which has turned out to be one of the most heavily used features in our platform.

In this blog, we will describe a superpower setting in the cluster blueprints feature that we see customers use heavily for their production clusters to secure against unplanned drift.

Blueprints Icon

Multi-tenancy: Best practices for shared Kubernetes clusters

Some of the key questions that platform teams have to think about very early on in their K8s journey are:

  • How many clusters should I have? What is the right number for my organization?
  • Should I set up dedicated or shared clusters for my application teams?
  • What are the governance controls that need to be in place?

The model that customers are increasingly adopting is to standardize on shared clusters as the default and create a dedicated cluster only when certain considerations are met.

graph LR
  A[Request for compute from Application teams] --> B[Evaluate against list of considerations] --> C[Dedicated or shared clusters];

A few example scenarios for which Platform teams often end up setting up dedicated clusters are:

  • Application has low latency requirements (target SLA/SLO is significantly different from others)
  • Application has specific requirements that are unique to it (e.g. GPU worker nodes, CNI plugin)
  • Based on Type of environment - ‘Prod’ has a dedicated clusters and 'Dev', 'Test' environments have shared clusters

With shared clusters (which is the most cost efficient and therefore the default model in most customer environments), there are certain challenges that platform teams have to solve for around security and operational efficiencies.

Rafay Terraform Provider - Dec 2022 Update

Customers can interface with the Rafay Kubernetes Operations Platform via multiple approaches:

  • Web Console/UI
  • RCTL CLI (with declarative specs)
  • Rafay Terraform Provider
  • Open API

Many of our customers that are standardized on the Infrastructure as Code (IaC) pattern are heavy users of HashiCorp's Terraform. They use Rafay's Terraform Provider for their automation requirements. They specifically use the Rafay Terraform Provider to configure and spin up entire "Kubernetes Operating Environments" for downstream teams in minutes.

As we add new capabilities to the platform, we add support for these in our Terraform Provider as well. On 18th Dec, 2022, we rolled out our v1.12 of Rafay's Terraform Provider. In this blog, we will describe one of the interesting enhancements.

Rafay Terraform Provider

Considerations for In-Place Upgrades to Amazon EKS v1.23

Earlier this year, AWS added support for Kubernetes v1.23 for their Amazon EKS offering. One significant change with this version is with the Container Storage Interface (CSI) for working with AWS Elastic Block Store (Amazon EBS) volumes.

Specifically, the updates to the CSI driver require customers to take action to ensure a seamless upgrade process for EKS clusters from previous versions. The CSI was developed in Kubernetes to replace the in-tree driver. With the CSI, there is now a simplified plug-in model that makes it easier for storage providers to decouple their releases from the Kubernetes release cycle.

graph LR
  A[In-Tree Storage Driver] --> B[CSI Plugin for EBS CSI];

In a nutshell, this transition is good for Amazon EKS users because they do not have to upgrade Kubernetes versions for their EKS clusters just to get some additional functionality or bug fixes for EBS storage via the "in-tree driver".

New User Experience for Product Documentation

We invest a lot of time creating and maintaining our customer facing product documentation. Over the last few years, as we added significant width to our platform, we found ourselves in a situation where the way the content was presented especially for new users was overwhelming to them.

We have been working behind the scenes for a few weeks to present the breadth of capabilities of the platform in our documentation in a format that is "visually easy" for the user to navigate. Today, we launched our refreshed Product Documentation site. We thought it would be fun to memorialize this milestone by writing a brief blog.

Enforce mTLS using Rafay's Managed Service Mesh

Earlier this week, we provided "hands-on, labs based training" for approximately 25 technologists on the recently introduced "Managed Service Mesh" capability in the Rafay Kubernetes Operations Platform.

Here's what we setup for the enablement session:

  • Each attendee was provided with their own Kubernetes cluster.
  • We spun up 25 Kubernetes clusters on Digital Ocean just a few hours before the session.
  • Each attendee had their own dedicated "project" in the "Training" Org

25 Training Clusters


Background

It is now becoming a standard operational requirement for applications to require the use of a Zero Trust security model. One of the important aspects of this model is the use of mutual TLS (mTLS) to ensure all communication between services are mutually authenticated and strongly encrypted.

Application teams commonly find themselves having to deal with this in the 11th hour. At this point, it is either "too late" to retrofit their application business logic or the legacy containerized application is not capable of being retrofitted. The service mesh's sidecar based enforcement approach is a perfect solution for scenarios like this.

Sidecar

Takeover Lifecycle Management of Amazon EKS Clusters

We invest a lot of time training our employees, partners and customers on capabilities that are seeing a lot of inbound interest from our customers.

Earlier this week, we provided "hands-on, labs based training" for approximately 30 technologists on a very interesting "capability" in the Rafay Kubernetes Operations Platform.

Background

Many of our customers that use AWS typically already have a few Amazon EKS clusters provisioned and in use before they intercept with the Rafay Kubernetes platform. They may have provisioned these clusters using Terraform or one of the many alternatives that exist in the market.

As they start using the platform, they naturally stumble onto the "Convert to Managed" option next to their imported EKS clusters and learn about this capability.

Convert 2 Managed

Integrated Cost Visibility & Governance for Kubernetes

Last week, we wrapped up "hands-on enablement" on our recently released "Integrated Cost Management" service for approximately 25 technologists. Here's what the team experienced in the 60-minute lab.

Integrated Cost Management


1. What does it take to enable cost visibility and management for a fleet of clusters spanning Amazon EKS, Azure AKS, and on-premises clusters in data centers?

With the Rafay Kubernetes Operations Platform, you can do this literally in a "single click/step".