Skip to content

Product Blog

Solutions for Key Kubernetes Challenges for AI/ML in the Enterprise - Part 2

This is part-2 of our blog series on challenges and solutions for AI/ML in the enterprise. This blog is based on our learnings over the last two years as we worked very closely with our customers that make extensive use of Kubernetes for AI/ML use cases. In part-1, we looked at the following:

  • Why Kubernetes is particularly compelling for AI/ML.
  • Described some of the key challenges that organizations will encounter with AI/ML and Kubernetes

In this part, we will look at some innovative approaches by which organizations can address these challenges.

EKS Anywhere Bare Metal Cluster Management Made Easy with Rafay

Amazon Elastic Kubernetes Service (EKS) is a managed service that you can use to run Kubernetes on AWS without needing to install, operate, and maintain your own Kubernetes control plane or nodes. One of the options available with EKS is EKS Anywhere for Bare Metal environments, which allows you to run Kubernetes on your own hardware. While this provides more control and flexibility to businesses, it also comes with its own set of challenges.

While the benefits of managing EKS Anywhere on Bare Metal are significant, it’s important to note that the process can be challenging and time consuming, particularly if you lack experience with kubernetes on Bare Metal infrastructure. This is where Rafay’s Kubernetes Platform for managing EKS Anywhere Bare Metal clusters can come in handy.

In this blog, we'll explore how Rafay’s Platform can address pain points and simplify the management process.


Rafay terraform provider

Rafay's Terraform Provider

Terraform is today one of the most popular tools to provision resources on all major cloud platforms, such as Amazon Web Services, Google Cloud Platform, and Microsoft Azure. It uses Infrastructure as a Code (IaC) to automate infrastructure provisioning. This blog will discuss the Terraform provider for Rafay.

The Rafay Terraform provider Terraform Provider is a plugin that allows Terraform to manage resources in the Rafay platform. It enables users to automate the creation, configuration, and deletion of Rafay resources such as clusters, projects, policies, and environments. It is available as an open-source project on GitHub, and it can be installed using the standard Terraform plugin installation process Terraform installation


Key Kubernetes Challenges for AI/ML in the Enterprise - Part 1

This blog is based on our learnings over the last two years as we worked very closely with our customers that make extensive use of Kubernetes for AI/ML.

This is part-1 of a two part series. In part-1, we will

  • Start by looking at why Kubernetes is particularly compelling for AI/ML.
  • Describe some of the key challenges that organizations will encounter with AI/ML and Kubernetes

In part-2, we will look at ways by which organizations can address these challenges.

Announcing our April 2023 (v1.24) Release

A few weeks back in early April 2023, we upgraded our Preview environment to v1.24 of the Rafay Kubernetes Operations Platform. Our sincere thanks to our customers and partners that have been actively testing the new functionality. We have received timely feedback that we have been able to incorporate into our product documentation and into the platform as well.

Today, we upgraded our Production environment to this release. As always, our customers will have seamless access to the new functionality with no interruptions to their applications or clusters. In this blog, I will describe some of the new features that are part of this release.

April Release v1.24

How Platform Teams can enable developers to use their preferred Kubernetes Tools

There are cases where developers may prefer to use tools on their laptops such as Lens Desktop to visualize resources and interact with Kubernetes clusters. The use of a desktop based app such as Lens can be a better user experience for developers over the Kubectl CLI.

In this blog, we will describe how Platform Teams can use Rafay’s Zero-Trust Access service to enable developers to use popular Kubernetes visualization apps to troubleshoot their applications. Watch a video showing how a developer can use Lens Desktop with Rafay's Zero Trust Kubectl Access service to securely and remotely access Kubernetes clusters.

Goldilocks Zone for AKS

In this blog, we will look at the process used by Microsoft Azure to add support for new Kubernetes versions for their "Managed" Azure Kubernetes Service (AKS). We will also look at recommendations for customers on things they need to consider to operate their AKS clusters at scale without issues.

Azure's AKS managed Kubernetes is supported globally in 60+ regions. As one can imagine, it is not practical to update software in all these regions in one fell swoop. The AKS team at Microsoft employs a Safe Deployment Practice (SDP) where new releases are rolled out gradually in phases. This means that any given time, something new is being rolled out to some region.

Note

The AKS team maintains a Release Tracker that provides visibility to customers that require it.

Considerations for In-Place Upgrades to Amazon EKS v1.24

Recently, AWS added support for Kubernetes v1.24 for their Amazon EKS offering. One significant change with this version is the removal of Dockershim as the Container Runtime (CRI). Amazon EKS clusters v1.24 onwards are standardized on "containerd".

New Amazon EKS v1.24 clusters are provisioned with containerd. Watch a brief video showcasing how customers can use Rafay to configure and provision an Amazon EKS v1.24 cluster.

When EKS clusters are upgraded to v1.24, the nodes in the EKS cluster's data plane are seamlessly migrated from "Dockershim" to "containerd".

graph LR
  A[Dockershim] --> B[Containerd];

Although this transition is mostly "behind the scenes" for users, the transition from Dockershim -> Containerd can cause disruptions to deployed applications that may be dependent on Docker. In this blog, we will look at what Rafay has done to protect our customers during an in-place upgrade to EKS v1.24.