In the last two blogs (part 1 and part 2), we discussed the challenges customers face with running AI/ML on Kubernetes and innovative solutions to address these challenges. In this blog, we will flip this on its head and look at how AI/ML can make Kubernetes easier to use and operate.
This is part-2 of our blog series on challenges and solutions for AI/ML in the enterprise. This blog is based on our learnings over the last two years as we worked very closely with our customers that make extensive use of Kubernetes for AI/ML use cases. In part-1, we looked at the following:
Why Kubernetes is particularly compelling for AI/ML.
Described some of the key challenges that organizations will encounter with AI/ML and Kubernetes
In this part, we will look at some innovative approaches by which organizations can address these challenges.
Amazon Elastic Kubernetes Service (EKS) is a managed service that you can use to run Kubernetes on AWS without needing to install, operate, and maintain your own Kubernetes control plane or nodes. One of the options available with EKS is EKS Anywhere for Bare Metal environments, which allows you to run Kubernetes on your own hardware. While this provides more control and flexibility to businesses, it also comes with its own set of challenges.
While the benefits of managing EKS Anywhere on Bare Metal are significant, it’s important to note that the process can be challenging and time consuming, particularly if you lack experience with kubernetes on Bare Metal infrastructure. This is where Rafay’s Kubernetes Platform for managing EKS Anywhere Bare Metal clusters can come in handy.
In this blog, we'll explore how Rafay’s Platform can address pain points and simplify the management process.
Terraform is today one of the most popular tools to provision resources on all major cloud platforms, such as Amazon Web Services, Google Cloud Platform, and Microsoft Azure. It uses Infrastructure as a Code (IaC) to automate infrastructure provisioning. This blog will discuss the Terraform provider for Rafay.
The Rafay Terraform provider Terraform Provider is a plugin that allows Terraform to manage resources in the Rafay platform. It enables users to automate the creation, configuration, and deletion of Rafay resources such as clusters, projects, policies, and environments. It is available as an open-source project on GitHub, and it can be installed using the standard Terraform plugin installation process Terraform installation
This blog is based on our learnings over the last two years as we worked very closely with our customers that make extensive use of Kubernetes for AI/ML.
This is part-1 of a two part series. In part-1, we will
Start by looking at why Kubernetes is particularly compelling for AI/ML.
Describe some of the key challenges that organizations will encounter with AI/ML and Kubernetes
In part-2, we will look at ways by which organizations can address these challenges.
A few weeks back in early April 2023, we upgraded our Preview environment to v1.24 of the Rafay Kubernetes Operations Platform. Our sincere thanks to our customers and partners that have been actively testing the new functionality. We have received timely feedback that we have been able to incorporate into our product documentation and into the platform as well.
Today, we upgraded our Production environment to this release. As always, our customers will have seamless access to the new functionality with no interruptions to their applications or clusters. In this blog, I will describe some of the new features that are part of this release.
We just rolled out "enhancements" and "new functionality" from our upcoming v1.24 release to our Preview environment. This release will be promoted to our "Production" environment in a few weeks. Learn more about our Preview Environment and about the enhancements and new functionality in the v1.24 release.
There are cases where developers may prefer to use tools on their laptops such as Lens Desktop to visualize resources and interact with Kubernetes clusters. The use of a desktop based app such as Lens can be a better user experience for developers over the Kubectl CLI.
In this blog, we will describe how Platform Teams can use Rafay’s Zero-Trust Access service to enable developers to use popular Kubernetes visualization apps to troubleshoot their applications. Watch a video showing how a developer can use Lens Desktop with Rafay's Zero Trust Kubectl Access service to securely and remotely access Kubernetes clusters.
In this blog, we will look at the process used by Microsoft Azure to add support for new Kubernetes versions for their "Managed" Azure Kubernetes Service (AKS). We will also look at recommendations for customers on things they need to consider to operate their AKS clusters at scale without issues.
Azure's AKS managed Kubernetes is supported globally in 60+ regions. As one can imagine, it is not practical to update software in all these regions in one fell swoop. The AKS team at Microsoft employs a Safe Deployment Practice (SDP) where new releases are rolled out gradually in phases. This means that any given time, something new is being rolled out to some region.
Note
The AKS team maintains a Release Tracker that provides visibility to customers that require it.
Recently, AWS added support for Kubernetes v1.24 for their Amazon EKS offering. One significant change with this version is the removal of Dockershim as the Container Runtime (CRI). Amazon EKS clusters v1.24 onwards are standardized on "containerd".
New Amazon EKS v1.24 clusters are provisioned with containerd. Watch a brief video showcasing how customers can use Rafay to configure and provision an Amazon EKS v1.24 cluster.
When EKS clusters are upgraded to v1.24, the nodes in the EKS cluster's data plane are seamlessly migrated from "Dockershim" to "containerd".
graph LR
A[Dockershim] --> B[Containerd];
Although this transition is mostly "behind the scenes" for users, the transition from Dockershim -> Containerd can cause disruptions to deployed applications that may be dependent on Docker. In this blog, we will look at what Rafay has done to protect our customers during an in-place upgrade to EKS v1.24.