Skip to content

Product Blog

Monitoring Kubernetes Environments using Rafay

Rafay is a Kubernetes management platform that enables platform teams automate the entire lifecycle of K8s clusters, including provisioning, scaling, upgrading, and monitoring. For companies that are embracing a multi-cloud approach, visibility and effective monitoring of clusters require use of disparate tools. Rafay provides various tools and features to centralize manage the cluster estate and track the performance, health, and resource utilization of your Kubernetes clusters and workloads.

Here are some key aspects of Rafay's monitoring solution:


Integration with Prometheus

Rafay has in-built integration with Prometheus, which is a popular open-source monitoring and alerting toolkit for Kubernetes, to collect and store metrics from your Kubernetes clusters. Prometheus Operator provides a Kubernetes native deployment and management of Prometheus and related monitoring components. It uses Kubernetes custom resources to simplify and streamline the deployment and configuration of Prometheus. Specifically, it allows users to define and manage monitoring instances as Kubernetes resources.


Grafana Integration

Rafay integrates with Grafana, which is another popular open-source to create a dashboard and visualization platform that works seamlessly with Prometheus to create custom dashboards and visualizations for monitoring your Kubernetes clusters and applications. Grafana allows you to create informative, real-time graphs and charts based on Prometheus data.


Application Performance Monitoring (APM)

Rafay provides seamless integration with APM tools like Newrelic, Splunk Connect, Dynatrace, Cloudwatch or Datadog to gain deeper insights into your application's performance. This helps to provide detailed insights into your application's behavior, including transaction tracing, code-level performance, and error tracking.

Alerting and Notifications

Apart from the option for users to view alerts & notifications generated due to cluster health on Rafay UI, it is also possible to configure alerting rules in Prometheus and integrate with alerting tools like Slack, email, Microsoft Teams, Service Now, OpsGenie or PagerDuty to ensure that you are informed and take action on any critical issues promptly.


Log Management

In addition to metrics and to enable users access to audit logs, Rafay can also help you manage logs generated by your Kubernetes workloads. You can integrate Rafay with log management solutions like Elasticsearch, Fluentbit, Kibana, EFK stack, SumoLogic and Splunk for centralized log collection, storage and analysis.

Resource Utilization and Cost Monitoring

As organizations adopt and scale their K8s environments, they often find themselves running blind struggling to get to a ‘good enough’ appreciation of their cost structure. Rafay provides an integrated Cost Management solution to implement chargeback/showback models and carry our right sizing exercises for cost optimization.



Rafay's Kubernetes monitoring capabilities are designed to provide comprehensive insights into the health, performance and cost of your Kubernetes infrastructure and applications. By leveraging the platform's capabilities and integration with popular monitoring and observability tools, you can ensure that your K8s infrastructure is reliable, scalable and optimized from a cost perspective.

Blog Ideas

Sincere thanks to readers of our blog who spend time reading our product blogs. Please Contact the Rafay Product Team if you would like us to write about other topics.

Provision New AKS v1.27 Clusters using Rafay

Azure recently added support for Kubernetes v1.27 for AKS clusters. Customers can now use Rafay to provision new AKS clusters based on Kubernetes v1.27 as well.

This version of AKS was Generally Available (GA) starting July 2023 and go end of life in July 2024 i.e. with a 12 month support runway.

Kubernetes v1.27


Customers have shared with us that they would like to provision new AKS clusters based on new Kubernetes versions so that they do not have to plan/schedule for Kubernetes upgrades for these clusters right away. For the last few releases, we have introduced support for new cluster provisioning for the new Kubernetes version first and then follow up with support for zero touch in-place upgrades.

Optimizing Amazon's VPC CNI for your EKS Clusters Made Easy with Rafay

Amazon Elastic Kubernetes Service (EKS) is a managed service that you can use to run Kubernetes on AWS without needing to install, operate, and maintain your own Kubernetes control plane or nodes. Kubernetes clusters require a Container Network Interface (CNI) that is responsible for cluster networking. One of the options available with EKS is the Amazon VPC CNI, which allows your Kubernetes Pods to utilize IP Addresses defined within your VPCs Subnets. While this provides more control and flexibility to businesses, it also comes with its own set of challenges.

While the benefits of managing and customizing the Amazon VPC CNI on EKS are significant, it’s important to note that the process can be challenging and time consuming, particularly if you lack experience with kubernetes or Amazon's VPC and its resources.

This is where Rafay’s EKS integration can come in handy. In this blog, we'll explore how Rafay’s Platform can address pain points and simplify the management process.

Upgrade Strategies for Your Rafay MKS Cluster

In the past, there was only one way to upgrade your Rafay provisioned upstream Kubernetes cluster. The worker nodes were upgrade sequentially one worker node at a time. For large clusters with 100s of worker nodes, upgrades can take a very long time. In this blog, we will describe optimizations we have incorporated in our August 2023 release to allow users to configure faster upgrades. We now offer two ways to upgrade, and you have the freedom to choose the one that suits you best.

Upgrade strategies

CIS Benchmark for Kubernetes using Rafay

The Center for Internet Security (CIS) benchmark for Kubernetes consists of secure configuration guidelines especially for Kubernetes infrastructure set-up. These benchmarks encapsulate best practice security recommendations for configuring Kubernetes to support a strong security posture. The CIS Kubernetes Benchmark is written for the open source, upstream Kubernetes distribution and intended to be as universally applicable across distributions as possible.

In this blog, we describe how our customers perform CIS benchmark scans of their fleet of Kubernetes clusters using Rafay.

HashiCorp's New License

Last week, HashiCorp announced that they would be adopting the Business Source License for future releases of its products. In this blog, we describe how and if this impacts Rafay customers.

There is no impact to our mutual customers and users due to this recent license change by HashiCorp.

Many of our customers benefit from our native support of HashiCorp product offerings, such as Terraform and Vault, and our strong partnership ensures that they will continue to do so. In this blog, I'll describe these integrations, and provide more detail on the recent licensing change.

Integrated Grep Plugin for the Kubectl Web Shell

In our recent release, we added support for plugins in the web based kubectl shell that users have access to after they authenticate to their Rafay Org. In this blog, we will describe how we have enhanced the developer experience for users of this feature by providing them with a "grep plugin".

Rafay's zero trust kubectl web shell is one of the most heavily used features by users of the Rafay platform because it provides secure kubectl access to authenticated users from any device from anywhere. They just need a web browser to login and perform kubectl operations on their cluster.

Grep Plugin for Kubectl

Amazon EKS v1.27 Clusters using Rafay

In our recent release, we added support for new EKS cluster provisioning based on Kubernetes v1.27.

Kubernetes v1.27

Customers have shared with us that they would like to provision new EKS clusters using new Kubernetes versions so that they do not have to plan/schedule for Kubernetes upgrades for these clusters right away. For the last few releases, we have introduced support for new cluster provisioning for the new Kubernetes version first and then follow up with support for zero touch in-place upgrades.

Vector Databases for Generative AI on Kubernetes

Many of our customers use Kubernetes extensively for AI/ML use cases. This is one of the reasons why we have turnkey support for Nvidia GPUs on EKS, AKS, Upstream Kubernetes in on-prem data centers. Recently, we have had several customers look at adding support for Generative AI to their applications. Doing so will require looking at a slightly different technology stack.

Traditional relational databases are adept at handling structured data. They do this by storing data in tables. However, AI use cases are focused on handling unstructured data (e.g. images, audio, and text). Data like this is not well suited for storage and retrieval in a tabular format. This critical technology gap with relational databases has opened the door for a new type of database called a Vector Database that can natively store and process vector embeddings. The rapid rise of large-scale generative AI models has further propelled the demand for vector databases.

In this blog, we will review why vector databases are well suited and critical for AI and Generative AI. We will then look at how you can deploy and operate vector databases on Kubernetes using the Rafay Kubernetes Operations Platform in just one step.