Skip to content

Cluster Blueprints and Drift Detection

Around three years back, we noticed many of our customers struggling with enterprise wide standardization of their Kubernetes clusters. Every cluster in their Organization was a snowflake and they were looking for a way to enforce that every cluster had a "baseline set of add-ons". This prompted us to develop Cluster Blueprints which has turned out to be one of the most heavily used features in our platform.

In this blog, we will describe a superpower setting in the cluster blueprints feature that we see customers use heavily for their production clusters to secure against unplanned drift.

Blueprints Icon


The Drift Problem

Although cluster blueprint solves the "standardization" challenge, it is still possible for users with Cluster Admin privileges to make "accidental" changes to the add-ons associated with a cluster blueprint.

When something like this occurs, the cluster would have "drifted" away from the desired state. Unplanned, out of band changes can result in signficant operational, compliance and security issues. For example, what if this update impacted the configuration of a critical security scanner?


Drift Detection

Cluster Blueprints in the Rafay Kubernetes Operations platform can be configured to actively monitor for unexpected drift. This monitoring and enforcement is performed by the Rafay Kubernetes Operator deployed on the managed cluster. Customers have two options for response when drift is detected.

Option 1: Notify

Generates an audit event when unplanned drift is detected.

Option 2: Block

Block the uplanned drift and generate an audit event.

It is a good operational practice to ensure that all updates to production clusters are "planned", "version controlled" and "approved". The image below shows an environment where "drift detection based blocking" can be used in use in conjuction with a modern GitOps based pipeline performing the "allowed/planned update".

sequenceDiagram
  Git Repo->>Git Repo: Pull Request
  Git Repo->>Git Repo: Merge
  Git Repo->>Controller: Webhook 
  Controller->>Cluster: Update Blueprint
  Cluster->>Cluster: Monitor
  Rogue Admin-->>Cluster: Attempts out of band change
  Cluster-->>Controller: Audit Event
  Cluster-->>Rogue Admin: "X" Attempt Blocked "X" 

Here's an example of what the "Rogue Admin" would encounter when they try to delete a "drift protected" resource in the cluster blueprint.

Blocked Update


Try It Out

If you are interested in trying this out yourself, sign up for a Free Org/Tenant and use our "Getting Started Guide" for Cluster Blueprints and Drift Detection.

Get Started with Drift Detection


Blog Ideas

Sincere thanks to those who spend time reading our product blogs and provide us with feedback and ideas. Please Contact the Rafay Product Team if you would like us to write about specific topics.