Skip to content

Apr

v2.6 - Preview SaaS

24 Apr, 2024

The following features are available to customers and partners in our Preview environment. The section below provides a brief description of the new functionality and enhancements in this release.


Amazon EKS

Upgrade Insights

Upgrade insights scans cluster’s audit logs for events related to APIs that have been deprecated. This helps in identification and remediation of the appropriate resources before executing the upgrade. This information is now available within Rafay's console with this release making it easy for cluster administrators to consume information and orchestrate operations from one single place.

Upgrade_insights

Managed Add-ons

Support has been added for the following EKS Managed Add-ons with this release.

  • Amazon EFS CSI driver
  • Mountpoint for Amazon S3 CSI Driver
  • CSI snapshot controller
  • Amazon CloudWatch Observability agent

Managed Add-ons


Azure AKS

Azure Overlay CNI

In this release, we have added support for Azure CNI overlay in AKS clusters using RCTL and Terraform v1 API version, with plans to extend support to UI and other interfaces such as Terraform v3 API version and SystemSync in subsequent release. This enhancement aims to improve scalability, alleviate address exhaustion concerns, and simplify cluster scaling.

Subset Cluster Config with Azure Overlay

networkProfile:
    dnsServiceIP: 10.0.0.10
    loadBalancerSku: standard
    networkPlugin: azure
    networkPluginMode: overlay
    networkPolicy: calico
    podCidr: 10.244.0.0/16
    serviceCidr: 10.0.0.0/16

Kubernetes 1.29

New AKS clusters can be provisioned based on Kubernetes v1.29.x. Existing clusters managed by the controller can be upgraded in-place to Kubernetes v1.29.x.

AKS 1.29


Google GKE

Cluster Reservation Affinity

In this release, we have extended support for configuring reservation affinity beyond the UI and Terraform. Users can now utilize other interfaces such as RCTL/SystemSync to configure reservation affinity, enabling the utilization of reserved compute engine instances in GKE by setting Reservation Affinity to node pools.

Cluster Config with Reservation Affinity

{
  "apiVersion": "infra.k8smgmt.io/v3",
  "kind": "Cluster",
  "metadata": {
    "name": "my-cluster",
    "project": "defaultproject"
  },
  "spec": {
    "cloudCredentials": "dev",
    "type": "gke",
    "config": {
      "gcpProject": "dev-382813",
      "location": {
        "type": "zonal",
        "config": {
          "zone": "us-central1-a"
        }
      },
      "controlPlaneVersion": "1.27",
      "network": {
        "name": "default",
        "subnetName": "default",
        "access": {
          "type": "public",
          "config": null
        },
        "enableVPCNativetraffic": true,
        "maxPodsPerNode": 110
      },
      "features": {
        "enableComputeEnginePersistentDiskCSIDriver": true
      },
      "nodePools": [
        {
          "name": "default-nodepool",
          "nodeVersion": "1.27",
          "size": 2,
          "machineConfig": {
            "imageType": "COS_CONTAINERD",
            "machineType": "e2-standard-4",
            "bootDiskType": "pd-standard",
            "bootDiskSize": 100,
            "reservationAffinity": {
              "consumeReservationType": "specific",
              "reservationName": "my-reservation"
            }
          },
          "upgradeSettings": {
            "strategy": "SURGE",
            "config": {
              "maxSurge": 1
            }
          }
        }
      ]
    },
    "blueprint": {
      "name": "minimal",
      "version": "latest"
    }
  }
}

Kubernetes 1.29

New GKE clusters can be provisioned based on Kubernetes v1.29.x. Existing clusters managed by the controller can be upgraded in-place to Kubernetes v1.29.x.

GKE 1.29


Imported/Registered Clusters

Fleet Support

Now, you can use fleet operations with imported cluster types. Easily update blueprints across imported clusters with fleet ops functionality.

imported cluster fleet

blueprint update

Important

The action types like Control Plane Upgrade, Node Group And Control Plane Upgrade, Node Groups Upgrade, and Patch are applicable for EKS and AKS cluster types but not for imported cluster types. However, the type Blueprint is applicable for EKS,AKS and Imported Cluster type as well.


Clusters

Export Option

An export option is now available to download the list of clusters across the organization/projects with metadata including custom labels, K8s version, active nodes, project etc. This will help customers plan for operations such as upgrades, co-ordinate with cluster owners etc.

Cluster Export

Resources Page

A number of improvements have been implemented to the Resources page including:

  • Addition of a vertical scroller to Cluster resource grids
  • Displaying information related to HPAs in the workloads debug page
  • Displaying Custom Resources associated with CRDs

CRD improvements


Network Policy

Installation Profile

Choosing an installation profile as part of the blueprint configuration has been made optional with this release, a 'None' option is available for the same.

Installation Profile

This removes the restriction of installing Cilium in chaining mode through Rafay to enforce network policies. The cluster-wide and namespace scoped rules/policies framework can now be used for additional scenarios such as:

  • Cilium is the primary CNI
  • Use of NetworkPolicy CRDs with any CNI that can enforce Network Policies

Support for Network Policy Dashboard and Installation Profile is also being deprecated with this release.


Blueprint

Add-ons

With previous releases, deletion of add-ons is not allowed if it is referenced in any of the blueprints. With this release, add-ons can be deleted if it is part of blueprint versions that are disabled. This lets Platform Admins delete any stale or unused add-ons. There is a check still in place preventing add-ons being deleted if they are still part of any active blueprint versions.


Custom Roles for Zero-Trust Access

Workspace Admin Roles

A previous release introduced the ZTKA Custom Access feature that enables customers to define custom RBAC definitions to control the access that users have to the clusters in the organization. An example could be restricting users to read only access (get, list, watch verbs) for certain resources (e.g. pods, secrets) in a certain namespace.

In order to remove the need for a Platform Admin to create a Role definition file individually for each of the namespaces, a facility has been added to include the label k8smgmt.io/bindingtype: rolebinding in the ClusterRole definition file. This creates RoleBindings on the fly in all the namespaces associated with the Workspace Admin base role.

Note

This feature was earlier available only for Namespace Admin roles, this has been extended to Workspace Admins with this release

This table summarizes the various scenarios and the resulting behavior.


Cost Management

Control Plane costs

With this release, it is possible to configure chargeback reports to include/distribute costs for Control Plane among the various tenants sharing the cluster.

Control Plan


v2.6 Bug Fixes

Bug ID Description
RC-32872 ZTKA Custom role does not work when base role is Org read only
RC-33175 UI: When selecting the Environment Template User role to assign to the group, all other selected roles get deselected automatically
RC-33698 Namespace Admin role is not able to deploy the workload when the namespace is in terminating state
RC-30728 Upstream K8s: When adding a new node as part of a Day 2 operation, node labels and taints are not being accepted
RC-33356 EKS: Cluster provisioning is in an infinite loop if blueprint sync fails during provisioning via TF interface
RC-33361 Backup and Restore: UI shows old agent name even when a new data agent is deployed in the same cluster
RC-33673 ClusterRolebinding of a Project admin role user gets deleted for an IdP user having multiple group associations and roles including Org READ ONLY role
RC-32592 MKS: Replace ntpd with systemd's timesyncd
RC-33845 EKS: No bootstrap agents found error raised at drain process

Template Loader v2

19 Apr, 2024

A new version of the template loader utility is now available. With this utility, administrators can now load "multiple" templates at the same time into their Rafay Orgs.

All Rafay supported templates and supporting documentation has been updated with instructions for the new template loader.


v1.1.28 - Terraform Provider

13 April, 2024

An updated version of the Terraform provider is now available.

This release includes enhancements around resources listed below:


Existing Resources

  • rafay_gke_cluster : This resource now supports GPU node pool configuration

  • rafay_aks_cluster : This resource now supports multiple ACR profiles addition.

  • rafay_aks_cluster_v3 : This resource now supports multiple ACR profiles addition.

v1.1.28 Bug Fixes

Bug ID Description
RC-32850 EKS Terraform: Terraform replan with the same spec showing diff for the subnets in cni_params
RC-34077 v3 operations are failing after the migration for AKS takenover clusters having azurepolicy
RC-33580 Error while applying v3 operations on a cluster created using v1 terraform with auto_scaler_profile, oms_workspace_location, osDiskSizeGb

v2.5 - SaaS

5 Apr, 2024

The section below provides a brief description of the new functionality and enhancements in this release.


Amazon EKS

Enhanced Cloud Error Messaging for Cluster Provisioning

We have enhanced cluster LCM experience by providing more detailed error messages. In the event of any failures during the process, you will receive real-time feedback from both cloud formation and cluster events. This improved visibility pinpoints the exact source of the issue, allowing you to troubleshoot and resolve problems more efficiently.

Examples

  • When an invalid policy ARN is configured during the creation of an IAM Service Account, here is how the failure will be displayed

invalud arn error

  • If a cluster deletion fails due to a specific reason, this is how it will be displayed

cluster delete error

Migrating EKS Add-ons to Managed Add-ons

This enhancement streamlines experience and optimizes workflow efficiency. Here's what you need to know:

  • Kubeproxy, coredns, and vpc-cni are now mandated during cluster provisioning
  • For existing clusters, creating add-ons (Day 2) will automatically convert existing self-managed add-ons to managed add-ons
  • During cluster upgrades, any essential self-managed add-ons that are not already converted to managed add-ons will be automatically migrated

Considerations for RCTL and Terraform Users:

For users using RCTL and Terraform, please take note of the following steps:

  • RCTL Users:

    • After this managed add-on migration, the cluster configuration file will now include managed add-on configuration. To ensure that your cluster configuration reflects these changes, please download the latest configuration using either the user interface (UI) or the RCTL tool
  • Terraform Users:

    • Execute the command terraform apply --refresh-only to update the state, ensuring it reflects the correct state post migration with managed add-on information. This will synchronize the state file.

    • Use terraform show to output the resource file, ensuring it aligns with the new state file with managed add-on information.

Important

  • Users must ensure that the action DescribeAddonConfiguration is included in the IAM role policy used in the EKS cluster as we migrate self-managed add-ons. This action is necessary to retrieve add-on configurations and required to compare and porting configuration during migration.

  • Add-ons will only be updated if set to the latest version; no action will take place if they are pinned to a specific version during a cluster upgrade.


Azure AKS

Stop and Start AKS Cluster

No more switching back and forth! You can now start and stop your AKS clusters directly from within the Rafay platform. This seamless integration simplifies cluster management and helps optimize your cloud spending by pausing idle clusters and minimizing resource consumption through the familiar Rafay Interface.

Start Stop AKS

For more information, please refer to the AKS documentation

Multiple ACR Support

In this release, we have added support for adding multiple Azure Container Registry profiles directly when creating the AKS cluster as part of the cluster configuration. This enhancement offers greater flexibility in configuring multiple container registries, simplifying the customization of your AKS cluster to suit your specific requirements.

acr

Configurable Azure AKS Authentication & Authorization

Instead of switching between consoles, you can now configure and choose between local accounts or Azure AD for streamlined setup or enhanced security through centralized identity management. Opt for Azure RBAC for managing access at the Azure resource level, or Kubernetes RBAC for precise control when configuring the cluster using Rafay Controller.

auth aks

AKS UI Enhancements

  • You can now create an AKS Node pool in a subnet separate from the cluster's subnet.
  • We have added UI support in AKS to specify a User Assigned Identity and Kubelet Identity for a managed cluster identity.

Google GKE

Private Cluster Firewall Configuration Customization

Use Case Scenario

Previously, users had to navigate to Google Console or use gcloud commands to manually open additional ports on the firewall for their GKE private cluster. This process often led to inconvenience and added complexity during cluster setup, requiring users to navigate between two different consoles.

With this release, we have added a new cluster configuration option directly through the controller. Now, users can easily specify the additional ports they need opened while creating their cluster as part of the firewall configuration, streamlining the management of GKE private cluster.

gke firewall

For more information, please refer to the GKE documentation

GPU Support for Node Pools

In this release, users can now add GPU based node pools. This enhancement enables users to incorporate GPU resources into their node pools on both day 0 and day 2. The support for GPU-based node pools is available across UI, RCTL, Swagger API, and Terraform.

NodePool Configuration

gke gpu support

For more information, please refer to the GKE documentation

Important


Upstream Kubernetes

Improved preflight checks

We have enhanced the existing preflight checks for provisioning or scaling of Rafay managed upstream Kubernetes clusters to cover for the following scenarios.

  • Communication with NTP server: Does the node have connectivity to a NTP server?
  • Time skew over 10 seconds: Is the clock on the node out of sync with NTP?
  • DNS lookup verification: Is the node able to resolve using the configured DNS server?
  • Firewall validation: Does the node have a firewall that will block provisioning?

If any of these checks fail, the installer will abort and exit. Once the requirements are addressed by the administrator, they can attempt provisioning again.

RCTL Users

As part of this release, we have added an enhancement to RCTL. When creating the upstream cluster with RCTL apply, the full summary will be displayed at the end.

Important

Download the latest RCTL to access this functionality.


Policy Management

OPA GateKeeper

Support for OPA Gatekeeper version v3.14.0 has been introduced with this release.


Blueprint

Error Handling and Reporting

In this release, we have added a number of improvements to make it easier to troubleshoot blueprint/add-on failures. This includes correlation of K8s events to expose more meaningful error messages from the cluster which makes it easy to pinpoint root cause when there is an add-on deployment failure.

Examples

Here are some scenarios where blueprint updates or add-on additions failed as part of the blueprint. This is how they will be shown with more details.

Addon Failure

addon failure

Blueprint Failure

bp failure


Namespace

Exclusions for Namespace Sync

With previous releases, if namespace sync was enabled, all namespaces created outside the controller (out-of-band) were automatically synced back. Now, you can leverage the new "exclude namespaces" feature at project level to define specific namespaces that should not be synced. This provides granular control allowing you to exclude namespaces that you don't need to synchronize.

Important

If a namespace is removed from the exclusion list and namespace sync is enabled, the namespace will be synchronized to the controller in the next reconciliation loop.

Existing synchronized namespaces will remain unchanged. The effect only applies to new namespaces added as part of this new configuration; any namespaces already synced will continue to remain as they are.

exclude list


Cost Management Enhancement

As part of this release, we have added enhancements that further enhance your visibility into costs associated with objects such as clusters, namespaces, and workloads. Here's what you need to know:

  • Added cost links from objects (clusters, namespaces, and workloads) to the Cost Explorer dashboard.
  • Users can now easily access detailed cost information for specific objects by clicking on the provided link, facilitating better cost management and optimization.

Cluster

cluster cost

cluster dashboard

Namespace

namespace cost

namespace dashboard

Workloads

workload cost workload dash


v2.5 Bug Fixes

Bug ID Description
RC-28999 EKS: Coredns issue with permission to endpointslices after upgrade EKS cluster to k8s version
RC-30965 Blueprint update failing with network policy enabled as cilium pods are stuck in pending state
RC-29363 EKS:Node Instance Role ARN is not being detected when converting to the managed cluster
RC-33393 Cluster Template: Provisioning of EKS cluster using cluster template failing with error 'unknown field "instanceRolePermissionsBoundary"'
RC-29353 Error while doing rctl apply to the v3 spec of AKS convert to managed cluster
RC-32310 EKS: Not able to update an existing CloudWatch log group retention
RC-32263 UI: Backup Policy must mandate location 'Control Plane Backup Location' and also 'Volume Backup Location' when selected so
RC-32686 Get Add-on version API using master creds rather than target account
RC-31491 MKS: Cluster nodes fail to provision at times due to silent connection drops