Skip to content

Part 2: Provision

What Will You Do

In this part of the self-paced exercise, you will provision an Azure AKS cluster with a GPU node pool based on a declarative cluster specification.


Step 1: Cluster Spec

  • Open Terminal (on macOS/Linux) or Command Prompt (Windows) and navigate to the folder where you forked the Git repository
  • Navigate to the folder "/getstarted/gpuaks/cluster"

The "aks-gpu.yaml" file contains the declarative specification for our Azure AKS Cluster.

Cluster Details

Update the following values in the spec file to match the correct values in your environment.

  • project: defaultproject
  • cloudprovider: azure-cc
  • location: northcentralus
  • resourceGroupName: Tim-RG
apiVersion: rafay.io/v1alpha1
kind: Cluster
metadata:
  name: demo-gpu-aks
  project: defaultproject
spec:
  blueprint: default-aks
  cloudprovider: azure-cc
  clusterConfig:
    apiVersion: rafay.io/v1alpha1
    kind: aksClusterConfig
    metadata:
      name: demo-gpu-aks
    spec:
      managedCluster:
        apiVersion: "2022-07-01"
        identity:
          type: SystemAssigned
        location: northcentralus
        properties:
          apiServerAccessProfile:
            enablePrivateCluster: true
          dnsPrefix: demo-gpu-aks-dns
          kubernetesVersion: 1.25.6
          networkProfile:
            loadBalancerSku: standard
            networkPlugin: kubenet
        sku:
          name: Basic
          tier: Free
        type: Microsoft.ContainerService/managedClusters
      nodePools:
      - apiVersion: "2022-07-01"
        location: northcentralus
        name: primary
        properties:
          count: 1
          enableAutoScaling: true
          maxCount: 1
          maxPods: 110
          minCount: 1
          mode: System
          orchestratorVersion: 1.25.6
          osType: Linux
          type: VirtualMachineScaleSets
          vmSize: Standard_NC4as_T4_v3
        type: Microsoft.ContainerService/managedClusters/agentPools
      resourceGroupName: Tim-RG
  proxyconfig: {}
  type: aks

Step 2: Provision Cluster

  • On your command line, navigate to the cluster sub folder
  • Type the command
rctl apply -f aks-gpu.yaml

If there are no errors, you will be presented with a "Task ID" that you can use to check progress/status. Note that this step requires creation of infrastructure in your Azure account and can take ~20-30 minutes to complete.

{
  "taskset_id": "x28y6ek",
  "operations": [
    {
      "operation": "ClusterCreation",
      "resource_name": "demo-gpu-aks",
      "status": "PROVISION_TASK_STATUS_PENDING"
    },
    {
      "operation": "NodegroupCreation",
      "resource_name": "primary",
      "status": "PROVISION_TASK_STATUS_PENDING"
    },
    {
      "operation": "BlueprintSync",
      "resource_name": "demo-gpu-aks",
      "status": "PROVISION_TASK_STATUS_PENDING"
    }
  ],
  "comments": "The status of the operations can be fetched using taskset_id",
  "status": "PROVISION_TASKSET_STATUS_PENDING"
}
  • Navigate to the project in your Org
  • Click on Infrastructure -> Clusters. You should see something like the following

Provisioning in Process

  • Click on the cluster name to monitor progress

Provisioning in Process


Step 3: Verify Cluster

Once provisioning is complete, you should see a healthy cluster in the web console

Provisioned Cluster

  • Click on the kubectl link and type the following command
kubectl get nodes -o wide

You should see something like the following

NAME                              STATUS   ROLES   AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
aks-primary-14718340-vmss000002   Ready    agent   8m38s   v1.25.6   10.224.0.4    <none>        Ubuntu 22.04.2 LTS   5.15.0-1041-azure   containerd://1.7.1+a

Recap

Congratulations! At this point, you have successfully configured and provisioned an Azure AKS cluster with a GPU node pool in your account using the RCTL CLI. You are now ready to move on to the next step where you will create a deploy a custom cluster blueprint that contains the GPU Operator as an addon.