GPU Dashboard
Use the GPU Resource Dashboard once NVIDIA GPU Operator is deployed. NVIDIA GPU Operator deployment automates the management of all NVIDIA software components, which are required to provision GPUs within Kubernetes. The GPU Resource Dashboard is powered by the dcgmExporter deployed as part of the GPU Operator, and it gives deeper visibility into the resources of your GPU cores
The GPU Dashboard contains the graph of current GPU SM Clocks, Current GPU Memory Clocks, GPU Utilization, GPU Memory Copy Utilization, GPU SM Clocks, GPU Memory Clocks, Framebuffer Memory Used, Framebuffer Memory Free, GPU Average Temperature, and GPU Power Tool
Access a specific GPU core dashboard with the following options:
From Cluster Card¶
Click on GPUs count in the Cluster card of the project to view a list of all GPU cores of the cluster, the node they belong to and the pods using GPU cores
From the GPU core list, click node GPU core to view the GPU dashboard for that specific GPU core
From Node Dashboard¶
From the Node's dashboard, click GPU tab and select the GPU core in the drop down list to view the GPU dashboard for that specific GPU core.