How to configure GPU support for Charmed Apache Kyuubi¶
Charmed Apache Kyuubi supports the RAPIDS Accelerator for Apache Spark on K8s. This makes it possible to run queries with hardware acceleration on NVIDIA GPUs.
Prerequisites¶
Kubernetes cluster is up and running with NVIDIA GPU support
Charmed Apache Kyuubi, revision
121
(Apache Spark 3.5),122
(Apache Spark 3.4) or higher
Check that NVIDIA GPU support is properly enabled by searching for a gpu-operator
deployment.
On a non-confined MicroK8s cluster, this is done by enabling the gpu
add-on.
Once the deployment is successful, you should see a new gpu-operator-resource
namespace with similar-looking pods:
kubectl get pods -n gpu-operator-resources
NAME READY STATUS
gpu-feature-discovery-l5g5k 1/1 Running
gpu-operator-85776c76f-7jzp2 1/1 Running
gpu-operator-node-feature-discovery-gc-d8f9f89db-77rz9 1/1 Running
gpu-operator-node-feature-discovery-master-79978f78cf-bslkt 1/1 Running
gpu-operator-node-feature-discovery-worker-vvg6w 1/1 Running
nvidia-container-toolkit-daemonset-r5lq2 1/1 Running
nvidia-cuda-validator-76t6r 0/1 Completed
nvidia-dcgm-exporter-s92ln 1/1 Running
nvidia-device-plugin-daemonset-92xnm 1/1 Running
nvidia-operator-validator-d7hhh 1/1 Running
Finally, make sure that the cluster now list at least one nvidia.com/gpu
GPU resource under one node’s capacity:
kubectl get node <node-name> -o=jsonpath="{.status.capacity."nvidia.com/gpu"}"
Output example
1
Configuring hardware-accelerated spark jobs¶
Enable hardware-accelerated Spark jobs with the following configuration option:
juju config <kyuubi-app> gpu-enable=true
Each executor pod will now use one full GPU resource.
Use the gpu-engine-executors-limit
to set the number of executors a Kyuubi Engine will spawn.
juju config <kyuubi-app> gpu-engine-executors-limit=2
To get the most out of the hardware, the pod configuration should be adjusted for the workload:
juju config <kyuubi-app> gpu-pinned-memory=4
juju config <kyuubi-app> executor-memory=8
juju config <kyuubi-app> executor-cores=4
Checking GPU resources usage¶
If you have a shell access to the machine, you can use the system management interface CLI:
nvidia-smi
The output should look like the following
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.169 Driver Version: 570.169 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 <first GPU unit> Off | 00000000:01:00.0 On | N/A |
| ....... | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2611 G <path to spark process> XXX MiB |
| ... |
+-----------------------------------------------------------------------------------------+
Under the Processes
table, you should see a process for each currently active executor.