How to use advanced scheduling

This guide shows how to configure Charmed Apache Spark with Kubernetes mechanisms such as node affinity and toleration to optimize infrastructure governance and performance.

Those mechanisms are used to decouple control plane operations from user-driven workloads, ensuring system services remain stable on cost-effective instance. Spark Executor pods can benefit from specialized hardware (high-memory nodes, custom hardware resources such as GPU, specific architecture) while maintaining the flexibility to scale idle resources to zero.

Prerequisites

This guide assumes that Charmed Apache Spark is deployed on a multi-node Kubernetes cluster.

To configure advanced scheduling, use nodeSelector, affinity rules, and taints. nodeSelector and affinity rules work with Kubernetes labels, while taints are applied directly to nodes.

To apply a label to a node:

kubectl label nodes <node_name> <key>=<value>

To taint a node:

kubectl taint nodes <node_name> <key>=<value>:<effect>

Note

We recommend exclusively using NoSchedule effects rather than NoExecute to avoid disrupting pre-existing workloads.

Verify that taints and labels were properly applied to the node:

kubectl describe node <node_name>

This guide shows how to schedule pods on tainted nodes and assign them to specific nodes, ensuring that:

  • Spark Driver and Executor pods are scheduled on the nodes dedicated to running user workloads

  • Charmed Apache Spark components are scheduled on control-plane nodes and/or specific architecture

How to schedule jobs

Charmed Apache Spark can be configured to schedule driver and executor pod on specific nodes. This section details how to set up and configure the advanced scheduling of Spark jobs to mutually segregate control-plane workloads from user workloads and allocate pods on mixed architectures clusters.

(Alternative) Define a Pod template

While we recommend using Namespace Node Affinity Operator for common scenarios, one downside is that it is limited to adding affinities and tolerations. Apache Spark on Kubernetes offers a native way of customising the deployments: Pod templates.

Pod templates are more complex and versatile than the Namespace Node Affinity Operator configuration and can be used to schedule pods with different hardware needs or resource quotas. Enabling GPU acceleration presents such a case, where we do not want to reserve costly resources for driver pods if they do not need it.
This section presents an alternative way to schedule Spark jobs using Pod templates.

The example below is the equivalent of the first namespaces_settings.yaml presented in the previous section, as it applies a nodeSelector and a toleration matching a previously applied taint:

apiVersion: v1
kind: Pod
spec:
  nodeSelector:
    <label_key>: <label_value>
  tolerations:
    - effect: <taint_effect>
      key: <taint_key>
      operator: Equal
      value: <taint_value>

You may save this file under pod_template.yaml and apply it to a Spark job using the spark-client snap:

spark-client.spark-submit \
    --username <service_account> --namespace <namespace> \
    --conf spark.kubernetes.driver.podTemplateFile=pod_template.yaml \
    --conf spark.kubernetes.executor.podTemplateFile=pod_template.yaml \
    ...

You may omit one of the two Spark properties above, or point to a different file in each property to schedule driver and executors pods differently.

Pod templates can also be used to schedule pods on specific architecture. The example below will schedule the driver and/or executors pods (depending on the Spark property used) only on arm64 nodes.

apiVersion: v1
kind: Pod
spec:
  nodeSelector:
    kubernetes.io/arch: arm64

The Integration Hub charm can be used to enforce Pod templates properties on integrated application by means of charm configuration options.

Note

Please note that the template files must be accessible from the ‘spark-submit’ command, not from where the pods are actually running.

How to schedule Charmed Apache Spark components

While the two previous sections already take care of segregating control-plane workloads from user-driven workloads, this guide details a few strategies on how to also separate the Charmed Apache Spark component from third party workloads (neither Charmed Apache Spark nor the Spark jobs) and take advantage of specific architectures.

Use Juju commands

To target a specific architecture, a Juju constraint can be applied to the Charmed Apache Spark model itself, or to each individual charm. To apply the constraint to the model, run:

juju -m <charmed_spark_juju_model> set-model-constraints arch=arm64

All Juju applications to be deployed on said model will then use the constraint. To apply the constraint to a single charm, run:

juju deploy -m <charmed_spark_juju_model> kyuubi-k8s --trust --channel=3.5/edge --constraints="arch=arm64"

A single Juju model can contain applications deployed over different architectures.

Note

You may check if a charm supports a specific architecture on Charmhub.

Constraints tags may also be used to set affinity/anti-affinity of the charms’ pods. Please note that they are no native Juju mechanisms for setting tolerations, so the deployments examples in this section are limited to untainted nodes.

To deploy a charmed operator with a nodeSelector expression similar to what we did in the previous sections for the Spark jobs, run:

juju deploy -m <charmed_spark_juju_model> kyuubi-k8s --constraints "tags=<label_key>=<label_value>"

The same mechanism can be used to improve service availability. To deploy three units of the Charmed Apache Kyuubi charm on three distinct nodes of a cluster, run:

export APP_NAME="kyuubi"
juju deploy -m <charmed_spark_juju_model> kyuubi-k8s $APP_NAME -n 3 \
 --constraints="tags=anti-pod.app.kubernetes.io/name=${APP_NAME},anti-pod.topology-key=kubernetes.io/hostname"

You may check with kubectl get pod kyuubi-0 -n <charmed_spark_juju_model> -o yaml that the proper anti-affinity rule was applied to the pod:

...
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
              - key: app.kubernetes.io/name
                operator: In
                values:
                  - kyuubi
          topologyKey: kubernetes.io/hostname
...

Warning

It is not possible to deploy a single charm on heterogeneous architectures. All units must be deployed on nodes of the same architecture.

Deploy the Namespace Node Affinity Operator

You can use Namespace Node Affinity Operator to add toleration to the Charmed Apache Spark components, similar to how we previously did it for the Apache Spark jobs themselves. The targeted nodes must first be untainted.

Create a new Juju model:

juju add-model <charmed_spark_juju_model>

Then label the newly created namespace with:

kubectl label ns <charmed_spark_juju_model> namespace-node-affinity=enabled

Deploy the Namespace Node Affinity Operator:

juju deploy -m <charmed_spark_juju_model> namespace-node-affinity --trust

Once the charm is up and running, taint the node:

kubectl taint node <node> <taint_key>=<taint_value>:<taint_effect>

In a new settings.yaml file, adapt the configuration below to your Juju model and node taint(s)/affinities:

<charmed_spark_juju_model>: |
  tolerations:
  - key: <taint_key>
    operator: Equal
    value: <taint_value>
    effect: <taint_effect>

Configure the operator to apply the respective toleration to any new charm:

juju config -m <charmed_spark_juju_model> namespace-node-affinity settings_yaml="$(<settings.yaml)"

This is it! Any new Juju application deployment will now get the desired tolerations and affinities:

juju deploy -m <charmed_spark_juju_model> s3-integrator s3

You can check that the configuration is properly applied using:

kubectl get pod s3-0 -n <charmed_spark_juju_model> -o yaml | yq '.spec.tolerations'

Note

Please note that with the setup above, the modeloperator pod created by the addition of a new Juju model and the Namespace Node Affinity Operator might be deployed on a different node since they were scheduled before the Namespace Node Affinity Operator could apply the configuration.