Deploy Charmed Apache Spark¶
Charmed Apache Spark comes with a bundled set of components that allow you to easily manage Apache Spark workloads on K8s, providing integration with object storage, monitoring and log aggregation.
For an overview of all components and how they relate to each other, see the Components overview.
Prerequisites¶
Since Charmed Apache Spark will be managed by Juju, make sure that:
you have a Juju client (e.g. via a snap) installed in your local machine
you are able to connect to a Juju controller
you have read-write permissions to either an S3-compatible or an Azure object storage
To set up a Juju controller on K8s and the Juju client, you can refer to existing tutorials and documentation for MicroK8s and for AWS EKS. Also refer to the How-to set up environment guide to install and set up an S3-compatible object storage on MicroK8s (MinIO), EKS (AWS S3), or Azure object storages. For other backends or K8s distributions other than MinIO on MicroK8s and S3 on EKS (e.g. Ceph, Charmed Kubernetes, GKE, etc.), please refer to their documentation.
Preparation¶
The Charmed Apache Spark bundle is deployed using Terraform, and therefore make sure you have a working Terraform 1.8+ installed in your machine. You can install Terraform or OpenTofu via a snap. Run the following command to install Terraform using snap:
sudo snap install terraform --classic
Terraform modules make use of the Terraform Juju provider. More information about the Juju provider can be found in the Terraform documentation.
The Charmed Apache Spark Terraform module
is a reusable product module, that consists of all charms in the Charmed Apache Spark solution including the integration between
the charms. In order to use it, create a new file named main.tf in a local directory, and use the following Terraform code:
terraform {
required_version = ">=1.0.0"
required_providers {
juju = {
source = "juju/juju"
version = ">=1.0.0"
}
}
}
module "spark" {
source = "git::https://github.com/canonical/spark-k8s-bundle//terraform/products/charmed-spark-3.5?ref=terraform-cc008"
history_server_image = "ghcr.io/canonical/charmed-spark:3.5-22.04_stable"
integration_hub_image = "ghcr.io/canonical/spark-integration-hub:3-22.04_stable"
kyuubi_image = "ghcr.io/canonical/charmed-spark-kyuubi:3.5-22.04_stable"
spark_model_name = "spark"
admin_password = "<kyuubi-admin-password>"
tls_private_key = "<base-64-encoded-private-key>"
kyuubi_config = {"service-account": "kyuubi-user"}
storage_backend = "s3"
s3_access_key = "<s3-access-key>"
s3_secret_key = "<s3-secret-key>"
s3_config = {"endpoint": "<s3-endpoint>", "region": "<s3-region>", "bucket": "<s3-bucket>", "path": "spark-events/"}
}
Caution
The example here assumes we want to use Apache Spark 3.5. If you wish to use a different Apache Spark version, please make sure you use
the correct source of the Terraform module, and also the correct OCI images for history_server_image and kyuubi_image. For instance,
if you’d wish to use Apache Spark 4.0, you would need use the source charmed-spark-4.0 and specify history_server_image and kyuubi_image
as ghcr.io/canonical/charmed-spark:4.0-22.04_stable and ghcr.io/canonical/charmed-spark-kyuubi:4.0-22.04_stable respectively.
The following table provides the description of the different options:
key |
Description |
|---|---|
|
The Apache Spark image to use as workload for Spark History Server charm |
|
The OCI image to use as workload for Spark Integration Hub charm |
|
The OCI image to use as workload for Kyuubi K8s charm |
|
The name of the Juju model where the bundle is to be deployed |
|
The password to set for the |
|
The private key to be used for generating Kyuubi TLS certificates, provided as base64-encoded string |
|
The service account which is used by Kyuubi to run Spark jobs |
|
The object storage backend to be used. The backends |
|
The S3 access key ID |
|
The S3 secret key |
|
The S3 endpoint |
|
The S3 region |
|
The name of the S3 bucket to be used |
|
The path inside the S3 bucket to be used |
The following command can be used to generate a new private key and get its base64-encoded value:
openssl genrsa -out private.key 2048
base64 private.key -w0
If you’d wish to use Azure Storage as a storage backend instead, you’d need to configure the Azure Storage specific options, as follows:
terraform {
required_version = ">=1.0.0"
required_providers {
juju = {
source = "juju/juju"
version = ">=1.0.0"
}
}
}
module "spark" {
source = "git::https://github.com/canonical/spark-k8s-bundle//terraform/products/charmed-spark-3.5?ref=terraform-cc008"
history_server_image = "ghcr.io/canonical/charmed-spark:3.5-22.04_stable"
integration_hub_image = "ghcr.io/canonical/spark-integration-hub:3-22.04_stable"
kyuubi_image = "ghcr.io/canonical/charmed-spark-kyuubi:3.5-22.04_stable"
spark_model_name = "spark"
admin_password = "<kyuubi-admin-password>"
tls_private_key = "<base-64-encoded-private-key>"
kyuubi_config = {"service-account": "kyuubi-user"}
storage_backend = "azure_storage"
azure_storage_secret_key = "<azure-storage-secret-key>"
azure_storage_config = {"container": "<azure-storage-container>", "storage-account": "<azure-storage-account>", "protocol": "<storage-protocol>", "path": "spark-events/"}
}
The following table provides the description of the Azure Storage specific options:
key |
Description |
|---|---|
|
The Azure Storage account secret key |
|
The name of the Azure Storage container to be used |
|
The Azure Storage account to be used |
|
The connection protocol to be used. Valid values are |
|
The path inside the Azure Storage container to be used |
The main.tf module we just created will not deploy the Canonical Observability Stack (COS) and thus by default the bundle would not have observability enabled.
To enable observability using COS, add an additional COS module in the same main.tf file, and wire it to work with the spark module, as follows:
terraform {
required_version = ">=1.0.0"
required_providers {
juju = {
source = "juju/juju"
version = ">=1.0.0"
}
}
}
resource "juju_model" "cos" {
name = "cos"
}
module "cos" {
# the source is pinned to the last commit on branch track/2 that's still compatible with Juju TF < 1.4.0.
# For more details, see this section in cos-lite docs: https://github.com/canonical/observability-stack/blob/track/2/terraform/cos-lite/README.md#provider--100--140
source = "git::https://github.com/canonical/observability-stack//terraform/cos-lite?ref=7448dadb996835c1c0ae1d79d2f435992652d410"
model_uuid = juju_model.cos.uuid
}
module "spark" {
source = "git::https://github.com/canonical/spark-k8s-bundle//terraform/products/charmed-spark-3.5?ref=terraform-cc008"
history_server_image = "ghcr.io/canonical/charmed-spark:3.5-22.04_stable"
integration_hub_image = "ghcr.io/canonical/spark-integration-hub:3-22.04_stable"
kyuubi_image = "ghcr.io/canonical/charmed-spark-kyuubi:3.5-22.04_stable"
spark_model_name = "spark"
admin_password = "<kyuubi-admin-password>"
tls_private_key = "<base-64-encoded-private-key>"
kyuubi_config = {"service-account": "kyuubi-user"}
storage_backend = "s3"
s3_access_key = "<s3-access-key>"
s3_secret_key = "<s3-secret-key>"
s3_config = {"endpoint": "<s3-endpoint>", "region": "<s3-region>", "bucket": "<s3-bucket>", "path": "spark-events/"}
cos_offers = {
dashboard = module.cos.offers.grafana_dashboards.url
logging = module.cos.offers.loki_logging.url
metrics = module.cos.offers.prometheus_receive_remote_write.url
}
}
The added section creates a new Juju model named cos, deploys COS in that model, and the grafana_dashboards, loki_logging and prometheus_receive_remote_write offers are provided as cos_offers to the spark module.
For the full list of input configurations supported, and the reference of the Charmed Apache Spark Terraform bundle, please refer to the README corresponding to the product module in spark-k8s-bundle repository.
Deploy¶
To deploy Charmed Apache Spark using Terraform, first initialize the terraform modules. To do so, run the following command while being on the directory which contains the main.tf file you created in the previous step.
terraform init
Once the terraform modules are initialized, deploy the solution with the following command:
terraform apply
Once prompted with the Terraform plan, verify the plan and type “yes” in the prompt. The deployment will then begin and take some time. Once the deployment completes, verify the deployment with the juju status command as follows:
# for charms in the spark model
juju switch spark
juju status
# for charms in the cos model (if you deployed with COS)
juju status --model cos
Once everything settles to idle, you may test connection with Kyuubi using this section of documentation on Charmed Apache Kyuubi. Please note that most of the charms and integrations in that section of documentation are already deployed alongside the bundle, in which case you can skip deploying / integrating them.
For more information about Terraform, please refer to the official docs.