Components overview¶
Charmed Apache Spark is composed of foundational software artifacts and a set of Juju operators (charms) that together provide a fully managed Apache Spark platform on Kubernetes. All charms are available individually on Charmhub and can also be deployed together via the Charmed Apache Spark bundle or Terraform modules.
Software artifacts¶
Three foundational components that are used independently of Juju: spark8t, the Charmed Apache Spark Rock, and the spark-client snap.
spark8t¶
spark8t is a Python library that extends Apache Spark with tooling to manage Spark jobs and service accounts with hierarchical configuration. It is the foundation shared by both the spark-client snap, the OCI images and the Juju charms.
Charmed Apache Spark Rock¶
The Charmed Apache Spark Rock is an OCI-compliant container image that bundles Apache Spark binaries together with Canonical tooling. It is used as the base image for Spark driver and executor pods on Kubernetes, and as the foundation for the spark-client snap.
spark-client snap¶
The spark-client snap provides CLI tools for working with Charmed Apache Spark from a workstation or edge node. It communicates with the Kubernetes API to submit jobs and manage service accounts — it does not connect to any Juju charm directly.
Command |
Description |
|---|---|
|
Submit Spark applications to a Kubernetes cluster |
|
Start an interactive PySpark shell |
|
Create, configure, and manage Spark service accounts |
|
JDBC client for connecting to Apache Kyuubi endpoints |
|
Import TLS certificates for encrypted Kyuubi connections |
Juju operators (charms)¶
Each subsection below groups charms by function. All charms can be deployed individually or together via the bundle.
Core components¶
The following charms form the foundation of any Charmed Apache Spark deployment, connecting Spark service accounts to external services and providing a UI for completed job logs:
Charm |
Description |
|---|---|
Central hub that manages Spark service account configurations and writes them into Kubernetes Secrets. It allows high-level configuration of Spark properties and seamless integration with external services, such as object storage backends and COS deployments. |
|
Supplies S3-compatible object storage credentials (endpoint, bucket, access key) to the Integration Hub and History Server. Supports MinIO, AWS S3, and any S3-compatible backend. |
|
Alternative to |
|
Exposes a web UI for browsing and analyzing event logs of completed Spark jobs stored in object storage. Receives credentials from |
|
Kubernetes ingress proxy. Exposes the History Server web UI at a stable URL outside the cluster. |
Apache Kyuubi (SQL / JDBC)¶
Charmed Apache Kyuubi provides a JDBC/ODBC endpoint for running SQL queries against data in object storage, powered by Apache Spark engines.
Charm |
Description |
|---|---|
Provides a JDBC/ODBC endpoint for SQL queries. Integrates with the Integration Hub to obtain Spark service account configuration. Supports horizontal scaling and external metastore. |
|
|
Required authentication database for Kyuubi. Kyuubi will remain blocked without this integration. |
|
External Hive metastore providing persistent metadata storage. Without it, metadata is stored on pod-local storage and lost on pod restarts. |
Required for multi-node Kyuubi deployments. Coordinates distributed Kyuubi instances. |
|
Provides TLS certificates to Kyuubi for encrypted JDBC connections. For production environments, use a CA-backed certificates operator instead. |
|
Retrieves JDBC credentials (endpoint, username, password, TLS certificate) from Kyuubi via the |
Observability (COS integration)¶
Charmed Apache Spark integrates natively with the Canonical Observability Stack (COS), which is deployed in a separate Juju model and includes Grafana, Prometheus, Loki, and Alertmanager.
The Tutorial demonstrates a simplified COS setup using only prometheus-pushgateway-k8s and cos-configuration-k8s, integrated directly with the COS model charms. The full bundle overlay additionally deploys grafana-agent-k8s and prometheus-scrape-config-k8s as a cross-model observability bridge:
Charm |
Tutorial |
Bundle |
Description |
|---|---|---|---|
Yes |
Yes |
Accepts metrics pushed by ephemeral Spark jobs (which are too short-lived for pull-based scraping) and exposes them to Prometheus. In the full bundle, integrates with the Integration Hub to automatically configure service accounts with the pushgateway address. |
|
Yes |
Yes |
Syncs Grafana dashboard definitions from a git repository into Grafana. Pre-configured in the bundle to use the dashboards from this repository. |
|
— |
Yes |
Cross-model bridge that ships metrics, log streams, and dashboard definitions from the Spark model to the COS model via remote-write and Loki push API. |
|
— |
Yes |
Configures the Prometheus scrape interval for the Pushgateway metrics endpoint. |
Architecture¶
The following diagram shows how the components relate in a full deployment:
flowchart TB
client["<b>spark-client snap</b><br>spark-submit · pyspark<br>beeline · service-account-registry"]
backend[("<b>Object Storage</b><br>MinIO · AWS S3 · Azure Blob")]
subgraph spark-model["Spark Juju Model"]
direction TB
hub["<b>spark-integration-hub-k8s</b>"]
subgraph integrators["Storage Integrators"]
direction LR
s3int["s3-integrator"]
azint["azure-storage-integrator"]
end
traefik["traefik-k8s"]
hs["spark-history-server-k8s"]
subgraph kyu-grp["Apache Kyuubi"]
direction TB
kyuubi["kyuubi-k8s"]
authdb["postgresql-k8s<br>(auth-db)"]
metadb["postgresql-k8s<br>(metastore)"]
zk["zookeeper-k8s"]
tls["self-signed-certificates"]
di["data-integrator"]
end
subgraph cos-bridge["COS Bridge"]
direction LR
pgw["prometheus-pushgateway-k8s"]
scrape["prometheus-scrape-config-k8s"]
agent["grafana-agent-k8s"]
cosconf["cos-configuration-k8s"]
end
end
subgraph cos-model["COS Juju Model (cos-lite)"]
direction LR
prom["Prometheus"]
graf["Grafana"]
loki_n["Loki"]
end
client -->|"K8s API<br>(spark-submit / pyspark)"| hub
client -->|"JDBC"| kyuubi
backend <-->|credentials| s3int
backend <-->|credentials| azint
s3int -->|s3-credentials| hub
azint -->|azure-storage-credentials| hub
s3int -->|s3-credentials| hs
azint -->|azure-storage-credentials| hs
hub -->|spark-service-account| kyuubi
hub -->|cos| pgw
traefik -->|ingress| hs
kyuubi -->|auth-db| authdb
kyuubi -->|metastore-db| metadb
kyuubi -->|zookeeper| zk
kyuubi -->|certificates| tls
di -->|jdbc| kyuubi
hs -->|metrics · logs · dashboards| agent
pgw --- scrape -->|metrics| agent
cosconf -->|dashboards| agent
agent -->|remote-write| prom
agent -->|push API| loki_n
agent -->|dashboards| graf
The spark-client tools communicate with Kubernetes directly — the Integration Hub writes configuration into K8s Secrets associated with service accounts, and spark-client reads them at job submission time.
For a step-by-step walkthrough of setting up these components, see the Tutorial. For supported Kubernetes distributions, see the Requirements page.