Security hardening guide¶
This document provides an overview of security features and guidance for hardening the security of Charmed Apache Spark K8s, including setting up and managing a secure environment.
Environment¶
The environment where applications operate can be divided in two components:
Kubernetes
Juju
Kubernetes¶
Charmed Apache Spark can be deployed on top of several Kubernetes distributions. The following table provides references for the security documentation for the main supported cloud platforms.
Cloud |
Security guide |
---|---|
Charmed Kubernetes |
|
AWS EKS |
Best Practices for Security, Identity and Compliance, AWS security credentials, Security in EKS |
Azure AKS |
Azure security best practices and patterns, Managed identities for Azure resource, Security in AKS |
Juju¶
Juju is the component responsible for orchestrating the entire lifecycle, from deployment to Day 2 operations, of all applications. Therefore, Juju must be set up securely. For more information, see the Juju security page and the How to harden your deployment guide.
Cloud credentials¶
When configuring the cloud credentials to be used with Juju, ensure that the users have correct permissions to operate at the required level on the Kubernetes cluster. Juju superusers responsible for bootstrapping and managing controllers require elevated permissions to manage several kind of resources. For this reason, the K8s user used for bootstrapping and managing the deployments should have full permissions, such as:
create, delete, patch, and list:
namespaces
services
deployments
stateful sets
pods
PVCs
In general, it is common practice to run Juju using the admin role of K8s, to have full permissions on the Kubernetes cluster.
Juju users¶
It is very important that Juju users are set up with minimal permissions depending on the scope of their operations. Please refer to the User access levels documentation for more information on the access level and corresponding abilities that the different users can be granted.
Juju user credentials must be stored securely and rotated regularly to limit the chances of unauthorized access due to credentials leakage.
Applications¶
In the following, we provide guidance on how to harden your deployment using:
Base Images
Apache Spark Security Upgrades
Encryption
Authentication
Monitoring and Auditing
Base images¶
Charmed Apache Spark K8s runs on top of a set of Rockcraft-based images, all based on the same Apache Spark distribution binaries, available in the Apache Spark release page, on top of Ubuntu 22.04. The images that can be found in the Charmed Apache Spark rock images GitHub repo are used as the base images for pods both for Spark jobs and charms. The following table summarises the relation between the component and its underlying base image.
Component |
Image |
---|---|
Spark Job (Driver) |
|
Spark Job (Executor) |
|
Spark History Server |
|
Charmed Apache Kyuubi |
|
Spark Job (Driver) - GPU Support |
|
Spark Job (Executor) - GPU Support |
|
Integration Hub |
New versions of the Charmed Apache Spark images may be released to provide patching of vulnerabilities (CVEs).
Charmed operator security upgrades¶
Charmed Apache Spark K8s operators, including Spark History server, Charmed Apache Kyuubi, and Integration Hub, install a pinned revision of the Charmed Apache Spark images outlined in the previous table to provide reproducible and secure environments. New versions of Charmed Apache Spark K8s operators may therefore be released to provide patching of vulnerabilities (CVEs). It is important to refresh the charm regularly to make sure the workload is as secure as possible.
Encryption¶
We recommend deploying Charmed Apache Spark K8s with encryption enabled for securing the communication between components, whenever available and supported by the server and client applications. In the following, we provide further information on how to encrypt the various data flow between the different components of the solution:
Client <> Kubernetes API connections
Object storage connections
Apache Kyuubi <> PostgreSQL connection
Apache Kyuubi <> Apache ZooKeeper connection
Spark History Server client connection
Kyuubi Client <> Kyuubi Server connection
Spark jobs communications
Client <> Kubernetes API connections¶
Make sure that the API service of Kubernetes is correctly encrypted, and it exposes HTTPS protocol.
Please refer to the documentation above
for the main substrates and/or the documentation of your distribution.
Please ensure that the various components of the solution, e.g. spark-client
, pods
, etc,
are correctly configured with the trusted CA certificate of the K8s cluster.
Object storage connections¶
Make sure that the object storage service is correctly encrypted, and it exposes HTTPS protocol.
Please refer to the documentation of your object storage backend to make sure
this option is supported and enabled. Please ensure that the various components of the solution,
e.g. spark-client
, pods
, etc, are correctly configured with the trusted CA certificate of the
K8s cluster.
See the how-to manage certificates guide for more information.
Apache Kyuubi <> PostgreSQL connection¶
Charmed Apache Kyuubi integration with PostgreSQL can be secured by enabling encryption for the PostgreSQL K8s charm. See the PostgreSQL K8s how-to enable TLS guide for more information on how to enable and customize encryption.
Apache Kyuubi <> Apache ZooKeeper connection¶
Charmed Apache Kyuubi integration with Apache ZooKeeper can be secured by enabling encryption for the Apache ZooKeeper K8s charm. See the Apache Kafka K8s how-to enable TLS guide for more information on how to enable and customize encryption for Apache ZooKeeper.
Spark History Server client connection¶
Spark History Server implements encryption terminated at ingress-level. Therefore, internal Kubernetes communication between ingress and Spark History Server is unencrypted. To enable encryption, see the how-to expose Spark History Server user guide.
Kyuubi Client <> Kyuubi Server connection¶
The Apache Kyuubi charm exposes a JDBC-compliant endpoint which can be connected using JDBC-compliant clients, like Beeline. Encryption is currently not supported and it is planned for 25.04.
Spark jobs communications¶
To secure the RPC channel used for communication between driver and executor, use the dedicated Apache Spark properties. Refer to the how-to manage spark accounts for more information on how to customize the Apache Spark service account with additional properties, or the Spark Configuration Management explanation page for more information on how Spark workload can be further configured.
Authentication¶
Charmed Apache Spark K8s provides external authentication capabilities for:
Kubernetes API
Spark History Server
Kyuubi JDBC endpoint
Kubernetes API¶
Authentication to the Kubernetes API follows standard implementations, as described in the upstream Kubernetes documentation. Please make sure that the distribution being used supports the authentication used by clients, and that the Kubernetes cluster has been correctly configured.
Generally, client applications store credentials information locally in a KUBECONFIG
file.
On the other hand, pods created by the charms and the Spark Job workloads receive credentials via shared secrets, mounted to the default locations /var/run/secrets/kubernetes.io/serviceaccount/
.
See the upstream documentation for more information.
Spark History Server¶
Authentication can be enabled in the Spark History Server when exposed using Traefik by leveraging on the Oath Keeper integration, that provides a cloud native Identity & Access Proxy (IAP) and Access Control Decision API able to authenticate, authorize, and mutate incoming HTTP(s) requests, fully-integrated with the Canonical Identity Platform.
Refer to the how-to enable authorization in the Spark history server guide for more information. From a permission-wise point of view, white lists of authorised users can be provided using the Spark History Server charm configuration option.
Kyuubi JDBC endpoint¶
Authentication is enabled by default and is mandatory for the JDBC endpoint exposed by Charmed Apache Kyuubi charm via
its integration with PostgreSQL charm on the auth-db
interface. Currently, only one admin user is enabled, whose password
can be set using the system-users
config in the charm.
Monitoring and auditing¶
Charmed Apache Spark provides native integration with the Canonical Observability Stack (COS). To reduce the blast radius of infrastructure disruptions, the general recommendation is to deploy COS and the observed application into separate environments, isolated from one another. Refer to the COS production deployments best practices page for more information.
For more information on how to enable and customise monitoring with COS, see the Monitoring guide.
Additional resources¶
For further information and details on the security and cryptographic specifications used by Charmed Apache Spark, please refer to the Cryptography.