You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Mich Talebzadeh <mi...@gmail.com> on 2023/01/23 22:32:37 UTC

Update to Spark Kubernetes docs for secrets

Hi,


in the link below


Running Spark on Kubernetes - Spark 3.3.1 Documentation (apache.org)
<https://spark.apache.org/docs/latest/running-on-kubernetes.html>


some of the stuff seems to be out of date. For example, secrets management
in Kubernetes cluster is discouraged through the secrets file in all
vendors that I know of. It is now superseded through workload identity. For
example see "Access secrets stored outside GKE clusters using Workload
Identity".
<https://cloud.google.com/kubernetes-engine/docs/tutorials/workload-identity-secrets>
 Workload Identity replaces the need to use Metadata concealment. The
sensitive metadata protected by metadata concealment is also protected by
Workload Identity.


I think Spark documentation should be updated to refer to the use of
Workload Identity.


For example with secrets file we store the following info:


{

  "type": "service_account",

  "project_id": "your project"

  "private_key_id": "abc",

  "private_key": "-----BEGIN PRIVATE KEY-----.....................",

  "client_email": "dockerscanner@<your project>.iam.gserviceaccount.com",

  "client_id": "123",

  "auth_uri": "https://accounts.google.com/o/oauth2/auth",

  "token_uri": "https://oauth2.googleapis.com/token",

  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs
",

  "client_x509_cert_url": "
https://www.googleapis.com/robot/v1/metadata/x509/dockerscanner%40<your
projectiam.gserviceaccount.com"

}

Stored on K8s cluster that all nodes have access to.

# Download the service account JSON key and store it in a Kubernetes
secret. Your Spark drivers and executors use this secret to authenticate
with BigQuery:

gcloud iam service-accounts keys create spark-sa.json --iam-account
$SA_EMAIL
kubectl create secret generic spark-sa --from-file=spark-sa.json -n spark
cp -f ./spark-sa.json /mnt/secrets

Spark doc refers to:

--conf spark.kubernetes.driver.secrets.spark-secret=/etc/secrets
--conf spark.kubernetes.executor.secrets.spark-secret=/etc/secrets

which is no longer practiced in Cloud.

HTH


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.