You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Pat Ferrel <pa...@occamsmachete.com> on 2019/02/13 01:47:34 UTC

Spark with Kubernetes connecting to pod id, not address

From: Pat Ferrel <pa...@actionml.com>
Reply: Pat Ferrel <pa...@actionml.com>
Date: February 12, 2019 at 5:40:41 PM
To: user@spark.apache.org <us...@spark.apache.org>
Subject: Spark with Kubernetes connecting to pod id, not address

We have a k8s deployment of several services including Apache Spark. All services seem to be operational. Our application connects to the Spark master to submit a job using the k8s DNS service for the cluster where the master is called `spark-api` so we use `master=spark://spark-api:7077` and we use `spark.submit.deployMode=cluster`. We submit the job through the API not by the spark-submit script.

This will run the "driver" and all "executors" on the cluster and this part seems to work but there is a callback to the launching code in our app from some Spark process. For some reason it is trying to connect to `harness-64d97d6d6-4r4d8`, which is the **pod ID**, not the k8s cluster IP or DNS.

How could this **pod ID** be getting into the system? Spark somehow seems to think it is the address of the service that called it. Needless to say any connection to the k8s pod ID fails and so does the job.

Any idea how Spark could think the **pod ID** is an IP address or DNS name?

BTW if we run a small sample job with `master=local` all is well, but the same job executed with the above config tries to connect to the spurious pod ID.

BTW2 the pod launching the Spark job has the k8s DNS name "harness-api” not sure if this matters

Thanks in advance