You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Bode, Meikel, NMA-CFD" <Me...@Bertelsmann.de> on 2021/08/12 11:34:54 UTC

K8S submit client vs. cluster

Hi all,

If we schedule a spark job on k8s, how are volume mappings handled?

In client mode I would expect that drivers volumes have to mapped manually in the pod template. Executor volumes are attached dynamically based on submit parameters. Right...?

I cluster mode I would expect that volumes for drivers/executors are taken from submit command and attached to the pods accordingly. Right...?

Any hints appreciated,

Best,
Meikel

Re: K8S submit client vs. cluster

Posted by Mich Talebzadeh <mi...@gmail.com>.
OK amazon not much difference compared to Google Cloud Kubernetes Engines
(GKE).

When I submit a job, you need a powerful compute server to submit the job.
It is another host but you cannot submit from K8s cluster nodes (I am not
aware if one can actually do that).

Anyway you submit something like below

         spark-submit --verbose \
           --properties-file ${property_file} \
           --master k8s://https://$KUBERNETES_MASTER_IP:443 \
          * --deploy-mode cluster \*
           --name pytest \
           --conf
spark.yarn.appMasterEnv.PYSPARK_PYTHON=./pyspark_venv/bin/python \
           --py-files $CODE_DIRECTORY/DSBQ.zip \
           --conf spark.kubernetes.namespace=$NAMESPACE \
           --conf spark.executor.memory=5000m \
           --conf spark.network.timeout=300 \
           --conf spark.executor.instances=3 \
           --conf spark.kubernetes.driver.limit.cores=1 \
           --conf spark.driver.cores=1 \
           --conf spark.executor.cores=1 \
           --conf spark.executor.memory=2000m \
           --conf spark.kubernetes.driver.docker.image=${IMAGEGCP} \
           --conf spark.kubernetes.executor.docker.image=${IMAGEGCP} \
           --conf spark.kubernetes.container.image=${IMAGEGCP} \
           --conf
spark.kubernetes.authenticate.driver.serviceAccountName=spark-bq \
           --conf
spark.driver.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" \
           --conf
spark.executor.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true"
\
           --conf spark.sql.execution.arrow.pyspark.enabled="true" \
           $CODE_DIRECTORY/${APPLICATION}

This is a PySpark job and I have told Spark to run it  in cluster mode. The
docker image I built is Spark version 3.1.1 with Java 8. Java 11 would not
work.


However, under the bonnet it is run in a client mode


+ CMD=("$SPARK_HOME/bin/spark-submit" --conf
"spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client
"$@")

+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf
spark.driver.bindAddress=10.64.0.88 *--deploy-mode client*
--properties-file /opt/spark/conf/spark.properties --class
org.apache.spark.deploy.PythonRunner
gs://axial-glow-224522-spark-on-k8s/codes/RandomDataBigQuery.py


So regardless it is run in the client mode. You can see this behaviour with
switch


 spark-submit --verbose


HTH


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 12 Aug 2021 at 17:29, Bode, Meikel, NMA-CFD <
Meikel.Bode@bertelsmann.de> wrote:

> On EKS…
>
>
>
> *From:* Mich Talebzadeh <mi...@gmail.com>
> *Sent:* Donnerstag, 12. August 2021 15:47
> *To:* Bode, Meikel, NMA-CFD <Me...@Bertelsmann.de>
> *Cc:* user@spark.apache.org
> *Subject:* Re: K8S submit client vs. cluster
>
>
>
> Ok
>
>
>
> As I see it with PySpark even if it is submitted as cluster, it will be
> converted to client mode anyway
>
>
> Are you running this on AWS or GCP?
>
>
>
>    view my Linkedin profile
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F&data=04%7C01%7C%7Cc589602079b34630fe7f08d95d97ae9f%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637643728318918233%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vEl8zDS%2BZC2NvHbw7qKCts2ry6ouJ%2BzHTjXMik6rw3M%3D&reserved=0>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>
>
>
> On Thu, 12 Aug 2021 at 12:42, Bode, Meikel, NMA-CFD <
> Meikel.Bode@bertelsmann.de> wrote:
>
> Hi Mich,
>
>
>
> All PySpark.
>
>
>
> Best,
>
> Meikel
>
>
>
> *From:* Mich Talebzadeh <mi...@gmail.com>
> *Sent:* Donnerstag, 12. August 2021 13:41
> *To:* Bode, Meikel, NMA-CFD <Me...@Bertelsmann.de>
> *Cc:* user@spark.apache.org
> *Subject:* Re: K8S submit client vs. cluster
>
>
>
> Is this Spark or PySpark?
>
>
>
>
>
>
>    view my Linkedin profile
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F&data=04%7C01%7C%7Cc589602079b34630fe7f08d95d97ae9f%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637643728318918233%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vEl8zDS%2BZC2NvHbw7qKCts2ry6ouJ%2BzHTjXMik6rw3M%3D&reserved=0>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>
>
>
> On Thu, 12 Aug 2021 at 12:35, Bode, Meikel, NMA-CFD <
> Meikel.Bode@bertelsmann.de> wrote:
>
> Hi all,
>
>
>
> If we schedule a spark job on k8s, how are volume mappings handled?
>
>
>
> In client mode I would expect that drivers volumes have to mapped manually
> in the pod template. Executor volumes are attached dynamically based on
> submit parameters. Right…?
>
>
>
> I cluster mode I would expect that volumes for drivers/executors are taken
> from submit command and attached to the pods accordingly. Right…?
>
>
>
> Any hints appreciated,
>
>
>
> Best,
>
> Meikel
>
>

RE: K8S submit client vs. cluster

Posted by "Bode, Meikel, NMA-CFD" <Me...@Bertelsmann.de>.
On EKS...

From: Mich Talebzadeh <mi...@gmail.com>
Sent: Donnerstag, 12. August 2021 15:47
To: Bode, Meikel, NMA-CFD <Me...@Bertelsmann.de>
Cc: user@spark.apache.org
Subject: Re: K8S submit client vs. cluster

Ok

As I see it with PySpark even if it is submitted as cluster, it will be converted to client mode anyway


Are you running this on AWS or GCP?


 [https://docs.google.com/uc?export=download&id=1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ&revid=0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ]   view my Linkedin profile<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F&data=04%7C01%7C%7Cc589602079b34630fe7f08d95d97ae9f%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637643728318918233%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vEl8zDS%2BZC2NvHbw7qKCts2ry6ouJ%2BzHTjXMik6rw3M%3D&reserved=0>



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.




On Thu, 12 Aug 2021 at 12:42, Bode, Meikel, NMA-CFD <Me...@bertelsmann.de>> wrote:
Hi Mich,

All PySpark.

Best,
Meikel

From: Mich Talebzadeh <mi...@gmail.com>>
Sent: Donnerstag, 12. August 2021 13:41
To: Bode, Meikel, NMA-CFD <Me...@Bertelsmann.de>>
Cc: user@spark.apache.org<ma...@spark.apache.org>
Subject: Re: K8S submit client vs. cluster

Is this Spark or PySpark?





 [https://docs.google.com/uc?export=download&id=1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ&revid=0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ]   view my Linkedin profile<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F&data=04%7C01%7C%7Cc589602079b34630fe7f08d95d97ae9f%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637643728318918233%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vEl8zDS%2BZC2NvHbw7qKCts2ry6ouJ%2BzHTjXMik6rw3M%3D&reserved=0>



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.




On Thu, 12 Aug 2021 at 12:35, Bode, Meikel, NMA-CFD <Me...@bertelsmann.de>> wrote:
Hi all,

If we schedule a spark job on k8s, how are volume mappings handled?

In client mode I would expect that drivers volumes have to mapped manually in the pod template. Executor volumes are attached dynamically based on submit parameters. Right...?

I cluster mode I would expect that volumes for drivers/executors are taken from submit command and attached to the pods accordingly. Right...?

Any hints appreciated,

Best,
Meikel

Re: K8S submit client vs. cluster

Posted by Mich Talebzadeh <mi...@gmail.com>.
Ok

As I see it with PySpark even if it is submitted as cluster, it will be
converted to client mode anyway

Are you running this on AWS or GCP?


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 12 Aug 2021 at 12:42, Bode, Meikel, NMA-CFD <
Meikel.Bode@bertelsmann.de> wrote:

> Hi Mich,
>
>
>
> All PySpark.
>
>
>
> Best,
>
> Meikel
>
>
>
> *From:* Mich Talebzadeh <mi...@gmail.com>
> *Sent:* Donnerstag, 12. August 2021 13:41
> *To:* Bode, Meikel, NMA-CFD <Me...@Bertelsmann.de>
> *Cc:* user@spark.apache.org
> *Subject:* Re: K8S submit client vs. cluster
>
>
>
> Is this Spark or PySpark?
>
>
>
>
>
>
>    view my Linkedin profile
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F&data=04%7C01%7C%7Cfa2ebcafde7841ce513708d95d860a55%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637643652541525851%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3dd3pwdbBc97OpUmhTZrqfMtuKaFUeio3BGfJurl1Ss%3D&reserved=0>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>
>
>
> On Thu, 12 Aug 2021 at 12:35, Bode, Meikel, NMA-CFD <
> Meikel.Bode@bertelsmann.de> wrote:
>
> Hi all,
>
>
>
> If we schedule a spark job on k8s, how are volume mappings handled?
>
>
>
> In client mode I would expect that drivers volumes have to mapped manually
> in the pod template. Executor volumes are attached dynamically based on
> submit parameters. Right…?
>
>
>
> I cluster mode I would expect that volumes for drivers/executors are taken
> from submit command and attached to the pods accordingly. Right…?
>
>
>
> Any hints appreciated,
>
>
>
> Best,
>
> Meikel
>
>

RE: K8S submit client vs. cluster

Posted by "Bode, Meikel, NMA-CFD" <Me...@Bertelsmann.de>.
Hi Mich,

All PySpark.

Best,
Meikel

From: Mich Talebzadeh <mi...@gmail.com>
Sent: Donnerstag, 12. August 2021 13:41
To: Bode, Meikel, NMA-CFD <Me...@Bertelsmann.de>
Cc: user@spark.apache.org
Subject: Re: K8S submit client vs. cluster

Is this Spark or PySpark?





 [https://docs.google.com/uc?export=download&id=1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ&revid=0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ]   view my Linkedin profile<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F&data=04%7C01%7C%7Cfa2ebcafde7841ce513708d95d860a55%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637643652541525851%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3dd3pwdbBc97OpUmhTZrqfMtuKaFUeio3BGfJurl1Ss%3D&reserved=0>



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.




On Thu, 12 Aug 2021 at 12:35, Bode, Meikel, NMA-CFD <Me...@bertelsmann.de>> wrote:
Hi all,

If we schedule a spark job on k8s, how are volume mappings handled?

In client mode I would expect that drivers volumes have to mapped manually in the pod template. Executor volumes are attached dynamically based on submit parameters. Right...?

I cluster mode I would expect that volumes for drivers/executors are taken from submit command and attached to the pods accordingly. Right...?

Any hints appreciated,

Best,
Meikel

Re: K8S submit client vs. cluster

Posted by Mich Talebzadeh <mi...@gmail.com>.
Is this Spark or PySpark?



   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 12 Aug 2021 at 12:35, Bode, Meikel, NMA-CFD <
Meikel.Bode@bertelsmann.de> wrote:

> Hi all,
>
>
>
> If we schedule a spark job on k8s, how are volume mappings handled?
>
>
>
> In client mode I would expect that drivers volumes have to mapped manually
> in the pod template. Executor volumes are attached dynamically based on
> submit parameters. Right…?
>
>
>
> I cluster mode I would expect that volumes for drivers/executors are taken
> from submit command and attached to the pods accordingly. Right…?
>
>
>
> Any hints appreciated,
>
>
>
> Best,
>
> Meikel
>