You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "koert kuipers (Jira)" <ji...@apache.org> on 2022/01/23 23:23:00 UTC

[jira] [Comment Edited] (SPARK-31726) Make spark.files available in driver with cluster deploy mode on kubernetes

    [ https://issues.apache.org/jira/browse/SPARK-31726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480756#comment-17480756 ] 

koert kuipers edited comment on SPARK-31726 at 1/23/22, 11:22 PM:
------------------------------------------------------------------

[~beregon87] about --jars, are you seeing that the jars are also not available on driver, or not added to classpath, or both?

i ran a simple test where i added a jar from s3, e.g. --jars s3a://some/jar.jar, and was surprised to find the driver could not find a class in that jar (on kubernetes with cluster deploy mode). this would be a more serious bug given the description of --jars clearly says it should:
--jars JARS    Comma-separated list of jars to include on the driver and executor classpaths.

now with --files its to bad the drivers dont get it but at least it does what it says on the tin (which does not include a promise to get the files to the driver):
--files FILES    Comma-separated list of files to be placed in the working directory of each executor.


was (Author: koert):
[~beregon87] about --jars, are you seeing that the jars are also not available on driver, or not added to classpath, or both?

i ran a simple test where i added a jar from s3, e.g. --jars s3a://some/jar.jar, and was surprised to find the driver could not find a class in that jar (on kubernetes with cluster deploy mode). this would be a more serious bug given the description of --jars clearly says it should:
Comma-separated list of jars to include on the driver and executor classpaths.

now with --files its to bad the drivers dont get it but at least it does what it says on the tin (which does not include a promise to get the files to the driver):
Comma-separated list of files to be placed in the working directory of each executor.

> Make spark.files available in driver with cluster deploy mode on kubernetes
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-31726
>                 URL: https://issues.apache.org/jira/browse/SPARK-31726
>             Project: Spark
>          Issue Type: Improvement
>          Components: Kubernetes, Spark Core
>    Affects Versions: 3.0.0
>            Reporter: koert kuipers
>            Priority: Minor
>
> currently on yarn with cluster deploy mode --files makes the files available for driver and executors and also put them on classpath for driver and executors.
> on k8s with cluster deploy mode --files makes the files available on executors but they are not on classpath. it does not make the files available on driver and they are not on driver classpath.
> it would be nice if the k8s behavior was consistent with yarn, or at least makes the files available on driver. once the files are available there is a simple workaround to get them on classpath using spark.driver.extraClassPath="./"
> background:
> we recently started testing kubernetes for spark. our main platform is yarn on which we use client deploy mode. our first experience was that client deploy mode was difficult to use on k8s (we dont launch from inside a pod). so we switched to cluster deploy mode, which seems to behave well on k8s. but then we realized that our program rely on reading files on classpath (application.conf, log4j.properties etc.) that are on the client but now are no longer on the driver (since driver is no longer on client). an easy fix for this seems to be to ship the files using --files to make them available on driver, but we could not get this to work.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org