You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xuzhou Yin (Jira)" <ji...@apache.org> on 2020/09/01 07:34:00 UTC
[jira] [Comment Edited] (SPARK-23153) Support application dependencies in submission client's local file system

    [ https://issues.apache.org/jira/browse/SPARK-23153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17188222#comment-17188222 ] 

Xuzhou Yin edited comment on SPARK-23153 at 9/1/20, 7:33 AM:
-------------------------------------------------------------

Hi guys,

I have looked through the pull request of this change, and there is one part which I don't quite understand, it would be awesome if someone can explain it a little bit.

At this line: [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,] Spark filters out all paths which are not local (ie. no scheme or [file://|file:///] scheme). Does it mean it will ignore all other paths which are not local? For example, when starting a Spark job with spark.jars=local:///local/path/1.jar,s3://s3/path/2.jar,[file:///local/path/3.jar], it seems like this logic will upload [file:///local/path/3.jar] to s3, and reset spark.jars to only s3://upload/path/3.jar, while completely ignoring local:///local/path/1.jar and s3:///s3/path/2.jar.

Is this an expected behavior? If so, how should we do if we want to specify dependencies which are in HCFS such as S3, or driver's local (ie. local://) instead of file://? If this is a bug, is there a Jira issue for it?

Thanks a lot!


was (Author: xuzhoyin):
Hi guys,

I have looked through the pull request of this change, and does not quite understand one part, it would be awesome if someone can explain it a little bit.

At this line: [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,] it filters out all paths which are not local (ie. no scheme or file:// scheme). Does it ignore all other paths which are not local? For example, when starting a Spark job with spark.jars=local:///local/path/1.jar,s3:///s3/path/2.jar,file:///local/path/3.jar, it seems like this logic will upload file:///local/path/3.jar to s3, and reset spark.jars to only s3://upload/path/3.jar, while completely ignoring local:///local/path/1.jar and s3:///s3/path/2.jar.

Is this expected behavior? If so, how should we do if we want to specify dependencies which are in HCFS such as S3 instead of local? If this is a bug, is there a Jira issue for it?

Thanks a lot!

> Support application dependencies in submission client's local file system
> -------------------------------------------------------------------------
>
>                 Key: SPARK-23153
>                 URL: https://issues.apache.org/jira/browse/SPARK-23153
>             Project: Spark
>          Issue Type: Improvement
>          Components: Kubernetes, Spark Core
>    Affects Versions: 2.4.0
>            Reporter: Yinan Li
>            Assignee: Stavros Kontopoulos
>            Priority: Major
>             Fix For: 3.0.0
>
>
> Currently local dependencies are not supported with Spark on K8S i.e. if the user has code or dependencies only on the client where they run {{spark-submit}} then the current implementation has no way to make those visible to the Spark application running inside the K8S pods that get launched.  This limits users to only running applications where the code and dependencies are either baked into the Docker images used or where those are available via some external and globally accessible file system e.g. HDFS which are not viable options for many users and environments



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org