You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marcelo Vanzin (JIRA)" <ji...@apache.org> on 2017/08/29 18:01:03 UTC

[jira] [Updated] (SPARK-21714) SparkSubmit in Yarn Client mode downloads remote files and then reuploads them again

     [ https://issues.apache.org/jira/browse/SPARK-21714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcelo Vanzin updated SPARK-21714:
-----------------------------------
    Fix Version/s: 2.2.1

> SparkSubmit in Yarn Client mode downloads remote files and then reuploads them again
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-21714
>                 URL: https://issues.apache.org/jira/browse/SPARK-21714
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Submit
>    Affects Versions: 2.2.0
>            Reporter: Thomas Graves
>            Assignee: Saisai Shao
>            Priority: Critical
>             Fix For: 2.2.1, 2.3.0
>
>
> SPARK-10643 added the ability for spark-submit to download remote file in client mode.
> However in yarn mode this introduced a bug where it downloads them for the client but then yarn client just reuploads them to HDFS and uses them again. This should not happen when the remote file is HDFS.  This is wasting resources and its defeating the  distributed cache because if the original object was public it would have been shared by many users. By us downloading and reuploading, it becomes private.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org