You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2016/07/29 01:24:20 UTC

[jira] [Updated] (SPARK-16787) SparkContext.addFile() should not fail if called twice with the same file

     [ https://issues.apache.org/jira/browse/SPARK-16787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Josh Rosen updated SPARK-16787:
-------------------------------
    Target Version/s: 2.0.1  (was: 1.6.3, 2.0.1)

> SparkContext.addFile() should not fail if called twice with the same file
> -------------------------------------------------------------------------
>
>                 Key: SPARK-16787
>                 URL: https://issues.apache.org/jira/browse/SPARK-16787
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.6.2, 2.0.0
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>
> The behavior of SparkContext.addFile() changed slightly with the introduction of the Netty-RPC-based file server, which was introduced in Spark 1.6 (where it was disabled by default) and became the default / only file server in Spark 2.0.0.
> Prior to 2.0, calling SparkContext.addFile() twice with the same path would succeed and would cause future tasks to receive an updated copy of the file. This behavior was never explicitly documented but Spark has behaved this way since very early 1.x versions (some of the relevant lines in Executor.updateDependencies() have existed since 2012).
> In 2.0 (or 1.6 with the Netty file server enabled), the second addFile() call will fail with a requirement error because NettyStreamManager tries to guard against duplicate file registration.
> I believe that this change of behavior was unintentional and propose to remove the {{require}} check so that Spark 2.0 matches 1.x's default behavior.
> This problem also affects addJar() in a more subtle way: the fileServer.addJar() call will also fail with an exception but that exception is logged and ignored due to some code which was added in 2014 in order to ignore errors caused by missing Spark examples JARs when running on YARN cluster mode (AFAIK).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org