You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Devaraj K (JIRA)" <ji...@apache.org> on 2017/09/22 22:18:00 UTC
[jira] [Commented] (SPARK-19417) spark.files.overwrite is ignored

    [ https://issues.apache.org/jira/browse/SPARK-19417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16177253#comment-16177253 ] 

Devaraj K commented on SPARK-19417:
-----------------------------------

Thanks [~ckanich] for the test case.
{code:title=SparkContext.scala|borderStyle=solid}
  def addFile(path: String, recursive: Boolean): Unit = {
  ............
    val timestamp = System.currentTimeMillis
    if (addedFiles.putIfAbsent(key, timestamp).isEmpty) {
      logInfo(s"Added file $path at $key with timestamp $timestamp")
      // Fetch the file locally so that closures which are run on the driver can still use the
      // SparkFiles API to access files.
      Utils.fetchFile(uri.toString, new File(SparkFiles.getRootDirectory()), conf,
        env.securityManager, hadoopConfiguration, timestamp, useCache = false)
      postEnvironmentUpdate()
    }
{code}
It is not adding the file if it exists already and it seems to be the intentional behavior, Please find the discussion here https://github.com/apache/spark/pull/14396.

Do you have any real use case to have this?

> spark.files.overwrite is ignored
> --------------------------------
>
>                 Key: SPARK-19417
>                 URL: https://issues.apache.org/jira/browse/SPARK-19417
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.0
>            Reporter: Chris Kanich
>
> I have not been able to get Spark to actually overwrite a file after I have changed it on the driver node, re-called addFile, and then used it on the executors again. Here's a failing test.
> {code}
>   test("can overwrite files when spark.files.overwrite is true") {
>     val dir = Utils.createTempDir()
>     val file = new File(dir, "file")
>     try {
>       Files.write("one", file, StandardCharsets.UTF_8)
>       sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local-cluster[1,1,1024]")
>          .set("spark.files.overwrite", "true"))
>       sc.addFile(file.getAbsolutePath)
>       def getAddedFileContents(): String = {
>         sc.parallelize(Seq(0)).map { _ =>
>           scala.io.Source.fromFile(SparkFiles.get("file")).mkString
>         }.first()
>       }
>       assert(getAddedFileContents() === "one")
>       Files.write("two", file, StandardCharsets.UTF_8)
>       sc.addFile(file.getAbsolutePath)
>       assert(getAddedFileContents() === "onetwo")
>     } finally {
>       Utils.deleteRecursively(dir)
>       sc.stop()
>     }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org