You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Kingsley Jones (JIRA)" <ji...@apache.org> on 2019/01/16 10:36:00 UTC

[jira] [Commented] (SPARK-12216) Spark failed to delete temp directory

    [ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743838#comment-16743838 ] 

Kingsley Jones commented on SPARK-12216:
----------------------------------------

I had another go at investigating this as it continues to greatly frustrate my deployments.

Firstly, I had no difficulty following the build instructions for Spark from source on Windows. The only "trick" needed was to use git bash shell on windows to manage the launch of the build process. There is nothing complicated about that, and so I would encourage others to try.

Secondly, the build on Windows 10 worked first time which was also my first time using maven. So there are no real reasons why development work cannot be done on the Windows platform to investigate this issue (that is what I am doing now).

Thirdly, I re-investigated my comment from April 2018 above... 

The classLoader which is at the child level is:

scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f

On searching the Spark source code, and online, I found that this class loader is actually from the Scala REPL.

It is not actually part of the Spark source tree.

When looking at the cruft showing up in the Windows temp directory the classes that pop up seem associated with the REPL.

This makes sense, since the REPL will barf with the errors indicated above if you do nothing more than launch a spark-shell and then close it straight away.

The conclusions I reached:
1) certainly it is possible to hack this on a Windows 10 machine (I am trying the incremental builds via SBT toolchain)

2) the bug does seem to be related to classloader clean-up but the fault (at least for the REPL) may NOT be Spark source code but the Scala REPL (or maybe an interaction between how Spark loads the relevant REPL code ???)

3) watching files go in and out of the Windows temp area seems reproducible with high reliability (as commenters above maintain)


Remaining questions on my mind...


Since this issue pops up for pretty much anybody who simply tries Spark on Windows, but the build from source showed NO problems with building, I think the runtime issue is probably to do with how classes are loaded and the file system difference between Windows and Linux on file locks.


The question is how to isolate which part of the codebase is actually producing the classes that are not cleanable. Is it the main Spark source code, or is it the Scala REPL. Since I got immediately discouraged on using Spark in any real evaluation (100% reliable barf-outs on the very first example are discouraging) I never actually did cluster test jobs to see if the thing worked outside the REPL. It is quite possible the problem is just the REPL.


I would suggest it would help to get at least some dialogue on this.


It does not seem very desirable to shut out 50% of global developers on a project which is manifestly *easy* to build on Windows 10 (using maven + git) but where the first encounter of any experienced Windows developer is an immediate sensation that this is an ultra-flakey codebase.


Quite possibly, it is just the REPL and it is actually a Scala code-base issue.


Folks like me will invest time and energy investigating such bugs but only if we think there is a will to fix issues that are persistent and discouraging of adoption among folks who very often have few enterprise choices before them other than Windows. This is not out of religious attachment, but due to dependencies in the data workflow chain that are not easy to fix. In particular, many financial industry developers have to use Windows stacks in some part of the data acquisition process. These are not inexperienced developers in cross-platform, Linux or java.


The reason why such folks are liable to get discouraged is there prior bad experiences with Java in distributed systems that had Windows components due to the well known difference in file locking behaviour between Linux and Windows. When such folks see barf-outs of this kind, they get discouraged that we are back in the EJB hell of previous times when systems broke all over the place and there were elaborate hacks to patch the problem.


Please consider how we can better test and isolate what is really causing this problem.











> Spark failed to delete temp directory 
> --------------------------------------
>
>                 Key: SPARK-12216
>                 URL: https://issues.apache.org/jira/browse/SPARK-12216
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Shell
>         Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>            Reporter: stefan
>            Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark temp dir: C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
>         at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
>         at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
>         at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
>         at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>         at org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
>         at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
>         at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
>         at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
>         at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
>         at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
>         at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
>         at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
>         at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
>         at scala.util.Try$.apply(Try.scala:161)
>         at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
>         at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
>         at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org