You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marcelo Vanzin (JIRA)" <ji...@apache.org> on 2016/07/14 17:08:20 UTC
[jira] [Commented] (SPARK-5311) EventLoggingListener throws exception if log directory does not exist

    [ https://issues.apache.org/jira/browse/SPARK-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15377285#comment-15377285 ] 

Marcelo Vanzin commented on SPARK-5311:
---------------------------------------

Pasting the PR comments here for posterity:

{quote}
JoshRosen commented on Feb 4, 2015
Actually, maybe we don't want to fix this: the old behavior was pretty bad since it might create an entire hierarchy of directories with the default permissions. andrewor14, do you have any thoughts on this?
andrewor14
 
andrewor14 commented on Feb 9, 2015
ganonp I agree with JoshRosen that we should probably not arbitrarily create the directory for the user if it doesn't exist. The semantics of the spark.eventLog.dir should be that "this is an existing directory and I want to use it to put my event logs". Would you mind closing this PR then?
{quote}

> EventLoggingListener throws exception if log directory does not exist
> ---------------------------------------------------------------------
>
>                 Key: SPARK-5311
>                 URL: https://issues.apache.org/jira/browse/SPARK-5311
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.3.0
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>            Priority: Blocker
>
> If the log directory does not exist, EventLoggingListener throws an IllegalArgumentException.  Here's a simple reproduction (using the master branch (1.3.0)):
> {code}
> ./bin/spark-shell --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=/tmp/nonexistent-dir
> {code}
> where /tmp/nonexistent-dir is a directory that doesn't exist and /tmp exists.  This results in the following exception:
> {code}
> 15/01/18 17:10:44 INFO HttpServer: Starting HTTP Server
> 15/01/18 17:10:44 INFO Utils: Successfully started service 'HTTP file server' on port 62729.
> 15/01/18 17:10:44 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
> 15/01/18 17:10:44 INFO Utils: Successfully started service 'SparkUI' on port 4041.
> 15/01/18 17:10:44 INFO SparkUI: Started SparkUI at http://joshs-mbp.att.net:4041
> 15/01/18 17:10:45 INFO Executor: Using REPL class URI: http://192.168.1.248:62726
> 15/01/18 17:10:45 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@joshs-mbp.att.net:62728/user/HeartbeatReceiver
> 15/01/18 17:10:45 INFO NettyBlockTransferService: Server created on 62730
> 15/01/18 17:10:45 INFO BlockManagerMaster: Trying to register BlockManager
> 15/01/18 17:10:45 INFO BlockManagerMasterActor: Registering block manager localhost:62730 with 265.4 MB RAM, BlockManagerId(<driver>, localhost, 62730)
> 15/01/18 17:10:45 INFO BlockManagerMaster: Registered BlockManager
> java.lang.IllegalArgumentException: Log directory /tmp/nonexistent-dir does not exist.
> 	at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:90)
> 	at org.apache.spark.SparkContext.<init>(SparkContext.scala:363)
> 	at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:986)
> 	at $iwC$$iwC.<init>(<console>:9)
> 	at $iwC.<init>(<console>:18)
> 	at <init>(<console>:20)
> 	at .<init>(<console>:24)
> 	at .<clinit>(<console>)
> 	at .<init>(<console>:7)
> 	at .<clinit>(<console>)
> 	at $print(<console>)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852)
> 	at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125)
> 	at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674)
> 	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705)
> 	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669)
> 	at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828)
> 	at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873)
> 	at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785)
> 	at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:123)
> 	at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:122)
> 	at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:270)
> 	at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:122)
> 	at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:60)
> 	at org.apache.spark.repl.SparkILoop$$anonfun$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:945)
> 	at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:147)
> 	at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:60)
> 	at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:106)
> 	at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:60)
> 	at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:962)
> 	at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)
> 	at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)
> 	at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> 	at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916)
> 	at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011)
> 	at org.apache.spark.repl.Main$.main(Main.scala:31)
> 	at org.apache.spark.repl.Main.main(Main.scala)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:365)
> 	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
> 	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}
> It looks like the directory existence check was introduced in https://github.com/apache/spark/commit/456451911d11cc0b6738f31b1e17869b1fb51c87?diff=unified.  This is a change of behavior / regression from earlier Spark versions, which would create the event log directory if it did not exist.
> I think the intent of this check may have been to handle cases where the event directory path corresponds to an existing file, so maybe we can guard the `!isDirectory` check with an `exists` check first and change the error message to be more specific.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org