You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/12/23 12:25:58 UTC

[jira] [Resolved] (SPARK-18988) Spark "spark.eventLog.dir" dir should create the directory if it is different from "spark.history.fs.logDirectory"

     [ https://issues.apache.org/jira/browse/SPARK-18988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-18988.
-------------------------------
    Resolution: Not A Problem

The behavior is on purpose. You must ensure that the directory you want Spark to log into exists. This means, for example, you don't accidentally silently log into something you didn't intend because of a typo.

> Spark "spark.eventLog.dir" dir should create the directory if it is different from "spark.history.fs.logDirectory"
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-18988
>                 URL: https://issues.apache.org/jira/browse/SPARK-18988
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 1.6.1, 2.1.0
>            Reporter: Chen He
>            Priority: Minor
>
> When set "spark.history.fs.logDirectory" to be hdfs:///spark-history but set "spark.eventLog.dir" to be hdfs:///spark-history/eventLog. It reports following error. 
> ERROR spark.SparkContext: Error initializing SparkContext.
> java.io.FileNotFoundException: File does not exist: hdfs:/spark-history/eventLog
> 	at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1367)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1359)
> 	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1359)
> 	at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:100)
> 	at org.apache.spark.SparkContext.<init>(SparkContext.scala:549)
> 	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
> 	at com.oracle.test.logs.Main.main(Main.java:13)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:497)
> 	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:559)
> If spark event history has to be the same as "spark.history.fs.logDirectory", why has "spark.eventLog.dir". If not, In the EventLoggingListener.start(). It should try to create this dir instead of just simply throwing exception. 
> {code}
>   def start() {
>     if (!fileSystem.getFileStatus(new Path(logBaseDir)).isDir) {
>       throw new IllegalArgumentException(s"Log directory $logBaseDir does not exist.")
>     }
> {code}
> It cause confusion, at the same time, Spark documentation does not make it clear
> {quote}
> 	Base directory in which Spark events are logged, if spark.eventLog.enabled is true. *Within this base directory* (???you must make sure it already exists???), Spark creates a sub-directory for each application, and logs the events specific to the application in this directory. Users may want to set this to a unified location like an HDFS directory so history files can be read by the history server.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org