You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Damien Doucet-Girard (JIRA)" <ji...@apache.org> on 2018/12/05 20:06:00 UTC

[jira] [Updated] (SPARK-26284) Spark History server object vs file storage behavior difference

     [ https://issues.apache.org/jira/browse/SPARK-26284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Damien Doucet-Girard updated SPARK-26284:
-----------------------------------------
    Description: 
I am using the spark history server in order to view running/complete jobs on spark using the kubernetes scheduling backend introduced in 2.3.0. Using a local file path in both {color:#333333}{{spark.eventLog.dir}}{color} and {{spark.history.fs.logDirectory}}, I have no issue seeing both incomplete and completed tasks, with {{.inprogress}} files being flushed regularly. However, when using an {{s3a://}} path, it seems the calls to flush the file ([https://github.com/apache/spark/blob/dd518a196c2d40ae48034b8b0950d1c8045c02ed/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L152-L154)] don't actually upload the file to s3. Due to this, I am unable to see currently incomplete tasks using an s3a path.

From the behavior I've observed, it only uploads on completion of the task (hadoop 2.7) or upon the log file filling up the block size set for s3a {{spark.hadoop.fs.s3a.multipart.size}} (hadoop 3.0.0). Is this intended behavior?

  was:
I am using the spark history server in order to view running/complete jobs on spark using the kubernetes scheduling backend introduced in 2.3.0. Using a local file path in both `{color:#333333}spark.eventLog.dir{color}` and `spark.history.fs.logDirectory`, I have no issue seeing both incomplete and completed tasks, with `.inprogress` files being flushed regularly. However, when using an `s3a://` path, it seems the calls to flush the file ([https://github.com/apache/spark/blob/dd518a196c2d40ae48034b8b0950d1c8045c02ed/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L152-L154)] don't actually upload the file to s3. Due to this, I am unable to see currently incomplete tasks using an s3a path.

From the behavior I've observed, it only uploads on completion of the task (hadoop 2.7) or upon the log file filling up the block size set for s3a `{color:#6a8759}{color:#333333}spark.hadoop.fs.s3a.multipart.size{color}` {color}(hadoop 3.0.0). Is this intended behavior?


> Spark History server object vs file storage behavior difference
> ---------------------------------------------------------------
>
>                 Key: SPARK-26284
>                 URL: https://issues.apache.org/jira/browse/SPARK-26284
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: Damien Doucet-Girard
>            Priority: Minor
>
> I am using the spark history server in order to view running/complete jobs on spark using the kubernetes scheduling backend introduced in 2.3.0. Using a local file path in both {color:#333333}{{spark.eventLog.dir}}{color} and {{spark.history.fs.logDirectory}}, I have no issue seeing both incomplete and completed tasks, with {{.inprogress}} files being flushed regularly. However, when using an {{s3a://}} path, it seems the calls to flush the file ([https://github.com/apache/spark/blob/dd518a196c2d40ae48034b8b0950d1c8045c02ed/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L152-L154)] don't actually upload the file to s3. Due to this, I am unable to see currently incomplete tasks using an s3a path.
> From the behavior I've observed, it only uploads on completion of the task (hadoop 2.7) or upon the log file filling up the block size set for s3a {{spark.hadoop.fs.s3a.multipart.size}} (hadoop 3.0.0). Is this intended behavior?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org