You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Miles Crawford (JIRA)" <ji...@apache.org> on 2016/04/12 18:50:25 UTC

[jira] [Commented] (SPARK-14561) History Server does not see new logs in S3

    [ https://issues.apache.org/jira/browse/SPARK-14561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237525#comment-15237525 ] 

Miles Crawford commented on SPARK-14561:
----------------------------------------

Steve Loughran on the user list says:
{quote}
s3 isn't a real filesystem, and apps writing to it don't have any data written until one of
 -the output stream is close()'d. This happens at the end of the app
 -the file is set up to be partitioned and a partition size is crossed

Until either of those conditions are met, the history server isn't going to see anything.

If you are going to use s3 as the dest, and you want to see incomplete apps, then you'll need to configure the spark job to have smaller partition size (64? 128? MB).

If it's completed apps that aren't being seen by the HS, then that's a bug, though if its against s3 only, likely to be something related to directory listings
{quote}

I agree - and it is only new, completed jobs that aren't showing up. If I restart the history server, it catches up and sees all the jobs.

> History Server does not see new logs in S3
> ------------------------------------------
>
>                 Key: SPARK-14561
>                 URL: https://issues.apache.org/jira/browse/SPARK-14561
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.6.1
>            Reporter: Miles Crawford
>
> If you set the Spark history server to use a log directory with an s3a:// url, everything appears to work fine at first, but new log files written by applications are not picked up by the server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org