You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuming Wang (JIRA)" <ji...@apache.org> on 2018/09/12 00:29:00 UTC

[jira] [Commented] (SPARK-25409) Speed up Spark History at start if there are tens of thousands of applications.

    [ https://issues.apache.org/jira/browse/SPARK-25409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16611409#comment-16611409 ] 

Yuming Wang commented on SPARK-25409:
-------------------------------------

Please create pull request from Github: https://github.com/apache/spark/pulls

> Speed up Spark History at start if there are tens of thousands of applications.
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-25409
>                 URL: https://issues.apache.org/jira/browse/SPARK-25409
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.3.1
>            Reporter: Rong Tang
>            Priority: Major
>         Attachments: SPARK-25409.0001.patch
>
>
> We have a spark history server, storing 7 days' applications. it usually has 10K to 20K attempts.
> We found that it can take hours at start up,loading/replaying the logs in event-logs folder.  thus, new finished applications have to wait several hours to be seem. So I made 2 improvements for it.
>  # As we run spark on yarn. the on-going applications' information can also be seen via resource manager, so I introduce in a flag spark.history.fs.load.incomplete to say loading logs for incomplete attempts or not.
>  # Incremental loading applications. as I said, we have more then 10K applications stored, it can take hours to load all of them at the first time. so I introduced in a config spark.history.fs.appRefreshNum to say how many application to load each time, then it gets a chance the check the latest updates.
> Here are the benchmark I did.  our system has 1K incomplete application ( it was not cleaned up for some reason, that is another issue that I need investigate), and applications' log size can be gigabytes. 
>  
> Not load incomplete attempts.
> | |Load Count|Load incomplete APPs|All attempts number|Time Cost|Increase with more attempts|
> |1 ( current implementation)|All|Yes|13K|2 hours 14 minutes|Yes|
> |2|All|No|13K|31 minutes| yes|
>  
>  
> Limit each time how much to load.
>  
> | |Load Count|Load incomplete APPs|All attempts number|Worst Cost|Increase with more attempts|
> |1 ( current implementation)|All|Yes|13K|2 hours 14 minutes|Yes|
> |2|3000|Yes|13K|42 minutes except last 1.6K
> (The last 1.6K attempts cost extremely long 2.5 hours)|NO|
>  
>  
> Limit each time how many to load, and not load incomplete jobs.
>  
> | |Load Count|Load incomplete APPs|All attempts number|Worst Cost|Avg|Increase with more attempts|
> |1 ( current implementation)|All|Yes|13K|2 hours 14 minutes| |Yes|
> |2|3000|NO|12K|17minutes
>  |10 minutes
> ( 41 minutes in total)|NO|
>  
>  
> | |Load Count|Load incomplete APPs|All attempts number|Worst Cost|Avg|Increase with more attempts|
> |1 ( current implementation)|All|Yes|20K|1 hour 52 minutes| |Yes|
> |2|3000|NO|18.5K|20minutes|18 minutes
> (2 hours 18 minutes in total)
>  |NO|
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org