You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2019/05/14 17:31:00 UTC

[jira] [Updated] (SPARK-27709) AppStatusListener.cleanupExecutors should remove dead executors in an ordering that makes sense, not a random order

     [ https://issues.apache.org/jira/browse/SPARK-27709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Josh Rosen updated SPARK-27709:
-------------------------------
    Description: 
When AppStatusListener removes dead executors in excess of {{spark.ui.retainedDeadExecutors}}, it looks like it does so in an essentially random order:

Based on the [current code|https://github.com/apache/spark/blob/fee695d0cf211e4119c7df7a984708628dc9368a/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala#L1112] it looks like we only index based on {{"active"}} but don't perform any secondary indexing or sorting based on the age / ID of the executor.

Instead, I think it might make sense to remove the oldest executors first, similar to how we order by "completionTime" when cleaning up old stages.

I think we should also consider making a higher default of {{spark.ui.retainedDeadExecutors}}: it currently defaults to 100 but this seems really low in comparison to the total number of retained tasks / stages / jobs (which collectively take much more space to store). Maybe ~1000 is a safe default?

  was:
When AppStatusListener removes dead executors in excess of {{spark.ui.retainedDeadExecutors}}, it looks like it does so in an essentially random order:

Based on the [current code|https://github.com/apache/spark/blob/fee695d0cf211e4119c7df7a984708628dc9368a/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala#L1112] it looks like we only index based on {{"active"}} but don't perform any secondary indexing or sorting based on the age / ID of the executor.

Instead, I think it might make sense to remove the oldest executors first, similar to how we order by "completionTime" when cleaning up old stages.

I think we should also consider making a higher default of {{spark.ui.retainedDeadExecutors}}: it currently defaults to 100 but this seems really low in comparison to the total number of retained tasks / stages / jobs (which collectively take much more space to store). Maybe ~1000 is a safe default?

(As long as we're 


> AppStatusListener.cleanupExecutors should remove dead executors in an ordering that makes sense, not a random order
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-27709
>                 URL: https://issues.apache.org/jira/browse/SPARK-27709
>             Project: Spark
>          Issue Type: Improvement
>          Components: Web UI
>    Affects Versions: 2.4.0
>            Reporter: Josh Rosen
>            Priority: Minor
>
> When AppStatusListener removes dead executors in excess of {{spark.ui.retainedDeadExecutors}}, it looks like it does so in an essentially random order:
> Based on the [current code|https://github.com/apache/spark/blob/fee695d0cf211e4119c7df7a984708628dc9368a/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala#L1112] it looks like we only index based on {{"active"}} but don't perform any secondary indexing or sorting based on the age / ID of the executor.
> Instead, I think it might make sense to remove the oldest executors first, similar to how we order by "completionTime" when cleaning up old stages.
> I think we should also consider making a higher default of {{spark.ui.retainedDeadExecutors}}: it currently defaults to 100 but this seems really low in comparison to the total number of retained tasks / stages / jobs (which collectively take much more space to store). Maybe ~1000 is a safe default?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org