You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/03/17 17:53:57 UTC

[GitHub] [spark] baohe-zhang opened a new pull request #31871: [SPARK-34779] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

baohe-zhang opened a new pull request #31871:
URL: https://github.com/apache/spark/pull/31871


   ### What changes were proposed in this pull request?
   Allow ExecutoMetricsPoller to keep stage entries in stageTCMP until a heartbeat occurs even if the entries have task count = 0.
   
   ### Why are the changes needed?
   This is a bug fix. 
   
   The current implementation of ExecutoMetricsPoller uses task count in each stage to decide whether to keep a stage entry or not. In the case of the executor only has 1 core, it may have these issues:
   
   1. Peak metrics missing (due to stage entry being removed within a heartbeat interval)
   2. Unnecessary and frequent hashmap entry removal and insertion.
   
   The detailed workflows of how entry removal causes the issue can be found on the SPARK-34779.
   
   This patch avoids peak polling metrics missing and helps ExecutoMetricsPoller report more accurate peak metrics for each active stage.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   Manual test this patch by running jobs with custom metrics polling interval and observe the debug logs of ExecutoMetricsPoller.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-801325920






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] baohe-zhang commented on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
baohe-zhang commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-801312401


   cc @squito 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-806210213


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41057/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-801325920






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on pull request #31871: [SPARK-34779][CORE] ExecutorMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
attilapiros commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-812143190


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #31871:
URL: https://github.com/apache/spark/pull/31871#discussion_r599187411



##########
File path: core/src/main/scala/org/apache/spark/executor/ExecutorMetricsPoller.scala
##########
@@ -124,17 +124,11 @@ private[spark] class ExecutorMetricsPoller(
    */
   def onTaskCompletion(taskId: Long, stageId: Int, stageAttemptId: Int): Unit = {
     // Decrement the task count.
-    // Remove the entry from stageTCMP if the task count reaches zero.
 
     def decrementCount(stage: StageKey, countAndPeaks: TCMP): TCMP = {
       val countValue = countAndPeaks.count.decrementAndGet()

Review comment:
       Shall we add `assert(countValue >= 0)` here to ensure the task count doesn't go wrong? And then in `removeIfInactive`, we could use `v.count.get == 0`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] baohe-zhang commented on pull request #31871: [SPARK-34779][CORE] ExecutorMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
baohe-zhang commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-814627595


   Yeah, it makes sense.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] baohe-zhang commented on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
baohe-zhang commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-810425928


   @wypoon Thanks for pointing it out! I will update the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-805122203


   **[Test build #136419 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136419/testReport)** for PR 31871 at commit [`27fef30`](https://github.com/apache/spark/commit/27fef3023f32f233d1028e9686c571bbcab2cc12).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31871: [SPARK-34779][CORE] ExecutorMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-812198273


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41406/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wypoon commented on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
wypoon commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-809805138


   LGTM too. Thanks for fixing the bug!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on pull request #31871: [SPARK-34779][CORE] ExecutorMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
attilapiros commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-812323640


   Merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-809839967


   **[Test build #136679 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136679/testReport)** for PR 31871 at commit [`2751c75`](https://github.com/apache/spark/commit/2751c75f594429a0d90c27d784c030a5fa847eed).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-809901030


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136679/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on pull request #31871: [SPARK-34779][CORE] ExecutorMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
attilapiros commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-814579136


   The whole `ExecutorMetricsPoller` was introduced in Spark 3.0 by https://issues.apache.org/jira/browse/SPARK-26329 which was definitely an improvement (even without this PR) on Spark 2.4.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31871: [SPARK-34779][CORE] ExecutorMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-812164543


   **[Test build #136826 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136826/testReport)** for PR 31871 at commit [`2751c75`](https://github.com/apache/spark/commit/2751c75f594429a0d90c27d784c030a5fa847eed).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] baohe-zhang commented on a change in pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
baohe-zhang commented on a change in pull request #31871:
URL: https://github.com/apache/spark/pull/31871#discussion_r599803715



##########
File path: core/src/main/scala/org/apache/spark/executor/ExecutorMetricsPoller.scala
##########
@@ -124,17 +124,11 @@ private[spark] class ExecutorMetricsPoller(
    */
   def onTaskCompletion(taskId: Long, stageId: Int, stageAttemptId: Int): Unit = {
     // Decrement the task count.
-    // Remove the entry from stageTCMP if the task count reaches zero.
 
     def decrementCount(stage: StageKey, countAndPeaks: TCMP): TCMP = {
       val countValue = countAndPeaks.count.decrementAndGet()

Review comment:
       Yeah that sounds good, I added it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-809857723


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41261/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-806220274


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41057/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-806192271


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41057/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on pull request #31871: [SPARK-34779][CORE] ExecutorMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-814550486


   Is this a regression introduced in Spark 3.0? If so, I think we should backport the fix to 3.1/3.0 as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31871: [SPARK-34779][CORE] ExecutorMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-812239621


   **[Test build #136826 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136826/testReport)** for PR 31871 at commit [`2751c75`](https://github.com/apache/spark/commit/2751c75f594429a0d90c27d784c030a5fa847eed).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-809860547


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41261/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on pull request #31871: [SPARK-34779][CORE] ExecutorMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
attilapiros commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-814576070


   @Ngone51 this is not a bugfix but an improvement (see the updated PR description and https://github.com/apache/spark/pull/31871#issuecomment-809805138) and we are not backporting improvements


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31871: [SPARK-34779][CORE] ExecutorMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-812240213


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136826/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-805963297


   **[Test build #136473 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136473/testReport)** for PR 31871 at commit [`2751c75`](https://github.com/apache/spark/commit/2751c75f594429a0d90c27d784c030a5fa847eed).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] redsanket commented on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
redsanket commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-802850828


   lgtm gentle ping @dongjoon-hyun @cloud-fan for reviews


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on pull request #31871: [SPARK-34779][CORE] ExecutorMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-814588943


   Ok, I saw `executorUpdates` is totally a new field in `Heartbeat` that introduced in Spark 3.0. So it shouldn't be a regression and don't need to backport @baohe-zhang .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] baohe-zhang commented on pull request #31871: [SPARK-34779][CORE] ExecutorMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
baohe-zhang commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-810653033


   @wypoon I updated the PR description fixed typo in the title.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-806220274


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41057/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-809901030


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136679/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31871: [SPARK-34779][CORE] ExecutorMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-812198288


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41406/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31871: [SPARK-34779][CORE] ExecutorMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-812164543


   **[Test build #136826 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136826/testReport)** for PR 31871 at commit [`2751c75`](https://github.com/apache/spark/commit/2751c75f594429a0d90c27d784c030a5fa847eed).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-809860564


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41261/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-809894576


   **[Test build #136679 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136679/testReport)** for PR 31871 at commit [`2751c75`](https://github.com/apache/spark/commit/2751c75f594429a0d90c27d784c030a5fa847eed).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-809835258


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31871: [SPARK-34779][CORE] ExecutorMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-812198288


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41406/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
attilapiros commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-805966882


   I give time for others to review and if no issue comes up I'll merge it in the end of next week (assuming it's still passing CI and no review is in progress).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on pull request #31871: [SPARK-34779][CORE] ExecutorMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-814576566


   I know it's an improvement based on `ExecutorMetricsPoller`. But, is it a regression compares to Spark 2.4?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31871: [SPARK-34779][CORE] ExecutorMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-812197544


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41406/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on a change in pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
attilapiros commented on a change in pull request #31871:
URL: https://github.com/apache/spark/pull/31871#discussion_r600392925



##########
File path: core/src/test/scala/org/apache/spark/executor/ExecutorMetricsPollerSuite.scala
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.executor
+
+import org.apache.spark.{SparkConf, SparkFunSuite}
+import org.apache.spark.memory.TestMemoryManager
+
+class ExecutorMetricsPollerSuite extends SparkFunSuite {
+
+  test("stage entry shouldn't be removed before a heartbeat occurs") {
+    val testMemoryManager = new TestMemoryManager(new SparkConf())
+    val poller = new ExecutorMetricsPoller(testMemoryManager, 1000, None)
+
+    poller.onTaskStart(0L, 0, 0)
+    // stage (0, 0) has an active task, so it remains on stageTCMP after heartbeat.
+    assert(poller.getExecutorUpdates.size === 1)
+    assert(poller.stageTCMP.size === 1)
+
+    poller.onTaskCompletion(0L, 0, 0)

Review comment:
       Let's test the active tasks count too:
   
   ```suggestion
       assert(poller.stageTCMP.size === 1)
       assert(poller.stageTCMP.get((0, 0)).count.get === 1)
       
       poller.onTaskCompletion(0L, 0, 0)
       assert(poller.stageTCMP.get((0, 0)).count.get === 0)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
attilapiros commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-805748882


   And please apply the change requested by @Ngone51 too: 
   https://github.com/apache/spark/pull/31871#discussion_r600302467
   
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on a change in pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
attilapiros commented on a change in pull request #31871:
URL: https://github.com/apache/spark/pull/31871#discussion_r600447394



##########
File path: core/src/main/scala/org/apache/spark/executor/ExecutorMetricsPoller.scala
##########
@@ -176,6 +171,20 @@ private[spark] class ExecutorMetricsPoller(
 
     stageTCMP.replaceAll(getUpdateAndResetPeaks)
 
+    def removeIfInactive(k: StageKey, v: TCMP): TCMP = {
+      if (v.count.get == 0) {
+        logDebug(s"removing (${k._1}, ${k._2}) from stageTCMP")
+        null
+      } else {
+        v
+      }
+    }
+
+    // Remove the entry from stageTCMP if the task count reaches zero.
+    executorUpdates.foreach { case (k, v) =>

Review comment:
       one more super Nit: `v` is not used so 
   either 
   `executorUpdates.keySet.foreach { k =>` 
   or 
   `executorUpdates.foreach { case (k, _) =>`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros closed pull request #31871: [SPARK-34779][CORE] ExecutorMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
attilapiros closed pull request #31871:
URL: https://github.com/apache/spark/pull/31871


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] baohe-zhang commented on pull request #31871: [SPARK-34779][CORE] ExecutorMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
baohe-zhang commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-812058160


   @Ngone51 Could you retest this pr? I think the test failures are not related to the changes of this pr. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-809860564


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41261/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] baohe-zhang commented on a change in pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
baohe-zhang commented on a change in pull request #31871:
URL: https://github.com/apache/spark/pull/31871#discussion_r600618146



##########
File path: core/src/test/scala/org/apache/spark/executor/ExecutorMetricsPollerSuite.scala
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.executor
+
+import org.apache.spark.{SparkConf, SparkFunSuite}
+import org.apache.spark.memory.TestMemoryManager
+
+class ExecutorMetricsPollerSuite extends SparkFunSuite {
+
+  test("stage entry shouldn't be removed before a heartbeat occurs") {

Review comment:
       Added.

##########
File path: core/src/test/scala/org/apache/spark/executor/ExecutorMetricsPollerSuite.scala
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.executor
+
+import org.apache.spark.{SparkConf, SparkFunSuite}
+import org.apache.spark.memory.TestMemoryManager
+
+class ExecutorMetricsPollerSuite extends SparkFunSuite {
+
+  test("stage entry shouldn't be removed before a heartbeat occurs") {
+    val testMemoryManager = new TestMemoryManager(new SparkConf())
+    val poller = new ExecutorMetricsPoller(testMemoryManager, 1000, None)
+
+    poller.onTaskStart(0L, 0, 0)
+    // stage (0, 0) has an active task, so it remains on stageTCMP after heartbeat.
+    assert(poller.getExecutorUpdates.size === 1)
+    assert(poller.stageTCMP.size === 1)
+
+    poller.onTaskCompletion(0L, 0, 0)

Review comment:
       Done.

##########
File path: core/src/main/scala/org/apache/spark/executor/ExecutorMetricsPoller.scala
##########
@@ -176,6 +171,20 @@ private[spark] class ExecutorMetricsPoller(
 
     stageTCMP.replaceAll(getUpdateAndResetPeaks)
 
+    def removeIfInactive(k: StageKey, v: TCMP): TCMP = {
+      if (v.count.get == 0) {
+        logDebug(s"removing (${k._1}, ${k._2}) from stageTCMP")
+        null
+      } else {
+        v
+      }
+    }
+
+    // Remove the entry from stageTCMP if the task count reaches zero.
+    executorUpdates.foreach { case (k, v) =>

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wypoon edited a comment on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
wypoon edited a comment on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-809805138


   LGTM. I commented in the Jira [SPARK-34779](https://issues.apache.org/jira/browse/SPARK-34779) - I don't think that there is a bug as claimed, i.e., I don't think any metrics that are polled are lost, because task metric peaks are sent at task end, and are aggregated. They do not have to come from the executor metric update in a heartbeat. However, this change is an improvement, as it reduces the frequency of removal and insertion of entries in `stageTCMP`.
   Please update the PR description, as that would become the commit message.
   I suggest changing the "Why are the changes needed" section to:
   
   """
   This is an improvement.
   The current implementation of `ExecutorMetricsPoller` keeps a map, `stageTCMP` of (stageId, stageAttemptId) to (count of running tasks, executor metric peaks). The entry for the stage is removed on task completion if the task count decreases to 0. In the case of an executor with a single core, this leads to unnecessary removal and insertion of entries for a given stage.
   """
   
   (Note: please correct the spelling of ExecutorMetricsPoller in the previous section and in the title as well.)
   
   For the "How was this patch tested" section, you probably want to say that a new unit test is added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #31871:
URL: https://github.com/apache/spark/pull/31871#discussion_r600302467



##########
File path: core/src/test/scala/org/apache/spark/executor/ExecutorMetricsPollerSuite.scala
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.executor
+
+import org.apache.spark.{SparkConf, SparkFunSuite}
+import org.apache.spark.memory.TestMemoryManager
+
+class ExecutorMetricsPollerSuite extends SparkFunSuite {
+
+  test("stage entry shouldn't be removed before a heartbeat occurs") {

Review comment:
       nit: `SPARK-34779: stage ...`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-809839967


   **[Test build #136679 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136679/testReport)** for PR 31871 at commit [`2751c75`](https://github.com/apache/spark/commit/2751c75f594429a0d90c27d784c030a5fa847eed).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31871: [SPARK-34779][CORE] ExecutorMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-812240213


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136826/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] baohe-zhang commented on pull request #31871: [SPARK-34779][CORE] ExecutorMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
baohe-zhang commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-814557605


   @Ngone51 Yes, this was introduced in spark 3.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] baohe-zhang commented on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
baohe-zhang commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-802061173


   Cc @squito @wypoon  Could you review this change? Thanks!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on a change in pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
attilapiros commented on a change in pull request #31871:
URL: https://github.com/apache/spark/pull/31871#discussion_r600447394



##########
File path: core/src/main/scala/org/apache/spark/executor/ExecutorMetricsPoller.scala
##########
@@ -176,6 +171,20 @@ private[spark] class ExecutorMetricsPoller(
 
     stageTCMP.replaceAll(getUpdateAndResetPeaks)
 
+    def removeIfInactive(k: StageKey, v: TCMP): TCMP = {
+      if (v.count.get == 0) {
+        logDebug(s"removing (${k._1}, ${k._2}) from stageTCMP")
+        null
+      } else {
+        v
+      }
+    }
+
+    // Remove the entry from stageTCMP if the task count reaches zero.
+    executorUpdates.foreach { case (k, v) =>

Review comment:
       one more super Nit: `v` is not used so 
   either 
   `executorUpdates.keySet.foreach { case k =>` 
   or 
   `executorUpdates.foreach { case (k, _) =>`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-805639374


   cc @tgravescs @mridulm @attilapiros 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] baohe-zhang commented on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
baohe-zhang commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-805967858


   Sounds good, thanks for the review! @Ngone51 @attilapiros 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on pull request #31871: [SPARK-34779][CORE] ExecutorMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-814574948


   So, could you help submit a separate PR to backport to 3.1 & 3.0?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] baohe-zhang commented on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

Posted by GitBox <gi...@apache.org>.
baohe-zhang commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-805108234


   @Ngone51 I added a unit test for the changes. The unit test is to verify the stage entry and its peak metrics will be kept in the current heartbeat interval even if the active task count is down to 0. The goal is to avoid peak metrics missing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org