You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ankuriitg <gi...@git.apache.org> on 2018/08/23 19:56:42 UTC

[GitHub] spark pull request #22209: [SPARK-24415][Core] HistoryServer does not displa...

GitHub user ankuriitg opened a pull request:

    https://github.com/apache/spark/pull/22209

    [SPARK-24415][Core] HistoryServer does not display metrics from tasks…

    … that
    
    complete after stage failure
    
    ## What changes were proposed in this pull request?
    The problem occurs because stage object is removed from liveStages in
    AppStatusListener onStageCompletion. Because of this any onTaskEnd event
    received after onStageCompletion event do not update stage metrics.
    
    The fix is to retain stage objects in liveStages until all tasks are complete.
    
    ## How was this patch tested?
    
    1. Fixed the reproducible example posted in the JIRA
    2. Added unit test

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ankuriitg/spark ankurgupta/SPARK-24415

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22209.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22209
    
----
commit cd1e8564cd04ae6d94857608b87ace3fef975136
Author: ankurgupta <an...@...>
Date:   2018-08-23T18:48:40Z

    [SPARK-24415][Core] HistoryServer does not display metrics from tasks that
    complete after stage failure
    
    The problem occurs because stage object is removed from liveStages in
    AppStatusListener onStageCompletion. Because of this any onTaskEnd event
    received after onStageCompletion event do not update stage metrics.
    
    The fix is to retain stage objects in liveStages until all tasks are complete.
    
    Testing Done:
    1. Fixed the reproducible example posted in the JIRA
    2. Added unit test

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r215344056
  
    --- Diff: streaming/src/test/scala/org/apache/spark/streaming/UISeleniumSuite.scala ---
    @@ -77,7 +77,14 @@ class UISeleniumSuite
         inputStream.foreachRDD { rdd =>
           rdd.foreach(_ => {})
           try {
    -        rdd.foreach(_ => throw new RuntimeException("Oops"))
    +        rdd.foreach { _ =>
    +          import org.apache.spark.TaskContext
    --- End diff --
    
    This should be together with other imports. Will fix during merge.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the issue where stage p...

Posted by ankuriitg <gi...@git.apache.org>.
Github user ankuriitg commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r212688205
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -350,11 +350,16 @@ private[spark] class AppStatusListener(
             val e = it.next()
             if (job.stageIds.contains(e.getKey()._1)) {
               val stage = e.getValue()
    -          stage.status = v1.StageStatus.SKIPPED
    -          job.skippedStages += stage.info.stageId
    -          job.skippedTasks += stage.info.numTasks
    -          it.remove()
    -          update(stage, now)
    +          // Only update the stage if it has not finished already
    +          if (v1.StageStatus.ACTIVE.equals(stage.status) ||
    +              v1.StageStatus.PENDING.equals(stage.status)) {
    +            stage.status = v1.StageStatus.SKIPPED
    +            job.skippedStages += stage.info.stageId
    +            job.skippedTasks += stage.info.numTasks
    +            job.activeStages -= 1
    +            it.remove()
    --- End diff --
    
    I looked further into it and I could not find any case where onJobEnd is received before onStageCompleted. So the code that iterates over active stages in onJobEnd most likely handles some unforeseen edge scenario.
    
    I have fixed the bug with updating pool though.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] HistoryServer does not display metri...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r214187951
  
    --- Diff: core/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala ---
    @@ -1190,6 +1190,61 @@ class AppStatusListenerSuite extends SparkFunSuite with BeforeAndAfter {
         assert(appStore.asOption(appStore.lastStageAttempt(3)) === None)
       }
     
    +  test("SPARK-24415: update metrics for tasks that finish late") {
    +    val listener = new AppStatusListener(store, conf, true)
    +
    +    val stage1 = new StageInfo(1, 0, "stage1", 4, Nil, Nil, "details1")
    +    val stage2 = new StageInfo(2, 0, "stage2", 4, Nil, Nil, "details2")
    +
    +    // Start job
    +    listener.onJobStart(SparkListenerJobStart(1, time, Seq(stage1, stage2), null))
    +
    +    // Start 2 stages
    +    listener.onStageSubmitted(SparkListenerStageSubmitted(stage1, new Properties()))
    +    listener.onStageSubmitted(SparkListenerStageSubmitted(stage2, new Properties()))
    +
    +    // Start 2 Tasks
    +    val tasks = createTasks(2, Array("1"))
    +    tasks.foreach { task =>
    +      listener.onTaskStart(SparkListenerTaskStart(stage1.stageId, stage1.attemptNumber, task))
    +    }
    +
    +    // Task 1 Finished
    +    time += 1
    +    tasks(0).markFinished(TaskState.FINISHED, time)
    +    listener.onTaskEnd(
    +      SparkListenerTaskEnd(stage1.stageId, stage1.attemptId, "taskType", Success, tasks(0), null))
    +
    +    // Stage 1 Completed
    +    stage1.failureReason = Some("Failed")
    +    listener.onStageCompleted(SparkListenerStageCompleted(stage1))
    +
    +    // Stop job 1
    +    time += 1
    +    listener.onJobEnd(SparkListenerJobEnd(1, time, JobSucceeded))
    +
    +    // Task 2 Killed
    +    time += 1
    +    tasks(1).markFinished(TaskState.FINISHED, time)
    +    listener.onTaskEnd(
    +      SparkListenerTaskEnd(stage1.stageId, stage1.attemptId, "taskType",
    +        TaskKilled(reason = "Killed"), tasks(1), null))
    +
    +    // Ensure killed task metrics are updated
    +    val allStages = store.view(classOf[StageDataWrapper]).reverse().asScala.map(_.info)
    +    val failedStages = allStages.filter(_.status == v1.StageStatus.FAILED)
    +    assert(failedStages.size == 1)
    +    assert(failedStages.head.numKilledTasks == 1)
    +    assert(failedStages.head.numCompleteTasks == 1)
    +
    +    val allJobs = store.view(classOf[JobDataWrapper]).reverse().asScala.map(_.info)
    +    assert(allJobs.size == 1)
    --- End diff --
    
    I was a little puzzled about why this test was working. Turns out that job metrics are updated in `onTaskEnd` based on the jobs that the stage is tracking.
    
    The weirdness if because above, in your test, you're ending the job before the task end event, but the job is still being updated. That's because of the above, and because `LIVE_ENTITY_UPDATE_PERIOD` is `0` (disabled) in the tests.
    
    So in a real app you could still miss the updates to the job metrics, since the `maybeUpdate` call in the `onTaskEnd` handler may skip updating the job, and since it's not tracked anymore, it won't be flushed.
    
    Anyway, this is probably minor and could be a separate fix. Could you file a separate bug to audit remaining event order issues in this code (like the above), and also what happens when events are dropped?
    
    Unless you want to fix that here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    (Feel free to do it if you think it should be there, btw.)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] HistoryServer does not display metri...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Just a note the jira I filed for this was actually affecting running jobs as well (https://issues.apache.org/jira/browse/SPARK-24415)
    
    I think we have seen a similar issue on the history server though as well so perhaps its related. Haven't looked at the code here yet.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the issue where stage p...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r212705634
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -350,11 +350,16 @@ private[spark] class AppStatusListener(
             val e = it.next()
             if (job.stageIds.contains(e.getKey()._1)) {
               val stage = e.getValue()
    -          stage.status = v1.StageStatus.SKIPPED
    -          job.skippedStages += stage.info.stageId
    -          job.skippedTasks += stage.info.numTasks
    -          it.remove()
    -          update(stage, now)
    +          // Only update the stage if it has not finished already
    +          if (v1.StageStatus.ACTIVE.equals(stage.status) ||
    +              v1.StageStatus.PENDING.equals(stage.status)) {
    +            stage.status = v1.StageStatus.SKIPPED
    +            job.skippedStages += stage.info.stageId
    +            job.skippedTasks += stage.info.numTasks
    +            job.activeStages -= 1
    +            it.remove()
    --- End diff --
    
    This can happen when events get dropped ...
    Spark makes best case effort to deliver events in order; but when events get dropped, UI becomes inconsistent. I assume this might be an effort to recover in that case ? Any thoughts @tgravescs, @vanzin ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the issue where stage page agg...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Sorry, but is my biggest pet peeve. Your PR title still only explains the problem. "Fix problem X" does not describe the fix.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r213779029
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -506,7 +516,16 @@ private[spark] class AppStatusListener(
           if (killedDelta > 0) {
             stage.killedSummary = killedTasksSummary(event.reason, stage.killedSummary)
           }
    -      maybeUpdate(stage, now)
    +      // Remove stage if there are no active tasks left and stage is already finished
    +      val removeStage =
    +        stage.activeTasks == 0 &&
    +          (v1.StageStatus.COMPLETE.equals(stage.status) ||
    +            v1.StageStatus.FAILED.equals(stage.status))
    +      if (removeStage) {
    +        update(stage, now, last = true)
    --- End diff --
    
    Shouldn't this be removing the stage from the live list too?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95444/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by bersprockets <gi...@git.apache.org>.
Github user bersprockets commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

Posted by ankuriitg <gi...@git.apache.org>.
Github user ankuriitg commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r213857850
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -506,7 +516,16 @@ private[spark] class AppStatusListener(
           if (killedDelta > 0) {
             stage.killedSummary = killedTasksSummary(event.reason, stage.killedSummary)
           }
    -      maybeUpdate(stage, now)
    +      // Remove stage if there are no active tasks left and stage is already finished
    +      val removeStage =
    +        stage.activeTasks == 0 &&
    +          (v1.StageStatus.COMPLETE.equals(stage.status) ||
    +            v1.StageStatus.FAILED.equals(stage.status))
    +      if (removeStage) {
    +        update(stage, now, last = true)
    --- End diff --
    
    You are correct. I am removing the stage from live list towards the end of this function.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95233/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22209


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    **[Test build #95229 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95229/testReport)** for PR 22209 at commit [`70678dc`](https://github.com/apache/spark/commit/70678dc957255d459ba0b5c3960f8a6c8767f50d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

Posted by ankuriitg <gi...@git.apache.org>.
Github user ankuriitg commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r214372932
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -350,11 +350,20 @@ private[spark] class AppStatusListener(
             val e = it.next()
             if (job.stageIds.contains(e.getKey()._1)) {
               val stage = e.getValue()
    -          stage.status = v1.StageStatus.SKIPPED
    -          job.skippedStages += stage.info.stageId
    -          job.skippedTasks += stage.info.numTasks
    -          it.remove()
    -          update(stage, now)
    +          if (v1.StageStatus.PENDING.equals(stage.status)) {
    --- End diff --
    
    In this PR, I am only trying to fix any issues which are caused by out-of-order events leaving the ones caused by dropped events.
    
    So, with that information, onJobEnd event does not do anything for active stages. Active stages are updated in onTaskEnd event or onStageCompleted event.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r212773115
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -350,11 +350,16 @@ private[spark] class AppStatusListener(
             val e = it.next()
             if (job.stageIds.contains(e.getKey()._1)) {
               val stage = e.getValue()
    -          stage.status = v1.StageStatus.SKIPPED
    -          job.skippedStages += stage.info.stageId
    -          job.skippedTasks += stage.info.numTasks
    -          it.remove()
    -          update(stage, now)
    +          // Only update the stage if it has not finished already
    +          if (v1.StageStatus.ACTIVE.equals(stage.status) ||
    +              v1.StageStatus.PENDING.equals(stage.status)) {
    +            stage.status = v1.StageStatus.SKIPPED
    +            job.skippedStages += stage.info.stageId
    +            job.skippedTasks += stage.info.numTasks
    +            job.activeStages -= 1
    +            it.remove()
    --- End diff --
    
    They would be in the list if the task end event arrives late, right? (Haven't really re-read the code to be sure.)
    
    Unless it's guaranteed that the task end event will arrive before the job end event, unlike the case for the stage end one.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

Posted by ankuriitg <gi...@git.apache.org>.
Github user ankuriitg commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r213859644
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -350,11 +350,21 @@ private[spark] class AppStatusListener(
             val e = it.next()
             if (job.stageIds.contains(e.getKey()._1)) {
               val stage = e.getValue()
    -          stage.status = v1.StageStatus.SKIPPED
    -          job.skippedStages += stage.info.stageId
    -          job.skippedTasks += stage.info.numTasks
    -          it.remove()
    -          update(stage, now)
    +          // Mark the stage as skipped if in Pending status
    --- End diff --
    
    Removed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    **[Test build #95444 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95444/testReport)** for PR 22209 at commit [`b164bb1`](https://github.com/apache/spark/commit/b164bb1fee5435269161d7e83f0cf3901ff24402).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] HistoryServer does not display metri...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    **[Test build #95183 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95183/testReport)** for PR 22209 at commit [`cd1e856`](https://github.com/apache/spark/commit/cd1e8564cd04ae6d94857608b87ace3fef975136).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

Posted by ankuriitg <gi...@git.apache.org>.
Github user ankuriitg commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r212773791
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -350,11 +350,16 @@ private[spark] class AppStatusListener(
             val e = it.next()
             if (job.stageIds.contains(e.getKey()._1)) {
               val stage = e.getValue()
    -          stage.status = v1.StageStatus.SKIPPED
    -          job.skippedStages += stage.info.stageId
    -          job.skippedTasks += stage.info.numTasks
    -          it.remove()
    -          update(stage, now)
    +          // Only update the stage if it has not finished already
    +          if (v1.StageStatus.ACTIVE.equals(stage.status) ||
    +              v1.StageStatus.PENDING.equals(stage.status)) {
    +            stage.status = v1.StageStatus.SKIPPED
    +            job.skippedStages += stage.info.stageId
    +            job.skippedTasks += stage.info.numTasks
    +            job.activeStages -= 1
    +            it.remove()
    --- End diff --
    
    This code now only handles the scenario when onStageCompleted event is dropped (not received). If we don't want to handle that scenario, then we can remove this part of the code altogether.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the issue where stage page agg...

Posted by ankuriitg <gi...@git.apache.org>.
Github user ankuriitg commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Thanks for your comment @tgravescs. This fix is indeed for the SparkUI (running jobs) issue. It does not fix it for history server yet.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the issue where stage p...

Posted by ankuriitg <gi...@git.apache.org>.
Github user ankuriitg commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r212471192
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -350,11 +350,16 @@ private[spark] class AppStatusListener(
             val e = it.next()
             if (job.stageIds.contains(e.getKey()._1)) {
               val stage = e.getValue()
    -          stage.status = v1.StageStatus.SKIPPED
    -          job.skippedStages += stage.info.stageId
    -          job.skippedTasks += stage.info.numTasks
    -          it.remove()
    -          update(stage, now)
    +          // Only update the stage if it has not finished already
    +          if (v1.StageStatus.ACTIVE.equals(stage.status) ||
    +              v1.StageStatus.PENDING.equals(stage.status)) {
    +            stage.status = v1.StageStatus.SKIPPED
    +            job.skippedStages += stage.info.stageId
    +            job.skippedTasks += stage.info.numTasks
    +            job.activeStages -= 1
    +            it.remove()
    --- End diff --
    
    For already completed stages, we will leave the removal of stage to happen in either onTaskEnd or onStageCompleted event. This ensures that stage metrics are updated even when onJobEnd event is received before onTaskEnd event.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r214411084
  
    --- Diff: core/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala ---
    @@ -1190,6 +1190,61 @@ class AppStatusListenerSuite extends SparkFunSuite with BeforeAndAfter {
         assert(appStore.asOption(appStore.lastStageAttempt(3)) === None)
       }
     
    +  test("SPARK-24415: update metrics for tasks that finish late") {
    +    val listener = new AppStatusListener(store, conf, true)
    +
    +    val stage1 = new StageInfo(1, 0, "stage1", 4, Nil, Nil, "details1")
    +    val stage2 = new StageInfo(2, 0, "stage2", 4, Nil, Nil, "details2")
    +
    +    // Start job
    +    listener.onJobStart(SparkListenerJobStart(1, time, Seq(stage1, stage2), null))
    +
    +    // Start 2 stages
    +    listener.onStageSubmitted(SparkListenerStageSubmitted(stage1, new Properties()))
    +    listener.onStageSubmitted(SparkListenerStageSubmitted(stage2, new Properties()))
    +
    +    // Start 2 Tasks
    +    val tasks = createTasks(2, Array("1"))
    +    tasks.foreach { task =>
    +      listener.onTaskStart(SparkListenerTaskStart(stage1.stageId, stage1.attemptNumber, task))
    +    }
    +
    +    // Task 1 Finished
    +    time += 1
    +    tasks(0).markFinished(TaskState.FINISHED, time)
    +    listener.onTaskEnd(
    +      SparkListenerTaskEnd(stage1.stageId, stage1.attemptId, "taskType", Success, tasks(0), null))
    +
    +    // Stage 1 Completed
    +    stage1.failureReason = Some("Failed")
    +    listener.onStageCompleted(SparkListenerStageCompleted(stage1))
    +
    +    // Stop job 1
    +    time += 1
    +    listener.onJobEnd(SparkListenerJobEnd(1, time, JobSucceeded))
    +
    +    // Task 2 Killed
    +    time += 1
    +    tasks(1).markFinished(TaskState.FINISHED, time)
    +    listener.onTaskEnd(
    +      SparkListenerTaskEnd(stage1.stageId, stage1.attemptId, "taskType",
    +        TaskKilled(reason = "Killed"), tasks(1), null))
    +
    +    // Ensure killed task metrics are updated
    +    val allStages = store.view(classOf[StageDataWrapper]).reverse().asScala.map(_.info)
    +    val failedStages = allStages.filter(_.status == v1.StageStatus.FAILED)
    +    assert(failedStages.size == 1)
    +    assert(failedStages.head.numKilledTasks == 1)
    +    assert(failedStages.head.numCompleteTasks == 1)
    +
    +    val allJobs = store.view(classOf[JobDataWrapper]).reverse().asScala.map(_.info)
    +    assert(allJobs.size == 1)
    --- End diff --
    
    > I am only trying to fix any issues which are caused by out-of-order events
    
    Well, the job metrics not being updated is an out-of-order issue. The code in `onTaskEnd` that updates the job is this:
    
    ```
          stage.jobs.foreach { job =>
            ...
            maybeUpdate(job, now)
          }
    ```
    
    That `maybeUpdate` call, with default settings, may not write the job data to the store, and you'll end up with out-of-date info on the UI. This will only happen on live apps, since the SHS flushes everything at the end.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    **[Test build #95358 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95358/testReport)** for PR 22209 at commit [`76f1801`](https://github.com/apache/spark/commit/76f180181261a2d7adcce27c40bfb9126c094bc5).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    would be nice to put into 2.3 as well, I realize we are close to rc but I don't think we should stop backporting fixes since I don't expect 2.4 to be stable for a while.  If we stop for a bit for this rc we should have some way to track to pull back after rc. thoughts?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

Posted by ankuriitg <gi...@git.apache.org>.
Github user ankuriitg commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r215056633
  
    --- Diff: core/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala ---
    @@ -1190,6 +1190,61 @@ class AppStatusListenerSuite extends SparkFunSuite with BeforeAndAfter {
         assert(appStore.asOption(appStore.lastStageAttempt(3)) === None)
       }
     
    +  test("SPARK-24415: update metrics for tasks that finish late") {
    +    val listener = new AppStatusListener(store, conf, true)
    +
    +    val stage1 = new StageInfo(1, 0, "stage1", 4, Nil, Nil, "details1")
    +    val stage2 = new StageInfo(2, 0, "stage2", 4, Nil, Nil, "details2")
    +
    +    // Start job
    +    listener.onJobStart(SparkListenerJobStart(1, time, Seq(stage1, stage2), null))
    +
    +    // Start 2 stages
    +    listener.onStageSubmitted(SparkListenerStageSubmitted(stage1, new Properties()))
    +    listener.onStageSubmitted(SparkListenerStageSubmitted(stage2, new Properties()))
    +
    +    // Start 2 Tasks
    +    val tasks = createTasks(2, Array("1"))
    +    tasks.foreach { task =>
    +      listener.onTaskStart(SparkListenerTaskStart(stage1.stageId, stage1.attemptNumber, task))
    +    }
    +
    +    // Task 1 Finished
    +    time += 1
    +    tasks(0).markFinished(TaskState.FINISHED, time)
    +    listener.onTaskEnd(
    +      SparkListenerTaskEnd(stage1.stageId, stage1.attemptId, "taskType", Success, tasks(0), null))
    +
    +    // Stage 1 Completed
    +    stage1.failureReason = Some("Failed")
    +    listener.onStageCompleted(SparkListenerStageCompleted(stage1))
    +
    +    // Stop job 1
    +    time += 1
    +    listener.onJobEnd(SparkListenerJobEnd(1, time, JobSucceeded))
    +
    +    // Task 2 Killed
    +    time += 1
    +    tasks(1).markFinished(TaskState.FINISHED, time)
    +    listener.onTaskEnd(
    +      SparkListenerTaskEnd(stage1.stageId, stage1.attemptId, "taskType",
    +        TaskKilled(reason = "Killed"), tasks(1), null))
    +
    +    // Ensure killed task metrics are updated
    +    val allStages = store.view(classOf[StageDataWrapper]).reverse().asScala.map(_.info)
    +    val failedStages = allStages.filter(_.status == v1.StageStatus.FAILED)
    +    assert(failedStages.size == 1)
    +    assert(failedStages.head.numKilledTasks == 1)
    +    assert(failedStages.head.numCompleteTasks == 1)
    +
    +    val allJobs = store.view(classOf[JobDataWrapper]).reverse().asScala.map(_.info)
    +    assert(allJobs.size == 1)
    --- End diff --
    
    Sorry, I couldn't respond on Friday.
    
    So, the above code has been replaced with "conditionalLiveUpdate(job, now, removeStage)" in my PR. This means that if the taskEnd event is the last event and stage has already completed, the job will be updated instantaneously. If it is the last event but stage has not completed yet, then the onStageCompleted event will update the job instantaneously.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] HistoryServer does not display metri...

Posted by ankuriitg <gi...@git.apache.org>.
Github user ankuriitg commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    @vanzin @squito 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r215343938
  
    --- Diff: core/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala ---
    @@ -1190,6 +1190,61 @@ class AppStatusListenerSuite extends SparkFunSuite with BeforeAndAfter {
         assert(appStore.asOption(appStore.lastStageAttempt(3)) === None)
       }
     
    +  test("SPARK-24415: update metrics for tasks that finish late") {
    +    val listener = new AppStatusListener(store, conf, true)
    +
    +    val stage1 = new StageInfo(1, 0, "stage1", 4, Nil, Nil, "details1")
    +    val stage2 = new StageInfo(2, 0, "stage2", 4, Nil, Nil, "details2")
    +
    +    // Start job
    +    listener.onJobStart(SparkListenerJobStart(1, time, Seq(stage1, stage2), null))
    +
    +    // Start 2 stages
    +    listener.onStageSubmitted(SparkListenerStageSubmitted(stage1, new Properties()))
    +    listener.onStageSubmitted(SparkListenerStageSubmitted(stage2, new Properties()))
    +
    +    // Start 2 Tasks
    +    val tasks = createTasks(2, Array("1"))
    +    tasks.foreach { task =>
    +      listener.onTaskStart(SparkListenerTaskStart(stage1.stageId, stage1.attemptNumber, task))
    +    }
    +
    +    // Task 1 Finished
    +    time += 1
    +    tasks(0).markFinished(TaskState.FINISHED, time)
    +    listener.onTaskEnd(
    +      SparkListenerTaskEnd(stage1.stageId, stage1.attemptId, "taskType", Success, tasks(0), null))
    +
    +    // Stage 1 Completed
    +    stage1.failureReason = Some("Failed")
    +    listener.onStageCompleted(SparkListenerStageCompleted(stage1))
    +
    +    // Stop job 1
    +    time += 1
    +    listener.onJobEnd(SparkListenerJobEnd(1, time, JobSucceeded))
    +
    +    // Task 2 Killed
    +    time += 1
    +    tasks(1).markFinished(TaskState.FINISHED, time)
    +    listener.onTaskEnd(
    +      SparkListenerTaskEnd(stage1.stageId, stage1.attemptId, "taskType",
    +        TaskKilled(reason = "Killed"), tasks(1), null))
    +
    +    // Ensure killed task metrics are updated
    +    val allStages = store.view(classOf[StageDataWrapper]).reverse().asScala.map(_.info)
    +    val failedStages = allStages.filter(_.status == v1.StageStatus.FAILED)
    +    assert(failedStages.size == 1)
    +    assert(failedStages.head.numKilledTasks == 1)
    +    assert(failedStages.head.numCompleteTasks == 1)
    +
    +    val allJobs = store.view(classOf[JobDataWrapper]).reverse().asScala.map(_.info)
    +    assert(allJobs.size == 1)
    --- End diff --
    
    Ah, missed that. Took another look and it should be fine.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the issue where stage page agg...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    **[Test build #95218 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95218/testReport)** for PR 22209 at commit [`55637c3`](https://github.com/apache/spark/commit/55637c3622c37b374f2c68f9cd546f4f94e6084c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95358/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the issue where stage p...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r212468772
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -350,11 +350,16 @@ private[spark] class AppStatusListener(
             val e = it.next()
             if (job.stageIds.contains(e.getKey()._1)) {
               val stage = e.getValue()
    -          stage.status = v1.StageStatus.SKIPPED
    -          job.skippedStages += stage.info.stageId
    -          job.skippedTasks += stage.info.numTasks
    -          it.remove()
    -          update(stage, now)
    +          // Only update the stage if it has not finished already
    +          if (v1.StageStatus.ACTIVE.equals(stage.status) ||
    +              v1.StageStatus.PENDING.equals(stage.status)) {
    +            stage.status = v1.StageStatus.SKIPPED
    +            job.skippedStages += stage.info.stageId
    +            job.skippedTasks += stage.info.numTasks
    +            job.activeStages -= 1
    +            it.remove()
    --- End diff --
    
    removal from iterator should always happen


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Merging to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] HistoryServer does not display metri...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    PR title should describe the fix, not the problem. The problem is already described on the bug.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    **[Test build #95218 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95218/testReport)** for PR 22209 at commit [`55637c3`](https://github.com/apache/spark/commit/55637c3622c37b374f2c68f9cd546f4f94e6084c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the issue where stage p...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r212705093
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -608,14 +627,20 @@ private[spark] class AppStatusListener(
           }
     
           stage.executorSummaries.values.foreach(update(_, now))
    -      update(stage, now, last = true)
     
           val executorIdsForStage = stage.blackListedExecutors
           executorIdsForStage.foreach { executorId =>
             liveExecutors.get(executorId).foreach { exec =>
               removeBlackListedStageFrom(exec, event.stageInfo.stageId, now)
             }
           }
    +
    +      // Remove stage only if there are no active tasks remaining
    +      val removeStage = stage.activeTasks == 0
    +      update(stage, now, removeStage)
    --- End diff --
    
    `last = removeStage`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    **[Test build #95382 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95382/testReport)** for PR 22209 at commit [`76f1801`](https://github.com/apache/spark/commit/76f180181261a2d7adcce27c40bfb9126c094bc5).
     * This patch **fails from timeout after a configured wait of \`400m\`**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    I guess it should be ok. Just trying to be conservative and not inadvertently making the branch less stable in the middle of an rc cycle...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the issue where stage p...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r212708646
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -350,11 +350,16 @@ private[spark] class AppStatusListener(
             val e = it.next()
             if (job.stageIds.contains(e.getKey()._1)) {
               val stage = e.getValue()
    -          stage.status = v1.StageStatus.SKIPPED
    -          job.skippedStages += stage.info.stageId
    -          job.skippedTasks += stage.info.numTasks
    -          it.remove()
    -          update(stage, now)
    +          // Only update the stage if it has not finished already
    +          if (v1.StageStatus.ACTIVE.equals(stage.status) ||
    +              v1.StageStatus.PENDING.equals(stage.status)) {
    +            stage.status = v1.StageStatus.SKIPPED
    +            job.skippedStages += stage.info.stageId
    +            job.skippedTasks += stage.info.numTasks
    +            job.activeStages -= 1
    +            it.remove()
    --- End diff --
    
    In your question, this == this PR?
    
    If so, no, that's not what it's fixing. Task end events can "naturally" arrive after the stage end event in the case of a stage failure, and this code was missing that case.
    
    When event drops occur, a lot of things get out of sync, and this change wouldn't fix that. It perhaps could make it a little worse: if a task end event does not arrive, then maybe with this change the stage will never be actually removed from the live stages map. Not sure how easy it would be to recover from that though, since dropped events could probably cause other sorts of leaks in this class too, but I also feel that's a separate issue.
    
    (Also, hopefully, dropped events for this listener should be less common in 2.3 after the listener bus changes.)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

Posted by ankuriitg <gi...@git.apache.org>.
Github user ankuriitg commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r213495693
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -350,11 +350,22 @@ private[spark] class AppStatusListener(
             val e = it.next()
             if (job.stageIds.contains(e.getKey()._1)) {
               val stage = e.getValue()
    -          stage.status = v1.StageStatus.SKIPPED
    -          job.skippedStages += stage.info.stageId
    -          job.skippedTasks += stage.info.numTasks
    -          it.remove()
    -          update(stage, now)
    +          // Only update the stage if it has not finished already
    +          if (v1.StageStatus.ACTIVE.equals(stage.status) ||
    --- End diff --
    
    Done. Updating the stage in onJobEnd event only if it is in Pending status.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] HistoryServer does not display metri...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    **[Test build #95305 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95305/testReport)** for PR 22209 at commit [`0552af0`](https://github.com/apache/spark/commit/0552af0abb484c1b9129a0091b2057e06d5ab4ac).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    **[Test build #95389 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95389/testReport)** for PR 22209 at commit [`dccbf36`](https://github.com/apache/spark/commit/dccbf36a2041052da7489f301abce3fda3a845ef).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

Posted by ankuriitg <gi...@git.apache.org>.
Github user ankuriitg commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r215345106
  
    --- Diff: streaming/src/test/scala/org/apache/spark/streaming/UISeleniumSuite.scala ---
    @@ -77,7 +77,14 @@ class UISeleniumSuite
         inputStream.foreachRDD { rdd =>
           rdd.foreach(_ => {})
           try {
    -        rdd.foreach(_ => throw new RuntimeException("Oops"))
    +        rdd.foreach { _ =>
    +          import org.apache.spark.TaskContext
    --- End diff --
    
    Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

Posted by juliuszsompolski <gi...@git.apache.org>.
Github user juliuszsompolski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r214381992
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -350,11 +350,20 @@ private[spark] class AppStatusListener(
             val e = it.next()
             if (job.stageIds.contains(e.getKey()._1)) {
               val stage = e.getValue()
    -          stage.status = v1.StageStatus.SKIPPED
    -          job.skippedStages += stage.info.stageId
    -          job.skippedTasks += stage.info.numTasks
    -          it.remove()
    -          update(stage, now)
    +          if (v1.StageStatus.PENDING.equals(stage.status)) {
    --- End diff --
    
    In the previous behaviour, it would have marked the stages that were ACTIVE as SKIPPED, which now will not happen here anymore.
    It looks like the code in `onStageCompleted` may handle that in the `stage.status = event.stageInfo.failureReason match {` handles that. Is that the case?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    **[Test build #95233 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95233/testReport)** for PR 22209 at commit [`70678dc`](https://github.com/apache/spark/commit/70678dc957255d459ba0b5c3960f8a6c8767f50d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    any reason not to merge to 2.3?  its a bug in 2.3


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

Posted by juliuszsompolski <gi...@git.apache.org>.
Github user juliuszsompolski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r214320305
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -350,11 +350,20 @@ private[spark] class AppStatusListener(
             val e = it.next()
             if (job.stageIds.contains(e.getKey()._1)) {
               val stage = e.getValue()
    -          stage.status = v1.StageStatus.SKIPPED
    -          job.skippedStages += stage.info.stageId
    -          job.skippedTasks += stage.info.numTasks
    -          it.remove()
    -          update(stage, now)
    +          if (v1.StageStatus.PENDING.equals(stage.status)) {
    --- End diff --
    
    Is there nothing to be done here if StageStatus is ACTIVE?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    **[Test build #95223 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95223/testReport)** for PR 22209 at commit [`69497a2`](https://github.com/apache/spark/commit/69497a2272c657145b5147edff6a227082bf9b15).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95229/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r213476910
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -350,11 +350,22 @@ private[spark] class AppStatusListener(
             val e = it.next()
             if (job.stageIds.contains(e.getKey()._1)) {
               val stage = e.getValue()
    -          stage.status = v1.StageStatus.SKIPPED
    -          job.skippedStages += stage.info.stageId
    -          job.skippedTasks += stage.info.numTasks
    -          it.remove()
    -          update(stage, now)
    +          // Only update the stage if it has not finished already
    +          if (v1.StageStatus.ACTIVE.equals(stage.status) ||
    --- End diff --
    
    I think that there are still two things here:
    
    - according to code in DAGScheduler#handleTaskCompletion, it seems like it's possible for a job to be marked finished before all task end events arrive. The job is marked finished as soon as all partitions are computed, so if you have e.g. speculative tasks you may miss things by removing the stage here.
    
    - re-reading the code again, the pool only really needs to be updated in the pending case, since other cases are handled in `onStageCompleted` already.
    
    This will cause that leak that I mentioned before, though, where missing events makes things remains in the live lists forever. But that's already a problem with this code, and we should look at all of those as a separate task.
    
    So I think it's more correct to only do anything for pending stages here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r213143804
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -350,11 +350,22 @@ private[spark] class AppStatusListener(
             val e = it.next()
             if (job.stageIds.contains(e.getKey()._1)) {
               val stage = e.getValue()
    -          stage.status = v1.StageStatus.SKIPPED
    -          job.skippedStages += stage.info.stageId
    -          job.skippedTasks += stage.info.numTasks
    -          it.remove()
    -          update(stage, now)
    +          // Only update the stage if it has not finished already
    +          if (v1.StageStatus.ACTIVE.equals(stage.status) ||
    --- End diff --
    
    So I went back and took a closer look and I think this isn't entirely correct (and wasn't entirely correct before either).
    
    If I remember the semantics correctly, the stage should be skipped if it is part of the job's stages, and is in the pending state when the job finishes.
    
    If it's in the active state, it should not be marked as skipped. If you do that, the update to the skipped tasks (in L358) will most certainly be wrong.
    
    So if the state is still active here, it means some event was missed. The best we can do in that case, I think, is remove it from the live stages list and update the pool data, and that's it.
    
    On a related note, if the "onStageSubmitted" event is missed, the stage will remain in the "pending" state even if tasks start on it. Perhaps that could also be added to the "onTaskStart" handler, just to be sure the stage is marked as active.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] HistoryServer does not display metri...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    **[Test build #95431 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95431/testReport)** for PR 22209 at commit [`dccbf36`](https://github.com/apache/spark/commit/dccbf36a2041052da7489f301abce3fda3a845ef).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    **[Test build #95233 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95233/testReport)** for PR 22209 at commit [`70678dc`](https://github.com/apache/spark/commit/70678dc957255d459ba0b5c3960f8a6c8767f50d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r214410470
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -350,11 +350,20 @@ private[spark] class AppStatusListener(
             val e = it.next()
             if (job.stageIds.contains(e.getKey()._1)) {
               val stage = e.getValue()
    -          stage.status = v1.StageStatus.SKIPPED
    -          job.skippedStages += stage.info.stageId
    -          job.skippedTasks += stage.info.numTasks
    -          it.remove()
    -          update(stage, now)
    +          if (v1.StageStatus.PENDING.equals(stage.status)) {
    --- End diff --
    
    > In the previous behaviour, it would have marked the stages that were ACTIVE as SKIPPED
    
    That sounds right, but that was also incorrect behavior. I think we need to take a look at all the "what happens when events are dropped" cases, and that's better done separately. It's much less likely for events to be dropped here nowadays...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    **[Test build #95382 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95382/testReport)** for PR 22209 at commit [`76f1801`](https://github.com/apache/spark/commit/76f180181261a2d7adcce27c40bfb9126c094bc5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r213779487
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -506,7 +516,16 @@ private[spark] class AppStatusListener(
           if (killedDelta > 0) {
             stage.killedSummary = killedTasksSummary(event.reason, stage.killedSummary)
           }
    -      maybeUpdate(stage, now)
    +      // Remove stage if there are no active tasks left and stage is already finished
    --- End diff --
    
    Comment just repeats the code. You may want to explain when this happens, otherwise the comment is unnecessary.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95431/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the issue where stage p...

Posted by ankuriitg <gi...@git.apache.org>.
Github user ankuriitg commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r212498716
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -350,11 +350,16 @@ private[spark] class AppStatusListener(
             val e = it.next()
             if (job.stageIds.contains(e.getKey()._1)) {
               val stage = e.getValue()
    -          stage.status = v1.StageStatus.SKIPPED
    -          job.skippedStages += stage.info.stageId
    -          job.skippedTasks += stage.info.numTasks
    -          it.remove()
    -          update(stage, now)
    +          // Only update the stage if it has not finished already
    +          if (v1.StageStatus.ACTIVE.equals(stage.status) ||
    +              v1.StageStatus.PENDING.equals(stage.status)) {
    +            stage.status = v1.StageStatus.SKIPPED
    +            job.skippedStages += stage.info.stageId
    +            job.skippedTasks += stage.info.numTasks
    +            job.activeStages -= 1
    +            it.remove()
    --- End diff --
    
    I think the assumption here is that we will always receive onStageCompleted event before onJobEvent. If that does not occur for some reason, then any active stages are marked as skipped.
    I don't know the scenario when onStageCompleted event is not received before onJobEnd event (or received at all). Let me look further into it. Additionally, I will also fix the bug for updating pool.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

Posted by ankuriitg <gi...@git.apache.org>.
Github user ankuriitg commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r213859611
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -506,7 +516,16 @@ private[spark] class AppStatusListener(
           if (killedDelta > 0) {
             stage.killedSummary = killedTasksSummary(event.reason, stage.killedSummary)
           }
    -      maybeUpdate(stage, now)
    +      // Remove stage if there are no active tasks left and stage is already finished
    --- End diff --
    
    Fixed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the issue where stage page agg...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    **[Test build #95183 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95183/testReport)** for PR 22209 at commit [`cd1e856`](https://github.com/apache/spark/commit/cd1e8564cd04ae6d94857608b87ace3fef975136).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    **[Test build #95389 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95389/testReport)** for PR 22209 at commit [`dccbf36`](https://github.com/apache/spark/commit/dccbf36a2041052da7489f301abce3fda3a845ef).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    pulled into 2.3


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95223/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

Posted by ankuriitg <gi...@git.apache.org>.
Github user ankuriitg commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r213372734
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -350,11 +350,22 @@ private[spark] class AppStatusListener(
             val e = it.next()
             if (job.stageIds.contains(e.getKey()._1)) {
               val stage = e.getValue()
    -          stage.status = v1.StageStatus.SKIPPED
    -          job.skippedStages += stage.info.stageId
    -          job.skippedTasks += stage.info.numTasks
    -          it.remove()
    -          update(stage, now)
    +          // Only update the stage if it has not finished already
    +          if (v1.StageStatus.ACTIVE.equals(stage.status) ||
    --- End diff --
    
    I changed the commit to mark a stage as skipped only if it is in Pending stage.
    
    I did not update the "onTaskStart" to handle missed "onStageSubmitted" event as that can be a part of a different review, if required.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the issue where stage page agg...

Posted by ankuriitg <gi...@git.apache.org>.
Github user ankuriitg commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Just checked, this fixes the Spark History Server issue (SPARK-24539) as well, I just needed to restart SHS in order for my changes to take affect.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r213143932
  
    --- Diff: streaming/src/test/scala/org/apache/spark/streaming/UISeleniumSuite.scala ---
    @@ -77,7 +77,14 @@ class UISeleniumSuite
         inputStream.foreachRDD { rdd =>
           rdd.foreach(_ => {})
           try {
    -        rdd.foreach(_ => throw new RuntimeException("Oops"))
    +        rdd.foreach(_ => {
    --- End diff --
    
    Since you're touching this: `.foreach { _ =>`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    **[Test build #95444 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95444/testReport)** for PR 22209 at commit [`b164bb1`](https://github.com/apache/spark/commit/b164bb1fee5435269161d7e83f0cf3901ff24402).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

Posted by ankuriitg <gi...@git.apache.org>.
Github user ankuriitg commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r214384020
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -350,11 +350,20 @@ private[spark] class AppStatusListener(
             val e = it.next()
             if (job.stageIds.contains(e.getKey()._1)) {
               val stage = e.getValue()
    -          stage.status = v1.StageStatus.SKIPPED
    -          job.skippedStages += stage.info.stageId
    -          job.skippedTasks += stage.info.numTasks
    -          it.remove()
    -          update(stage, now)
    +          if (v1.StageStatus.PENDING.equals(stage.status)) {
    --- End diff --
    
    Yes, that updates the status of a stage. Additionally, completed stages are removed from liveStages list after that or onTaskEnd event.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

Posted by ankuriitg <gi...@git.apache.org>.
Github user ankuriitg commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r214372050
  
    --- Diff: core/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala ---
    @@ -1190,6 +1190,61 @@ class AppStatusListenerSuite extends SparkFunSuite with BeforeAndAfter {
         assert(appStore.asOption(appStore.lastStageAttempt(3)) === None)
       }
     
    +  test("SPARK-24415: update metrics for tasks that finish late") {
    +    val listener = new AppStatusListener(store, conf, true)
    +
    +    val stage1 = new StageInfo(1, 0, "stage1", 4, Nil, Nil, "details1")
    +    val stage2 = new StageInfo(2, 0, "stage2", 4, Nil, Nil, "details2")
    +
    +    // Start job
    +    listener.onJobStart(SparkListenerJobStart(1, time, Seq(stage1, stage2), null))
    +
    +    // Start 2 stages
    +    listener.onStageSubmitted(SparkListenerStageSubmitted(stage1, new Properties()))
    +    listener.onStageSubmitted(SparkListenerStageSubmitted(stage2, new Properties()))
    +
    +    // Start 2 Tasks
    +    val tasks = createTasks(2, Array("1"))
    +    tasks.foreach { task =>
    +      listener.onTaskStart(SparkListenerTaskStart(stage1.stageId, stage1.attemptNumber, task))
    +    }
    +
    +    // Task 1 Finished
    +    time += 1
    +    tasks(0).markFinished(TaskState.FINISHED, time)
    +    listener.onTaskEnd(
    +      SparkListenerTaskEnd(stage1.stageId, stage1.attemptId, "taskType", Success, tasks(0), null))
    +
    +    // Stage 1 Completed
    +    stage1.failureReason = Some("Failed")
    +    listener.onStageCompleted(SparkListenerStageCompleted(stage1))
    +
    +    // Stop job 1
    +    time += 1
    +    listener.onJobEnd(SparkListenerJobEnd(1, time, JobSucceeded))
    +
    +    // Task 2 Killed
    +    time += 1
    +    tasks(1).markFinished(TaskState.FINISHED, time)
    +    listener.onTaskEnd(
    +      SparkListenerTaskEnd(stage1.stageId, stage1.attemptId, "taskType",
    +        TaskKilled(reason = "Killed"), tasks(1), null))
    +
    +    // Ensure killed task metrics are updated
    +    val allStages = store.view(classOf[StageDataWrapper]).reverse().asScala.map(_.info)
    +    val failedStages = allStages.filter(_.status == v1.StageStatus.FAILED)
    +    assert(failedStages.size == 1)
    +    assert(failedStages.head.numKilledTasks == 1)
    +    assert(failedStages.head.numCompleteTasks == 1)
    +
    +    val allJobs = store.view(classOf[JobDataWrapper]).reverse().asScala.map(_.info)
    +    assert(allJobs.size == 1)
    --- End diff --
    
    In this PR, I am only trying to fix any issues which are caused by out-of-order events leaving the ones caused by dropped events
    
    Now given that information, job will always be updated whether task completion event is received last, or the stage completion event or the job completion event (because of the weirdness that you mentioned above). Please let me know if that is not correct.
    
    I will create a separate JIRA which handles dropped events.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95382/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    **[Test build #95431 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95431/testReport)** for PR 22209 at commit [`dccbf36`](https://github.com/apache/spark/commit/dccbf36a2041052da7489f301abce3fda3a845ef).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the issue where stage p...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r212491921
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -350,11 +350,16 @@ private[spark] class AppStatusListener(
             val e = it.next()
             if (job.stageIds.contains(e.getKey()._1)) {
               val stage = e.getValue()
    -          stage.status = v1.StageStatus.SKIPPED
    -          job.skippedStages += stage.info.stageId
    -          job.skippedTasks += stage.info.numTasks
    -          it.remove()
    -          update(stage, now)
    +          // Only update the stage if it has not finished already
    +          if (v1.StageStatus.ACTIVE.equals(stage.status) ||
    +              v1.StageStatus.PENDING.equals(stage.status)) {
    +            stage.status = v1.StageStatus.SKIPPED
    +            job.skippedStages += stage.info.stageId
    +            job.skippedTasks += stage.info.numTasks
    +            job.activeStages -= 1
    +            it.remove()
    --- End diff --
    
    Btw, there is an existing bug that we are not updating pool, etc which we do in onStageCompleted ...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the issue where stage p...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r212703732
  
    --- Diff: core/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala ---
    @@ -1190,6 +1190,65 @@ class AppStatusListenerSuite extends SparkFunSuite with BeforeAndAfter {
         assert(appStore.asOption(appStore.lastStageAttempt(3)) === None)
       }
     
    +  test("stage and job metrics should be updated when stage completion event" +
    --- End diff --
    
    "update metrics for tasks that finish late". Same thing, much shorter. Could also add the bug number to the test name.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r212772176
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -350,11 +350,16 @@ private[spark] class AppStatusListener(
             val e = it.next()
             if (job.stageIds.contains(e.getKey()._1)) {
               val stage = e.getValue()
    -          stage.status = v1.StageStatus.SKIPPED
    -          job.skippedStages += stage.info.stageId
    -          job.skippedTasks += stage.info.numTasks
    -          it.remove()
    -          update(stage, now)
    +          // Only update the stage if it has not finished already
    +          if (v1.StageStatus.ACTIVE.equals(stage.status) ||
    +              v1.StageStatus.PENDING.equals(stage.status)) {
    +            stage.status = v1.StageStatus.SKIPPED
    +            job.skippedStages += stage.info.stageId
    +            job.skippedTasks += stage.info.numTasks
    +            job.activeStages -= 1
    +            it.remove()
    --- End diff --
    
    To clarify, I was referring to 'this' being job end event received before stage end (for a stage which is part of a job).
    
    I was not referring to task end event's (those can come in after stage or job end's).
    
    Thanks for clarifying @vanzin ... given the snippet is not trying to recover from events drop, wondering why "non"-skipped stages would even be in the list : I would expect all of them to be skipped ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] HistoryServer does not display metri...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the issue where stage page agg...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the issue where stage p...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r212490964
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -350,11 +350,16 @@ private[spark] class AppStatusListener(
             val e = it.next()
             if (job.stageIds.contains(e.getKey()._1)) {
               val stage = e.getValue()
    -          stage.status = v1.StageStatus.SKIPPED
    -          job.skippedStages += stage.info.stageId
    -          job.skippedTasks += stage.info.numTasks
    -          it.remove()
    -          update(stage, now)
    +          // Only update the stage if it has not finished already
    +          if (v1.StageStatus.ACTIVE.equals(stage.status) ||
    +              v1.StageStatus.PENDING.equals(stage.status)) {
    +            stage.status = v1.StageStatus.SKIPPED
    +            job.skippedStages += stage.info.stageId
    +            job.skippedTasks += stage.info.numTasks
    +            job.activeStages -= 1
    +            it.remove()
    --- End diff --
    
    I am not sure I follow - if that is the case, why are we doing this for active stages here ? onStageCompleted/onTaskEnd would be fired for active stages as well.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    **[Test build #95305 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95305/testReport)** for PR 22209 at commit [`0552af0`](https://github.com/apache/spark/commit/0552af0abb484c1b9129a0091b2057e06d5ab4ac).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    **[Test build #95223 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95223/testReport)** for PR 22209 at commit [`69497a2`](https://github.com/apache/spark/commit/69497a2272c657145b5147edff6a227082bf9b15).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95218/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    **[Test build #95229 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95229/testReport)** for PR 22209 at commit [`70678dc`](https://github.com/apache/spark/commit/70678dc957255d459ba0b5c3960f8a6c8767f50d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95305/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the issue where stage p...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r212704521
  
    --- Diff: core/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala ---
    @@ -1190,6 +1190,65 @@ class AppStatusListenerSuite extends SparkFunSuite with BeforeAndAfter {
         assert(appStore.asOption(appStore.lastStageAttempt(3)) === None)
       }
     
    +  test("stage and job metrics should be updated when stage completion event" +
    +    " is received before task end event") {
    +    val listener = new AppStatusListener(store, conf, true)
    +
    +    val stage1 = new StageInfo(1, 0, "stage1", 4, Nil, Nil, "details1")
    +    val stage2 = new StageInfo(2, 0, "stage2", 4, Nil, Nil, "details2")
    +
    +    // Start job
    +    listener.onJobStart(SparkListenerJobStart(1, time, Seq(stage1, stage2), null))
    +
    +    // Start 2 stages
    +    listener.onStageSubmitted(SparkListenerStageSubmitted(stage1, new Properties()))
    +    listener.onStageSubmitted(SparkListenerStageSubmitted(stage2, new Properties()))
    +
    +    // Start 2 Tasks
    +    val tasks = createTasks(2, Array("1"))
    +    tasks.foreach { task =>
    +      listener.onTaskStart(SparkListenerTaskStart(stage1.stageId, stage1.attemptNumber, task))
    +    }
    +
    +    // Task 1 Finished
    +    time += 1
    +    tasks(0).markFinished(TaskState.FINISHED, time)
    +    listener.onTaskEnd(
    +      SparkListenerTaskEnd(stage1.stageId, stage1.attemptId, "taskType", Success, tasks(0), null))
    +
    +    // Stage 1 Completed
    +    stage1.failureReason = Some("Failed")
    +    listener.onStageCompleted(SparkListenerStageCompleted(stage1))
    +
    +    // Stop job 1 and stage 2 will become SKIPPED
    +    time += 1
    +    listener.onJobEnd(SparkListenerJobEnd(1, time, JobSucceeded))
    +
    +    // Task 2 Killed
    +    time += 1
    +    tasks(1).markFinished(TaskState.FINISHED, time)
    +    listener.onTaskEnd(
    +      SparkListenerTaskEnd(stage1.stageId, stage1.attemptId, "taskType",
    +        TaskKilled(reason = "Killed"), tasks(1), null))
    +
    +    // Ensure killed task metrics are updated
    +    val allStages = store.view(classOf[StageDataWrapper]).reverse().asScala.map(_.info)
    +    assert(allStages.filter(_.status == v1.StageStatus.SKIPPED).size == 1)
    +    val failedStages = allStages.filter(_.status == v1.StageStatus.FAILED)
    +    assert(failedStages.size == 1)
    +    assert(failedStages.head.numKilledTasks == 1)
    +    assert(failedStages.head.numCompleteTasks == 1)
    +
    +    // Ensure killed task metrics are updated
    --- End diff --
    
    This is the same comment as above, either remove this one or update each comment to say where the check it occurring. (I prefer the former.)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r213778328
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -350,11 +350,21 @@ private[spark] class AppStatusListener(
             val e = it.next()
             if (job.stageIds.contains(e.getKey()._1)) {
               val stage = e.getValue()
    -          stage.status = v1.StageStatus.SKIPPED
    -          job.skippedStages += stage.info.stageId
    -          job.skippedTasks += stage.info.numTasks
    -          it.remove()
    -          update(stage, now)
    +          // Mark the stage as skipped if in Pending status
    --- End diff --
    
    Unnecessary comment. Just repeats the code.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the issue where stage page agg...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95183/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95389/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    **[Test build #95358 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95358/testReport)** for PR 22209 at commit [`76f1801`](https://github.com/apache/spark/commit/76f180181261a2d7adcce27c40bfb9126c094bc5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by bersprockets <gi...@git.apache.org>.
Github user bersprockets commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    Looks like test failed due to https://issues.apache.org/jira/browse/SPARK-23622


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

Posted by bersprockets <gi...@git.apache.org>.
Github user bersprockets commented on the issue:

    https://github.com/apache/spark/pull/22209
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org