You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by kayousterhout <gi...@git.apache.org> on 2014/07/24 07:01:23 UTC

[GitHub] spark pull request: [SPARK-1726] [SPARK-2567] Eliminate zombie sta...

GitHub user kayousterhout opened a pull request:

    https://github.com/apache/spark/pull/1566

    [SPARK-1726] [SPARK-2567] Eliminate zombie stages in UI.

    Due to problems with when we update runningStages (in DAGScheduler.scala)
    and how we decide to send a SparkListenerStageCompleted message to
    SparkListeners, sometimes stages can be shown as "running" in the UI forever
    (even after they have failed).  This issue can manifest when stages are
    resubmitted with 0 tasks, or when the DAGScheduler catches non-serializable
    tasks. The problem also resulted in a (small) memory leak in the DAGScheduler,
    where stages can stay in runningStages forever. This commit fixes
    that problem and adds a unit test.
    
    Thanks @tsudukim for helping to look into this issue!
    
    cc @markhamstra @rxin

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kayousterhout/spark-1 dag_fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1566.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1566
    
----
commit 217d74b45d8d34e6a030c08fc609213a7cf8130e
Author: Kay Ousterhout <ka...@gmail.com>
Date:   2014-07-24T03:17:09Z

    [SPARK-1726] [SPARK-2567] Eliminate zombie stages in UI.
    
    Due to problems with when we update runningStages (in DAGScheduler.scala)
    and how we decide to send a SparkListenerStageCompleted message to
    SparkListeners, somtimes stages can be shown as "running" in the UI forever
    (even after they have failed).  This issue can manifest when stages are
    resubmitted with 0 tasks, or when the DAGScheduler catches non-serializable
    tasks. The problem also resulted in a (small) memory leak in the DAGScheduler,
    where stages can stay in runningStages forever. This commit fixes
    that problem and adds a unit test.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1726] [SPARK-2567] Eliminate zombie sta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1566#issuecomment-49969799
  
    QA results for PR 1566:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17096/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1726] [SPARK-2567] Eliminate zombie sta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1566#issuecomment-49967900
  
    QA tests have started for PR 1566. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17096/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1726] [SPARK-2567] Eliminate zombie sta...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1566#issuecomment-50211023
  
    BTW I've merged this only into 1.1 because the patch didn't apply cleanly on 1.0. If you think it's important, we can also add it to 1.0.x, but it doesn't seem like that big of a showstopper.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1726] [SPARK-2567] Eliminate zombie sta...

Posted by markhamstra <gi...@git.apache.org>.
Github user markhamstra commented on the pull request:

    https://github.com/apache/spark/pull/1566#issuecomment-49968268
  
    Makes sense.  LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1726] [SPARK-2567] Eliminate zombie sta...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the pull request:

    https://github.com/apache/spark/pull/1566#issuecomment-49968323
  
    Thanks for the quick review @markhamstra !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1726] [SPARK-2567] Eliminate zombie sta...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the pull request:

    https://github.com/apache/spark/pull/1566#issuecomment-50246187
  
    Yeah that seems fine to me -- thanks Matei!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1726] [SPARK-2567] Eliminate zombie sta...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1566#issuecomment-50210960
  
    Looks good to me too. I've merged this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1726] [SPARK-2567] Eliminate zombie sta...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/1566


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1726] [SPARK-2567] Eliminate zombie sta...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1566#discussion_r15330105
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ---
    @@ -753,11 +752,14 @@ class DAGScheduler(
           null
         }
     
    -    // must be run listener before possible NotSerializableException
    -    // should be "StageSubmitted" first and then "JobEnded"
    -    listenerBus.post(SparkListenerStageSubmitted(stageToInfos(stage), properties))
    -
         if (tasks.size > 0) {
    +      runningStages += stage
    +      // SparkListenerStageSubmitted should be posted before testing whether tasks are
    +      // serializable. If tasks are not serializable, a SparkListenerStageCompleted event
    +      // will be posted, which should always come after a corresponding SparkListenerStageSubmitted
    +      // event.
    +      listenerBus.post(SparkListenerStageSubmitted(stageToInfos(stage), properties))
    --- End diff --
    
    I also moved this event inside the check for tasks.size being > 0 -- because we shouldn't tell the UI/listeners about a stage if it doesn't have any tasks and therefore won't be run.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1726] [SPARK-2567] Eliminate zombie sta...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1566#discussion_r15330087
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ---
    @@ -710,7 +710,6 @@ class DAGScheduler(
             if (missing == Nil) {
               logInfo("Submitting " + stage + " (" + stage.rdd + "), which has no missing parents")
               submitMissingTasks(stage, jobId.get)
    -          runningStages += stage
    --- End diff --
    
    So just to clarify what's going on here: prior to my change, we added a stage to runningStages here, after calling submitMissingTasks (so *after* the code I modified below gets executed).  This could lead to a memory leak (if the stage needed to be aborted in submitMissingTasks, due to a NotSerializableException for example, because then it would never be removed from runningStages).  It also meant that the DAGScheduler sent a SparkListenerStageSubmitted event to the UI, but never a SparkListenerStageCompleted (because, on line 1072, we only send a SparkListenerStageCompleted event if the stage is in runningStages).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---