You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by JoshRosen <gi...@git.apache.org> on 2015/04/07 19:48:29 UTC

[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...

GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/5397

    [SPARK-6737] Fix memory leak in OutputCommitCoordinator

    This patch fixes a memory leak in the DAGScheduler, which caused us to leak a map entry per submitted stage.  The problem is that the OutputCommitCoordinator needs to be informed when stages end in order to remove entries from its `authorizedCommitters` map, but the DAGScheduler only notified it of stage completion in one of the four code paths that are used to mark stages as completed.
    
    This patch fixes this issue and updates DAGSchedulerSuite's `assertDataStructuresEmpty` assertion to also check the OutputCommitCoordinator data structures.  I've also added a comment at the top of DAGScheduler so that we remember to update this test when adding new data structures.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark SPARK-6737

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5397.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5397
    
----
commit 4ead1dc9410cecb4dcd3230980f0822fa90f436b
Author: Josh Rosen <jo...@databricks.com>
Date:   2015-04-07T17:42:29Z

    Add regression tests for SPARK-6737

commit 789689944c34928616e66500d9d9518ccf3e31dc
Author: Josh Rosen <jo...@databricks.com>
Date:   2015-04-07T17:43:08Z

    Fix SPARK-6737 by informing OutputCommitCoordinator of all stage end events.

commit 3052aeacc3fa4d042aa623a7d85f2fd7e43628fe
Author: Josh Rosen <jo...@databricks.com>
Date:   2015-04-07T17:45:20Z

    Comment update

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/5397#issuecomment-90749819
  
    LGTM too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/5397#issuecomment-90758825
  
    Actually, it appears that the merge conflicts with `branch-1.3` are trivial to resolve, so I'll perform a cherry pick, run the tests locally, the monitor Jenkins for the backport commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5397#discussion_r27906516
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ---
    @@ -709,9 +713,10 @@ class DAGScheduler(
           // cancelling the stages because if the DAG scheduler is stopped, the entire application
           // is in the process of getting stopped.
           val stageFailedMessage = "Stage cancelled because SparkContext was shut down"
    -      runningStages.foreach { stage =>
    -        stage.latestInfo.stageFailed(stageFailedMessage)
    -        listenerBus.post(SparkListenerStageCompleted(stage.latestInfo))
    +      // The `toArray` here is necessary so that we don't iterate over `runningStages` while
    +      // mutating it.
    +      runningStages.toArray.foreach { stage =>
    --- End diff --
    
    You could avoid the comment by doing:
    
        while (!runningStages.isEmpty) markStageAsFinished(runningStages.last)
    
    But either is fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5397#discussion_r27903878
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ---
    @@ -1264,6 +1270,7 @@ class DAGScheduler(
                 try { // cancelTasks will fail if a SchedulerBackend does not implement killTask
                   taskScheduler.cancelTasks(stageId, shouldInterruptThread)
                   stage.latestInfo.stageFailed(failureReason)
    +              outputCommitCoordinator.stageEnd(stage.id)
    --- End diff --
    
    It actually looks like `handleTaskCompletion` has a nested `markStageAsFinished` method that looks like it should do this. That method also removes the stage from `runningStages`, which doesn't appear to happen in all of the paths where we post `SparkListenerStageCompleted`.  Let me take a look at this and see whether there's a safe way to refactor things to use this method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5397#issuecomment-90711303
  
      [Test build #29806 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29806/consoleFull) for   PR 5397 at commit [`af3b02f`](https://github.com/apache/spark/commit/af3b02f746f9e39b03614a05cf07a39fc0494488).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/5397


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/5397#issuecomment-90691924
  
    LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5397#discussion_r27903465
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ---
    @@ -1264,6 +1270,7 @@ class DAGScheduler(
                 try { // cancelTasks will fail if a SchedulerBackend does not implement killTask
                   taskScheduler.cancelTasks(stageId, shouldInterruptThread)
                   stage.latestInfo.stageFailed(failureReason)
    +              outputCommitCoordinator.stageEnd(stage.id)
    --- End diff --
    
    Wouldn't it be better to put these three lines (at least) in a separate method:
    
        private def endStage(stage: Stage): Unit = {
          stage.latestInfo.stageFailed(failureReason)
          outputCommitCoordinator.stageEnd(stage.id)
          listenerBus.post(SparkListenerStageCompleted(stage.latestInfo))
        }
    
    That would make it harder to miss things like this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5397#issuecomment-90673566
  
      [Test build #29803 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29803/consoleFull) for   PR 5397 at commit [`3052aea`](https://github.com/apache/spark/commit/3052aeacc3fa4d042aa623a7d85f2fd7e43628fe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5397#issuecomment-90705180
  
      [Test build #29803 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29803/consoleFull) for   PR 5397 at commit [`3052aea`](https://github.com/apache/spark/commit/3052aeacc3fa4d042aa623a7d85f2fd7e43628fe).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/5397#issuecomment-90688600
  
    @vanzin, I've refactored this code to extract the stage completion code into a new `markStageAsFinished` method.  There's one small change in behavior that I'll comment on inline.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5397#issuecomment-90705192
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29803/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5397#issuecomment-90689047
  
      [Test build #29806 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29806/consoleFull) for   PR 5397 at commit [`af3b02f`](https://github.com/apache/spark/commit/af3b02f746f9e39b03614a05cf07a39fc0494488).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/5397#issuecomment-90757996
  
    Thanks for reviewing.  I'm going to merge this into master and open a separate PR to backport to branch-1.3.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5397#discussion_r27905557
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ---
    @@ -1098,7 +1085,6 @@ class DAGScheduler(
               logInfo(s"Marking $failedStage (${failedStage.name}) as failed " +
                 s"due to a fetch failure from $mapStage (${mapStage.name})")
               markStageAsFinished(failedStage, Some(failureMessage))
    -          runningStages -= failedStage
    --- End diff --
    
    This was redundant, since `markStagesAsFinished` removes the stage from `runningStages`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5397#issuecomment-90711339
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29806/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5397#discussion_r27905686
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ---
    @@ -1263,8 +1269,7 @@ class DAGScheduler(
               if (runningStages.contains(stage)) {
                 try { // cancelTasks will fail if a SchedulerBackend does not implement killTask
                   taskScheduler.cancelTasks(stageId, shouldInterruptThread)
    -              stage.latestInfo.stageFailed(failureReason)
    -              listenerBus.post(SparkListenerStageCompleted(stage.latestInfo))
    +              markStageAsFinished(stage, Some(failureReason))
    --- End diff --
    
    There's a slight change in behavior here: in the old code, we never removed the stage from `runningStages` here even though we posted a StageCompletion listener event.  I think this was probably a bug.  In this updated code, `markStagesAsFinished` will remove the stage from `runningStages`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org