You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by JoshRosen <gi...@git.apache.org> on 2015/04/07 19:48:29 UTC
[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...
GitHub user JoshRosen opened a pull request:
https://github.com/apache/spark/pull/5397
[SPARK-6737] Fix memory leak in OutputCommitCoordinator
This patch fixes a memory leak in the DAGScheduler, which caused us to leak a map entry per submitted stage. The problem is that the OutputCommitCoordinator needs to be informed when stages end in order to remove entries from its `authorizedCommitters` map, but the DAGScheduler only notified it of stage completion in one of the four code paths that are used to mark stages as completed.
This patch fixes this issue and updates DAGSchedulerSuite's `assertDataStructuresEmpty` assertion to also check the OutputCommitCoordinator data structures. I've also added a comment at the top of DAGScheduler so that we remember to update this test when adding new data structures.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/JoshRosen/spark SPARK-6737
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/5397.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5397
----
commit 4ead1dc9410cecb4dcd3230980f0822fa90f436b
Author: Josh Rosen <jo...@databricks.com>
Date: 2015-04-07T17:42:29Z
Add regression tests for SPARK-6737
commit 789689944c34928616e66500d9d9518ccf3e31dc
Author: Josh Rosen <jo...@databricks.com>
Date: 2015-04-07T17:43:08Z
Fix SPARK-6737 by informing OutputCommitCoordinator of all stage end events.
commit 3052aeacc3fa4d042aa623a7d85f2fd7e43628fe
Author: Josh Rosen <jo...@databricks.com>
Date: 2015-04-07T17:45:20Z
Comment update
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...
Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:
https://github.com/apache/spark/pull/5397#issuecomment-90749819
LGTM too.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/5397#issuecomment-90758825
Actually, it appears that the merge conflicts with `branch-1.3` are trivial to resolve, so I'll perform a cherry pick, run the tests locally, the monitor Jenkins for the backport commit.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...
Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/5397#discussion_r27906516
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ---
@@ -709,9 +713,10 @@ class DAGScheduler(
// cancelling the stages because if the DAG scheduler is stopped, the entire application
// is in the process of getting stopped.
val stageFailedMessage = "Stage cancelled because SparkContext was shut down"
- runningStages.foreach { stage =>
- stage.latestInfo.stageFailed(stageFailedMessage)
- listenerBus.post(SparkListenerStageCompleted(stage.latestInfo))
+ // The `toArray` here is necessary so that we don't iterate over `runningStages` while
+ // mutating it.
+ runningStages.toArray.foreach { stage =>
--- End diff --
You could avoid the comment by doing:
while (!runningStages.isEmpty) markStageAsFinished(runningStages.last)
But either is fine.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/5397#discussion_r27903878
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ---
@@ -1264,6 +1270,7 @@ class DAGScheduler(
try { // cancelTasks will fail if a SchedulerBackend does not implement killTask
taskScheduler.cancelTasks(stageId, shouldInterruptThread)
stage.latestInfo.stageFailed(failureReason)
+ outputCommitCoordinator.stageEnd(stage.id)
--- End diff --
It actually looks like `handleTaskCompletion` has a nested `markStageAsFinished` method that looks like it should do this. That method also removes the stage from `runningStages`, which doesn't appear to happen in all of the paths where we post `SparkListenerStageCompleted`. Let me take a look at this and see whether there's a safe way to refactor things to use this method.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5397#issuecomment-90711303
[Test build #29806 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29806/consoleFull) for PR 5397 at commit [`af3b02f`](https://github.com/apache/spark/commit/af3b02f746f9e39b03614a05cf07a39fc0494488).
* This patch **passes all tests**.
* This patch merges cleanly.
* This patch adds no public classes.
* This patch does not change any dependencies.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/5397
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...
Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:
https://github.com/apache/spark/pull/5397#issuecomment-90691924
LGTM.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...
Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/5397#discussion_r27903465
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ---
@@ -1264,6 +1270,7 @@ class DAGScheduler(
try { // cancelTasks will fail if a SchedulerBackend does not implement killTask
taskScheduler.cancelTasks(stageId, shouldInterruptThread)
stage.latestInfo.stageFailed(failureReason)
+ outputCommitCoordinator.stageEnd(stage.id)
--- End diff --
Wouldn't it be better to put these three lines (at least) in a separate method:
private def endStage(stage: Stage): Unit = {
stage.latestInfo.stageFailed(failureReason)
outputCommitCoordinator.stageEnd(stage.id)
listenerBus.post(SparkListenerStageCompleted(stage.latestInfo))
}
That would make it harder to miss things like this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5397#issuecomment-90673566
[Test build #29803 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29803/consoleFull) for PR 5397 at commit [`3052aea`](https://github.com/apache/spark/commit/3052aeacc3fa4d042aa623a7d85f2fd7e43628fe).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5397#issuecomment-90705180
[Test build #29803 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29803/consoleFull) for PR 5397 at commit [`3052aea`](https://github.com/apache/spark/commit/3052aeacc3fa4d042aa623a7d85f2fd7e43628fe).
* This patch **passes all tests**.
* This patch merges cleanly.
* This patch adds no public classes.
* This patch does not change any dependencies.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/5397#issuecomment-90688600
@vanzin, I've refactored this code to extract the stage completion code into a new `markStageAsFinished` method. There's one small change in behavior that I'll comment on inline.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5397#issuecomment-90705192
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29803/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5397#issuecomment-90689047
[Test build #29806 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29806/consoleFull) for PR 5397 at commit [`af3b02f`](https://github.com/apache/spark/commit/af3b02f746f9e39b03614a05cf07a39fc0494488).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/5397#issuecomment-90757996
Thanks for reviewing. I'm going to merge this into master and open a separate PR to backport to branch-1.3.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/5397#discussion_r27905557
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ---
@@ -1098,7 +1085,6 @@ class DAGScheduler(
logInfo(s"Marking $failedStage (${failedStage.name}) as failed " +
s"due to a fetch failure from $mapStage (${mapStage.name})")
markStageAsFinished(failedStage, Some(failureMessage))
- runningStages -= failedStage
--- End diff --
This was redundant, since `markStagesAsFinished` removes the stage from `runningStages`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5397#issuecomment-90711339
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29806/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6737] Fix memory leak in OutputCommitCo...
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/5397#discussion_r27905686
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ---
@@ -1263,8 +1269,7 @@ class DAGScheduler(
if (runningStages.contains(stage)) {
try { // cancelTasks will fail if a SchedulerBackend does not implement killTask
taskScheduler.cancelTasks(stageId, shouldInterruptThread)
- stage.latestInfo.stageFailed(failureReason)
- listenerBus.post(SparkListenerStageCompleted(stage.latestInfo))
+ markStageAsFinished(stage, Some(failureReason))
--- End diff --
There's a slight change in behavior here: in the old code, we never removed the stage from `runningStages` here even though we posted a StageCompletion listener event. I think this was probably a bug. In this updated code, `markStagesAsFinished` will remove the stage from `runningStages`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org