You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by we...@apache.org on 2018/02/02 04:21:13 UTC

spark git commit: [SPARK-23306] Fix the oom caused by contention

Repository: spark
Updated Branches:
  refs/heads/master 969eda4a0 -> b3a04283f


[SPARK-23306] Fix the oom caused by contention

## What changes were proposed in this pull request?

here is race condition in TaskMemoryManger, which may cause OOM.

The memory released may be taken by another task because there is a gap between releaseMemory and acquireMemory, e.g., UnifiedMemoryManager, causing the OOM. if the current is the only one that can perform spill. It can happen to BytesToBytesMap, as it only spill required bytes.

Loop on current consumer if it still has memory to release.

## How was this patch tested?

The race contention is hard to reproduce, but the current logic seems causing the issue.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: Zhan Zhang <zh...@fb.com>

Closes #20480 from zhzhan/oom.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b3a04283
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b3a04283
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b3a04283

Branch: refs/heads/master
Commit: b3a04283f490020c13b6750de021af734c449c3a
Parents: 969eda4
Author: Zhan Zhang <zh...@fb.com>
Authored: Fri Feb 2 12:21:06 2018 +0800
Committer: Wenchen Fan <we...@databricks.com>
Committed: Fri Feb 2 12:21:06 2018 +0800

----------------------------------------------------------------------
 .../java/org/apache/spark/memory/TaskMemoryManager.java   | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/b3a04283/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java
----------------------------------------------------------------------
diff --git a/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java b/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java
index 632d718..d07faf1 100644
--- a/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java
+++ b/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java
@@ -172,10 +172,7 @@ public class TaskMemoryManager {
             currentEntry = sortedConsumers.lastEntry();
           }
           List<MemoryConsumer> cList = currentEntry.getValue();
-          MemoryConsumer c = cList.remove(cList.size() - 1);
-          if (cList.isEmpty()) {
-            sortedConsumers.remove(currentEntry.getKey());
-          }
+          MemoryConsumer c = cList.get(cList.size() - 1);
           try {
             long released = c.spill(required - got, consumer);
             if (released > 0) {
@@ -185,6 +182,11 @@ public class TaskMemoryManager {
               if (got >= required) {
                 break;
               }
+            } else {
+              cList.remove(cList.size() - 1);
+              if (cList.isEmpty()) {
+                sortedConsumers.remove(currentEntry.getKey());
+              }
             }
           } catch (ClosedByInterruptException e) {
             // This called by user to kill a task (e.g: speculative task).


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org