You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by we...@apache.org on 2019/04/04 01:58:31 UTC

[spark] branch master updated: [SPARK-27338][CORE] Fix deadlock in UnsafeExternalSorter.SpillableIterator when locking both UnsafeExternalSorter.SpillableIterator and TaskMemoryManager

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 6c4552c6 [SPARK-27338][CORE] Fix deadlock in UnsafeExternalSorter.SpillableIterator when locking both UnsafeExternalSorter.SpillableIterator and TaskMemoryManager
6c4552c6 is described below

commit 6c4552c65045cfe82ed95212ee7cff684e44288b
Author: Venkata krishnan Sowrirajan <vs...@qubole.com>
AuthorDate: Thu Apr 4 09:58:05 2019 +0800

    [SPARK-27338][CORE] Fix deadlock in UnsafeExternalSorter.SpillableIterator when locking both UnsafeExternalSorter.SpillableIterator and TaskMemoryManager
    
    ## What changes were proposed in this pull request?
    
    In `UnsafeExternalSorter.SpillableIterator#loadNext()` takes lock on the `UnsafeExternalSorter` and calls `freePage` once the `lastPage` is consumed which needs to take a lock on `TaskMemoryManager`. At the same time, there can be another MemoryConsumer using `UnsafeExternalSorter` as part of sorting can try to `allocatePage` needs to get lock on `TaskMemoryManager` which can cause spill to happen which requires lock on `UnsafeExternalSorter` again causing deadlock. This is a classic  [...]
    
    To fix this, we can move the `freePage` call in `loadNext` outside of `Synchronized` block similar to the fix in SPARK-26265
    
    ## How was this patch tested?
    
    Manual tests were being done and will also try to add a test.
    
    Closes #24265 from venkata91/deadlock-sorter.
    
    Authored-by: Venkata krishnan Sowrirajan <vs...@qubole.com>
    Signed-off-by: Wenchen Fan <we...@databricks.com>
---
 .../unsafe/sort/UnsafeExternalSorter.java          | 34 +++++++++++++++-------
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
index af5a934..2314514 100644
--- a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
+++ b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
@@ -573,19 +573,31 @@ public final class UnsafeExternalSorter extends MemoryConsumer {
 
     @Override
     public void loadNext() throws IOException {
-      synchronized (this) {
-        loaded = true;
-        if (nextUpstream != null) {
-          // Just consumed the last record from in memory iterator
-          if (lastPage != null) {
-            freePage(lastPage);
-            lastPage = null;
+      MemoryBlock pageToFree = null;
+      try {
+        synchronized (this) {
+          loaded = true;
+          if (nextUpstream != null) {
+            // Just consumed the last record from in memory iterator
+            if(lastPage != null) {
+              // Do not free the page here, while we are locking `SpillableIterator`. The `freePage`
+              // method locks the `TaskMemoryManager`, and it's a bad idea to lock 2 objects in
+              // sequence. We may hit dead lock if another thread locks `TaskMemoryManager` and
+              // `SpillableIterator` in sequence, which may happen in
+              // `TaskMemoryManager.acquireExecutionMemory`. 
+              pageToFree = lastPage;
+              lastPage = null;
+            }
+            upstream = nextUpstream;
+            nextUpstream = null;
           }
-          upstream = nextUpstream;
-          nextUpstream = null;
+          numRecords--;
+          upstream.loadNext();
+        }
+      } finally {
+        if (pageToFree != null) {
+          freePage(pageToFree);
         }
-        numRecords--;
-        upstream.loadNext();
       }
     }
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org