You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Andrew Or (JIRA)" <ji...@apache.org> on 2015/09/18 00:55:04 UTC

[jira] [Created] (SPARK-10677) UnsafeExternalSorter should atomically release and acquire

Andrew Or created SPARK-10677:
---------------------------------

             Summary: UnsafeExternalSorter should atomically release and acquire
                 Key: SPARK-10677
                 URL: https://issues.apache.org/jira/browse/SPARK-10677
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.5.0
            Reporter: Andrew Or
            Assignee: Andrew Or
            Priority: Blocker


We have code like the following:
{code}
private void acquireNewPage() throws IOException {
    final long memoryAcquired = shuffleMemoryManager.tryToAcquire(pageSizeBytes);
    if (memoryAcquired < pageSizeBytes) {
      shuffleMemoryManager.release(memoryAcquired);
      spill();
      final long memoryAcquiredAfterSpilling = shuffleMemoryManager.tryToAcquire(pageSizeBytes);
      if (memoryAcquiredAfterSpilling != pageSizeBytes) {
        shuffleMemoryManager.release(memoryAcquiredAfterSpilling);
        throw new IOException("Unable to acquire " + pageSizeBytes + " bytes of memory");
      }
    }
...
{code}

Context: in this code we're trying to acquire a new page. If the memory request fails, we spill and try again. If the second memory request still fails, then we throw an exception.

Problem: When we spill, we release ALL the memory we currently hold onto only to re-acquire some of it immediately afterwards. This creates the opportunity for other tasks to jump in and steal our memory allocation, hence starving us.

Solution: Instead, we should make the release and acquire atomic where possible. If we know we want exactly a page after the spill, then spill everything minus a page.

I believe this is also the cause of SPARK-10474, where we fail to acquire memory for the pointer array immediately after spilling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org