You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2015/09/18 02:19:04 UTC
[jira] [Assigned] (SPARK-10677) UnsafeExternalSorter should
atomically release and acquire
[ https://issues.apache.org/jira/browse/SPARK-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-10677:
------------------------------------
Assignee: Andrew Or (was: Apache Spark)
> UnsafeExternalSorter should atomically release and acquire
> ----------------------------------------------------------
>
> Key: SPARK-10677
> URL: https://issues.apache.org/jira/browse/SPARK-10677
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.5.0
> Reporter: Andrew Or
> Assignee: Andrew Or
> Priority: Blocker
>
> We have code like the following:
> {code}
> private void acquireNewPage() throws IOException {
> final long memoryAcquired = shuffleMemoryManager.tryToAcquire(pageSizeBytes);
> if (memoryAcquired < pageSizeBytes) {
> shuffleMemoryManager.release(memoryAcquired);
> spill();
> final long memoryAcquiredAfterSpilling = shuffleMemoryManager.tryToAcquire(pageSizeBytes);
> if (memoryAcquiredAfterSpilling != pageSizeBytes) {
> shuffleMemoryManager.release(memoryAcquiredAfterSpilling);
> throw new IOException("Unable to acquire " + pageSizeBytes + " bytes of memory");
> }
> }
> ...
> {code}
> Context: in this code we're trying to acquire a new page. If the memory request fails, we spill and try again. If the second memory request still fails, then we throw an exception.
> Problem: When we spill, we release ALL the memory we currently hold onto only to re-acquire some of it immediately afterwards. This creates the opportunity for other tasks to jump in and steal our memory allocation, hence starving us.
> Solution: Instead, we should make the release and acquire atomic where possible. If we know we want exactly a page after the spill, then spill everything minus a page.
> I believe this is also the cause of SPARK-10474, where we fail to acquire memory for the pointer array immediately after spilling.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org