You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by andrewor14 <gi...@git.apache.org> on 2015/10/13 03:08:17 UTC

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

GitHub user andrewor14 opened a pull request:

    https://github.com/apache/spark/pull/9084

    [SPARK-10983] Unified memory manager

    This patch unifies the memory management of the storage and execution regions such that either side can borrow memory from each other. When memory pressure arises, storage will be evicted in favor of execution. To avoid regressions in cases where storage is crucial, we dynamically allocate a fraction of space for storage that execution cannot evict. Several configurations are introduced:
    
    - **spark.memory.fraction (default 0.75)**: fraction of the heap space used for execution and storage. The lower this is, the more frequently spills and cached data eviction occur. The purpose of this config is to set aside memory for internal metadata, user data structures, and imprecise size estimation in the case of sparse, unusually large records.
    
    - **spark.memory.storageFraction (default 0.5)**: size of the storage region within the space set aside by `spark.memory.fraction`. Cached data may only be evicted if total storage exceeds this region.
    
    - **spark.memory.useLegacyMode (default false)**: whether to use the memory management that existed in Spark 1.5 and before. This is mainly for backward compatibility.
    
    For a detailed description of the design, see [SPARK-10000](https://issues.apache.org/jira/browse/SPARK-10000).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/andrewor14/spark unified-memory-manager

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9084.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9084
    
----
commit 6481bc17d66b6045195138de9ac490d479019182
Author: Andrew Or <an...@databricks.com>
Date:   2015-10-09T02:12:13Z

    Add UnifiedMemoryManager skeleton

commit e08e1fe14cc2f137a70a4b73e4f1950b8ef4cd63
Author: Andrew Or <an...@databricks.com>
Date:   2015-10-09T20:29:03Z

    First implementation of UnifiedMemoryManager
    
    As of this commit, acquiring execution memory may cause cached
    blocks to be evicted. There are still a couple of TODOs, notably
    figuring out what the `maxExecutionMemory` and `maxStorageMemory`
    should actually be. They may not be fixed since either side can
    now borrow from the other.

commit 812c9744e9d9047518f9cb596386753d04b5d046
Author: Andrew Or <an...@databricks.com>
Date:   2015-10-09T21:34:39Z

    Use a global lock for all memory manager interactions
    
    Without this, we cannot both avoid deadlocks and race conditions,
    because each individual component ShuffleMemoryManager,
    MemoryStore and MemoryManager all have their own respective
    locks.
    
    This commit allows us to simplify several unintuitive control
    flows that were introduced to avoid acquiring locks in different
    orders. Since we have only one lock now, these code blocks can
    be significantly simplified.
    
    A forseeable downside to this is parallelism is reduced,
    but memory acquisitions and releases don't actually occur that
    frequently, so performance should not suffer noticeably.
    Additional investigations about this should ensue.

commit cc5f64c50ad7cadda81c550b90ac6894b88fa92a
Author: Andrew Or <an...@databricks.com>
Date:   2015-10-09T22:12:53Z

    Use a dynamic max for execution and storage
    
    Previously, ShuffleMemoryManager will allow each task to acquire
    up to 1/N of the entire storage + execution region. What we want
    is more like 1/N of the space not occupied by storage, since the
    "max" now varies over time.

commit ad8a6c47ff1338724e42cc1ffee229a1e5611e12
Author: Andrew Or <an...@databricks.com>
Date:   2015-10-09T22:35:26Z

    Fix notifyAll() IllegalMonitorStateException
    
    This happened because we were still calling `this.notifyAll()`
    in ShuffleMemoryManager when we were holding the `memoryManager`
    lock. Easy fix.

commit 0dc9a9597bbd0c26a68dfdd0ed0a9fb3e9d13a79
Author: Andrew Or <an...@databricks.com>
Date:   2015-10-09T22:51:45Z

    Minor: update a few comments

commit 5a4ffb907614105f311d9c76d907a803487c5207
Author: Andrew Or <an...@databricks.com>
Date:   2015-10-09T22:56:05Z

    Register blocks evicted by execution

commit b519540804902ec3acd21814b94f5d9ec06cb088
Author: Andrew Or <an...@databricks.com>
Date:   2015-10-09T23:02:42Z

    Minor: more comment updates

commit a65799e930e9cd4de6a54ff87d27632080def773
Author: Andrew Or <an...@databricks.com>
Date:   2015-10-12T21:43:08Z

    Add tests for UnifiedMemoryManager + TODOs
    
    Tests are passing in this commit, but there are follow-ups that
    need to be done (see TODOs added in this commit). More tests will
    be added in the future.

commit 6e913a5b9ef0a1758c66756a9e2b2f03ec8699a6
Author: Andrew Or <an...@databricks.com>
Date:   2015-10-12T23:46:56Z

    Clean up test code, resolve TODOs
    
    As of this commit all *MemoryManagerSuite's are documented and
    pass tests.

commit 3eef5a4c69c41ab6502461128d3b6181ab38f56b
Author: Andrew Or <an...@databricks.com>
Date:   2015-10-12T23:51:07Z

    Fix tests
    
    TaskContext was not stubbed correctly in ShuffleMemoryManagerSuite.
    In particular, this patch added some code that does some things
    to the active TaskMetrics, but this was not part of the mocked
    TaskContext object.

commit c56600b216f7d04dc3837cd1489fa5c661953de1
Author: Andrew Or <an...@databricks.com>
Date:   2015-10-13T00:59:15Z

    Minor: Add TODOs, fix a few comments

commit 01ff5335b2b6554ed9388b015145b8dfd9bc1977
Author: Andrew Or <an...@databricks.com>
Date:   2015-10-13T00:59:36Z

    Merge branch 'master' of github.com:apache/spark into unified-memory-manager

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147849150
  
    LGTM. I'm going to merge this now. There's some followup tasks to do in terms of refactoring some old spilling tests, but there's followup tasks for those and we'll do them soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9084#discussion_r41825192
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
    @@ -509,6 +512,11 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging {
     
     private[spark] object SparkConf extends Logging {
     
    +  // Deprecation message for memory fraction configs used in the old memory management model
    +  private val deprecatedMemoryFractionMessage =
    +    "As of Spark 1.6, execution and storage memory management are unified. " +
    +      "All memory fractions used in the old model are now deprecated and no longer read."
    --- End diff --
    
    The old configurations will still be respected in legacy mode, so this is slightly ambiguous / confusing. Is there an easy way to avoid the warning if the legacy mode configuration is turned on? If not, I suppose we could just expand the deprecation message to mention this corner case, perhaps by just appending a "(unless spark.XX.YY is enabled)" at the end.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147630632
  
      [Test build #43631 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43631/consoleFull) for   PR 9084 at commit [`face129`](https://github.com/apache/spark/commit/face129d898db2ebcea200ed448ec2e78c288c73).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147602003
  
    It looks like the test failures are occurring in the external sorter suites. Some of these tests exercise spilling-related logic (such as ensuring that spill files are cleaned up); in order to induce spilling, these tests configured the legacy memory fractions to be really small. In order to fix these tests, we can either enable legacy mode for those tests only or we can modify the tests to set the new configurations. Ideally these sorter unit tests wouldn't be relying on memory management behavior in order to test spill cleanup, but they were written a long time ago. We could refactor these tests, but I'd prefer a band-aid solution for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147812914
  
      [Test build #1888 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1888/console) for   PR 9084 at commit [`face129`](https://github.com/apache/spark/commit/face129d898db2ebcea200ed448ec2e78c288c73).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9084#discussion_r41827580
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala ---
    @@ -37,15 +37,14 @@ private case class MemoryEntry(value: Any, size: Long, deserialized: Boolean)
     private[spark] class MemoryStore(blockManager: BlockManager, memoryManager: MemoryManager)
       extends BlockStore(blockManager) {
     
    +  // Note: all changes to memory allocations, notably putting blocks, evicting blocks, and
    +  // acquiring or releasing unroll memory, must be synchronized on `memoryManager`!
    +
       private val conf = blockManager.conf
       private val entries = new LinkedHashMap[BlockId, MemoryEntry](32, 0.75f, true)
    -  private val maxMemory = memoryManager.maxStorageMemory
    -
    -  // Ensure only one thread is putting, and if necessary, dropping blocks at any given time
    -  private val accountingLock = new Object
     
       // A mapping from taskAttemptId to amount of memory used for unrolling a block (in bytes)
    -  // All accesses of this map are assumed to have manually synchronized on `accountingLock`
    +  // All accesses of this map are assumed to have manually synchronized on `memoryManager`
    --- End diff --
    
    maybe it should, but I think it's largely irrelevant in the future when we have datasets. I would prefer to keep it in `MemoryStore` for now to minimize the changes. (we can always move it later)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147573704
  
      [Test build #43607 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43607/consoleFull) for   PR 9084 at commit [`059ae0d`](https://github.com/apache/spark/commit/059ae0da40e981306a060bc9f7dafd7639263cf0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147573522
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147628515
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147571621
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9084#discussion_r41827358
  
    --- Diff: core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala ---
    @@ -0,0 +1,138 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.memory
    +
    +import scala.collection.mutable
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.storage.{BlockStatus, BlockId}
    +
    +
    +/**
    + * A [[MemoryManager]] that enforces a soft boundary between execution and storage such that
    + * either side can borrow memory from the other.
    + *
    + * The region shared between execution and storage is a fraction of the total heap space
    + * configurable through `spark.memory.fraction` (default 0.75). The position of the boundary
    + * within this space is further determined by `spark.memory.storageFraction` (default 0.5).
    + * This means the size of the storage region is 0.75 * 0.5 = 0.375 of the heap space by default.
    + *
    + * Storage can borrow as much execution memory as is free until execution reclaims its space.
    + * When this happens, cached blocks will be evicted from memory until sufficient borrowed
    + * memory is released to satisfy the execution memory request.
    + *
    + * Similarly, execution can borrow as much storage memory as is free. However, execution
    + * memory is *never* evicted by storage due to the complexities involved in implementing this.
    + * The implication is that attempts to cache blocks may fail if execution has already eaten
    + * up most of the storage space, in which case the new blocks will be evicted directly.
    + */
    +private[spark] class UnifiedMemoryManager(conf: SparkConf, maxMemory: Long) extends MemoryManager {
    +
    +  def this(conf: SparkConf) {
    +    this(conf, UnifiedMemoryManager.getMaxMemory(conf))
    +  }
    +
    +  /**
    +   * Size of the storage region, in bytes.
    +   *
    +   * This region is not statically reserved; execution can borrow from it if necessary.
    +   * Cached blocks can be evicted only if actual storage memory usage exceeds this region.
    +   */
    +  private val storageRegionSize: Long = {
    +    (maxMemory * conf.getDouble("spark.memory.storageFraction", 0.5)).toLong
    +  }
    +
    +  /**
    +   * Total amount of memory, in bytes, not currently occupied by either execution or storage.
    +   */
    +  private def totalFreeMemory: Long = synchronized {
    +    assert(_executionMemoryUsed <= maxMemory)
    +    assert(_storageMemoryUsed <= maxMemory)
    +    assert(_executionMemoryUsed + _storageMemoryUsed <= maxMemory)
    +    maxMemory - _executionMemoryUsed - _storageMemoryUsed
    +  }
    +
    +  /**
    +   * Total available memory for execution, in bytes.
    +   * In this model, this is equivalent to the amount of memory not occupied by storage.
    +   */
    +  override def maxExecutionMemory: Long = synchronized {
    +    maxMemory - _storageMemoryUsed
    +  }
    +
    +  /**
    +   * Total available memory for storage, in bytes.
    +   * In this model, this is equivalent to the amount of memory not occupied by execution.
    +   */
    +  override def maxStorageMemory: Long = synchronized {
    +    maxMemory - _executionMemoryUsed
    +  }
    +
    +  /**
    +   * Acquire N bytes of memory for execution, evicting cached blocks if necessary.
    +   *
    +   * This method evicts blocks only up to the amount of memory borrowed by storage.
    +   * Blocks evicted in the process, if any, are added to `evictedBlocks`.
    +   * @return number of bytes successfully granted (<= N).
    +   */
    +  override def acquireExecutionMemory(
    +      numBytes: Long,
    +      evictedBlocks: mutable.Buffer[(BlockId, BlockStatus)]): Long = synchronized {
    +    assert(numBytes >= 0)
    +    val memoryBorrowedByStorage = math.max(0, _storageMemoryUsed - storageRegionSize)
    +    // If there is not enough free memory AND storage has borrowed some execution memory,
    +    // then evict as much memory borrowed by storage as needed to grant this request
    +    val shouldEvictStorage = totalFreeMemory < numBytes && memoryBorrowedByStorage > 0
    +    if (shouldEvictStorage) {
    +      val spaceToEnsure = math.min(numBytes, memoryBorrowedByStorage)
    +      memoryStore.ensureFreeSpace(spaceToEnsure, evictedBlocks)
    --- End diff --
    
    as for whether it's potentially expensive, yes it may be. However, note that you only do this under memory pressure, meaning your tasks would have otherwise spilled anyway, so in either case we'll end up doing some I/O


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9084#discussion_r41826520

--- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala ---
@@ -365,16 +372,13 @@ private[spark] class MemoryStore(blockManager: BlockManager, memoryManager: Memo
* for freeing up more space for another block that needs to be put. Only then the actually
* dropping of blocks (and writing to disk if necessary) can proceed in parallel. */

- accountingLock.synchronized {
+ memoryManager.synchronized {
// Note: if we have previously unrolled this block successfully, then pending unroll
// memory should be non-zero. This is the amount that we already reserved during the
// unrolling process. In this case, we can just reuse this space to cache our block.
- //
- // Note: the StaticMemoryManager counts unroll memory as storage memory. Here, the
- // synchronization on `accountingLock` guarantees that the release of unroll memory and
- // acquisition of storage memory happens atomically. However, if storage memory is acquired
- // outside of MemoryStore or if unroll memory is counted as execution memory, then we will
- // have to revisit this assumption. See SPARK-10983 for more context.
+ // The synchronization on `memoryManager` here guarantees that the release and acquire
+ // happen atomically. This relies on the assumption that all memory acquisitions are
+ // synchronized on the same lock.
--- End diff --

It's nice that this simplification resulted from the single-lock approach. I'm happy to stick with that simple approach until perf. data suggests that we should do something more complex.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9084#discussion_r41825331
  
    --- Diff: core/src/main/scala/org/apache/spark/memory/MemoryManager.scala ---
    @@ -72,46 +92,62 @@ private[spark] abstract class MemoryManager {
       def acquireUnrollMemory(
    --- End diff --
    
    Given that `acquireUnrollMemory` appears to act as a synonym for `acquireStorageMemory` in the current implementation, it might be worth adding a brief comment above this method to explain that this extra method exists in order to give us the future flexibility to account for unroll memory differently in the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147642599
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147642601
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43622/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147810781
  
      [Test build #1890 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1890/console) for   PR 9084 at commit [`face129`](https://github.com/apache/spark/commit/face129d898db2ebcea200ed448ec2e78c288c73).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9084#discussion_r41833337
  
    --- Diff: core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala ---
    @@ -0,0 +1,138 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.memory
    +
    +import scala.collection.mutable
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.storage.{BlockStatus, BlockId}
    +
    +
    +/**
    + * A [[MemoryManager]] that enforces a soft boundary between execution and storage such that
    + * either side can borrow memory from the other.
    + *
    + * The region shared between execution and storage is a fraction of the total heap space
    + * configurable through `spark.memory.fraction` (default 0.75). The position of the boundary
    + * within this space is further determined by `spark.memory.storageFraction` (default 0.5).
    + * This means the size of the storage region is 0.75 * 0.5 = 0.375 of the heap space by default.
    + *
    + * Storage can borrow as much execution memory as is free until execution reclaims its space.
    + * When this happens, cached blocks will be evicted from memory until sufficient borrowed
    + * memory is released to satisfy the execution memory request.
    + *
    + * Similarly, execution can borrow as much storage memory as is free. However, execution
    + * memory is *never* evicted by storage due to the complexities involved in implementing this.
    + * The implication is that attempts to cache blocks may fail if execution has already eaten
    + * up most of the storage space, in which case the new blocks will be evicted directly.
    + */
    +private[spark] class UnifiedMemoryManager(conf: SparkConf, maxMemory: Long) extends MemoryManager {
    +
    +  def this(conf: SparkConf) {
    +    this(conf, UnifiedMemoryManager.getMaxMemory(conf))
    +  }
    +
    +  /**
    +   * Size of the storage region, in bytes.
    +   *
    +   * This region is not statically reserved; execution can borrow from it if necessary.
    +   * Cached blocks can be evicted only if actual storage memory usage exceeds this region.
    +   */
    +  private val storageRegionSize: Long = {
    +    (maxMemory * conf.getDouble("spark.memory.storageFraction", 0.5)).toLong
    +  }
    +
    +  /**
    +   * Total amount of memory, in bytes, not currently occupied by either execution or storage.
    +   */
    +  private def totalFreeMemory: Long = synchronized {
    +    assert(_executionMemoryUsed <= maxMemory)
    +    assert(_storageMemoryUsed <= maxMemory)
    +    assert(_executionMemoryUsed + _storageMemoryUsed <= maxMemory)
    +    maxMemory - _executionMemoryUsed - _storageMemoryUsed
    +  }
    +
    +  /**
    +   * Total available memory for execution, in bytes.
    +   * In this model, this is equivalent to the amount of memory not occupied by storage.
    +   */
    +  override def maxExecutionMemory: Long = synchronized {
    +    maxMemory - _storageMemoryUsed
    +  }
    +
    +  /**
    +   * Total available memory for storage, in bytes.
    +   * In this model, this is equivalent to the amount of memory not occupied by execution.
    +   */
    +  override def maxStorageMemory: Long = synchronized {
    +    maxMemory - _executionMemoryUsed
    +  }
    +
    +  /**
    +   * Acquire N bytes of memory for execution, evicting cached blocks if necessary.
    +   *
    +   * This method evicts blocks only up to the amount of memory borrowed by storage.
    +   * Blocks evicted in the process, if any, are added to `evictedBlocks`.
    +   * @return number of bytes successfully granted (<= N).
    +   */
    +  override def acquireExecutionMemory(
    +      numBytes: Long,
    +      evictedBlocks: mutable.Buffer[(BlockId, BlockStatus)]): Long = synchronized {
    +    assert(numBytes >= 0)
    +    val memoryBorrowedByStorage = math.max(0, _storageMemoryUsed - storageRegionSize)
    +    // If there is not enough free memory AND storage has borrowed some execution memory,
    +    // then evict as much memory borrowed by storage as needed to grant this request
    +    val shouldEvictStorage = totalFreeMemory < numBytes && memoryBorrowedByStorage > 0
    +    if (shouldEvictStorage) {
    +      val spaceToEnsure = math.min(numBytes, memoryBorrowedByStorage)
    +      memoryStore.ensureFreeSpace(spaceToEnsure, evictedBlocks)
    --- End diff --
    
    (for the `ensureFreeSpace` return value, I looked into it a little more but I don't think there's anything useful we can do with it. The problem is that in unrolling it could return `false` but still end up giving you enough space.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147768759
  
      [Test build #1888 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1888/consoleFull) for   PR 9084 at commit [`face129`](https://github.com/apache/spark/commit/face129d898db2ebcea200ed448ec2e78c288c73).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9084#discussion_r41827375
  
    --- Diff: core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala ---
    @@ -0,0 +1,138 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.memory
    +
    +import scala.collection.mutable
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.storage.{BlockStatus, BlockId}
    +
    +
    +/**
    + * A [[MemoryManager]] that enforces a soft boundary between execution and storage such that
    + * either side can borrow memory from the other.
    + *
    + * The region shared between execution and storage is a fraction of the total heap space
    + * configurable through `spark.memory.fraction` (default 0.75). The position of the boundary
    + * within this space is further determined by `spark.memory.storageFraction` (default 0.5).
    + * This means the size of the storage region is 0.75 * 0.5 = 0.375 of the heap space by default.
    --- End diff --
    
    thanks :+1:


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9084#discussion_r41826368
  
    --- Diff: core/src/main/scala/org/apache/spark/shuffle/ShuffleMemoryManager.scala ---
    @@ -65,7 +66,7 @@ class ShuffleMemoryManager protected (
        * total memory pool (where N is the # of active tasks) before it is forced to spill. This can
        * happen if the number of tasks increases but an older task had a lot of memory already.
        */
    -  def tryToAcquire(numBytes: Long): Long = synchronized {
    +  def tryToAcquire(numBytes: Long): Long = memoryManager.synchronized {
    --- End diff --
    
    This slightly-confusing synchronization on `memoryManager` isn't a huge deal since it's going to be removed by the memory manager unification patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147588424
  
      [Test build #43602 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43602/console) for   PR 9084 at commit [`01ff533`](https://github.com/apache/spark/commit/01ff5335b2b6554ed9388b015145b8dfd9bc1977).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147775906
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147597282
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43607/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9084#discussion_r41826094
  
    --- Diff: core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala ---
    @@ -0,0 +1,138 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.memory
    +
    +import scala.collection.mutable
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.storage.{BlockStatus, BlockId}
    +
    +
    +/**
    + * A [[MemoryManager]] that enforces a soft boundary between execution and storage such that
    + * either side can borrow memory from the other.
    + *
    + * The region shared between execution and storage is a fraction of the total heap space
    + * configurable through `spark.memory.fraction` (default 0.75). The position of the boundary
    + * within this space is further determined by `spark.memory.storageFraction` (default 0.5).
    + * This means the size of the storage region is 0.75 * 0.5 = 0.375 of the heap space by default.
    + *
    + * Storage can borrow as much execution memory as is free until execution reclaims its space.
    + * When this happens, cached blocks will be evicted from memory until sufficient borrowed
    + * memory is released to satisfy the execution memory request.
    + *
    + * Similarly, execution can borrow as much storage memory as is free. However, execution
    + * memory is *never* evicted by storage due to the complexities involved in implementing this.
    + * The implication is that attempts to cache blocks may fail if execution has already eaten
    + * up most of the storage space, in which case the new blocks will be evicted directly.
    + */
    +private[spark] class UnifiedMemoryManager(conf: SparkConf, maxMemory: Long) extends MemoryManager {
    +
    +  def this(conf: SparkConf) {
    +    this(conf, UnifiedMemoryManager.getMaxMemory(conf))
    +  }
    +
    +  /**
    +   * Size of the storage region, in bytes.
    +   *
    +   * This region is not statically reserved; execution can borrow from it if necessary.
    +   * Cached blocks can be evicted only if actual storage memory usage exceeds this region.
    +   */
    +  private val storageRegionSize: Long = {
    +    (maxMemory * conf.getDouble("spark.memory.storageFraction", 0.5)).toLong
    +  }
    +
    +  /**
    +   * Total amount of memory, in bytes, not currently occupied by either execution or storage.
    +   */
    +  private def totalFreeMemory: Long = synchronized {
    +    assert(_executionMemoryUsed <= maxMemory)
    +    assert(_storageMemoryUsed <= maxMemory)
    +    assert(_executionMemoryUsed + _storageMemoryUsed <= maxMemory)
    +    maxMemory - _executionMemoryUsed - _storageMemoryUsed
    +  }
    +
    +  /**
    +   * Total available memory for execution, in bytes.
    +   * In this model, this is equivalent to the amount of memory not occupied by storage.
    +   */
    +  override def maxExecutionMemory: Long = synchronized {
    +    maxMemory - _storageMemoryUsed
    +  }
    +
    +  /**
    +   * Total available memory for storage, in bytes.
    +   * In this model, this is equivalent to the amount of memory not occupied by execution.
    +   */
    +  override def maxStorageMemory: Long = synchronized {
    +    maxMemory - _executionMemoryUsed
    +  }
    +
    +  /**
    +   * Acquire N bytes of memory for execution, evicting cached blocks if necessary.
    +   *
    +   * This method evicts blocks only up to the amount of memory borrowed by storage.
    +   * Blocks evicted in the process, if any, are added to `evictedBlocks`.
    +   * @return number of bytes successfully granted (<= N).
    +   */
    +  override def acquireExecutionMemory(
    +      numBytes: Long,
    +      evictedBlocks: mutable.Buffer[(BlockId, BlockStatus)]): Long = synchronized {
    +    assert(numBytes >= 0)
    +    val memoryBorrowedByStorage = math.max(0, _storageMemoryUsed - storageRegionSize)
    +    // If there is not enough free memory AND storage has borrowed some execution memory,
    +    // then evict as much memory borrowed by storage as needed to grant this request
    +    val shouldEvictStorage = totalFreeMemory < numBytes && memoryBorrowedByStorage > 0
    +    if (shouldEvictStorage) {
    +      val spaceToEnsure = math.min(numBytes, memoryBorrowedByStorage)
    +      memoryStore.ensureFreeSpace(spaceToEnsure, evictedBlocks)
    --- End diff --
    
    Is it possible that this call could be expensive? Just wondering whether it's safe to be performing inside of the `synchronized` block. Note that we can always optimize the locking later, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9084#discussion_r41826409
  
    --- Diff: core/src/main/scala/org/apache/spark/shuffle/ShuffleMemoryManager.scala ---
    @@ -73,15 +74,18 @@ class ShuffleMemoryManager protected (
         // of active tasks, to let other tasks ramp down their memory in calls to tryToAcquire
         if (!taskMemory.contains(taskAttemptId)) {
           taskMemory(taskAttemptId) = 0L
    -      notifyAll()  // Will later cause waiting tasks to wake up and check numTasks again
    +      // This will later cause waiting tasks to wake up and check numTasks again
    +      memoryManager.notifyAll()
         }
     
         // Keep looping until we're either sure that we don't want to grant this request (because this
         // task would have more than 1 / numActiveTasks of the memory) or we have enough free
         // memory to give it (we always let each task get at least 1 / (2 * numActiveTasks)).
    +    // TODO: simplify this to limit each task to its own slot
    --- End diff --
    
    Can you elaborate on what changes this might require? Will we have to effectively port the current cross-task accounting logic into the StaticMemoryManager in order to preserve the existing behavior for legacy mode?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9084#discussion_r41828008
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
    @@ -509,6 +512,11 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging {
     
     private[spark] object SparkConf extends Logging {
     
    +  // Deprecation message for memory fraction configs used in the old memory management model
    +  private val deprecatedMemoryFractionMessage =
    +    "As of Spark 1.6, execution and storage memory management are unified. " +
    +      "All memory fractions used in the old model are now deprecated and no longer read."
    --- End diff --
    
    I can print a different warning if legacy mode is on


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/9084


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9084#discussion_r41827532
  
    --- Diff: core/src/main/scala/org/apache/spark/shuffle/ShuffleMemoryManager.scala ---
    @@ -73,15 +74,18 @@ class ShuffleMemoryManager protected (
         // of active tasks, to let other tasks ramp down their memory in calls to tryToAcquire
         if (!taskMemory.contains(taskAttemptId)) {
           taskMemory(taskAttemptId) = 0L
    -      notifyAll()  // Will later cause waiting tasks to wake up and check numTasks again
    +      // This will later cause waiting tasks to wake up and check numTasks again
    +      memoryManager.notifyAll()
         }
     
         // Keep looping until we're either sure that we don't want to grant this request (because this
         // task would have more than 1 / numActiveTasks of the memory) or we have enough free
         // memory to give it (we always let each task get at least 1 / (2 * numActiveTasks)).
    +    // TODO: simplify this to limit each task to its own slot
    --- End diff --
    
    no, this is largely orthogonal to the memory management task. Limiting a task to its slot affects both legacy and unified mode. The motivation for doing this is to simplify the logic here, so we shouldn't bend ourselves backwards to try to exactly preserve the legacy behavior.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9084#discussion_r41827174
  
    --- Diff: core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala ---
    @@ -0,0 +1,138 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.memory
    +
    +import scala.collection.mutable
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.storage.{BlockStatus, BlockId}
    +
    +
    +/**
    + * A [[MemoryManager]] that enforces a soft boundary between execution and storage such that
    + * either side can borrow memory from the other.
    + *
    + * The region shared between execution and storage is a fraction of the total heap space
    + * configurable through `spark.memory.fraction` (default 0.75). The position of the boundary
    + * within this space is further determined by `spark.memory.storageFraction` (default 0.5).
    + * This means the size of the storage region is 0.75 * 0.5 = 0.375 of the heap space by default.
    + *
    + * Storage can borrow as much execution memory as is free until execution reclaims its space.
    + * When this happens, cached blocks will be evicted from memory until sufficient borrowed
    + * memory is released to satisfy the execution memory request.
    + *
    + * Similarly, execution can borrow as much storage memory as is free. However, execution
    + * memory is *never* evicted by storage due to the complexities involved in implementing this.
    + * The implication is that attempts to cache blocks may fail if execution has already eaten
    + * up most of the storage space, in which case the new blocks will be evicted directly.
    + */
    +private[spark] class UnifiedMemoryManager(conf: SparkConf, maxMemory: Long) extends MemoryManager {
    +
    +  def this(conf: SparkConf) {
    +    this(conf, UnifiedMemoryManager.getMaxMemory(conf))
    +  }
    +
    +  /**
    +   * Size of the storage region, in bytes.
    +   *
    +   * This region is not statically reserved; execution can borrow from it if necessary.
    +   * Cached blocks can be evicted only if actual storage memory usage exceeds this region.
    +   */
    +  private val storageRegionSize: Long = {
    +    (maxMemory * conf.getDouble("spark.memory.storageFraction", 0.5)).toLong
    +  }
    +
    +  /**
    +   * Total amount of memory, in bytes, not currently occupied by either execution or storage.
    +   */
    +  private def totalFreeMemory: Long = synchronized {
    +    assert(_executionMemoryUsed <= maxMemory)
    +    assert(_storageMemoryUsed <= maxMemory)
    +    assert(_executionMemoryUsed + _storageMemoryUsed <= maxMemory)
    +    maxMemory - _executionMemoryUsed - _storageMemoryUsed
    +  }
    +
    +  /**
    +   * Total available memory for execution, in bytes.
    +   * In this model, this is equivalent to the amount of memory not occupied by storage.
    +   */
    +  override def maxExecutionMemory: Long = synchronized {
    +    maxMemory - _storageMemoryUsed
    +  }
    +
    +  /**
    +   * Total available memory for storage, in bytes.
    +   * In this model, this is equivalent to the amount of memory not occupied by execution.
    +   */
    +  override def maxStorageMemory: Long = synchronized {
    +    maxMemory - _executionMemoryUsed
    +  }
    +
    +  /**
    +   * Acquire N bytes of memory for execution, evicting cached blocks if necessary.
    +   *
    +   * This method evicts blocks only up to the amount of memory borrowed by storage.
    +   * Blocks evicted in the process, if any, are added to `evictedBlocks`.
    +   * @return number of bytes successfully granted (<= N).
    +   */
    +  override def acquireExecutionMemory(
    +      numBytes: Long,
    +      evictedBlocks: mutable.Buffer[(BlockId, BlockStatus)]): Long = synchronized {
    +    assert(numBytes >= 0)
    +    val memoryBorrowedByStorage = math.max(0, _storageMemoryUsed - storageRegionSize)
    +    // If there is not enough free memory AND storage has borrowed some execution memory,
    +    // then evict as much memory borrowed by storage as needed to grant this request
    +    val shouldEvictStorage = totalFreeMemory < numBytes && memoryBorrowedByStorage > 0
    +    if (shouldEvictStorage) {
    +      val spaceToEnsure = math.min(numBytes, memoryBorrowedByStorage)
    +      memoryStore.ensureFreeSpace(spaceToEnsure, evictedBlocks)
    --- End diff --
    
    there's no more accounting lock :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147775816
  
      [Test build #43646 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43646/console) for   PR 9084 at commit [`face129`](https://github.com/apache/spark/commit/face129d898db2ebcea200ed448ec2e78c288c73).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147612047
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9084#discussion_r41825937
  
    --- Diff: core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala ---
    @@ -0,0 +1,138 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.memory
    +
    +import scala.collection.mutable
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.storage.{BlockStatus, BlockId}
    +
    +
    +/**
    + * A [[MemoryManager]] that enforces a soft boundary between execution and storage such that
    + * either side can borrow memory from the other.
    + *
    + * The region shared between execution and storage is a fraction of the total heap space
    + * configurable through `spark.memory.fraction` (default 0.75). The position of the boundary
    + * within this space is further determined by `spark.memory.storageFraction` (default 0.5).
    + * This means the size of the storage region is 0.75 * 0.5 = 0.375 of the heap space by default.
    + *
    + * Storage can borrow as much execution memory as is free until execution reclaims its space.
    + * When this happens, cached blocks will be evicted from memory until sufficient borrowed
    + * memory is released to satisfy the execution memory request.
    + *
    + * Similarly, execution can borrow as much storage memory as is free. However, execution
    + * memory is *never* evicted by storage due to the complexities involved in implementing this.
    + * The implication is that attempts to cache blocks may fail if execution has already eaten
    + * up most of the storage space, in which case the new blocks will be evicted directly.
    --- End diff --
    
    "evicted immediately"? I also wonder if this should clarify that the new blocks will not be cached if their storage level is MEMORY_ONLY and will be spilled to disk if their storage level permits that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147571614
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147768208
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147768915
  
      [Test build #1890 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1890/consoleFull) for   PR 9084 at commit [`face129`](https://github.com/apache/spark/commit/face129d898db2ebcea200ed448ec2e78c288c73).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147613765
  
      [Test build #43622 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43622/consoleFull) for   PR 9084 at commit [`24a391c`](https://github.com/apache/spark/commit/24a391cbba36812d555d32153daa5c132b3e608d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #9084: [SPARK-10983] Unified memory manager

Posted by pzz2011 <gi...@git.apache.org>.

Github user pzz2011 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9084#discussion_r75595379
  
    --- Diff: core/src/main/scala/org/apache/spark/memory/MemoryManager.scala ---
    @@ -40,6 +41,10 @@ private[spark] abstract class MemoryManager {
         _memoryStore
       }
     
    +  // Amount of execution/storage memory in use, accesses must be synchronized on `this`
    +  protected var _executionMemoryUsed: Long = 0
    --- End diff --
    
    why here is protected?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #9084: [SPARK-10983] Unified memory manager

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9084#discussion_r75595837
  
    --- Diff: core/src/main/scala/org/apache/spark/memory/MemoryManager.scala ---
    @@ -40,6 +41,10 @@ private[spark] abstract class MemoryManager {
         _memoryStore
       }
     
    +  // Amount of execution/storage memory in use, accesses must be synchronized on `this`
    +  protected var _executionMemoryUsed: Long = 0
    --- End diff --
    
    @pzz2011 you're making several comments on old PRs. Generally people won't see that and it's not the place for discussion anyway. If you can formulate a specific question beyond "why is the code this way?" ask on user@.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147588554
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43602/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147628488
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147574036
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9084#discussion_r41826285
  
    --- Diff: core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala ---
    @@ -0,0 +1,138 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.memory
    +
    +import scala.collection.mutable
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.storage.{BlockStatus, BlockId}
    +
    +
    +/**
    + * A [[MemoryManager]] that enforces a soft boundary between execution and storage such that
    + * either side can borrow memory from the other.
    + *
    + * The region shared between execution and storage is a fraction of the total heap space
    + * configurable through `spark.memory.fraction` (default 0.75). The position of the boundary
    + * within this space is further determined by `spark.memory.storageFraction` (default 0.5).
    + * This means the size of the storage region is 0.75 * 0.5 = 0.375 of the heap space by default.
    + *
    + * Storage can borrow as much execution memory as is free until execution reclaims its space.
    + * When this happens, cached blocks will be evicted from memory until sufficient borrowed
    + * memory is released to satisfy the execution memory request.
    + *
    + * Similarly, execution can borrow as much storage memory as is free. However, execution
    + * memory is *never* evicted by storage due to the complexities involved in implementing this.
    + * The implication is that attempts to cache blocks may fail if execution has already eaten
    + * up most of the storage space, in which case the new blocks will be evicted directly.
    + */
    +private[spark] class UnifiedMemoryManager(conf: SparkConf, maxMemory: Long) extends MemoryManager {
    +
    +  def this(conf: SparkConf) {
    +    this(conf, UnifiedMemoryManager.getMaxMemory(conf))
    +  }
    +
    +  /**
    +   * Size of the storage region, in bytes.
    +   *
    +   * This region is not statically reserved; execution can borrow from it if necessary.
    +   * Cached blocks can be evicted only if actual storage memory usage exceeds this region.
    +   */
    +  private val storageRegionSize: Long = {
    +    (maxMemory * conf.getDouble("spark.memory.storageFraction", 0.5)).toLong
    +  }
    +
    +  /**
    +   * Total amount of memory, in bytes, not currently occupied by either execution or storage.
    +   */
    +  private def totalFreeMemory: Long = synchronized {
    +    assert(_executionMemoryUsed <= maxMemory)
    +    assert(_storageMemoryUsed <= maxMemory)
    +    assert(_executionMemoryUsed + _storageMemoryUsed <= maxMemory)
    +    maxMemory - _executionMemoryUsed - _storageMemoryUsed
    +  }
    +
    +  /**
    +   * Total available memory for execution, in bytes.
    +   * In this model, this is equivalent to the amount of memory not occupied by storage.
    +   */
    +  override def maxExecutionMemory: Long = synchronized {
    +    maxMemory - _storageMemoryUsed
    +  }
    +
    +  /**
    +   * Total available memory for storage, in bytes.
    +   * In this model, this is equivalent to the amount of memory not occupied by execution.
    +   */
    +  override def maxStorageMemory: Long = synchronized {
    +    maxMemory - _executionMemoryUsed
    +  }
    +
    +  /**
    +   * Acquire N bytes of memory for execution, evicting cached blocks if necessary.
    +   *
    +   * This method evicts blocks only up to the amount of memory borrowed by storage.
    +   * Blocks evicted in the process, if any, are added to `evictedBlocks`.
    +   * @return number of bytes successfully granted (<= N).
    +   */
    +  override def acquireExecutionMemory(
    +      numBytes: Long,
    +      evictedBlocks: mutable.Buffer[(BlockId, BlockStatus)]): Long = synchronized {
    +    assert(numBytes >= 0)
    +    val memoryBorrowedByStorage = math.max(0, _storageMemoryUsed - storageRegionSize)
    +    // If there is not enough free memory AND storage has borrowed some execution memory,
    +    // then evict as much memory borrowed by storage as needed to grant this request
    +    val shouldEvictStorage = totalFreeMemory < numBytes && memoryBorrowedByStorage > 0
    +    if (shouldEvictStorage) {
    +      val spaceToEnsure = math.min(numBytes, memoryBorrowedByStorage)
    +      memoryStore.ensureFreeSpace(spaceToEnsure, evictedBlocks)
    --- End diff --
    
    Another possible concern: I wonder whether it's possible for there to be a deadlock between `MemoryManager.this` and `MemoryStore.accountingLock`. For instance, `MemoryStore. reserveUnrollMemoryForThisTask` first synchronizes on the accounting lock and then calls `MemoryManager.acquireUnrollMemory`, while at the same time `MemoryManager.acquireExecutionMemory` calls `MemoryStore.ensureFreeSpace`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9084#discussion_r41825365
  
    --- Diff: core/src/main/scala/org/apache/spark/memory/MemoryManager.scala ---
    @@ -72,46 +92,62 @@ private[spark] abstract class MemoryManager {
       def acquireUnrollMemory(
           blockId: BlockId,
           numBytes: Long,
    -      evictedBlocks: mutable.Buffer[(BlockId, BlockStatus)]): Boolean
    +      evictedBlocks: mutable.Buffer[(BlockId, BlockStatus)]): Boolean = synchronized {
    +    acquireStorageMemory(blockId, numBytes, evictedBlocks)
    +  }
     
       /**
        * Release N bytes of execution memory.
        */
    -  def releaseExecutionMemory(numBytes: Long): Unit
    +  def releaseExecutionMemory(numBytes: Long): Unit = synchronized {
    +    if (numBytes > _executionMemoryUsed) {
    +      logWarning(s"Attempted to release $numBytes bytes of execution " +
    +        s"memory when we only have ${_executionMemoryUsed} bytes")
    +      _executionMemoryUsed = 0
    +    } else {
    +      _executionMemoryUsed -= numBytes
    +    }
    +  }
     
       /**
        * Release N bytes of storage memory.
        */
    -  def releaseStorageMemory(numBytes: Long): Unit
    +  def releaseStorageMemory(numBytes: Long): Unit = synchronized {
    +    if (numBytes > _storageMemoryUsed) {
    +      logWarning(s"Attempted to release $numBytes bytes of storage " +
    +        s"memory when we only have ${_storageMemoryUsed} bytes")
    +      _storageMemoryUsed = 0
    +    } else {
    +      _storageMemoryUsed -= numBytes
    +    }
    +  }
     
       /**
        * Release all storage memory acquired.
        */
    -  def releaseStorageMemory(): Unit
    +  def releaseStorageMemory(): Unit = synchronized {
    --- End diff --
    
    Minor nit (which I guess existed in the old code), but what do you think about naming this `releaseAllStorageMemory`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147642518
  
      [Test build #43622 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43622/console) for   PR 9084 at commit [`24a391c`](https://github.com/apache/spark/commit/24a391cbba36812d555d32153daa5c132b3e608d).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147597211
  
      [Test build #43607 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43607/console) for   PR 9084 at commit [`059ae0d`](https://github.com/apache/spark/commit/059ae0da40e981306a060bc9f7dafd7639263cf0).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147768685
  
      [Test build #1889 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1889/consoleFull) for   PR 9084 at commit [`face129`](https://github.com/apache/spark/commit/face129d898db2ebcea200ed448ec2e78c288c73).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147813165
  
      [Test build #1889 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1889/console) for   PR 9084 at commit [`face129`](https://github.com/apache/spark/commit/face129d898db2ebcea200ed448ec2e78c288c73).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9084#discussion_r41827295
  
    --- Diff: core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala ---
    @@ -0,0 +1,138 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.memory
    +
    +import scala.collection.mutable
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.storage.{BlockStatus, BlockId}
    +
    +
    +/**
    + * A [[MemoryManager]] that enforces a soft boundary between execution and storage such that
    + * either side can borrow memory from the other.
    + *
    + * The region shared between execution and storage is a fraction of the total heap space
    + * configurable through `spark.memory.fraction` (default 0.75). The position of the boundary
    + * within this space is further determined by `spark.memory.storageFraction` (default 0.5).
    + * This means the size of the storage region is 0.75 * 0.5 = 0.375 of the heap space by default.
    + *
    + * Storage can borrow as much execution memory as is free until execution reclaims its space.
    + * When this happens, cached blocks will be evicted from memory until sufficient borrowed
    + * memory is released to satisfy the execution memory request.
    + *
    + * Similarly, execution can borrow as much storage memory as is free. However, execution
    + * memory is *never* evicted by storage due to the complexities involved in implementing this.
    + * The implication is that attempts to cache blocks may fail if execution has already eaten
    + * up most of the storage space, in which case the new blocks will be evicted directly.
    + */
    +private[spark] class UnifiedMemoryManager(conf: SparkConf, maxMemory: Long) extends MemoryManager {
    +
    +  def this(conf: SparkConf) {
    +    this(conf, UnifiedMemoryManager.getMaxMemory(conf))
    +  }
    +
    +  /**
    +   * Size of the storage region, in bytes.
    +   *
    +   * This region is not statically reserved; execution can borrow from it if necessary.
    +   * Cached blocks can be evicted only if actual storage memory usage exceeds this region.
    +   */
    +  private val storageRegionSize: Long = {
    +    (maxMemory * conf.getDouble("spark.memory.storageFraction", 0.5)).toLong
    +  }
    +
    +  /**
    +   * Total amount of memory, in bytes, not currently occupied by either execution or storage.
    +   */
    +  private def totalFreeMemory: Long = synchronized {
    +    assert(_executionMemoryUsed <= maxMemory)
    +    assert(_storageMemoryUsed <= maxMemory)
    +    assert(_executionMemoryUsed + _storageMemoryUsed <= maxMemory)
    +    maxMemory - _executionMemoryUsed - _storageMemoryUsed
    +  }
    +
    +  /**
    +   * Total available memory for execution, in bytes.
    +   * In this model, this is equivalent to the amount of memory not occupied by storage.
    +   */
    +  override def maxExecutionMemory: Long = synchronized {
    +    maxMemory - _storageMemoryUsed
    +  }
    +
    +  /**
    +   * Total available memory for storage, in bytes.
    +   * In this model, this is equivalent to the amount of memory not occupied by execution.
    +   */
    +  override def maxStorageMemory: Long = synchronized {
    +    maxMemory - _executionMemoryUsed
    +  }
    +
    +  /**
    +   * Acquire N bytes of memory for execution, evicting cached blocks if necessary.
    +   *
    +   * This method evicts blocks only up to the amount of memory borrowed by storage.
    +   * Blocks evicted in the process, if any, are added to `evictedBlocks`.
    +   * @return number of bytes successfully granted (<= N).
    +   */
    +  override def acquireExecutionMemory(
    +      numBytes: Long,
    +      evictedBlocks: mutable.Buffer[(BlockId, BlockStatus)]): Long = synchronized {
    +    assert(numBytes >= 0)
    +    val memoryBorrowedByStorage = math.max(0, _storageMemoryUsed - storageRegionSize)
    +    // If there is not enough free memory AND storage has borrowed some execution memory,
    +    // then evict as much memory borrowed by storage as needed to grant this request
    +    val shouldEvictStorage = totalFreeMemory < numBytes && memoryBorrowedByStorage > 0
    +    if (shouldEvictStorage) {
    +      val spaceToEnsure = math.min(numBytes, memoryBorrowedByStorage)
    +      memoryStore.ensureFreeSpace(spaceToEnsure, evictedBlocks)
    --- End diff --
    
    Oh, duh. That's what I get for looking at the unified diff.
    
    Looks safe to me now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9084#discussion_r41833851
  
    --- Diff: core/src/main/scala/org/apache/spark/memory/MemoryManager.scala ---
    @@ -72,46 +92,62 @@ private[spark] abstract class MemoryManager {
       def acquireUnrollMemory(
    --- End diff --
    
    actually, it's more than a synonym. In `StaticMemoryManager` it's required to preserve existing behavior where unrolling doesn't evict all the blocks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147670790
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147670793
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43631/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147775909
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43646/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147565350
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9084#discussion_r41825250
  
    --- Diff: core/src/main/scala/org/apache/spark/memory/MemoryManager.scala ---
    @@ -40,19 +41,38 @@ private[spark] abstract class MemoryManager {
         _memoryStore
       }
     
    +  // Amount of execution/storage memory in use, accesses must be synchronized on `this`
    +  protected var _executionMemoryUsed: Long = 0
    +  protected var _storageMemoryUsed: Long = 0
    +
       /**
        * Set the [[MemoryStore]] used by this manager to evict cached blocks.
        * This must be set after construction due to initialization ordering constraints.
        */
    -  def setMemoryStore(store: MemoryStore): Unit = {
    +  final def setMemoryStore(store: MemoryStore): Unit = {
         _memoryStore = store
       }
     
       /**
    -   * Acquire N bytes of memory for execution.
    +   * Total available memory for execution, in bytes.
    +   */
    +  def maxExecutionMemory: Long
    +
    +  /**
    +   * Total available memory for storage, in bytes.
    +   */
    +  def maxStorageMemory: Long
    +
    +  // TODO: avoid passing evicted blocks around to simplify method signatures (SPARK-10985)
    --- End diff --
    
    Good call on deferring this to a followup task.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147597279
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147565590
  
      [Test build #43602 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43602/consoleFull) for   PR 9084 at commit [`01ff533`](https://github.com/apache/spark/commit/01ff5335b2b6554ed9388b015145b8dfd9bc1977).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9084#discussion_r41826446
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala ---
    @@ -37,15 +37,14 @@ private case class MemoryEntry(value: Any, size: Long, deserialized: Boolean)
     private[spark] class MemoryStore(blockManager: BlockManager, memoryManager: MemoryManager)
       extends BlockStore(blockManager) {
     
    +  // Note: all changes to memory allocations, notably putting blocks, evicting blocks, and
    +  // acquiring or releasing unroll memory, must be synchronized on `memoryManager`!
    +
       private val conf = blockManager.conf
       private val entries = new LinkedHashMap[BlockId, MemoryEntry](32, 0.75f, true)
    -  private val maxMemory = memoryManager.maxStorageMemory
    -
    -  // Ensure only one thread is putting, and if necessary, dropping blocks at any given time
    -  private val accountingLock = new Object
     
       // A mapping from taskAttemptId to amount of memory used for unrolling a block (in bytes)
    -  // All accesses of this map are assumed to have manually synchronized on `accountingLock`
    +  // All accesses of this map are assumed to have manually synchronized on `memoryManager`
    --- End diff --
    
    Does this imply that `unrollMemoryMap` should live inside of `memoryManager`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9084#discussion_r41826153
  
    --- Diff: core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala ---
    @@ -0,0 +1,138 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.memory
    +
    +import scala.collection.mutable
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.storage.{BlockStatus, BlockId}
    +
    +
    +/**
    + * A [[MemoryManager]] that enforces a soft boundary between execution and storage such that
    + * either side can borrow memory from the other.
    + *
    + * The region shared between execution and storage is a fraction of the total heap space
    + * configurable through `spark.memory.fraction` (default 0.75). The position of the boundary
    + * within this space is further determined by `spark.memory.storageFraction` (default 0.5).
    + * This means the size of the storage region is 0.75 * 0.5 = 0.375 of the heap space by default.
    + *
    + * Storage can borrow as much execution memory as is free until execution reclaims its space.
    + * When this happens, cached blocks will be evicted from memory until sufficient borrowed
    + * memory is released to satisfy the execution memory request.
    + *
    + * Similarly, execution can borrow as much storage memory as is free. However, execution
    + * memory is *never* evicted by storage due to the complexities involved in implementing this.
    + * The implication is that attempts to cache blocks may fail if execution has already eaten
    + * up most of the storage space, in which case the new blocks will be evicted directly.
    + */
    +private[spark] class UnifiedMemoryManager(conf: SparkConf, maxMemory: Long) extends MemoryManager {
    +
    +  def this(conf: SparkConf) {
    +    this(conf, UnifiedMemoryManager.getMaxMemory(conf))
    +  }
    +
    +  /**
    +   * Size of the storage region, in bytes.
    +   *
    +   * This region is not statically reserved; execution can borrow from it if necessary.
    +   * Cached blocks can be evicted only if actual storage memory usage exceeds this region.
    +   */
    +  private val storageRegionSize: Long = {
    +    (maxMemory * conf.getDouble("spark.memory.storageFraction", 0.5)).toLong
    +  }
    +
    +  /**
    +   * Total amount of memory, in bytes, not currently occupied by either execution or storage.
    +   */
    +  private def totalFreeMemory: Long = synchronized {
    +    assert(_executionMemoryUsed <= maxMemory)
    +    assert(_storageMemoryUsed <= maxMemory)
    +    assert(_executionMemoryUsed + _storageMemoryUsed <= maxMemory)
    +    maxMemory - _executionMemoryUsed - _storageMemoryUsed
    +  }
    +
    +  /**
    +   * Total available memory for execution, in bytes.
    +   * In this model, this is equivalent to the amount of memory not occupied by storage.
    +   */
    +  override def maxExecutionMemory: Long = synchronized {
    +    maxMemory - _storageMemoryUsed
    +  }
    +
    +  /**
    +   * Total available memory for storage, in bytes.
    +   * In this model, this is equivalent to the amount of memory not occupied by execution.
    +   */
    +  override def maxStorageMemory: Long = synchronized {
    +    maxMemory - _executionMemoryUsed
    +  }
    +
    +  /**
    +   * Acquire N bytes of memory for execution, evicting cached blocks if necessary.
    +   *
    +   * This method evicts blocks only up to the amount of memory borrowed by storage.
    +   * Blocks evicted in the process, if any, are added to `evictedBlocks`.
    +   * @return number of bytes successfully granted (<= N).
    +   */
    +  override def acquireExecutionMemory(
    +      numBytes: Long,
    +      evictedBlocks: mutable.Buffer[(BlockId, BlockStatus)]): Long = synchronized {
    +    assert(numBytes >= 0)
    +    val memoryBorrowedByStorage = math.max(0, _storageMemoryUsed - storageRegionSize)
    +    // If there is not enough free memory AND storage has borrowed some execution memory,
    +    // then evict as much memory borrowed by storage as needed to grant this request
    +    val shouldEvictStorage = totalFreeMemory < numBytes && memoryBorrowedByStorage > 0
    +    if (shouldEvictStorage) {
    +      val spaceToEnsure = math.min(numBytes, memoryBorrowedByStorage)
    +      memoryStore.ensureFreeSpace(spaceToEnsure, evictedBlocks)
    --- End diff --
    
    Also, `ensureFreeSpace` has a return value. Should we check it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147611987
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147769449
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147565337
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147573507
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9084#discussion_r41826327
  
    --- Diff: core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala ---
    @@ -0,0 +1,138 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.memory
    +
    +import scala.collection.mutable
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.storage.{BlockStatus, BlockId}
    +
    +
    +/**
    + * A [[MemoryManager]] that enforces a soft boundary between execution and storage such that
    + * either side can borrow memory from the other.
    + *
    + * The region shared between execution and storage is a fraction of the total heap space
    + * configurable through `spark.memory.fraction` (default 0.75). The position of the boundary
    + * within this space is further determined by `spark.memory.storageFraction` (default 0.5).
    + * This means the size of the storage region is 0.75 * 0.5 = 0.375 of the heap space by default.
    --- End diff --
    
    This might be a really cool place to put a ASCII-art diagram illustrating the breakdown (like a horizonal stacked barchart). I can help make one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147768809
  
      [Test build #1891 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1891/consoleFull) for   PR 9084 at commit [`face129`](https://github.com/apache/spark/commit/face129d898db2ebcea200ed448ec2e78c288c73).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147670601
  
      [Test build #43631 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43631/console) for   PR 9084 at commit [`face129`](https://github.com/apache/spark/commit/face129d898db2ebcea200ed448ec2e78c288c73).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147769408
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147588551
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147574037
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43606/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147817751
  
      [Test build #1891 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1891/console) for   PR 9084 at commit [`face129`](https://github.com/apache/spark/commit/face129d898db2ebcea200ed448ec2e78c288c73).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10983] Unified memory manager

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9084#issuecomment-147770364
  
      [Test build #43646 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43646/consoleFull) for   PR 9084 at commit [`face129`](https://github.com/apache/spark/commit/face129d898db2ebcea200ed448ec2e78c288c73).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org