You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/07/09 15:07:40 UTC
[GitHub] [spark] JoshRosen commented on issue #25084: [SPARK-28314][SQL] Use the same MemoryManager when building HashedRelation

JoshRosen commented on issue #25084: [SPARK-28314][SQL] Use the same MemoryManager when building HashedRelation
URL: https://github.com/apache/spark/pull/25084#issuecomment-509681776
 
 
   It looks like [SPARK-11309](https://issues.apache.org/jira/browse/SPARK-11309) is a ~4-year-old TODO to clean this up.
   
   Chasing through `git blame`, it looks like the `new TaskMemoryManager` was introduced in an even earlier patch: https://github.com/apache/spark/pull/7592#discussion_r35400866
   
   I think this is because broadcast variable memory is not exclusively owned by a single task: 
   
   - We only deserialize `HashedRelation` in broadcasts.
   - In TorrentBroadcast, we use locking to ensure that each executor JVM holds a single copy of the re-assembled broadcast.
   - When a task reads a broadcast variable, we take out a block manager read lock and release the read lock at the end of the task. This ensures that the block manager doesn't evict the reconstructed broadcast while it is still in use by a task (doing so would mess up the memory accounting: after eviction, the block manager would assume that the on-heap broadcast variable was garbage-collectible, but it can't be collected if it's still being used by a task).
   - All memory allocated by a TaskMemoryManager is automatically cleaned up at the end of a task when `Executor` calls `cleanUpAllAllocatedMemory()`. If we use the current task's TaskMemoryManager to allocate the pages for the reconstructed broadcast variable then those pages would be freed once the first task completed, causing use-after-free problems in the other tasks sharing the same broadcast variable.
   
      In the current code, we create a dummy `TaskMemoryManager` for each deserialized broadcast and never call cleanup (instead, letting the GC handle it for us). This is why the old code has a comment about using on-heap memory: if you allocate off-heap memory here then it'll never be freed back to the UnifiedMemoryManager, eventually causing an OOM.
   
   This illustrates some problems with our current memory abstractions: we don't have a good way to use the Tungsten managed buffers / memory blocks for storage memory. Given this, I think we may need a different short-term fix for the infinite recursion than this patch's current code. Maybe we could pass in a `depth` flag to detect when we're recursing and use that to prevent infinite recursion?
   
   (By the way, the wide scoping of the `catch OutOfMemoryError` feels dodgy to me: this should probably be narrowed down to only catch OOMs when allocating the MemoryBlock's backing `long[]` array, then wrap and re-throw as a `SparkOutOfMemoryError`)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org