You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/09 07:23:28 UTC

[GitHub] [spark] JoshRosen commented on pull request #37413: [SPARK-39983][CORE][SQL] Do not cache unserialized broadcast relations on the driver

JoshRosen commented on PR #37413:
URL: https://github.com/apache/spark/pull/37413#issuecomment-1209009482

   > QQ @JoshRosen, @alex-balikov - if the expectation is that the variable can be recreated (if missing) at driver - with that being a remote possibility, do we want to make it a `WeakReference` instead of a `SoftReference` ? Softref's are kept around as much as possible by the jvm (usually until there are possibilities of OOM), unlike Weakref's
   
   Good question. It looks like PR https://github.com/apache/spark/pull/22995 originally proposed to make this into a `WeakReference` but ended up using a `SoftReference` in an attempt to reduce the likelihood that a broadcast variable would be cleared on GC. I'm slightly hesitant to want to change behaviors that could impact user created broadcast variables (although maybe I'm being overly-conservative).
   
   I don't think that changing `SoftReference` to `WeakReference` would negatively impact broadcast hash joins or other SQL operators that use internal broadcast variables: each of those operators retrieves the broadcast value and stores a strong reference to it for the duration of the task, so as long as one task is running the WeakReference here and in the cache added in https://github.com/apache/spark/pull/20183 should still be alive, so the cache will be effective in keeping the broadcast variable value from being duplicated.
   
   I agree that the `WeakReference` could make sense for these internal broadcast variables on the driver, though. It looks like `SoftReference` and `WeakReference` share a common superclass, so we can flag the reference type based on whether this is an internal or external broadcast variable. Here's a patch implementing that idea : https://github.com/JoshRosen/spark/commit/01fd8be179274d3fe298f691264e2e1bd37bca2e
   
   @alex-balikov, @mridulm, WDYT? If that WeakRef change makes sense to you, let's cherry-pick it into this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org