You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/04/11 22:25:47 UTC

[GitHub] [spark] amoghmargoor edited a comment on issue #15178: [SPARK-17556][SQL] Executor side broadcast for broadcast joins

amoghmargoor edited a comment on issue #15178: [SPARK-17556][SQL] Executor side broadcast for broadcast joins
URL: https://github.com/apache/spark/pull/15178#issuecomment-481952736
 
 
   @viirya Thanks for this diff. 
   We found one issue here, which I wanted to point out just in case somebody wanted to use this patch.
   There are references to broadcast.value in BroadcastHashJoinExec which gets executed on Driver. That might bring the RDD values being broadcasted to Driver's block manager too.  That happens due to code generation flow. To fix it, we took the shortcut and avoided using one hash join optimization in code gen for cases where keys in build side are unique. Not sure if we can come up with solution where we need not have to sacrifice upon that.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org