You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/05/09 03:09:50 UTC

[GitHub] [spark] nvander1 commented on issue #24563: [SPARK-27359] [OPTIMIZER] [SQL] Rewrite ArraysOverlap Join

nvander1 commented on issue #24563: [SPARK-27359] [OPTIMIZER] [SQL] Rewrite ArraysOverlap Join
URL: https://github.com/apache/spark/pull/24563#issuecomment-490726294
 
 
   @viirya 
   
   Oops, thanks for pointing out the missing title! :)
   
   I’ve only used this when the size of the arrays is several orders of magnitude less than the number of records on the largest side of the join. I don’t have any benchmarks to back this up yet (I’ll do some experiments and post the result here).
   
   An assumption is that the number of items in the largest array is several orders of magnitude less than the number of records on either side of the join. This feels similar to how the replication factor used to optimize skew joins is also small.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org