You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/07/26 05:18:45 UTC

[GitHub] [spark] dilipbiswal opened a new pull request #25258: [SPARK-19712] Move subquery rewrite to beginning of optimizer

dilipbiswal opened a new pull request #25258: [SPARK-19712] Move subquery rewrite to beginning of optimizer
URL: https://github.com/apache/spark/pull/25258
 
 
   ## What changes were proposed in this pull request?
   Currently predicate subqueries (IN/EXISTS) are converted to Joins at the end of optimizer in RewritePredicateSubquery. This change moves the rewrite close to beginning of optimizer. The original idea was to keep the subquery expressions in Filter form so that we can push them down as deep as possible. One disadvantage is that, after the subqueries are rewritten in join form, they are not subjected to further optimizations. In this change, we convert the subqueries to join form early in the rewrite phase.  
   
   I will combine the pullupCorrelatedPredicates and RewritePredicateSubquery in a follow-up PR.
   
   ## How was this patch tested?
   A new test suite `LeftSemiAntiJoinAndSubqueryEquivalencySuite` is added to verify that the correlated subqueries and queries that explicitly use leftsemi/anti joins result in same plan after optmization.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org