You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/04/11 10:22:23 UTC

[GitHub] [spark] francis0407 opened a new pull request #24344: [SPARK-27440][SQL] Optimize uncorrelated predicate subquery

francis0407 opened a new pull request #24344: [SPARK-27440][SQL] Optimize uncorrelated predicate subquery
URL: https://github.com/apache/spark/pull/24344
 
 
   ## What changes were proposed in this pull request?
   
   This PR is trying to optimize uncorrelated predicate subqueries(InSubquery, Exists).
   Currently, we rewrite all the predicate subqueries(`InSubquery`, `Exists`) as semi-join/anti-join. But uncorrelated predicate subquery can be evaluated using a subplan instead of a join. We can firstly rewrite all the uncorrelated predicate subqueries as `Exists`, then optimize it and compute it using a subquery physical plan like ScalarSubquery. 
   
   This PR adds a new Optimize rule: RewriteUncorrelatedSubquery.
   This rule rewrites uncorrelated PredicateSubquery expressions as Exists(it can also be used for ANY/SOME/ALL). Besides, we can use `limit 1` and `select 1` after the subquery to reduce the result set. `InSubquery` can be rewritten as uncorrelated Exists only when the left side values are literals and the subquery has no outer reference. Here is an example,
   ```SQL
   3 in (select b from t) => exists(select 1 from t where b = 3 limit 1)
   ```
   
   Also, this PR adds a new class `Exists` which is the physical copy of Exists to be used inside SparkPlan.
   
   
   ## How was this patch tested?
   
   ut
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org