You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by hvanhovell <gi...@git.apache.org> on 2016/04/18 23:22:05 UTC

[GitHub] spark pull request: [SPARK-4226][SQL] Support IN/EXISTS Subqueries

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12306#discussion_r60136567
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -1447,3 +1450,133 @@ object EmbedSerializerInFilter extends Rule[LogicalPlan] {
           }
       }
     }
    +
    +/**
    + * This rule rewrites predicate sub-queries into left semi/anti joins. The following predicates
    + * are supported:
    + * a. EXISTS/NOT EXISTS will be rewritten as semi/anti join, unresolved conditions in Filter
    + *    will be pulled out as the join conditions.
    + * b. IN/NOT IN will be rewritten as semi/anti join, unresolved conditions in the Filter will
    + *    be pulled out as join conditions, value = selected column will also be used as join
    + *    condition.
    + */
    +object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
    +  /**
    +   * Pull out all correlated predicates from a given sub-query. This method removes the correlated
    +   * predicates from sub-query [[Filter]]s and adds the references of these predicates to
    +   * all intermediate [[Project]] clauses (if they are missing) in order to be able to evaluate the
    +   * predicates in the join condition.
    +   *
    +   * This method returns the rewritten sub-query and the combined (AND) extracted predicate.
    +   */
    +  private def pullOutCorrelatedPredicates(
    +      subquery: LogicalPlan,
    +      query: LogicalPlan): (LogicalPlan, Option[Expression]) = {
    +    val references: Set[Expression] = query.output.toSet
    +    val predicateMap = mutable.Map.empty[LogicalPlan, Seq[Expression]]
    +    val transformed = subquery transformUp {
    --- End diff --
    
    I'll adress this in a follow-up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org