You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2017/08/20 16:18:08 UTC

[GitHub] spark pull request #18968: [SPARK-21759][SQL] In.checkInputDataTypes should ...

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18968#discussion_r134119888
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -138,46 +138,80 @@ case class Not(child: Expression)
     case class In(value: Expression, list: Seq[Expression]) extends Predicate {
     
       require(list != null, "list should not be null")
    +
    +  lazy val valExprs = value match {
    +    case cns: CreateNamedStruct => cns.valExprs
    +    case expr => Seq(expr)
    +  }
    +
    +  override lazy val resolved: Boolean = {
    +    lazy val checkForInSubquery = list match {
    +      case (l @ ListQuery(sub, children, _)) :: Nil =>
    +        // SPARK-21759:
    +        // It is possibly that the subquery plan has more output than value expressions, because
    +        // the condition expressions in `ListQuery` might use part of subquery plan's output.
    +        // For example, in the following query plan, the condition of `ListQuery` uses d#3.
    +        // from the subquery query. For now the size of output of subquery is 2(c#2, d#3), the
    +        // size of value is 1 (a#0).
    +        // Query:
    +        //   SELECT t1.a FROM t1
    +        //   WHERE
    +        //   t1.a IN (SELECT t2.c
    +        //           FROM t2
    +        //           WHERE t1.b < t2.d);
    +        // Query Plan:
    +        //   Project [a#0]
    +        //   +- Filter a#0 IN (list#4 [(b#1 < d#3)])
    +        //      :  +- Project [c#2, d#3]
    +        //      :     +- LocalRelation <empty>, [c#2, d#3]
    +        //      +- LocalRelation <empty>, [a#0, b#1]
    +        //
    +        // Notice that in analysis we should not face such problem. During analysis we only care
    +        // if the size of subquery plan match the size of value expression. `CheckAnalysis` makes
    +        // sure this by a particular check. However, optimization rules will possibly change the
    +        // analyzed plan and produce unresolved plan again. That's why we add this check here.
    +
    +        // Take the subset of output which are not going to match with value expressions and also
    +        // not used in condition expressions, if any.
    +        val subqueryOutputNotInCondition = sub.output.drop(valExprs.length).filter { attr =>
    --- End diff --
    
    Also, add a TODO here. This check needs an update after we combine the optimizer rules for subquery rewriting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org