You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by nongli <gi...@git.apache.org> on 2016/03/17 21:01:40 UTC

[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

GitHub user nongli opened a pull request:

    https://github.com/apache/spark/pull/11792

    [SPARK-13981][SQL] Defer evaluating variables within Filter operator.

    ## What changes were proposed in this pull request?
    
    This improves the Filter codegen to defer loading variables within the generated
    filter code. This is useful since the filter logic naturally short circuits and
    the deferred logic doesn't need to be run.
    
    ## How was this patch tested?
    
    On tpcds q55, this fixes the regression from introducing the IsNotNull predicates.
    
    ```
    TPCDS Snappy:                       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)
    --------------------------------------------------------------------------------
    q55  (master)                            5562 / 5822         20.7          48.3
    q55  (this patch)                        5079 / 5190         22.7          44.1
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nongli/spark spark-13981

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11792.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11792
    
----
commit 29e408d61f3557b1c5df343d039ef74d1c6e9ab3
Author: Nong Li <no...@databricks.com>
Date:   2016-03-17T19:55:02Z

    [SPARK-13981][SQL] Defer evaluating variables within Filter operator.
    
    This improves the Filter codegen to defer loading variables within the generated
    filter code. This is useful since the filter logic naturally short circuits and
    the deferred logic doesn't need to be run.
    
    On tpcds q55, this fixes the regression from introducing the IsNotNull predicates.
    
    TPCDS Snappy:                       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)
    --------------------------------------------------------------------------------
    q55  (master)                            5562 / 5822         20.7          48.3
    q55  (this patch)                        5079 / 5190         22.7          44.1

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11792#discussion_r57064001
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala ---
    @@ -110,23 +114,28 @@ case class Filter(condition: Expression, child: SparkPlan)
       override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: String): String = {
         val numOutput = metricTerm(ctx, "numOutputRows")
     
    -    // filter out the nulls
    -    val filterOutNull = notNullAttributes.map { a =>
    -      val idx = child.output.indexOf(a)
    -      s"if (${input(idx).isNull}) continue;"
    -    }.mkString("\n")
    +    val conjuncts = splitConjunctivePredicates(condition)
     
    -    ctx.currentVars = input
    -    val predicates = otherPreds.map { e =>
    -      val bound = ExpressionCanonicalizer.execute(
    -        BindReferences.bindReference(e, output))
    -      val ev = bound.gen(ctx)
    +    // Generate the code to evaluate each filter.
    +    val generated = conjuncts.map { c =>
    --- End diff --
    
    That defeats the point of this. Then we're referencing all the attributes first before the predicates.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11792#discussion_r57605732
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala ---
    @@ -110,39 +114,81 @@ case class Filter(condition: Expression, child: SparkPlan)
       override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: String): String = {
         val numOutput = metricTerm(ctx, "numOutputRows")
     
    -    // filter out the nulls
    -    val filterOutNull = notNullAttributes.map { a =>
    -      val idx = child.output.indexOf(a)
    -      s"if (${input(idx).isNull}) continue;"
    -    }.mkString("\n")
    +    /**
    +     * Generates code for `c`, using `in` for input attributes and `attrs` for nullability.
    +     */
    +    def genPredicate(c: Expression, in: Seq[ExprCode], attrs: Seq[Attribute]): String = {
    +      val bound = BindReferences.bindReference(c, attrs)
    +      val evaluated = evaluateRequiredVariables(child.output, in, c.references)
     
    -    ctx.currentVars = input
    -    val predicates = otherPreds.map { e =>
    -      val bound = ExpressionCanonicalizer.execute(
    -        BindReferences.bindReference(e, output))
    -      val ev = bound.gen(ctx)
    +      // Generate the code for the predicate.
    +      val ev = ExpressionCanonicalizer.execute(bound).gen(ctx)
           val nullCheck = if (bound.nullable) {
             s"${ev.isNull} || "
           } else {
             s""
           }
    +
           s"""
    +         |$evaluated
              |${ev.code}
              |if (${nullCheck}!${ev.value}) continue;
            """.stripMargin
    +    }
    +
    +    ctx.currentVars = input
    +
    +    // To generate the predicates we will follow this algorithm.
    +    // For each predicate that is not IsNotNull, we will generate them one by one loading attributes
    +    // as necessary. For each of both attributes, if there is a IsNotNull predicate we will generate
    +    // that check *before* the predicate. After all of these predicates, we will generate the
    +    // remaining IsNotNull checks that were not part of other predicates.
    +    // This has the property of not doing redundant IsNotNull checks and taking better advantage of
    +    // short-circuiting, not loading attributes until they are needed.
    +    // This is very perf sensitive.
    +    // TODO: revisit this. We can consider reodering predicates as well.
    +    val generatedIsNotNullChecks = new Array[Boolean](notNullPreds.length)
    +    val generated = otherPreds.map { c =>
    +      val nullChecks = c.references.map { r =>
    +        val idx = notNullPreds.indexWhere { n => n.asInstanceOf[IsNotNull].child.semanticEquals(r)}
    +        if (idx != -1 && !generatedIsNotNullChecks(idx)) {
    +          // Use the child's output. The nullability is what the child produced.
    +          val code = genPredicate(notNullPreds(idx), input, child.output)
    +          generatedIsNotNullChecks(idx) = true
    +          code
    +        } else {
    +          ""
    +        }
    +      }.mkString("\n").trim
    +
    +      // Here we use *this* operator's output with this output's nullability since we already
    +      // enforced them with the IsNotNull checks above.
    +      s"""
    +         |$nullChecks
    +         |${genPredicate(c, input, output)}
    +       """.stripMargin.trim
    +    }.mkString("\n")
    +
    +    val nullChecks = notNullPreds.zipWithIndex.map { case (c, idx) =>
    +      if (!generatedIsNotNullChecks(idx)) {
    +        genPredicate(c, input, child.output)
    +      } else {
    +        ""
    +      }
         }.mkString("\n")
     
         // Reset the isNull to false for the not-null columns, then the followed operators could
    -    // generate better code (remove dead branches).
    +    // generate better code (remove dead branches).                                              O
    --- End diff --
    
    ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-202566071
  
    **[Test build #54342 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54342/consoleFull)** for PR 11792 at commit [`1a77012`](https://github.com/apache/spark/commit/1a770129f7f19bbace8548f069fb5bd1ca2a322c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-202509546
  
    LGTM, just two minor comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-202567799
  
    **[Test build #2702 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2702/consoleFull)** for PR 11792 at commit [`1a77012`](https://github.com/apache/spark/commit/1a770129f7f19bbace8548f069fb5bd1ca2a322c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-199537232
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53715/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11792#discussion_r56590282
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala ---
    @@ -110,23 +114,34 @@ case class Filter(condition: Expression, child: SparkPlan)
       override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: String): String = {
         val numOutput = metricTerm(ctx, "numOutputRows")
     
    -    // filter out the nulls
    -    val filterOutNull = notNullAttributes.map { a =>
    -      val idx = child.output.indexOf(a)
    -      s"if (${input(idx).isNull}) continue;"
    -    }.mkString("\n")
    +    val conjuncts = splitConjunctivePredicates(condition)
    +    val evaluatedVars = new Array[ExprCode](input.length)
     
    -    ctx.currentVars = input
    -    val predicates = otherPreds.map { e =>
    -      val bound = ExpressionCanonicalizer.execute(
    -        BindReferences.bindReference(e, output))
    -      val ev = bound.gen(ctx)
    +    // Generate the code to evaluate each filter.
    +    val generated = conjuncts.map { c =>
    +      // Collect the referenced attributes and make sure they are evaluated.
    +      val bound = BindReferences.bindReference(c, output)
    +      val refs = bound.collect {
    +        case e: BoundReference => e
    +      }
    +
    +      val evaluated = refs.map { r =>
    --- End diff --
    
    I think this could be as easy as 
    ```
    val evaluated = evaluateRequiredVariable(child.output, c.references, input)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-202566479
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-201058647
  
    **[Test build #54107 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54107/consoleFull)** for PR 11792 at commit [`5d71fc5`](https://github.com/apache/spark/commit/5d71fc57a9d00289b4a09d78724ea5cf5409a6b2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-202531604
  
    **[Test build #54342 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54342/consoleFull)** for PR 11792 at commit [`1a77012`](https://github.com/apache/spark/commit/1a770129f7f19bbace8548f069fb5bd1ca2a322c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-201058988
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54107/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-201601013
  
    **[Test build #54228 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54228/consoleFull)** for PR 11792 at commit [`8f5dfc9`](https://github.com/apache/spark/commit/8f5dfc939f71ff1399d705d20741bb0a12209fe0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-199537228
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-201058978
  
    **[Test build #54107 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54107/consoleFull)** for PR 11792 at commit [`5d71fc5`](https://github.com/apache/spark/commit/5d71fc57a9d00289b4a09d78724ea5cf5409a6b2).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class ExprCode(var _code: String, var isNull: String, var value: String,`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11792#discussion_r57065601
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala ---
    @@ -110,23 +114,28 @@ case class Filter(condition: Expression, child: SparkPlan)
       override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: String): String = {
         val numOutput = metricTerm(ctx, "numOutputRows")
     
    -    // filter out the nulls
    -    val filterOutNull = notNullAttributes.map { a =>
    -      val idx = child.output.indexOf(a)
    -      s"if (${input(idx).isNull}) continue;"
    -    }.mkString("\n")
    +    val conjuncts = splitConjunctivePredicates(condition)
     
    -    ctx.currentVars = input
    -    val predicates = otherPreds.map { e =>
    -      val bound = ExpressionCanonicalizer.execute(
    -        BindReferences.bindReference(e, output))
    -      val ev = bound.gen(ctx)
    +    // Generate the code to evaluate each filter.
    +    val generated = conjuncts.map { c =>
    --- End diff --
    
    I can not find where this assumption (IsNotNull come before) is guarateed, if not, we could generate wrong results.
    
    Also, the conjuncts is in the sequence like this: 
    
     IsNotNull(A), IsNotNull(B), A > xx, B > xxx
    
    We should re-order them explicitly, make sue they are in this order:
    
        IsNotNull(A), A > xx, IsNotNull(B), B > xxx



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-198090947
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53463/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-202660290
  
    **[Test build #2706 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2706/consoleFull)** for PR 11792 at commit [`1a77012`](https://github.com/apache/spark/commit/1a770129f7f19bbace8548f069fb5bd1ca2a322c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-202624401
  
    **[Test build #2706 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2706/consoleFull)** for PR 11792 at commit [`1a77012`](https://github.com/apache/spark/commit/1a770129f7f19bbace8548f069fb5bd1ca2a322c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-198059282
  
    **[Test build #53463 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53463/consoleFull)** for PR 11792 at commit [`29e408d`](https://github.com/apache/spark/commit/29e408d61f3557b1c5df343d039ef74d1c6e9ab3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11792#discussion_r57605200
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala ---
    @@ -110,39 +114,81 @@ case class Filter(condition: Expression, child: SparkPlan)
       override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: String): String = {
         val numOutput = metricTerm(ctx, "numOutputRows")
     
    -    // filter out the nulls
    -    val filterOutNull = notNullAttributes.map { a =>
    -      val idx = child.output.indexOf(a)
    -      s"if (${input(idx).isNull}) continue;"
    -    }.mkString("\n")
    +    /**
    +     * Generates code for `c`, using `in` for input attributes and `attrs` for nullability.
    +     */
    +    def genPredicate(c: Expression, in: Seq[ExprCode], attrs: Seq[Attribute]): String = {
    +      val bound = BindReferences.bindReference(c, attrs)
    +      val evaluated = evaluateRequiredVariables(child.output, in, c.references)
     
    -    ctx.currentVars = input
    -    val predicates = otherPreds.map { e =>
    -      val bound = ExpressionCanonicalizer.execute(
    -        BindReferences.bindReference(e, output))
    -      val ev = bound.gen(ctx)
    +      // Generate the code for the predicate.
    +      val ev = ExpressionCanonicalizer.execute(bound).gen(ctx)
           val nullCheck = if (bound.nullable) {
             s"${ev.isNull} || "
           } else {
             s""
           }
    +
           s"""
    +         |$evaluated
              |${ev.code}
              |if (${nullCheck}!${ev.value}) continue;
            """.stripMargin
    +    }
    +
    +    ctx.currentVars = input
    +
    +    // To generate the predicates we will follow this algorithm.
    +    // For each predicate that is not IsNotNull, we will generate them one by one loading attributes
    +    // as necessary. For each of both attributes, if there is a IsNotNull predicate we will generate
    +    // that check *before* the predicate. After all of these predicates, we will generate the
    +    // remaining IsNotNull checks that were not part of other predicates.
    +    // This has the property of not doing redundant IsNotNull checks and taking better advantage of
    +    // short-circuiting, not loading attributes until they are needed.
    +    // This is very perf sensitive.
    +    // TODO: revisit this. We can consider reodering predicates as well.
    +    val generatedIsNotNullChecks = new Array[Boolean](notNullPreds.length)
    +    val generated = otherPreds.map { c =>
    +      val nullChecks = c.references.map { r =>
    +        val idx = notNullPreds.indexWhere { n => n.asInstanceOf[IsNotNull].child.semanticEquals(r)}
    +        if (idx != -1 && !generatedIsNotNullChecks(idx)) {
    +          // Use the child's output. The nullability is what the child produced.
    +          val code = genPredicate(notNullPreds(idx), input, child.output)
    --- End diff --
    
    ```
    generatedIsNotNullChecks(idx) = true
    genPredicate(notNullPreds(idx), input, child.output)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-202691066
  
    Thanks - merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-201059224
  
    @davies I did this a different way. Let me know your thoughts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11792#discussion_r56590311
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala ---
    @@ -110,23 +114,34 @@ case class Filter(condition: Expression, child: SparkPlan)
       override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: String): String = {
         val numOutput = metricTerm(ctx, "numOutputRows")
     
    -    // filter out the nulls
    -    val filterOutNull = notNullAttributes.map { a =>
    -      val idx = child.output.indexOf(a)
    -      s"if (${input(idx).isNull}) continue;"
    -    }.mkString("\n")
    +    val conjuncts = splitConjunctivePredicates(condition)
    +    val evaluatedVars = new Array[ExprCode](input.length)
     
    -    ctx.currentVars = input
    -    val predicates = otherPreds.map { e =>
    -      val bound = ExpressionCanonicalizer.execute(
    -        BindReferences.bindReference(e, output))
    -      val ev = bound.gen(ctx)
    +    // Generate the code to evaluate each filter.
    +    val generated = conjuncts.map { c =>
    +      // Collect the referenced attributes and make sure they are evaluated.
    +      val bound = BindReferences.bindReference(c, output)
    +      val refs = bound.collect {
    +        case e: BoundReference => e
    +      }
    +
    +      val evaluated = refs.map { r =>
    +        evaluatedVars(r.ordinal) = input(r.ordinal)
    +        evaluateVariable(input, r.ordinal)
    +      }.mkString("\n")
    +
    +
    +      // Generate the code for the predicate.
    +      ctx.currentVars = evaluatedVars
    --- End diff --
    
    ctx.currentVars  = input


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-198090942
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11792#discussion_r56916217
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala ---
    @@ -110,23 +114,28 @@ case class Filter(condition: Expression, child: SparkPlan)
       override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: String): String = {
         val numOutput = metricTerm(ctx, "numOutputRows")
     
    -    // filter out the nulls
    -    val filterOutNull = notNullAttributes.map { a =>
    -      val idx = child.output.indexOf(a)
    -      s"if (${input(idx).isNull}) continue;"
    -    }.mkString("\n")
    +    val conjuncts = splitConjunctivePredicates(condition)
     
    -    ctx.currentVars = input
    -    val predicates = otherPreds.map { e =>
    -      val bound = ExpressionCanonicalizer.execute(
    -        BindReferences.bindReference(e, output))
    -      val ev = bound.gen(ctx)
    +    // Generate the code to evaluate each filter.
    +    val generated = conjuncts.map { c =>
    --- End diff --
    
    It's better to use `notNullAttributes  ++ otherPreds` to make sure that `IsNotNull` always come first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-201551994
  
    **[Test build #54228 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54228/consoleFull)** for PR 11792 at commit [`8f5dfc9`](https://github.com/apache/spark/commit/8f5dfc939f71ff1399d705d20741bb0a12209fe0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-198060051
  
    Generated code before:
    ```
          InternalRow scan_row = scan_batch.getRow(scan_batchIdx++);
          /* input[0, int] */
          boolean scan_isNull = scan_row.isNullAt(0);
          int scan_value = scan_isNull ? -1 : (scan_row.getInt(0));
          /* input[1, int] */
          boolean scan_isNull1 = scan_row.isNullAt(1);
          int scan_value1 = scan_isNull1 ? -1 : (scan_row.getInt(1));
          
          if (scan_isNull) continue;
          if (scan_isNull1) continue;
          
          /* (input[0, int] >= 2452245) */
          boolean filter_value = false;
          filter_value = scan_value >= 2452245;
          if (!filter_value) continue;
          
          /* (input[0, int] <= 2452275) */
          boolean filter_value3 = false;
          filter_value3 = scan_value <= 2452275;
          if (!filter_value3) continue;
          
          filter_metricValue.add(1);
    ```
    
    with this patch
    ```
          InternalRow scan_row = scan_batch.getRow(scan_batchIdx++);
        
        /* input[0, int] */
        boolean scan_isNull = scan_row.isNullAt(0);
        int scan_value = scan_isNull ? -1 : (scan_row.getInt(0));
        
        if (!(!(scan_isNull))) continue;
        
        /* (input[0, int] >= 2452245) */
        boolean filter_value2 = false;
        filter_value2 = scan_value >= 2452245;
        if (!filter_value2) continue;
        
        /* (input[0, int] <= 2452275) */
        boolean filter_value5 = false;
        filter_value5 = scan_value <= 2452275;
        if (!filter_value5) continue;
        
        /* input[1, int] */
        boolean scan_isNull1 = scan_row.isNullAt(1);
        int scan_value1 = scan_isNull1 ? -1 : (scan_row.getInt(1));
        
        if (!(!(scan_isNull1))) continue;
        
        filter_metricValue.add(1);
    
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-202593045
  
    **[Test build #2702 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2702/consoleFull)** for PR 11792 at commit [`1a77012`](https://github.com/apache/spark/commit/1a770129f7f19bbace8548f069fb5bd1ca2a322c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11792#discussion_r56592392
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala ---
    @@ -110,23 +114,34 @@ case class Filter(condition: Expression, child: SparkPlan)
       override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: String): String = {
         val numOutput = metricTerm(ctx, "numOutputRows")
     
    -    // filter out the nulls
    -    val filterOutNull = notNullAttributes.map { a =>
    -      val idx = child.output.indexOf(a)
    -      s"if (${input(idx).isNull}) continue;"
    -    }.mkString("\n")
    +    val conjuncts = splitConjunctivePredicates(condition)
    +    val evaluatedVars = new Array[ExprCode](input.length)
     
    -    ctx.currentVars = input
    -    val predicates = otherPreds.map { e =>
    -      val bound = ExpressionCanonicalizer.execute(
    -        BindReferences.bindReference(e, output))
    -      val ev = bound.gen(ctx)
    +    // Generate the code to evaluate each filter.
    +    val generated = conjuncts.map { c =>
    +      // Collect the referenced attributes and make sure they are evaluated.
    +      val bound = BindReferences.bindReference(c, output)
    --- End diff --
    
    nwm, it's nice trick to use `output` here, since we already generate the code for BoundReference, the nullablity of it does not matter, but help to simplify the code of expressions.
    
    Could you add a comment for that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-201601820
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/11792


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-202566481
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54342/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-198090295
  
    **[Test build #53463 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53463/consoleFull)** for PR 11792 at commit [`29e408d`](https://github.com/apache/spark/commit/29e408d61f3557b1c5df343d039ef74d1c6e9ab3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-199536766
  
    **[Test build #53715 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53715/consoleFull)** for PR 11792 at commit [`dd35e2e`](https://github.com/apache/spark/commit/dd35e2e38a9ffc53300269c2cc618272d3900ed8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-199506893
  
    **[Test build #53715 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53715/consoleFull)** for PR 11792 at commit [`dd35e2e`](https://github.com/apache/spark/commit/dd35e2e38a9ffc53300269c2cc618272d3900ed8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-201601826
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54228/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11792#discussion_r56590134
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala ---
    @@ -110,23 +114,34 @@ case class Filter(condition: Expression, child: SparkPlan)
       override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: String): String = {
         val numOutput = metricTerm(ctx, "numOutputRows")
     
    -    // filter out the nulls
    -    val filterOutNull = notNullAttributes.map { a =>
    -      val idx = child.output.indexOf(a)
    -      s"if (${input(idx).isNull}) continue;"
    -    }.mkString("\n")
    +    val conjuncts = splitConjunctivePredicates(condition)
    +    val evaluatedVars = new Array[ExprCode](input.length)
     
    -    ctx.currentVars = input
    -    val predicates = otherPreds.map { e =>
    -      val bound = ExpressionCanonicalizer.execute(
    -        BindReferences.bindReference(e, output))
    -      val ev = bound.gen(ctx)
    +    // Generate the code to evaluate each filter.
    +    val generated = conjuncts.map { c =>
    +      // Collect the referenced attributes and make sure they are evaluated.
    +      val bound = BindReferences.bindReference(c, output)
    --- End diff --
    
    output will have different nullability than child.output, we should use child.output here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11792#issuecomment-201058982
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org