You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mgaido91 <gi...@git.apache.org> on 2017/10/18 00:23:38 UTC

[GitHub] spark pull request #19523: [SPARK-22301][SQL] Add rule to Optimizer for In w...

GitHub user mgaido91 opened a pull request:

    https://github.com/apache/spark/pull/19523

    [SPARK-22301][SQL] Add rule to Optimizer for In with empty list of va…

    
    ## What changes were proposed in this pull request?
    
    For performance reason, we should resolve in operation on an empty list as false in the optimizations phase, ad discussed in #19522.
    
    ## How was this patch tested?
    Added UT
    
    cc @gatorsmile 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mgaido91/spark SPARK-22301

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19523.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19523
    
----
commit 9aa74dd65d76725415e4eaaf5452a90f62802a8d
Author: Marco Gaido <ma...@gmail.com>
Date:   2017-10-17T21:41:00Z

    [SPARK-22301][SQL] Add rule to Optimizer for In with empty list of values

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with emp...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19523
  
    **[Test build #82928 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82928/testReport)** for PR 19523 at commit [`50c7af3`](https://github.com/apache/spark/commit/50c7af3d4fb9a23ccf460a1842d7e57a26ca582c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with not...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19523
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19523: [SPARK-22301][SQL] Add rule to Optimizer for In w...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19523#discussion_r145892306
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -204,6 +204,7 @@ case class In(value: Expression, list: Seq[Expression]) extends Predicate {
     
       override def children: Seq[Expression] = value +: list
       lazy val inSetConvertible = list.forall(_.isInstanceOf[Literal])
    +  lazy val isListEmpty = list.isEmpty
    --- End diff --
    
    Call list.isEmpty is, in comparison, fast and constant time. It doesn't save much of anything to cache it, and the overhead of a lazy val


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with not...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19523
  
    Thanks! Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with emp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19523
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with emp...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19523
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with emp...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19523
  
    **[Test build #82928 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82928/testReport)** for PR 19523 at commit [`50c7af3`](https://github.com/apache/spark/commit/50c7af3d4fb9a23ccf460a1842d7e57a26ca582c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with emp...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19523
  
    **[Test build #82923 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82923/testReport)** for PR 19523 at commit [`50c7af3`](https://github.com/apache/spark/commit/50c7af3d4fb9a23ccf460a1842d7e57a26ca582c).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with not...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19523
  
    **[Test build #83011 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83011/testReport)** for PR 19523 at commit [`99df613`](https://github.com/apache/spark/commit/99df613b344868190de11499af50405b198706fa).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with not...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19523
  
    **[Test build #83011 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83011/testReport)** for PR 19523 at commit [`99df613`](https://github.com/apache/spark/commit/99df613b344868190de11499af50405b198706fa).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19523: [SPARK-22301][SQL] Add rule to Optimizer for In w...

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19523#discussion_r145307826
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala ---
    @@ -102,7 +102,8 @@ case class InMemoryTableScanExec(
         case IsNull(a: Attribute) => statsFor(a).nullCount > 0
         case IsNotNull(a: Attribute) => statsFor(a).count - statsFor(a).nullCount > 0
     
    -    case In(_: AttributeReference, list: Seq[Expression]) if list.isEmpty => Literal.FalseLiteral
    +    // We rely on the optimizations in org.apache.spark.sql.catalyst.optimizer.OptimizeIn
    +    // to be sure that the list cannot be empty
    --- End diff --
    
    IMHO this comment is not accurate, since in optimizer we only deal with the case attribute is not nullable.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with emp...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19523
  
    @mgaido91 Could you update the PR title?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with emp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19523
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82923/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19523: [SPARK-22301][SQL] Add rule to Optimizer for In w...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19523#discussion_r145340100
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -204,6 +204,7 @@ case class In(value: Expression, list: Seq[Expression]) extends Predicate {
     
       override def children: Seq[Expression] = value +: list
       lazy val inSetConvertible = list.forall(_.isInstanceOf[Literal])
    +  lazy val isListEmpty = list.isEmpty
    --- End diff --
    
    I am using it to be consistent with the current implementation (see the line above)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with not...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19523
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83011/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with emp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19523
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with emp...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19523
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with emp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19523
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19523: [SPARK-22301][SQL] Add rule to Optimizer for In w...

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19523#discussion_r145306880
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -204,6 +204,7 @@ case class In(value: Expression, list: Seq[Expression]) extends Predicate {
     
       override def children: Seq[Expression] = value +: list
       lazy val inSetConvertible = list.forall(_.isInstanceOf[Literal])
    +  lazy val isListEmpty = list.isEmpty
    --- End diff --
    
    Do we really need this val?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with emp...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19523
  
    **[Test build #82923 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82923/testReport)** for PR 19523 at commit [`50c7af3`](https://github.com/apache/spark/commit/50c7af3d4fb9a23ccf460a1842d7e57a26ca582c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19523: [SPARK-22301][SQL] Add rule to Optimizer for In w...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19523#discussion_r145318964
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala ---
    @@ -102,7 +102,8 @@ case class InMemoryTableScanExec(
         case IsNull(a: Attribute) => statsFor(a).nullCount > 0
         case IsNotNull(a: Attribute) => statsFor(a).count - statsFor(a).nullCount > 0
     
    -    case In(_: AttributeReference, list: Seq[Expression]) if list.isEmpty => Literal.FalseLiteral
    +    // We rely on the optimizations in org.apache.spark.sql.catalyst.optimizer.OptimizeIn
    +    // to be sure that the list cannot be empty
    --- End diff --
    
    We can remove this line after we merge this PR https://github.com/apache/spark/pull/19522


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with not...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19523
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with emp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19523
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82928/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19523: [SPARK-22301][SQL] Add rule to Optimizer for In w...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19523


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org