You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2018/04/25 04:00:21 UTC

[GitHub] spark pull request #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.ev...

GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/21147

    [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateInSet produces wrong stats for STRING

    ## What changes were proposed in this pull request?
    `colStat.min` AND `colStat.max` are empty for string type. Thus, `evaluateInSet` should not return zero when either `colStat.min` or `colStat.max`.
    
    ## How was this patch tested?
    Added a test case.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark cached

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21147.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21147
    
----
commit 9672f92dde505eada20d8102dcd845a5418d37c8
Author: gatorsmile <ga...@...>
Date:   2018-04-25T03:59:46Z

    fix

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateI...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21147
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateI...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21147
  
    **[Test build #89815 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89815/testReport)** for PR 21147 at commit [`9672f92`](https://github.com/apache/spark/commit/9672f92dde505eada20d8102dcd845a5418d37c8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateI...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21147
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateI...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21147
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89815/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateI...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21147
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89884/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateI...

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/21147
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateI...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21147
  
    cc @cloud-fan @wzhfy 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateI...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21147
  
    **[Test build #89884 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89884/testReport)** for PR 21147 at commit [`9672f92`](https://github.com/apache/spark/commit/9672f92dde505eada20d8102dcd845a5418d37c8).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateI...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21147
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateI...

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/21147
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateI...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21147
  
    **[Test build #89884 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89884/testReport)** for PR 21147 at commit [`9672f92`](https://github.com/apache/spark/commit/9672f92dde505eada20d8102dcd845a5418d37c8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.ev...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21147


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateI...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21147
  
    somehow I thought it has passed tests and I has merged it to master... Anyway this is a pretty safe change and I don't think it will break any tests. Let's see the test result later.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.ev...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21147#discussion_r183940087
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala ---
    @@ -392,13 +392,13 @@ case class FilterEstimation(plan: Filter) extends Logging {
         val dataType = attr.dataType
         var newNdv = ndv
     
    -    if (ndv.toDouble == 0 || colStat.min.isEmpty || colStat.max.isEmpty)  {
    -      return Some(0.0)
    -    }
    -
         // use [min, max] to filter the original hSet
         dataType match {
           case _: NumericType | BooleanType | DateType | TimestampType =>
    +        if (ndv.toDouble == 0 || colStat.min.isEmpty || colStat.max.isEmpty)  {
    --- End diff --
    
    I think we always have max/min for integral type? cc @wzhfy 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateI...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21147
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateI...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21147
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2692/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateI...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21147
  
    **[Test build #89815 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89815/testReport)** for PR 21147 at commit [`9672f92`](https://github.com/apache/spark/commit/9672f92dde505eada20d8102dcd845a5418d37c8).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateI...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21147
  
    The failed `HiveClientSuite` is known to be flaky and should not be related to this PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.ev...

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21147#discussion_r184308017
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala ---
    @@ -392,13 +392,13 @@ case class FilterEstimation(plan: Filter) extends Logging {
         val dataType = attr.dataType
         var newNdv = ndv
     
    -    if (ndv.toDouble == 0 || colStat.min.isEmpty || colStat.max.isEmpty)  {
    -      return Some(0.0)
    -    }
    -
         // use [min, max] to filter the original hSet
         dataType match {
           case _: NumericType | BooleanType | DateType | TimestampType =>
    +        if (ndv.toDouble == 0 || colStat.min.isEmpty || colStat.max.isEmpty)  {
    --- End diff --
    
    min/max could be None when the table is empty


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.ev...

Posted by mshtelma <gi...@git.apache.org>.
Github user mshtelma commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21147#discussion_r184376159
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala ---
    @@ -392,13 +392,13 @@ case class FilterEstimation(plan: Filter) extends Logging {
         val dataType = attr.dataType
         var newNdv = ndv
     
    -    if (ndv.toDouble == 0 || colStat.min.isEmpty || colStat.max.isEmpty)  {
    -      return Some(0.0)
    -    }
    -
         // use [min, max] to filter the original hSet
         dataType match {
           case _: NumericType | BooleanType | DateType | TimestampType =>
    +        if (ndv.toDouble == 0 || colStat.min.isEmpty || colStat.max.isEmpty)  {
    --- End diff --
    
    min/max can be None if the  column contains only null values. This is exactly the case for my query. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateI...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21147
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2649/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org