You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by mshtelma <gi...@git.apache.org> on 2018/04/12 12:07:38 UTC

[GitHub] spark pull request #21052: [SPARK-23799] FilterEstimation.evaluateInSet prod...

GitHub user mshtelma opened a pull request:

    https://github.com/apache/spark/pull/21052

    [SPARK-23799] FilterEstimation.evaluateInSet produces devision by zero in a case of empty table with analyzed statistics

    During evaluation of IN conditions, if the source data frame, is represented by a plan, that uses hive table with columns, which were previously analyzed, and the plan has conditions for these fields, that cannot be satisfied (which leads us to an empty data frame), FilterEstimation.evaluateInSet method produces NumberFormatException and ClassCastException. 
    This PR fixes both bugs and introduces tests for them. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mshtelma/spark filter_estimation_evaluateInSet_Bugs

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21052.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21052
    
----
commit 297395effef8df279f11545d803b051cc2234c6e
Author: Mykhailo Shtelma <my...@...>
Date:   2018-03-26T13:09:39Z

    During evaluation of IN conditions, if the source table is empty, division by zero can occur. In order to fix this, check was added.

commit d634ddaec88d0511334ec6c021255094f697b31d
Author: Mykhailo Shtelma <my...@...>
Date:   2018-04-03T15:44:33Z

    Added test case for the the following situation: During evaluation of IN conditions, if the source table is empty, division by zero can occur. In order to fix this, check was added.

commit 74b6ebdc2cd8a91944cc6159946f560ba7212a6a
Author: Mykhailo Shtelma <my...@...>
Date:   2018-04-12T12:00:55Z

    If an empty dataframe (because of some conditions in parent query, which were not satisfied) is queried and CBO is turned on, wrong statistics is used, which leads to ClassCastException in FilterEstimation.evaluateInSet

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    @mshtelma Could you submit a backport PR to Spark 2.3?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    **[Test build #89596 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89596/testReport)** for PR 21052 at commit [`0faa789`](https://github.com/apache/spark/commit/0faa789a2e040c90c8add1ba93bd8618b1988d8a).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21052#discussion_r183219803
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
    @@ -382,4 +382,32 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared
           }
         }
       }
    +
    +  test("Simple queries must be working, if CBO is turned on") {
    +    withSQLConf(SQLConf.CBO_ENABLED.key -> "true") {
    +      withTable("TBL1", "TBL") {
    +        import org.apache.spark.sql.functions._
    +        val df = spark.range(1000L).select('id,
    +          'id * 2 as "FLD1",
    +          'id * 12 as "FLD2",
    +          lit("aaa") + 'id as "fld3")
    +        df.write
    +          .mode(SaveMode.Overwrite)
    +          .bucketBy(10, "id", "FLD1", "FLD2")
    +          .sortBy("id", "FLD1", "FLD2")
    +          .saveAsTable("TBL")
    +        sql("ANALYZE TABLE TBL COMPUTE STATISTICS ")
    +        sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, FLD1, FLD2, FLD3")
    +        val df2 = spark.sql(
    +          """
    +             SELECT t1.id, t1.fld1, t1.fld2, t1.fld3
    +             FROM tbl t1
    +             JOIN tbl t2 on t1.id=t2.id
    +             WHERE  t1.fld3 IN (-123.23,321.23)
    +          """.stripMargin)
    +        df2.createTempView("TBL2")
    +        sql("SELECT * FROM tbl2 WHERE fld3 IN ('qqq', 'qwe')  ").explain()
    --- End diff --
    
    Please do not use `explain()`. It will output the strings to the console. You can just do this:
    ```
    sql("SELECT * FROM tbl2 WHERE fld3 IN ('qqq', 'qwe')").queryExecution.executedPlan
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

Posted by mshtelma <gi...@git.apache.org>.

Github user mshtelma commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21052#discussion_r183206908
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
    @@ -382,4 +382,34 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared
           }
         }
       }
    +
    +  test("Simple queries must be working, if CBO is turned on") {
    +    withSQLConf(("spark.sql.cbo.enabled", "true")) {
    +      withTable("TBL1", "TBL") {
    +        import org.apache.spark.sql.functions._
    +        val df = spark.range(1000L).select('id,
    +          'id * 2 as "FLD1",
    +          'id * 12 as "FLD2",
    +          lit("aaa") + 'id as "fld3")
    +        df.write
    +          .mode(SaveMode.Overwrite)
    +          .bucketBy(10, "id", "FLD1", "FLD2")
    +          .sortBy("id", "FLD1", "FLD2")
    +          .saveAsTable("TBL")
    +        spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS ")
    --- End diff --
    
    done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

Posted by mshtelma <gi...@git.apache.org>.

Github user mshtelma commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21052#discussion_r183206919
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
    @@ -382,4 +382,34 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared
           }
         }
       }
    +
    +  test("Simple queries must be working, if CBO is turned on") {
    +    withSQLConf(("spark.sql.cbo.enabled", "true")) {
    --- End diff --
    
    done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21052#discussion_r183219812
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
    @@ -382,4 +382,32 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared
           }
         }
       }
    +
    +  test("Simple queries must be working, if CBO is turned on") {
    +    withSQLConf(SQLConf.CBO_ENABLED.key -> "true") {
    +      withTable("TBL1", "TBL") {
    +        import org.apache.spark.sql.functions._
    +        val df = spark.range(1000L).select('id,
    +          'id * 2 as "FLD1",
    +          'id * 12 as "FLD2",
    +          lit("aaa") + 'id as "fld3")
    +        df.write
    +          .mode(SaveMode.Overwrite)
    +          .bucketBy(10, "id", "FLD1", "FLD2")
    +          .sortBy("id", "FLD1", "FLD2")
    +          .saveAsTable("TBL")
    +        sql("ANALYZE TABLE TBL COMPUTE STATISTICS ")
    +        sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, FLD1, FLD2, FLD3")
    +        val df2 = spark.sql(
    +          """
    +             SELECT t1.id, t1.fld1, t1.fld2, t1.fld3
    +             FROM tbl t1
    +             JOIN tbl t2 on t1.id=t2.id
    +             WHERE  t1.fld3 IN (-123.23,321.23)
    +          """.stripMargin)
    --- End diff --
    
    Nit:
    ```Scala
              """
                |SELECT t1.id, t1.fld1, t1.fld2, t1.fld3
                |FROM tbl t1
                |JOIN tbl t2 on t1.id=t2.id
                |WHERE  t1.fld3 IN (-123.23,321.23)
              """.stripMargin)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    **[Test build #89684 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89684/testReport)** for PR 21052 at commit [`8369cbc`](https://github.com/apache/spark/commit/8369cbcd5eab3686c78365e1b1f906a3e8136731).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    **[Test build #89339 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89339/testReport)** for PR 21052 at commit [`74b6ebd`](https://github.com/apache/spark/commit/74b6ebdc2cd8a91944cc6159946f560ba7212a6a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89684/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by mshtelma <gi...@git.apache.org>.

Github user mshtelma commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    @gatorsmile the failed tests are not connected to the changes introduced in this PR. Would it make sense to run the test again ? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by mshtelma <gi...@git.apache.org>.

Github user mshtelma commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    @maropu thank you for the suggestions! I have implemented them and pushed the changes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    LGTM except two minor comments.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21052: [SPARK-23799] FilterEstimation.evaluateInSet prod...

Posted by mshtelma <gi...@git.apache.org>.

Github user mshtelma commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21052#discussion_r181418730
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala ---
    @@ -357,6 +357,17 @@ class FilterEstimationSuite extends StatsEstimationTestBase {
           expectedRowCount = 3)
       }
     
    +  test("evaluateInSet with all zeros") {
    +    validateEstimatedStats(
    +      Filter(InSet(attrString, Set(3, 4, 5)),
    +        StatsTestPlan(Seq(attrString), 10,
    +          AttributeMap(Seq(attrString ->
    +            ColumnStat(distinctCount = Some(0), min = Some(0), max = Some(0),
    --- End diff --
    
    done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89339/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    **[Test build #89684 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89684/testReport)** for PR 21052 at commit [`8369cbc`](https://github.com/apache/spark/commit/8369cbcd5eab3686c78365e1b1f906a3e8136731).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    **[Test build #89521 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89521/testReport)** for PR 21052 at commit [`0faa789`](https://github.com/apache/spark/commit/0faa789a2e040c90c8add1ba93bd8618b1988d8a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    **[Test build #89521 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89521/testReport)** for PR 21052 at commit [`0faa789`](https://github.com/apache/spark/commit/0faa789a2e040c90c8add1ba93bd8618b1988d8a).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89656/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    Thanks! Merged to master/2.3


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21052: [SPARK-23799] FilterEstimation.evaluateInSet prod...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21052#discussion_r181381874
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala ---
    @@ -395,27 +395,28 @@ case class FilterEstimation(plan: Filter) extends Logging {
         // use [min, max] to filter the original hSet
         dataType match {
           case _: NumericType | BooleanType | DateType | TimestampType =>
    -        val statsInterval =
    -          ValueInterval(colStat.min, colStat.max, dataType).asInstanceOf[NumericValueInterval]
    -        val validQuerySet = hSet.filter { v =>
    -          v != null && statsInterval.contains(Literal(v, dataType))
    -        }
    +        if (colStat.min.isDefined && colStat.max.isDefined) {
    --- End diff --
    
    check `ndv == 0` at the beginning and return `Some(0.0`? then we don't have to make all these changes


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    Let me revert it from Spark 2.3


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21052: [SPARK-23799] FilterEstimation.evaluateInSet prod...

Posted by mshtelma <gi...@git.apache.org>.

Github user mshtelma commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21052#discussion_r181418993
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala ---
    @@ -395,27 +395,28 @@ case class FilterEstimation(plan: Filter) extends Logging {
         // use [min, max] to filter the original hSet
         dataType match {
           case _: NumericType | BooleanType | DateType | TimestampType =>
    -        val statsInterval =
    -          ValueInterval(colStat.min, colStat.max, dataType).asInstanceOf[NumericValueInterval]
    -        val validQuerySet = hSet.filter { v =>
    -          v != null && statsInterval.contains(Literal(v, dataType))
    -        }
    +        if (colStat.min.isDefined && colStat.max.isDefined) {
    --- End diff --
    
    Yes, I have removes the bigger if, and implemented all three checks with one small if


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    **[Test build #89339 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89339/testReport)** for PR 21052 at commit [`74b6ebd`](https://github.com/apache/spark/commit/74b6ebdc2cd8a91944cc6159946f560ba7212a6a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89316/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    **[Test build #89675 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89675/testReport)** for PR 21052 at commit [`8d21488`](https://github.com/apache/spark/commit/8d2148814e52a2db1e14592c91467013565c310a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    **[Test build #89596 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89596/testReport)** for PR 21052 at commit [`0faa789`](https://github.com/apache/spark/commit/0faa789a2e040c90c8add1ba93bd8618b1988d8a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    cc @wzhfy Please review this. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

Posted by mshtelma <gi...@git.apache.org>.

Github user mshtelma commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    Regarding the devision by zero in EstimationUtils.scala#L166, I was not able to reproduce it here.  (https://github.com/apache/spark/blob/5cfd5fabcdbd77a806b98a6dd59b02772d2f6dee/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala#L166)
    I can add check there too, in order to be really sure, that this never happens. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    **[Test build #89316 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89316/testReport)** for PR 21052 at commit [`74b6ebd`](https://github.com/apache/spark/commit/74b6ebdc2cd8a91944cc6159946f560ba7212a6a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21052#discussion_r183826745
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala ---
    @@ -392,6 +392,10 @@ case class FilterEstimation(plan: Filter) extends Logging {
         val dataType = attr.dataType
         var newNdv = ndv
     
    +    if (ndv.toDouble == 0 || colStat.min.isEmpty || colStat.max.isEmpty)  {
    --- End diff --
    
    Yeah, we need to correct it in the next PR


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89675/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

Posted by mshtelma <gi...@git.apache.org>.

Github user mshtelma commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21052#discussion_r183206916
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
    @@ -382,4 +382,34 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared
           }
         }
       }
    +
    +  test("Simple queries must be working, if CBO is turned on") {
    +    withSQLConf(("spark.sql.cbo.enabled", "true")) {
    +      withTable("TBL1", "TBL") {
    +        import org.apache.spark.sql.functions._
    +        val df = spark.range(1000L).select('id,
    +          'id * 2 as "FLD1",
    +          'id * 12 as "FLD2",
    +          lit("aaa") + 'id as "fld3")
    +        df.write
    +          .mode(SaveMode.Overwrite)
    +          .bucketBy(10, "id", "FLD1", "FLD2")
    +          .sortBy("id", "FLD1", "FLD2")
    +          .saveAsTable("TBL")
    +        spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS ")
    +        spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, FLD1, FLD2, FLD3")
    +        val df2 = spark.sql(
    +          """
    +             SELECT t1.id, t1.fld1, t1.fld2, t1.fld3
    +             FROM tbl t1
    +             JOIN tbl t2 on t1.id=t2.id
    +             WHERE  t1.fld3 IN (-123.23,321.23)
    +          """.stripMargin)
    +        df2.createTempView("TBL2")
    +        spark.sql("SELECT * FROM tbl2 WHERE fld3 IN ('qqq', 'qwe')  ").explain()
    +      }
    +    }
    +
    --- End diff --
    
    done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

Posted by mshtelma <gi...@git.apache.org>.

Github user mshtelma commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21052#discussion_r183206913
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
    @@ -382,4 +382,34 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared
           }
         }
       }
    +
    +  test("Simple queries must be working, if CBO is turned on") {
    +    withSQLConf(("spark.sql.cbo.enabled", "true")) {
    +      withTable("TBL1", "TBL") {
    +        import org.apache.spark.sql.functions._
    +        val df = spark.range(1000L).select('id,
    +          'id * 2 as "FLD1",
    +          'id * 12 as "FLD2",
    +          lit("aaa") + 'id as "fld3")
    +        df.write
    +          .mode(SaveMode.Overwrite)
    +          .bucketBy(10, "id", "FLD1", "FLD2")
    +          .sortBy("id", "FLD1", "FLD2")
    +          .saveAsTable("TBL")
    +        spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS ")
    +        spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, FLD1, FLD2, FLD3")
    +        val df2 = spark.sql(
    +          """
    +             SELECT t1.id, t1.fld1, t1.fld2, t1.fld3
    +             FROM tbl t1
    +             JOIN tbl t2 on t1.id=t2.id
    +             WHERE  t1.fld3 IN (-123.23,321.23)
    +          """.stripMargin)
    +        df2.createTempView("TBL2")
    +        spark.sql("SELECT * FROM tbl2 WHERE fld3 IN ('qqq', 'qwe')  ").explain()
    +      }
    +    }
    +
    +  }
    +
    --- End diff --
    
    done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21052#discussion_r183685527
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala ---
    @@ -392,6 +392,10 @@ case class FilterEstimation(plan: Filter) extends Logging {
         val dataType = attr.dataType
         var newNdv = ndv
     
    +    if (ndv.toDouble == 0 || colStat.min.isEmpty || colStat.max.isEmpty)  {
    --- End diff --
    
    why `colStat.min.isEmpty || colStat.max.isEmpty` means empty output? string type always has no max/min


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    **[Test build #89656 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89656/testReport)** for PR 21052 at commit [`0faa789`](https://github.com/apache/spark/commit/0faa789a2e040c90c8add1ba93bd8618b1988d8a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by mshtelma <gi...@git.apache.org>.

Github user mshtelma commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    @gatorsmile I have removed explain() and changed formatting


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by mshtelma <gi...@git.apache.org>.

Github user mshtelma commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    @gatorsmile should I create new PR with these changes for 2.3 branch  ? I will do this. Do we need new jira for 2.3 ? or should I reference the existing  one ? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21052: [SPARK-23799] FilterEstimation.evaluateInSet prod...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21052#discussion_r181378148
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala ---
    @@ -357,6 +357,17 @@ class FilterEstimationSuite extends StatsEstimationTestBase {
           expectedRowCount = 3)
       }
     
    +  test("evaluateInSet with all zeros") {
    +    validateEstimatedStats(
    +      Filter(InSet(attrString, Set(3, 4, 5)),
    +        StatsTestPlan(Seq(attrString), 10,
    +          AttributeMap(Seq(attrString ->
    +            ColumnStat(distinctCount = Some(0), min = Some(0), max = Some(0),
    --- End diff --
    
    `min` and `max` should be `None`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by mshtelma <gi...@git.apache.org>.

Github user mshtelma commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    @wzhfy @maropu Hi guys, is there anything else I should add/change to the PR ? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89349/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

Posted by mshtelma <gi...@git.apache.org>.

Github user mshtelma commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21052#discussion_r183220650
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
    @@ -382,4 +382,32 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared
           }
         }
       }
    +
    +  test("Simple queries must be working, if CBO is turned on") {
    +    withSQLConf(SQLConf.CBO_ENABLED.key -> "true") {
    +      withTable("TBL1", "TBL") {
    +        import org.apache.spark.sql.functions._
    +        val df = spark.range(1000L).select('id,
    +          'id * 2 as "FLD1",
    +          'id * 12 as "FLD2",
    +          lit("aaa") + 'id as "fld3")
    +        df.write
    +          .mode(SaveMode.Overwrite)
    +          .bucketBy(10, "id", "FLD1", "FLD2")
    +          .sortBy("id", "FLD1", "FLD2")
    +          .saveAsTable("TBL")
    +        sql("ANALYZE TABLE TBL COMPUTE STATISTICS ")
    +        sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, FLD1, FLD2, FLD3")
    +        val df2 = spark.sql(
    +          """
    +             SELECT t1.id, t1.fld1, t1.fld2, t1.fld3
    +             FROM tbl t1
    +             JOIN tbl t2 on t1.id=t2.id
    +             WHERE  t1.fld3 IN (-123.23,321.23)
    +          """.stripMargin)
    +        df2.createTempView("TBL2")
    +        sql("SELECT * FROM tbl2 WHERE fld3 IN ('qqq', 'qwe')  ").explain()
    --- End diff --
    
    done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    **[Test build #89349 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89349/testReport)** for PR 21052 at commit [`0faa789`](https://github.com/apache/spark/commit/0faa789a2e040c90c8add1ba93bd8618b1988d8a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    **[Test build #89656 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89656/testReport)** for PR 21052 at commit [`0faa789`](https://github.com/apache/spark/commit/0faa789a2e040c90c8add1ba93bd8618b1988d8a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    @gatorsmile this broke 2.3 compilation. 
    https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.3-compile-maven-hadoop-2.6/638/


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by mshtelma <gi...@git.apache.org>.

Github user mshtelma commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    this time, it were completely different tests (HiveClientSuites) that have failed. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89521/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

Posted by mshtelma <gi...@git.apache.org>.

Github user mshtelma commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21052#discussion_r183220647
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
    @@ -382,4 +382,32 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared
           }
         }
       }
    +
    +  test("Simple queries must be working, if CBO is turned on") {
    +    withSQLConf(SQLConf.CBO_ENABLED.key -> "true") {
    +      withTable("TBL1", "TBL") {
    +        import org.apache.spark.sql.functions._
    +        val df = spark.range(1000L).select('id,
    +          'id * 2 as "FLD1",
    +          'id * 12 as "FLD2",
    +          lit("aaa") + 'id as "fld3")
    +        df.write
    +          .mode(SaveMode.Overwrite)
    +          .bucketBy(10, "id", "FLD1", "FLD2")
    +          .sortBy("id", "FLD1", "FLD2")
    +          .saveAsTable("TBL")
    +        sql("ANALYZE TABLE TBL COMPUTE STATISTICS ")
    +        sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, FLD1, FLD2, FLD3")
    +        val df2 = spark.sql(
    +          """
    +             SELECT t1.id, t1.fld1, t1.fld2, t1.fld3
    +             FROM tbl t1
    +             JOIN tbl t2 on t1.id=t2.id
    +             WHERE  t1.fld3 IN (-123.23,321.23)
    +          """.stripMargin)
    --- End diff --
    
    done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21052#discussion_r183190221
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
    @@ -382,4 +382,34 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared
           }
         }
       }
    +
    +  test("Simple queries must be working, if CBO is turned on") {
    +    withSQLConf(("spark.sql.cbo.enabled", "true")) {
    +      withTable("TBL1", "TBL") {
    +        import org.apache.spark.sql.functions._
    +        val df = spark.range(1000L).select('id,
    +          'id * 2 as "FLD1",
    +          'id * 12 as "FLD2",
    +          lit("aaa") + 'id as "fld3")
    +        df.write
    +          .mode(SaveMode.Overwrite)
    +          .bucketBy(10, "id", "FLD1", "FLD2")
    +          .sortBy("id", "FLD1", "FLD2")
    +          .saveAsTable("TBL")
    +        spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS ")
    +        spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, FLD1, FLD2, FLD3")
    +        val df2 = spark.sql(
    +          """
    +             SELECT t1.id, t1.fld1, t1.fld2, t1.fld3
    +             FROM tbl t1
    +             JOIN tbl t2 on t1.id=t2.id
    +             WHERE  t1.fld3 IN (-123.23,321.23)
    +          """.stripMargin)
    +        df2.createTempView("TBL2")
    +        spark.sql("SELECT * FROM tbl2 WHERE fld3 IN ('qqq', 'qwe')  ").explain()
    +      }
    +    }
    +
    --- End diff --
    
    nit: drop this line


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    **[Test build #89349 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89349/testReport)** for PR 21052 at commit [`0faa789`](https://github.com/apache/spark/commit/0faa789a2e040c90c8add1ba93bd8618b1988d8a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    @mshtelma Usually we describe PR using two sections: `What changes were proposed in this pull request?` and `How was this patch tested?`. I think it should be in the template when we open a PR. Could you please update PR description based on the template?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21052#discussion_r183190234
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
    @@ -382,4 +382,34 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared
           }
         }
       }
    +
    +  test("Simple queries must be working, if CBO is turned on") {
    +    withSQLConf(("spark.sql.cbo.enabled", "true")) {
    +      withTable("TBL1", "TBL") {
    +        import org.apache.spark.sql.functions._
    +        val df = spark.range(1000L).select('id,
    +          'id * 2 as "FLD1",
    +          'id * 12 as "FLD2",
    +          lit("aaa") + 'id as "fld3")
    +        df.write
    +          .mode(SaveMode.Overwrite)
    +          .bucketBy(10, "id", "FLD1", "FLD2")
    +          .sortBy("id", "FLD1", "FLD2")
    +          .saveAsTable("TBL")
    +        spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS ")
    +        spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, FLD1, FLD2, FLD3")
    +        val df2 = spark.sql(
    +          """
    +             SELECT t1.id, t1.fld1, t1.fld2, t1.fld3
    +             FROM tbl t1
    +             JOIN tbl t2 on t1.id=t2.id
    +             WHERE  t1.fld3 IN (-123.23,321.23)
    +          """.stripMargin)
    +        df2.createTempView("TBL2")
    +        spark.sql("SELECT * FROM tbl2 WHERE fld3 IN ('qqq', 'qwe')  ").explain()
    +      }
    +    }
    +
    +  }
    +
    --- End diff --
    
    ditto


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21052#discussion_r183190383
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
    @@ -382,4 +382,34 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared
           }
         }
       }
    +
    +  test("Simple queries must be working, if CBO is turned on") {
    +    withSQLConf(("spark.sql.cbo.enabled", "true")) {
    +      withTable("TBL1", "TBL") {
    +        import org.apache.spark.sql.functions._
    +        val df = spark.range(1000L).select('id,
    +          'id * 2 as "FLD1",
    +          'id * 12 as "FLD2",
    +          lit("aaa") + 'id as "fld3")
    +        df.write
    +          .mode(SaveMode.Overwrite)
    +          .bucketBy(10, "id", "FLD1", "FLD2")
    +          .sortBy("id", "FLD1", "FLD2")
    +          .saveAsTable("TBL")
    +        spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS ")
    --- End diff --
    
    nit: you don't need the `spark.` prefix


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21052


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21052: [SPARK-23799] FilterEstimation.evaluateInSet prod...

Posted by mshtelma <gi...@git.apache.org>.

Github user mshtelma commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21052#discussion_r181418681
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala ---
    @@ -357,6 +357,17 @@ class FilterEstimationSuite extends StatsEstimationTestBase {
           expectedRowCount = 3)
       }
     
    +  test("evaluateInSet with all zeros") {
    +    validateEstimatedStats(
    +      Filter(InSet(attrString, Set(3, 4, 5)),
    +        StatsTestPlan(Seq(attrString), 10,
    --- End diff --
    
    yes, this makes sense. 
    done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    **[Test build #89316 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89316/testReport)** for PR 21052 at commit [`74b6ebd`](https://github.com/apache/spark/commit/74b6ebdc2cd8a91944cc6159946f560ba7212a6a).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

Posted by mshtelma <gi...@git.apache.org>.

Github user mshtelma commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21052#discussion_r183206628
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
    @@ -382,4 +382,34 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared
           }
         }
       }
    +
    +  test("Simple queries must be working, if CBO is turned on") {
    +    withSQLConf(("spark.sql.cbo.enabled", "true")) {
    +      withTable("TBL1", "TBL") {
    +        import org.apache.spark.sql.functions._
    +        val df = spark.range(1000L).select('id,
    +          'id * 2 as "FLD1",
    +          'id * 12 as "FLD2",
    +          lit("aaa") + 'id as "fld3")
    +        df.write
    +          .mode(SaveMode.Overwrite)
    +          .bucketBy(10, "id", "FLD1", "FLD2")
    +          .sortBy("id", "FLD1", "FLD2")
    +          .saveAsTable("TBL")
    +        spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS ")
    +        spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, FLD1, FLD2, FLD3")
    +        val df2 = spark.sql(
    +          """
    +             SELECT t1.id, t1.fld1, t1.fld2, t1.fld3
    +             FROM tbl t1
    +             JOIN tbl t2 on t1.id=t2.id
    +             WHERE  t1.fld3 IN (-123.23,321.23)
    +          """.stripMargin)
    +        df2.createTempView("TBL2")
    +        spark.sql("SELECT * FROM tbl2 WHERE fld3 IN ('qqq', 'qwe')  ").explain()
    --- End diff --
    
    @wzhfy has suggested calling explain in order to trigger query optimization and calling FilterEstimation.evaluateInSet method. 
    I can call collect() instead. 
    I think explain() is sufficient for this test.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    See my PR https://github.com/apache/spark/pull/21147. We need to fix the issue first.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21052#discussion_r183190136
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
    @@ -382,4 +382,34 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared
           }
         }
       }
    +
    +  test("Simple queries must be working, if CBO is turned on") {
    +    withSQLConf(("spark.sql.cbo.enabled", "true")) {
    --- End diff --
    
    nit: `withSQLConf(SQLConf.CBO_ENABLED.key -> "true")`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    **[Test build #89675 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89675/testReport)** for PR 21052 at commit [`8d21488`](https://github.com/apache/spark/commit/8d2148814e52a2db1e14592c91467013565c310a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21052#discussion_r183190432
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
    @@ -382,4 +382,34 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared
           }
         }
       }
    +
    +  test("Simple queries must be working, if CBO is turned on") {
    +    withSQLConf(("spark.sql.cbo.enabled", "true")) {
    +      withTable("TBL1", "TBL") {
    +        import org.apache.spark.sql.functions._
    +        val df = spark.range(1000L).select('id,
    +          'id * 2 as "FLD1",
    +          'id * 12 as "FLD2",
    +          lit("aaa") + 'id as "fld3")
    +        df.write
    +          .mode(SaveMode.Overwrite)
    +          .bucketBy(10, "id", "FLD1", "FLD2")
    +          .sortBy("id", "FLD1", "FLD2")
    +          .saveAsTable("TBL")
    +        spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS ")
    +        spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, FLD1, FLD2, FLD3")
    +        val df2 = spark.sql(
    +          """
    +             SELECT t1.id, t1.fld1, t1.fld2, t1.fld3
    +             FROM tbl t1
    +             JOIN tbl t2 on t1.id=t2.id
    +             WHERE  t1.fld3 IN (-123.23,321.23)
    +          """.stripMargin)
    +        df2.createTempView("TBL2")
    +        spark.sql("SELECT * FROM tbl2 WHERE fld3 IN ('qqq', 'qwe')  ").explain()
    --- End diff --
    
    Why this `explain()` called?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89596/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21052: [SPARK-23799] FilterEstimation.evaluateInSet prod...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21052#discussion_r181378031
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala ---
    @@ -357,6 +357,17 @@ class FilterEstimationSuite extends StatsEstimationTestBase {
           expectedRowCount = 3)
       }
     
    +  test("evaluateInSet with all zeros") {
    +    validateEstimatedStats(
    +      Filter(InSet(attrString, Set(3, 4, 5)),
    +        StatsTestPlan(Seq(attrString), 10,
    --- End diff --
    
    change rowCount from `10` to `0`? this is more reasonable for an empty table.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21052: [SPARK-23799] FilterEstimation.evaluateInSet prod...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21052#discussion_r181380894
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/CBOSuite.scala ---
    @@ -0,0 +1,58 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.sql.execution
    +
    +import org.apache.spark.sql.{QueryTest, SaveMode}
    +import org.apache.spark.sql.test.SharedSparkSession
    +
    +class CBOSuite extends QueryTest with SharedSparkSession {
    +
    +  import testImplicits._
    +
    +  test("Simple queries must be working, if CBO is turned on") {
    --- End diff --
    
    Shall we move it to `StatisticsCollectionSuite`?
    And I think a simple EXPLAIN command on an empty table can just cover the case? We can check the plan's stats (e.g. rowCount == 0) after explain.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21052: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21052
  
    @wzhfy @gatorsmile could you trigger the tests?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21052: [SPARK-23799] FilterEstimation.evaluateInSet prod...

Posted by mshtelma <gi...@git.apache.org>.

Github user mshtelma commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21052#discussion_r181418832
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/CBOSuite.scala ---
    @@ -0,0 +1,58 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.sql.execution
    +
    +import org.apache.spark.sql.{QueryTest, SaveMode}
    +import org.apache.spark.sql.test.SharedSparkSession
    +
    +class CBOSuite extends QueryTest with SharedSparkSession {
    +
    +  import testImplicits._
    +
    +  test("Simple queries must be working, if CBO is turned on") {
    --- End diff --
    
    I have moved the test to StatisticsCollectionSuite
    Done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org