You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dongjoon-hyun <gi...@git.apache.org> on 2017/10/15 03:43:09 UTC

[GitHub] spark pull request #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite ...

GitHub user dongjoon-hyun opened a pull request:

    https://github.com/apache/spark/pull/19500

    [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test `convertMetastore` properly

    ## What changes were proposed in this pull request?
    
    This PR aims to improve **StatisticsSuite** to test `convertMetastore` configuration properly. Currently, some test logic in `test statistics of LogicalRelation converted from Hive serde tables` depends on the default configuration. New test case is shorter and covers both(true/false) cases explicitly.
    
    ## How was this patch tested?
    
    Pass the Jenkins with the improved test case.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dongjoon-hyun/spark SPARK-22280

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19500.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19500
    
----
commit 2a0a3f1b3f029c2454a471b33fed7766694fa518
Author: Dongjoon Hyun <do...@apache.org>
Date:   2017-10-15T03:38:22Z

    [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test `convertMetastore` properly

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19500
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19500
  
    **[Test build #82767 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82767/testReport)** for PR 19500 at commit [`2a0a3f1`](https://github.com/apache/spark/commit/2a0a3f1b3f029c2454a471b33fed7766694fa518).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19500
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82779/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19500
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19500
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82767/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19500
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19500#discussion_r144749492
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
    @@ -937,26 +937,22 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
       }
     
       test("test statistics of LogicalRelation converted from Hive serde tables") {
    -    val parquetTable = "parquetTable"
    -    val orcTable = "orcTable"
    -    withTable(parquetTable, orcTable) {
    -      sql(s"CREATE TABLE $parquetTable (key STRING, value STRING) STORED AS PARQUET")
    -      sql(s"CREATE TABLE $orcTable (key STRING, value STRING) STORED AS ORC")
    -      sql(s"INSERT INTO TABLE $parquetTable SELECT * FROM src")
    -      sql(s"INSERT INTO TABLE $orcTable SELECT * FROM src")
    -
    -      // the default value for `spark.sql.hive.convertMetastoreParquet` is true, here we just set it
    -      // for robustness
    -      withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "true") {
    -        checkTableStats(parquetTable, hasSizeInBytes = false, expectedRowCounts = None)
    -        sql(s"ANALYZE TABLE $parquetTable COMPUTE STATISTICS")
    -        checkTableStats(parquetTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
    -      }
    -      withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> "true") {
    -        // We still can get tableSize from Hive before Analyze
    -        checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = None)
    -        sql(s"ANALYZE TABLE $orcTable COMPUTE STATISTICS")
    -        checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
    +    Seq("orc", "parquet").foreach { format =>
    +      Seq("true", "false").foreach { isConverted =>
    --- End diff --
    
    Thank you for review, @gatorsmile . Sure.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite ...

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19500#discussion_r144759077
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
    @@ -937,26 +937,22 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
       }
     
       test("test statistics of LogicalRelation converted from Hive serde tables") {
    -    val parquetTable = "parquetTable"
    -    val orcTable = "orcTable"
    -    withTable(parquetTable, orcTable) {
    -      sql(s"CREATE TABLE $parquetTable (key STRING, value STRING) STORED AS PARQUET")
    -      sql(s"CREATE TABLE $orcTable (key STRING, value STRING) STORED AS ORC")
    -      sql(s"INSERT INTO TABLE $parquetTable SELECT * FROM src")
    -      sql(s"INSERT INTO TABLE $orcTable SELECT * FROM src")
    -
    -      // the default value for `spark.sql.hive.convertMetastoreParquet` is true, here we just set it
    -      // for robustness
    -      withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "true") {
    -        checkTableStats(parquetTable, hasSizeInBytes = false, expectedRowCounts = None)
    -        sql(s"ANALYZE TABLE $parquetTable COMPUTE STATISTICS")
    -        checkTableStats(parquetTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
    -      }
    -      withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> "true") {
    -        // We still can get tableSize from Hive before Analyze
    -        checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = None)
    --- End diff --
    
    Could you explain why orc table has size before analyze command while parquet table does not?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/19500
  
    Hi, @gatorsmile .
    Could you review this, too?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19500#discussion_r144745990
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
    @@ -937,26 +937,22 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
       }
     
       test("test statistics of LogicalRelation converted from Hive serde tables") {
    -    val parquetTable = "parquetTable"
    -    val orcTable = "orcTable"
    -    withTable(parquetTable, orcTable) {
    -      sql(s"CREATE TABLE $parquetTable (key STRING, value STRING) STORED AS PARQUET")
    -      sql(s"CREATE TABLE $orcTable (key STRING, value STRING) STORED AS ORC")
    -      sql(s"INSERT INTO TABLE $parquetTable SELECT * FROM src")
    -      sql(s"INSERT INTO TABLE $orcTable SELECT * FROM src")
    -
    -      // the default value for `spark.sql.hive.convertMetastoreParquet` is true, here we just set it
    -      // for robustness
    -      withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "true") {
    -        checkTableStats(parquetTable, hasSizeInBytes = false, expectedRowCounts = None)
    -        sql(s"ANALYZE TABLE $parquetTable COMPUTE STATISTICS")
    -        checkTableStats(parquetTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
    -      }
    -      withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> "true") {
    -        // We still can get tableSize from Hive before Analyze
    -        checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = None)
    -        sql(s"ANALYZE TABLE $orcTable COMPUTE STATISTICS")
    -        checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
    +    Seq("orc", "parquet").foreach { format =>
    +      Seq("true", "false").foreach { isConverted =>
    --- End diff --
    
    We prefer to using `Seq(true, false)`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19500
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82809/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19500


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19500#discussion_r144896264
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
    @@ -937,26 +937,22 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
       }
     
       test("test statistics of LogicalRelation converted from Hive serde tables") {
    -    val parquetTable = "parquetTable"
    -    val orcTable = "orcTable"
    -    withTable(parquetTable, orcTable) {
    -      sql(s"CREATE TABLE $parquetTable (key STRING, value STRING) STORED AS PARQUET")
    -      sql(s"CREATE TABLE $orcTable (key STRING, value STRING) STORED AS ORC")
    -      sql(s"INSERT INTO TABLE $parquetTable SELECT * FROM src")
    -      sql(s"INSERT INTO TABLE $orcTable SELECT * FROM src")
    -
    -      // the default value for `spark.sql.hive.convertMetastoreParquet` is true, here we just set it
    -      // for robustness
    -      withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "true") {
    -        checkTableStats(parquetTable, hasSizeInBytes = false, expectedRowCounts = None)
    -        sql(s"ANALYZE TABLE $parquetTable COMPUTE STATISTICS")
    -        checkTableStats(parquetTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
    -      }
    -      withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> "true") {
    -        // We still can get tableSize from Hive before Analyze
    -        checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = None)
    -        sql(s"ANALYZE TABLE $orcTable COMPUTE STATISTICS")
    -        checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
    +    Seq("orc", "parquet").foreach { format =>
    --- End diff --
    
    It is used `STORED AS $format`, too. :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19500
  
    **[Test build #82809 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82809/testReport)** for PR 19500 at commit [`8abac33`](https://github.com/apache/spark/commit/8abac338617014c77bc097f5c6b69aadafb3d410).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19500
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/19500
  
    Hi, @gatorsmile .
    Could you review this PR about improving TESTCASE?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/19500
  
    Thank you so much, @gatorsmile and @wzhfy . :D


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19500
  
    **[Test build #82809 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82809/testReport)** for PR 19500 at commit [`8abac33`](https://github.com/apache/spark/commit/8abac338617014c77bc097f5c6b69aadafb3d410).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite ...

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19500#discussion_r144759141
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
    @@ -937,26 +937,22 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
       }
     
       test("test statistics of LogicalRelation converted from Hive serde tables") {
    -    val parquetTable = "parquetTable"
    -    val orcTable = "orcTable"
    -    withTable(parquetTable, orcTable) {
    -      sql(s"CREATE TABLE $parquetTable (key STRING, value STRING) STORED AS PARQUET")
    -      sql(s"CREATE TABLE $orcTable (key STRING, value STRING) STORED AS ORC")
    -      sql(s"INSERT INTO TABLE $parquetTable SELECT * FROM src")
    -      sql(s"INSERT INTO TABLE $orcTable SELECT * FROM src")
    -
    -      // the default value for `spark.sql.hive.convertMetastoreParquet` is true, here we just set it
    -      // for robustness
    -      withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "true") {
    -        checkTableStats(parquetTable, hasSizeInBytes = false, expectedRowCounts = None)
    -        sql(s"ANALYZE TABLE $parquetTable COMPUTE STATISTICS")
    -        checkTableStats(parquetTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
    -      }
    -      withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> "true") {
    -        // We still can get tableSize from Hive before Analyze
    -        checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = None)
    -        sql(s"ANALYZE TABLE $orcTable COMPUTE STATISTICS")
    -        checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
    +    Seq("orc", "parquet").foreach { format =>
    +      Seq(true, false).foreach { isConverted =>
    +        withSQLConf(
    +          HiveUtils.CONVERT_METASTORE_ORC.key -> s"$isConverted",
    +          HiveUtils.CONVERT_METASTORE_PARQUET.key -> s"$isConverted") {
    +          withTable(format) {
    +            sql(s"CREATE TABLE $format (key STRING, value STRING) STORED AS $format")
    +            sql(s"INSERT INTO TABLE $format SELECT * FROM src")
    +
    +            val hasHiveStats = !isConverted
    --- End diff --
    
    we can just inline this val


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19500
  
    **[Test build #82779 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82779/testReport)** for PR 19500 at commit [`934da69`](https://github.com/apache/spark/commit/934da69d4900a2f5eb09c4d88dd9eb1b17cd568e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19500
  
    **[Test build #82779 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82779/testReport)** for PR 19500 at commit [`934da69`](https://github.com/apache/spark/commit/934da69d4900a2f5eb09c4d88dd9eb1b17cd568e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19500
  
    **[Test build #82767 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82767/testReport)** for PR 19500 at commit [`2a0a3f1`](https://github.com/apache/spark/commit/2a0a3f1b3f029c2454a471b33fed7766694fa518).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite ...

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19500#discussion_r144759023
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
    @@ -937,26 +937,22 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
       }
     
       test("test statistics of LogicalRelation converted from Hive serde tables") {
    -    val parquetTable = "parquetTable"
    -    val orcTable = "orcTable"
    -    withTable(parquetTable, orcTable) {
    -      sql(s"CREATE TABLE $parquetTable (key STRING, value STRING) STORED AS PARQUET")
    -      sql(s"CREATE TABLE $orcTable (key STRING, value STRING) STORED AS ORC")
    -      sql(s"INSERT INTO TABLE $parquetTable SELECT * FROM src")
    -      sql(s"INSERT INTO TABLE $orcTable SELECT * FROM src")
    -
    -      // the default value for `spark.sql.hive.convertMetastoreParquet` is true, here we just set it
    -      // for robustness
    -      withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "true") {
    -        checkTableStats(parquetTable, hasSizeInBytes = false, expectedRowCounts = None)
    -        sql(s"ANALYZE TABLE $parquetTable COMPUTE STATISTICS")
    -        checkTableStats(parquetTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
    -      }
    -      withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> "true") {
    -        // We still can get tableSize from Hive before Analyze
    -        checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = None)
    -        sql(s"ANALYZE TABLE $orcTable COMPUTE STATISTICS")
    -        checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
    +    Seq("orc", "parquet").foreach { format =>
    --- End diff --
    
    Maybe "orcTbl" and "parquetTbl" are better?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19500
  
    Thanks! Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19500#discussion_r144896008
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
    @@ -937,26 +937,22 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
       }
     
       test("test statistics of LogicalRelation converted from Hive serde tables") {
    -    val parquetTable = "parquetTable"
    -    val orcTable = "orcTable"
    -    withTable(parquetTable, orcTable) {
    -      sql(s"CREATE TABLE $parquetTable (key STRING, value STRING) STORED AS PARQUET")
    -      sql(s"CREATE TABLE $orcTable (key STRING, value STRING) STORED AS ORC")
    -      sql(s"INSERT INTO TABLE $parquetTable SELECT * FROM src")
    -      sql(s"INSERT INTO TABLE $orcTable SELECT * FROM src")
    -
    -      // the default value for `spark.sql.hive.convertMetastoreParquet` is true, here we just set it
    -      // for robustness
    -      withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "true") {
    -        checkTableStats(parquetTable, hasSizeInBytes = false, expectedRowCounts = None)
    -        sql(s"ANALYZE TABLE $parquetTable COMPUTE STATISTICS")
    -        checkTableStats(parquetTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
    -      }
    -      withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> "true") {
    -        // We still can get tableSize from Hive before Analyze
    -        checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = None)
    --- End diff --
    
    This is old test case by SPARK-17410.
    The original test case wasn't and my new test case didn't.
    It's due to `convertMetastoreXXX`. Hive INSERT will generate stats.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19500#discussion_r144896827
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
    @@ -937,26 +937,22 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
       }
     
       test("test statistics of LogicalRelation converted from Hive serde tables") {
    -    val parquetTable = "parquetTable"
    -    val orcTable = "orcTable"
    -    withTable(parquetTable, orcTable) {
    -      sql(s"CREATE TABLE $parquetTable (key STRING, value STRING) STORED AS PARQUET")
    -      sql(s"CREATE TABLE $orcTable (key STRING, value STRING) STORED AS ORC")
    -      sql(s"INSERT INTO TABLE $parquetTable SELECT * FROM src")
    -      sql(s"INSERT INTO TABLE $orcTable SELECT * FROM src")
    -
    -      // the default value for `spark.sql.hive.convertMetastoreParquet` is true, here we just set it
    -      // for robustness
    -      withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "true") {
    -        checkTableStats(parquetTable, hasSizeInBytes = false, expectedRowCounts = None)
    -        sql(s"ANALYZE TABLE $parquetTable COMPUTE STATISTICS")
    -        checkTableStats(parquetTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
    -      }
    -      withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> "true") {
    -        // We still can get tableSize from Hive before Analyze
    -        checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = None)
    -        sql(s"ANALYZE TABLE $orcTable COMPUTE STATISTICS")
    -        checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
    +    Seq("orc", "parquet").foreach { format =>
    +      Seq(true, false).foreach { isConverted =>
    +        withSQLConf(
    +          HiveUtils.CONVERT_METASTORE_ORC.key -> s"$isConverted",
    +          HiveUtils.CONVERT_METASTORE_PARQUET.key -> s"$isConverted") {
    +          withTable(format) {
    +            sql(s"CREATE TABLE $format (key STRING, value STRING) STORED AS $format")
    +            sql(s"INSERT INTO TABLE $format SELECT * FROM src")
    +
    +            val hasHiveStats = !isConverted
    --- End diff --
    
    Yep. Thank you for review, @wzhfy .


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org