You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dongjoon-hyun <gi...@git.apache.org> on 2017/10/15 03:43:09 UTC
[GitHub] spark pull request #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite ...
GitHub user dongjoon-hyun opened a pull request:
https://github.com/apache/spark/pull/19500
[SPARK-22280][SQL][TEST] Improve StatisticsSuite to test `convertMetastore` properly
## What changes were proposed in this pull request?
This PR aims to improve **StatisticsSuite** to test `convertMetastore` configuration properly. Currently, some test logic in `test statistics of LogicalRelation converted from Hive serde tables` depends on the default configuration. New test case is shorter and covers both(true/false) cases explicitly.
## How was this patch tested?
Pass the Jenkins with the improved test case.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dongjoon-hyun/spark SPARK-22280
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19500.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19500
----
commit 2a0a3f1b3f029c2454a471b33fed7766694fa518
Author: Dongjoon Hyun <do...@apache.org>
Date: 2017-10-15T03:38:22Z
[SPARK-22280][SQL][TEST] Improve StatisticsSuite to test `convertMetastore` properly
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19500
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19500
**[Test build #82767 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82767/testReport)** for PR 19500 at commit [`2a0a3f1`](https://github.com/apache/spark/commit/2a0a3f1b3f029c2454a471b33fed7766694fa518).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19500
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82779/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19500
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19500
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82767/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19500
LGTM
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/19500#discussion_r144749492
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -937,26 +937,22 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
}
test("test statistics of LogicalRelation converted from Hive serde tables") {
- val parquetTable = "parquetTable"
- val orcTable = "orcTable"
- withTable(parquetTable, orcTable) {
- sql(s"CREATE TABLE $parquetTable (key STRING, value STRING) STORED AS PARQUET")
- sql(s"CREATE TABLE $orcTable (key STRING, value STRING) STORED AS ORC")
- sql(s"INSERT INTO TABLE $parquetTable SELECT * FROM src")
- sql(s"INSERT INTO TABLE $orcTable SELECT * FROM src")
-
- // the default value for `spark.sql.hive.convertMetastoreParquet` is true, here we just set it
- // for robustness
- withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "true") {
- checkTableStats(parquetTable, hasSizeInBytes = false, expectedRowCounts = None)
- sql(s"ANALYZE TABLE $parquetTable COMPUTE STATISTICS")
- checkTableStats(parquetTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
- }
- withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> "true") {
- // We still can get tableSize from Hive before Analyze
- checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = None)
- sql(s"ANALYZE TABLE $orcTable COMPUTE STATISTICS")
- checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
+ Seq("orc", "parquet").foreach { format =>
+ Seq("true", "false").foreach { isConverted =>
--- End diff --
Thank you for review, @gatorsmile . Sure.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite ...
Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19500#discussion_r144759077
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -937,26 +937,22 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
}
test("test statistics of LogicalRelation converted from Hive serde tables") {
- val parquetTable = "parquetTable"
- val orcTable = "orcTable"
- withTable(parquetTable, orcTable) {
- sql(s"CREATE TABLE $parquetTable (key STRING, value STRING) STORED AS PARQUET")
- sql(s"CREATE TABLE $orcTable (key STRING, value STRING) STORED AS ORC")
- sql(s"INSERT INTO TABLE $parquetTable SELECT * FROM src")
- sql(s"INSERT INTO TABLE $orcTable SELECT * FROM src")
-
- // the default value for `spark.sql.hive.convertMetastoreParquet` is true, here we just set it
- // for robustness
- withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "true") {
- checkTableStats(parquetTable, hasSizeInBytes = false, expectedRowCounts = None)
- sql(s"ANALYZE TABLE $parquetTable COMPUTE STATISTICS")
- checkTableStats(parquetTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
- }
- withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> "true") {
- // We still can get tableSize from Hive before Analyze
- checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = None)
--- End diff --
Could you explain why orc table has size before analyze command while parquet table does not?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/19500
Hi, @gatorsmile .
Could you review this, too?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite ...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19500#discussion_r144745990
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -937,26 +937,22 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
}
test("test statistics of LogicalRelation converted from Hive serde tables") {
- val parquetTable = "parquetTable"
- val orcTable = "orcTable"
- withTable(parquetTable, orcTable) {
- sql(s"CREATE TABLE $parquetTable (key STRING, value STRING) STORED AS PARQUET")
- sql(s"CREATE TABLE $orcTable (key STRING, value STRING) STORED AS ORC")
- sql(s"INSERT INTO TABLE $parquetTable SELECT * FROM src")
- sql(s"INSERT INTO TABLE $orcTable SELECT * FROM src")
-
- // the default value for `spark.sql.hive.convertMetastoreParquet` is true, here we just set it
- // for robustness
- withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "true") {
- checkTableStats(parquetTable, hasSizeInBytes = false, expectedRowCounts = None)
- sql(s"ANALYZE TABLE $parquetTable COMPUTE STATISTICS")
- checkTableStats(parquetTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
- }
- withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> "true") {
- // We still can get tableSize from Hive before Analyze
- checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = None)
- sql(s"ANALYZE TABLE $orcTable COMPUTE STATISTICS")
- checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
+ Seq("orc", "parquet").foreach { format =>
+ Seq("true", "false").foreach { isConverted =>
--- End diff --
We prefer to using `Seq(true, false)`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19500
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82809/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite ...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/19500
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/19500#discussion_r144896264
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -937,26 +937,22 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
}
test("test statistics of LogicalRelation converted from Hive serde tables") {
- val parquetTable = "parquetTable"
- val orcTable = "orcTable"
- withTable(parquetTable, orcTable) {
- sql(s"CREATE TABLE $parquetTable (key STRING, value STRING) STORED AS PARQUET")
- sql(s"CREATE TABLE $orcTable (key STRING, value STRING) STORED AS ORC")
- sql(s"INSERT INTO TABLE $parquetTable SELECT * FROM src")
- sql(s"INSERT INTO TABLE $orcTable SELECT * FROM src")
-
- // the default value for `spark.sql.hive.convertMetastoreParquet` is true, here we just set it
- // for robustness
- withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "true") {
- checkTableStats(parquetTable, hasSizeInBytes = false, expectedRowCounts = None)
- sql(s"ANALYZE TABLE $parquetTable COMPUTE STATISTICS")
- checkTableStats(parquetTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
- }
- withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> "true") {
- // We still can get tableSize from Hive before Analyze
- checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = None)
- sql(s"ANALYZE TABLE $orcTable COMPUTE STATISTICS")
- checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
+ Seq("orc", "parquet").foreach { format =>
--- End diff --
It is used `STORED AS $format`, too. :)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19500
**[Test build #82809 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82809/testReport)** for PR 19500 at commit [`8abac33`](https://github.com/apache/spark/commit/8abac338617014c77bc097f5c6b69aadafb3d410).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19500
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/19500
Hi, @gatorsmile .
Could you review this PR about improving TESTCASE?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/19500
Thank you so much, @gatorsmile and @wzhfy . :D
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19500
**[Test build #82809 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82809/testReport)** for PR 19500 at commit [`8abac33`](https://github.com/apache/spark/commit/8abac338617014c77bc097f5c6b69aadafb3d410).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite ...
Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19500#discussion_r144759141
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -937,26 +937,22 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
}
test("test statistics of LogicalRelation converted from Hive serde tables") {
- val parquetTable = "parquetTable"
- val orcTable = "orcTable"
- withTable(parquetTable, orcTable) {
- sql(s"CREATE TABLE $parquetTable (key STRING, value STRING) STORED AS PARQUET")
- sql(s"CREATE TABLE $orcTable (key STRING, value STRING) STORED AS ORC")
- sql(s"INSERT INTO TABLE $parquetTable SELECT * FROM src")
- sql(s"INSERT INTO TABLE $orcTable SELECT * FROM src")
-
- // the default value for `spark.sql.hive.convertMetastoreParquet` is true, here we just set it
- // for robustness
- withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "true") {
- checkTableStats(parquetTable, hasSizeInBytes = false, expectedRowCounts = None)
- sql(s"ANALYZE TABLE $parquetTable COMPUTE STATISTICS")
- checkTableStats(parquetTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
- }
- withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> "true") {
- // We still can get tableSize from Hive before Analyze
- checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = None)
- sql(s"ANALYZE TABLE $orcTable COMPUTE STATISTICS")
- checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
+ Seq("orc", "parquet").foreach { format =>
+ Seq(true, false).foreach { isConverted =>
+ withSQLConf(
+ HiveUtils.CONVERT_METASTORE_ORC.key -> s"$isConverted",
+ HiveUtils.CONVERT_METASTORE_PARQUET.key -> s"$isConverted") {
+ withTable(format) {
+ sql(s"CREATE TABLE $format (key STRING, value STRING) STORED AS $format")
+ sql(s"INSERT INTO TABLE $format SELECT * FROM src")
+
+ val hasHiveStats = !isConverted
--- End diff --
we can just inline this val
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19500
**[Test build #82779 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82779/testReport)** for PR 19500 at commit [`934da69`](https://github.com/apache/spark/commit/934da69d4900a2f5eb09c4d88dd9eb1b17cd568e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19500
**[Test build #82779 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82779/testReport)** for PR 19500 at commit [`934da69`](https://github.com/apache/spark/commit/934da69d4900a2f5eb09c4d88dd9eb1b17cd568e).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19500
**[Test build #82767 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82767/testReport)** for PR 19500 at commit [`2a0a3f1`](https://github.com/apache/spark/commit/2a0a3f1b3f029c2454a471b33fed7766694fa518).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite ...
Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19500#discussion_r144759023
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -937,26 +937,22 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
}
test("test statistics of LogicalRelation converted from Hive serde tables") {
- val parquetTable = "parquetTable"
- val orcTable = "orcTable"
- withTable(parquetTable, orcTable) {
- sql(s"CREATE TABLE $parquetTable (key STRING, value STRING) STORED AS PARQUET")
- sql(s"CREATE TABLE $orcTable (key STRING, value STRING) STORED AS ORC")
- sql(s"INSERT INTO TABLE $parquetTable SELECT * FROM src")
- sql(s"INSERT INTO TABLE $orcTable SELECT * FROM src")
-
- // the default value for `spark.sql.hive.convertMetastoreParquet` is true, here we just set it
- // for robustness
- withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "true") {
- checkTableStats(parquetTable, hasSizeInBytes = false, expectedRowCounts = None)
- sql(s"ANALYZE TABLE $parquetTable COMPUTE STATISTICS")
- checkTableStats(parquetTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
- }
- withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> "true") {
- // We still can get tableSize from Hive before Analyze
- checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = None)
- sql(s"ANALYZE TABLE $orcTable COMPUTE STATISTICS")
- checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
+ Seq("orc", "parquet").foreach { format =>
--- End diff --
Maybe "orcTbl" and "parquetTbl" are better?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite to test...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19500
Thanks! Merged to master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/19500#discussion_r144896008
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -937,26 +937,22 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
}
test("test statistics of LogicalRelation converted from Hive serde tables") {
- val parquetTable = "parquetTable"
- val orcTable = "orcTable"
- withTable(parquetTable, orcTable) {
- sql(s"CREATE TABLE $parquetTable (key STRING, value STRING) STORED AS PARQUET")
- sql(s"CREATE TABLE $orcTable (key STRING, value STRING) STORED AS ORC")
- sql(s"INSERT INTO TABLE $parquetTable SELECT * FROM src")
- sql(s"INSERT INTO TABLE $orcTable SELECT * FROM src")
-
- // the default value for `spark.sql.hive.convertMetastoreParquet` is true, here we just set it
- // for robustness
- withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "true") {
- checkTableStats(parquetTable, hasSizeInBytes = false, expectedRowCounts = None)
- sql(s"ANALYZE TABLE $parquetTable COMPUTE STATISTICS")
- checkTableStats(parquetTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
- }
- withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> "true") {
- // We still can get tableSize from Hive before Analyze
- checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = None)
--- End diff --
This is old test case by SPARK-17410.
The original test case wasn't and my new test case didn't.
It's due to `convertMetastoreXXX`. Hive INSERT will generate stats.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19500: [SPARK-22280][SQL][TEST] Improve StatisticsSuite ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/19500#discussion_r144896827
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -937,26 +937,22 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
}
test("test statistics of LogicalRelation converted from Hive serde tables") {
- val parquetTable = "parquetTable"
- val orcTable = "orcTable"
- withTable(parquetTable, orcTable) {
- sql(s"CREATE TABLE $parquetTable (key STRING, value STRING) STORED AS PARQUET")
- sql(s"CREATE TABLE $orcTable (key STRING, value STRING) STORED AS ORC")
- sql(s"INSERT INTO TABLE $parquetTable SELECT * FROM src")
- sql(s"INSERT INTO TABLE $orcTable SELECT * FROM src")
-
- // the default value for `spark.sql.hive.convertMetastoreParquet` is true, here we just set it
- // for robustness
- withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "true") {
- checkTableStats(parquetTable, hasSizeInBytes = false, expectedRowCounts = None)
- sql(s"ANALYZE TABLE $parquetTable COMPUTE STATISTICS")
- checkTableStats(parquetTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
- }
- withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> "true") {
- // We still can get tableSize from Hive before Analyze
- checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = None)
- sql(s"ANALYZE TABLE $orcTable COMPUTE STATISTICS")
- checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = Some(500))
+ Seq("orc", "parquet").foreach { format =>
+ Seq(true, false).foreach { isConverted =>
+ withSQLConf(
+ HiveUtils.CONVERT_METASTORE_ORC.key -> s"$isConverted",
+ HiveUtils.CONVERT_METASTORE_PARQUET.key -> s"$isConverted") {
+ withTable(format) {
+ sql(s"CREATE TABLE $format (key STRING, value STRING) STORED AS $format")
+ sql(s"INSERT INTO TABLE $format SELECT * FROM src")
+
+ val hasHiveStats = !isConverted
--- End diff --
Yep. Thank you for review, @wzhfy .
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org