You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by wzhfy <gi...@git.apache.org> on 2017/12/09 08:39:44 UTC
[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive
GitHub user wzhfy opened a pull request:
https://github.com/apache/spark/pull/19932
[SPARK-22745][SQL] read partition stats from Hive
## What changes were proposed in this pull request?
Currently Spark can read table stats (e.g. `totalSize, numRows`) from Hive, we can also support to read partition stats from Hive using the same logic.
## How was this patch tested?
Added a new test case and modified an existing test case.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/wzhfy/spark read_hive_partition_stats
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19932.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19932
----
commit 48b81b5065808ffeff99142a03cd59bf54a9ea5d
Author: Zhenhua Wang <wa...@huawei.com>
Date: 2017-12-09T08:32:48Z
read partition stats
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19932
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84819/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19932#discussion_r155921370
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
@@ -413,32 +413,7 @@ private[hive] class HiveClientImpl(
case (key, _) => excludedTableProperties.contains(key)
}
val comment = properties.get("comment")
-
- // Here we are reading statistics from Hive.
- // Note that this statistics could be overridden by Spark's statistics if that's available.
- val totalSize = properties.get(StatsSetupConst.TOTAL_SIZE).map(BigInt(_))
--- End diff --
The code path is moved to the method `readHiveStats`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19932
**[Test build #84829 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84829/testReport)** for PR 19932 at commit [`b80c8f3`](https://github.com/apache/spark/commit/b80c8f39ede82bc805352a5abeb5d7ec0dcb8df8).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19932
LGTM
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19932#discussion_r156546859
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -213,6 +213,27 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
}
}
+ test("SPARK-22745 - read Hive's statistics for partition") {
+ val tableName = "hive_stats_part_table"
+ withTable(tableName) {
+ sql(s"CREATE TABLE $tableName (key STRING, value STRING) PARTITIONED BY (ds STRING)")
+ sql(s"INSERT INTO TABLE $tableName PARTITION (ds='2017-01-01') SELECT * FROM src")
+ var partition = spark.sessionState.catalog
+ .getPartition(TableIdentifier(tableName), Map("ds" -> "2017-01-01"))
+
+ assert(partition.stats.get.sizeInBytes == 5812)
+ assert(partition.stats.get.rowCount.isEmpty)
+
+ hiveClient
+ .runSqlHive(s"ANALYZE TABLE $tableName PARTITION (ds='2017-01-01') COMPUTE STATISTICS")
+ partition = spark.sessionState.catalog
+ .getPartition(TableIdentifier(tableName), Map("ds" -> "2017-01-01"))
+
+ assert(partition.stats.get.sizeInBytes == 5812)
--- End diff --
`totalSize` exists after the INSERT INTO command, so here `sizeInBytes` doesn't change after ANALYZE command, only rowCount is added.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19932
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19932
**[Test build #84680 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84680/testReport)** for PR 19932 at commit [`48b81b5`](https://github.com/apache/spark/commit/48b81b5065808ffeff99142a03cd59bf54a9ea5d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19932#discussion_r155936167
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -353,15 +374,6 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
createPartition("2010-01-02", 11,
"SELECT '1', 'A' from src UNION ALL SELECT '1', 'A' from src")
- sql(s"ANALYZE TABLE $tableName PARTITION (ds='2010-01-01') COMPUTE STATISTICS NOSCAN")
-
- assertPartitionStats("2010-01-01", "10", rowCount = None, sizeInBytes = 2000)
- assertPartitionStats("2010-01-01", "11", rowCount = None, sizeInBytes = 2000)
- assert(queryStats("2010-01-02", "10") === None)
- assert(queryStats("2010-01-02", "11") === None)
--- End diff --
After the change, these checks are not right as we read hive stats. So I remove them.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/19932
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19932
**[Test build #84680 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84680/testReport)** for PR 19932 at commit [`48b81b5`](https://github.com/apache/spark/commit/48b81b5065808ffeff99142a03cd59bf54a9ea5d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:
https://github.com/apache/spark/pull/19932#discussion_r155935430
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -213,6 +213,29 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
}
}
+ test("SPARK- - read Hive's statistics for partition") {
--- End diff --
SPARK- -> SPARK-22745?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19932#discussion_r156546885
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
@@ -1021,8 +998,38 @@ private[hive] object HiveClientImpl {
compressed = apiPartition.getSd.isCompressed,
properties = Option(apiPartition.getSd.getSerdeInfo.getParameters)
.map(_.asScala.toMap).orNull),
- parameters =
- if (hp.getParameters() != null) hp.getParameters().asScala.toMap else Map.empty)
+ parameters = properties,
+ stats = readHiveStats(properties))
+ }
+
+ /**
+ * Reads statistics from Hive.
+ * Note that this statistics could be overridden by Spark's statistics if that's available.
+ */
+ private def readHiveStats(properties: Map[String, String]): Option[CatalogStatistics] = {
+ val totalSize = properties.get(StatsSetupConst.TOTAL_SIZE).map(BigInt(_))
+ val rawDataSize = properties.get(StatsSetupConst.RAW_DATA_SIZE).map(BigInt(_))
+ val rowCount = properties.get(StatsSetupConst.ROW_COUNT).map(BigInt(_))
+ // TODO: check if this estimate is valid for tables after partition pruning.
--- End diff --
good catch, we can remove this
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19932
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84829/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19932
**[Test build #84819 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84819/testReport)** for PR 19932 at commit [`b80c8f3`](https://github.com/apache/spark/commit/b80c8f39ede82bc805352a5abeb5d7ec0dcb8df8).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19932
thanks, merging to master!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19932
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19932
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84689/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19932
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19932#discussion_r156395848
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
@@ -1021,8 +998,38 @@ private[hive] object HiveClientImpl {
compressed = apiPartition.getSd.isCompressed,
properties = Option(apiPartition.getSd.getSerdeInfo.getParameters)
.map(_.asScala.toMap).orNull),
- parameters =
- if (hp.getParameters() != null) hp.getParameters().asScala.toMap else Map.empty)
+ parameters = properties,
+ stats = readHiveStats(properties))
+ }
+
+ /**
+ * Reads statistics from Hive.
+ * Note that this statistics could be overridden by Spark's statistics if that's available.
+ */
+ private def readHiveStats(properties: Map[String, String]): Option[CatalogStatistics] = {
+ val totalSize = properties.get(StatsSetupConst.TOTAL_SIZE).map(BigInt(_))
+ val rawDataSize = properties.get(StatsSetupConst.RAW_DATA_SIZE).map(BigInt(_))
+ val rowCount = properties.get(StatsSetupConst.ROW_COUNT).map(BigInt(_))
+ // TODO: check if this estimate is valid for tables after partition pruning.
--- End diff --
do we still need this TODO?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19932#discussion_r156396216
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
@@ -413,32 +413,7 @@ private[hive] class HiveClientImpl(
case (key, _) => excludedTableProperties.contains(key)
}
val comment = properties.get("comment")
-
- // Here we are reading statistics from Hive.
- // Note that this statistics could be overridden by Spark's statistics if that's available.
- val totalSize = properties.get(StatsSetupConst.TOTAL_SIZE).map(BigInt(_))
- val rawDataSize = properties.get(StatsSetupConst.RAW_DATA_SIZE).map(BigInt(_))
- val rowCount = properties.get(StatsSetupConst.ROW_COUNT).map(BigInt(_))
- // TODO: check if this estimate is valid for tables after partition pruning.
- // NOTE: getting `totalSize` directly from params is kind of hacky, but this should be
- // relatively cheap if parameters for the table are populated into the metastore.
- // Currently, only totalSize, rawDataSize, and rowCount are used to build the field `stats`
- // TODO: stats should include all the other two fields (`numFiles` and `numPartitions`).
- // (see StatsSetupConst in Hive)
- val stats =
- // When table is external, `totalSize` is always zero, which will influence join strategy.
- // So when `totalSize` is zero, use `rawDataSize` instead. When `rawDataSize` is also zero,
- // return None.
- // In Hive, when statistics gathering is disabled, `rawDataSize` and `numRows` is always
- // zero after INSERT command. So they are used here only if they are larger than zero.
- if (totalSize.isDefined && totalSize.get > 0L) {
- Some(CatalogStatistics(sizeInBytes = totalSize.get, rowCount = rowCount.filter(_ > 0)))
- } else if (rawDataSize.isDefined && rawDataSize.get > 0) {
- Some(CatalogStatistics(sizeInBytes = rawDataSize.get, rowCount = rowCount.filter(_ > 0)))
- } else {
- // TODO: still fill the rowCount even if sizeInBytes is empty. Might break anything?
- None
- }
+ val hiveStats = readHiveStats(properties)
--- End diff --
nit: we can inline it
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19932
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19932
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19932
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84680/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19932
**[Test build #84829 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84829/testReport)** for PR 19932 at commit [`b80c8f3`](https://github.com/apache/spark/commit/b80c8f39ede82bc805352a5abeb5d7ec0dcb8df8).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19932#discussion_r155936087
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -213,6 +213,29 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
}
}
+ test("SPARK- - read Hive's statistics for partition") {
--- End diff --
oh, I forgot it, thanks!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19932#discussion_r156397168
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -213,6 +213,27 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
}
}
+ test("SPARK-22745 - read Hive's statistics for partition") {
+ val tableName = "hive_stats_part_table"
+ withTable(tableName) {
+ sql(s"CREATE TABLE $tableName (key STRING, value STRING) PARTITIONED BY (ds STRING)")
+ sql(s"INSERT INTO TABLE $tableName PARTITION (ds='2017-01-01') SELECT * FROM src")
+ var partition = spark.sessionState.catalog
+ .getPartition(TableIdentifier(tableName), Map("ds" -> "2017-01-01"))
+
+ assert(partition.stats.get.sizeInBytes == 5812)
+ assert(partition.stats.get.rowCount.isEmpty)
+
+ hiveClient
+ .runSqlHive(s"ANALYZE TABLE $tableName PARTITION (ds='2017-01-01') COMPUTE STATISTICS")
+ partition = spark.sessionState.catalog
+ .getPartition(TableIdentifier(tableName), Map("ds" -> "2017-01-01"))
+
+ assert(partition.stats.get.sizeInBytes == 5812)
--- End diff --
I'm expecting `totalSize` is picked here and the `sizeInBytes` would be changed, did I miss something?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19932
**[Test build #84819 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84819/testReport)** for PR 19932 at commit [`b80c8f3`](https://github.com/apache/spark/commit/b80c8f39ede82bc805352a5abeb5d7ec0dcb8df8).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19932
**[Test build #84689 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84689/testReport)** for PR 19932 at commit [`09a7c05`](https://github.com/apache/spark/commit/09a7c0594507ae6f14f3f016fdc407477e320107).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19932#discussion_r156396524
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
@@ -1011,6 +986,8 @@ private[hive] object HiveClientImpl {
*/
def fromHivePartition(hp: HivePartition): CatalogTablePartition = {
val apiPartition = hp.getTPartition
+ val properties: Map[String, String] =
+ if (hp.getParameters != null) hp.getParameters.asScala.toMap else Map.empty
--- End diff --
nit: if can't fit in one line, prefer
```
val xxx = if {
...
} else {
...
}
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19932
cc @cloud-fan @gatorsmile
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19932
**[Test build #84689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84689/testReport)** for PR 19932 at commit [`09a7c05`](https://github.com/apache/spark/commit/09a7c0594507ae6f14f3f016fdc407477e320107).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org