You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by wzhfy <gi...@git.apache.org> on 2017/12/09 08:39:44 UTC

[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive

GitHub user wzhfy opened a pull request:

    https://github.com/apache/spark/pull/19932

    [SPARK-22745][SQL] read partition stats from Hive

    ## What changes were proposed in this pull request?
    
    Currently Spark can read table stats (e.g. `totalSize, numRows`) from Hive, we can also support to read partition stats from Hive using the same logic.
    
    ## How was this patch tested?
    
    Added a new test case and modified an existing test case.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wzhfy/spark read_hive_partition_stats

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19932.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19932
    
----
commit 48b81b5065808ffeff99142a03cd59bf54a9ea5d
Author: Zhenhua Wang <wa...@huawei.com>
Date:   2017-12-09T08:32:48Z

    read partition stats

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19932
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84819/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19932#discussion_r155921370
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -413,32 +413,7 @@ private[hive] class HiveClientImpl(
             case (key, _) => excludedTableProperties.contains(key)
           }
           val comment = properties.get("comment")
    -
    -      // Here we are reading statistics from Hive.
    -      // Note that this statistics could be overridden by Spark's statistics if that's available.
    -      val totalSize = properties.get(StatsSetupConst.TOTAL_SIZE).map(BigInt(_))
    --- End diff --
    
    The code path is moved to the method `readHiveStats`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19932
  
    **[Test build #84829 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84829/testReport)** for PR 19932 at commit [`b80c8f3`](https://github.com/apache/spark/commit/b80c8f39ede82bc805352a5abeb5d7ec0dcb8df8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19932
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19932#discussion_r156546859
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
    @@ -213,6 +213,27 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
         }
       }
     
    +  test("SPARK-22745 - read Hive's statistics for partition") {
    +    val tableName = "hive_stats_part_table"
    +    withTable(tableName) {
    +      sql(s"CREATE TABLE $tableName (key STRING, value STRING) PARTITIONED BY (ds STRING)")
    +      sql(s"INSERT INTO TABLE $tableName PARTITION (ds='2017-01-01') SELECT * FROM src")
    +      var partition = spark.sessionState.catalog
    +        .getPartition(TableIdentifier(tableName), Map("ds" -> "2017-01-01"))
    +
    +      assert(partition.stats.get.sizeInBytes == 5812)
    +      assert(partition.stats.get.rowCount.isEmpty)
    +
    +      hiveClient
    +        .runSqlHive(s"ANALYZE TABLE $tableName PARTITION (ds='2017-01-01') COMPUTE STATISTICS")
    +      partition = spark.sessionState.catalog
    +        .getPartition(TableIdentifier(tableName), Map("ds" -> "2017-01-01"))
    +
    +      assert(partition.stats.get.sizeInBytes == 5812)
    --- End diff --
    
    `totalSize` exists after the INSERT INTO command, so here `sizeInBytes` doesn't change after ANALYZE command, only rowCount is added.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19932
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19932
  
    **[Test build #84680 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84680/testReport)** for PR 19932 at commit [`48b81b5`](https://github.com/apache/spark/commit/48b81b5065808ffeff99142a03cd59bf54a9ea5d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19932#discussion_r155936167
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
    @@ -353,15 +374,6 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
           createPartition("2010-01-02", 11,
             "SELECT '1', 'A' from src UNION ALL SELECT '1', 'A' from src")
     
    -      sql(s"ANALYZE TABLE $tableName PARTITION (ds='2010-01-01') COMPUTE STATISTICS NOSCAN")
    -
    -      assertPartitionStats("2010-01-01", "10", rowCount = None, sizeInBytes = 2000)
    -      assertPartitionStats("2010-01-01", "11", rowCount = None, sizeInBytes = 2000)
    -      assert(queryStats("2010-01-02", "10") === None)
    -      assert(queryStats("2010-01-02", "11") === None)
    --- End diff --
    
    After the change, these checks are not right as we read hive stats. So I remove them.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19932


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19932
  
    **[Test build #84680 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84680/testReport)** for PR 19932 at commit [`48b81b5`](https://github.com/apache/spark/commit/48b81b5065808ffeff99142a03cd59bf54a9ea5d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19932#discussion_r155935430
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
    @@ -213,6 +213,29 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
         }
       }
     
    +  test("SPARK- - read Hive's statistics for partition") {
    --- End diff --
    
    SPARK- -> SPARK-22745?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19932#discussion_r156546885
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -1021,8 +998,38 @@ private[hive] object HiveClientImpl {
             compressed = apiPartition.getSd.isCompressed,
             properties = Option(apiPartition.getSd.getSerdeInfo.getParameters)
               .map(_.asScala.toMap).orNull),
    -        parameters =
    -          if (hp.getParameters() != null) hp.getParameters().asScala.toMap else Map.empty)
    +      parameters = properties,
    +      stats = readHiveStats(properties))
    +  }
    +
    +  /**
    +   * Reads statistics from Hive.
    +   * Note that this statistics could be overridden by Spark's statistics if that's available.
    +   */
    +  private def readHiveStats(properties: Map[String, String]): Option[CatalogStatistics] = {
    +    val totalSize = properties.get(StatsSetupConst.TOTAL_SIZE).map(BigInt(_))
    +    val rawDataSize = properties.get(StatsSetupConst.RAW_DATA_SIZE).map(BigInt(_))
    +    val rowCount = properties.get(StatsSetupConst.ROW_COUNT).map(BigInt(_))
    +    // TODO: check if this estimate is valid for tables after partition pruning.
    --- End diff --
    
    good catch, we can remove this


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19932
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84829/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19932
  
    **[Test build #84819 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84819/testReport)** for PR 19932 at commit [`b80c8f3`](https://github.com/apache/spark/commit/b80c8f39ede82bc805352a5abeb5d7ec0dcb8df8).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19932
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19932
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19932
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84689/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19932
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19932#discussion_r156395848
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -1021,8 +998,38 @@ private[hive] object HiveClientImpl {
             compressed = apiPartition.getSd.isCompressed,
             properties = Option(apiPartition.getSd.getSerdeInfo.getParameters)
               .map(_.asScala.toMap).orNull),
    -        parameters =
    -          if (hp.getParameters() != null) hp.getParameters().asScala.toMap else Map.empty)
    +      parameters = properties,
    +      stats = readHiveStats(properties))
    +  }
    +
    +  /**
    +   * Reads statistics from Hive.
    +   * Note that this statistics could be overridden by Spark's statistics if that's available.
    +   */
    +  private def readHiveStats(properties: Map[String, String]): Option[CatalogStatistics] = {
    +    val totalSize = properties.get(StatsSetupConst.TOTAL_SIZE).map(BigInt(_))
    +    val rawDataSize = properties.get(StatsSetupConst.RAW_DATA_SIZE).map(BigInt(_))
    +    val rowCount = properties.get(StatsSetupConst.ROW_COUNT).map(BigInt(_))
    +    // TODO: check if this estimate is valid for tables after partition pruning.
    --- End diff --
    
    do we still need this TODO?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19932#discussion_r156396216
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -413,32 +413,7 @@ private[hive] class HiveClientImpl(
             case (key, _) => excludedTableProperties.contains(key)
           }
           val comment = properties.get("comment")
    -
    -      // Here we are reading statistics from Hive.
    -      // Note that this statistics could be overridden by Spark's statistics if that's available.
    -      val totalSize = properties.get(StatsSetupConst.TOTAL_SIZE).map(BigInt(_))
    -      val rawDataSize = properties.get(StatsSetupConst.RAW_DATA_SIZE).map(BigInt(_))
    -      val rowCount = properties.get(StatsSetupConst.ROW_COUNT).map(BigInt(_))
    -      // TODO: check if this estimate is valid for tables after partition pruning.
    -      // NOTE: getting `totalSize` directly from params is kind of hacky, but this should be
    -      // relatively cheap if parameters for the table are populated into the metastore.
    -      // Currently, only totalSize, rawDataSize, and rowCount are used to build the field `stats`
    -      // TODO: stats should include all the other two fields (`numFiles` and `numPartitions`).
    -      // (see StatsSetupConst in Hive)
    -      val stats =
    -        // When table is external, `totalSize` is always zero, which will influence join strategy.
    -        // So when `totalSize` is zero, use `rawDataSize` instead. When `rawDataSize` is also zero,
    -        // return None.
    -        // In Hive, when statistics gathering is disabled, `rawDataSize` and `numRows` is always
    -        // zero after INSERT command. So they are used here only if they are larger than zero.
    -        if (totalSize.isDefined && totalSize.get > 0L) {
    -          Some(CatalogStatistics(sizeInBytes = totalSize.get, rowCount = rowCount.filter(_ > 0)))
    -        } else if (rawDataSize.isDefined && rawDataSize.get > 0) {
    -          Some(CatalogStatistics(sizeInBytes = rawDataSize.get, rowCount = rowCount.filter(_ > 0)))
    -        } else {
    -          // TODO: still fill the rowCount even if sizeInBytes is empty. Might break anything?
    -          None
    -        }
    +      val hiveStats = readHiveStats(properties)
    --- End diff --
    
    nit: we can inline it


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19932
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19932
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19932
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84680/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19932
  
    **[Test build #84829 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84829/testReport)** for PR 19932 at commit [`b80c8f3`](https://github.com/apache/spark/commit/b80c8f39ede82bc805352a5abeb5d7ec0dcb8df8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19932#discussion_r155936087
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
    @@ -213,6 +213,29 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
         }
       }
     
    +  test("SPARK- - read Hive's statistics for partition") {
    --- End diff --
    
    oh, I forgot it, thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19932#discussion_r156397168
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
    @@ -213,6 +213,27 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
         }
       }
     
    +  test("SPARK-22745 - read Hive's statistics for partition") {
    +    val tableName = "hive_stats_part_table"
    +    withTable(tableName) {
    +      sql(s"CREATE TABLE $tableName (key STRING, value STRING) PARTITIONED BY (ds STRING)")
    +      sql(s"INSERT INTO TABLE $tableName PARTITION (ds='2017-01-01') SELECT * FROM src")
    +      var partition = spark.sessionState.catalog
    +        .getPartition(TableIdentifier(tableName), Map("ds" -> "2017-01-01"))
    +
    +      assert(partition.stats.get.sizeInBytes == 5812)
    +      assert(partition.stats.get.rowCount.isEmpty)
    +
    +      hiveClient
    +        .runSqlHive(s"ANALYZE TABLE $tableName PARTITION (ds='2017-01-01') COMPUTE STATISTICS")
    +      partition = spark.sessionState.catalog
    +        .getPartition(TableIdentifier(tableName), Map("ds" -> "2017-01-01"))
    +
    +      assert(partition.stats.get.sizeInBytes == 5812)
    --- End diff --
    
    I'm expecting `totalSize` is picked here and the `sizeInBytes` would be changed, did I miss something?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19932
  
    **[Test build #84819 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84819/testReport)** for PR 19932 at commit [`b80c8f3`](https://github.com/apache/spark/commit/b80c8f39ede82bc805352a5abeb5d7ec0dcb8df8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19932
  
    **[Test build #84689 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84689/testReport)** for PR 19932 at commit [`09a7c05`](https://github.com/apache/spark/commit/09a7c0594507ae6f14f3f016fdc407477e320107).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19932#discussion_r156396524
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -1011,6 +986,8 @@ private[hive] object HiveClientImpl {
        */
       def fromHivePartition(hp: HivePartition): CatalogTablePartition = {
         val apiPartition = hp.getTPartition
    +    val properties: Map[String, String] =
    +      if (hp.getParameters != null) hp.getParameters.asScala.toMap else Map.empty
    --- End diff --
    
    nit: if can't fit in one line, prefer
    ```
    val xxx = if {
      ...
    } else {
      ...
    }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/19932
  
    cc @cloud-fan @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19932
  
    **[Test build #84689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84689/testReport)** for PR 19932 at commit [`09a7c05`](https://github.com/apache/spark/commit/09a7c0594507ae6f14f3f016fdc407477e320107).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org