You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jiangxb1987 <gi...@git.apache.org> on 2017/10/06 11:25:38 UTC

[GitHub] spark pull request #19444: [SPARK-22214][SQL] Refactor the list hive partiti...

GitHub user jiangxb1987 opened a pull request:

    https://github.com/apache/spark/pull/19444

    [SPARK-22214][SQL] Refactor the list hive partitions code

    ## What changes were proposed in this pull request?
    
    In this PR we make a few changes to the list hive partitions code, to make the code more extensible.
    The following changes are made:
    1. In `HiveClientImpl.getPartitions()`, call `client.getPartitions` instead of `shim.getAllPartitions` when `spec` is empty;
    2. In `HiveTableScanExec`, previously we always call `listPartitionsByFilter` if the config `metastorePartitionPruning` is enabled, but actually, we'd better call `listPartitions` if `partitionPruningPred` is empty;
    3.  We should use sessionCatalog instead of SharedState.externalCatalog in `HiveTableScanExec`.
    
    ## How was this patch tested?
    
    Tested by existing test cases since this is code refactor, no regression or behavior change is expected.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jiangxb1987/spark hivePartitions

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19444.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19444
    
----
commit 8f50c7c47934a8dca662e8e2d5eacbc0b394eaa5
Author: Xingbo Jiang <xi...@databricks.com>
Date:   2017-10-06T11:04:29Z

    refactor list hive partitions.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19444: [SPARK-22214][SQL] Refactor the list hive partitions cod...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/19444
  
    cc @gatorsmile @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19444: [SPARK-22214][SQL] Refactor the list hive partitions cod...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19444
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19444: [SPARK-22214][SQL] Refactor the list hive partiti...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19444#discussion_r143168926
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -638,12 +638,14 @@ private[hive] class HiveClientImpl(
           table: CatalogTable,
           spec: Option[TablePartitionSpec]): Seq[CatalogTablePartition] = withHiveState {
         val hiveTable = toHiveTable(table, Some(userName))
    -    val parts = spec match {
    -      case None => shim.getAllPartitions(client, hiveTable).map(fromHivePartition)
    --- End diff --
    
    After this change, `HiveShim.getAllPartitions` is only used to support `HiveShim.getPartitionsByFilter` for hive 0.12, we may consider completely remove the method in the future.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19444: [SPARK-22214][SQL] Refactor the list hive partiti...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19444#discussion_r143390226
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala ---
    @@ -405,6 +405,11 @@ object CatalogTypes {
        * Specifications of a table partition. Mapping column name to column value.
        */
       type TablePartitionSpec = Map[String, String]
    +
    +  /**
    +   * Initialize an empty spec.
    +   */
    +  lazy val emptyTablePartitionSpec: TablePartitionSpec = Map.empty[String, String]
    --- End diff --
    
    We wanted to refer the val `emptyTablePartitionSpec` as `TablePartitionSpec`, not `Map[String, String]`, though they are equal.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19444: [SPARK-22214][SQL] Refactor the list hive partitions cod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19444
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19444: [SPARK-22214][SQL] Refactor the list hive partitions cod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19444
  
    **[Test build #82519 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82519/testReport)** for PR 19444 at commit [`1e119ea`](https://github.com/apache/spark/commit/1e119ea7ae414e27ad0832cee347a20d24f1c0cb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19444: [SPARK-22214][SQL] Refactor the list hive partitions cod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19444
  
    **[Test build #82509 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82509/testReport)** for PR 19444 at commit [`8f50c7c`](https://github.com/apache/spark/commit/8f50c7c47934a8dca662e8e2d5eacbc0b394eaa5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19444: [SPARK-22214][SQL] Refactor the list hive partiti...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19444#discussion_r143386555
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala ---
    @@ -405,6 +405,11 @@ object CatalogTypes {
        * Specifications of a table partition. Mapping column name to column value.
        */
       type TablePartitionSpec = Map[String, String]
    +
    +  /**
    +   * Initialize an empty spec.
    +   */
    +  lazy val emptyTablePartitionSpec: TablePartitionSpec = Map.empty[String, String]
    --- End diff --
    
    `Map.empty` is already an object, I think we can jus inline it


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19444: [SPARK-22214][SQL] Refactor the list hive partitions cod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19444
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82509/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19444: [SPARK-22214][SQL] Refactor the list hive partitions cod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19444
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82519/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19444: [SPARK-22214][SQL] Refactor the list hive partitions cod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19444
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19444: [SPARK-22214][SQL] Refactor the list hive partitions cod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19444
  
    **[Test build #82519 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82519/testReport)** for PR 19444 at commit [`1e119ea`](https://github.com/apache/spark/commit/1e119ea7ae414e27ad0832cee347a20d24f1c0cb).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19444: [SPARK-22214][SQL] Refactor the list hive partitions cod...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19444
  
    LGTM except a minor comment.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19444: [SPARK-22214][SQL] Refactor the list hive partiti...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19444


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19444: [SPARK-22214][SQL] Refactor the list hive partitions cod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19444
  
    **[Test build #82509 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82509/testReport)** for PR 19444 at commit [`8f50c7c`](https://github.com/apache/spark/commit/8f50c7c47934a8dca662e8e2d5eacbc0b394eaa5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19444: [SPARK-22214][SQL] Refactor the list hive partitions cod...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19444
  
    Thanks! Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19444: [SPARK-22214][SQL] Refactor the list hive partiti...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19444#discussion_r143235124
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -638,12 +638,14 @@ private[hive] class HiveClientImpl(
           table: CatalogTable,
           spec: Option[TablePartitionSpec]): Seq[CatalogTablePartition] = withHiveState {
         val hiveTable = toHiveTable(table, Some(userName))
    -    val parts = spec match {
    -      case None => shim.getAllPartitions(client, hiveTable).map(fromHivePartition)
    +    val partialPartSpec = spec match {
    --- End diff --
    
    -> `partSpec`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org