You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2017/07/18 15:07:49 UTC

[GitHub] spark pull request #18671: [SPARK-21457][SQL] ExternalCatalog.listPartitions...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/18671

    [SPARK-21457][SQL] ExternalCatalog.listPartitions should correctly handle partition values with dot

    ## What changes were proposed in this pull request?
    
    When we list partitions from hive metastore with a partial partition spec, we are expecting exact matching according to the partition values. However, hive treats dot specially and match any single character for dot. We should do an extra filter to drop unexpected partitions.
    
    ## How was this patch tested?
    
    new regression test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark hive

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18671.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18671
    
----
commit 43c564c3783a8684d8f51902d32c5297df736219
Author: Wenchen Fan <we...@databricks.com>
Date:   2017-07-18T14:57:25Z

    ExternalCatalog.listPartitions should correctly handle partition values with dot

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18671: [SPARK-21457][SQL] ExternalCatalog.listPartitions should...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18671
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79709/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18671: [SPARK-21457][SQL] ExternalCatalog.listPartitions should...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18671
  
    **[Test build #79709 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79709/testReport)** for PR 18671 at commit [`43c564c`](https://github.com/apache/spark/commit/43c564c3783a8684d8f51902d32c5297df736219).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18671: [SPARK-21457][SQL] ExternalCatalog.listPartitions should...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18671
  
    Thanks! Merging to master/2.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18671: [SPARK-21457][SQL] ExternalCatalog.listPartitions should...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18671
  
    cc @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18671: [SPARK-21457][SQL] ExternalCatalog.listPartitions should...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18671
  
    **[Test build #79709 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79709/testReport)** for PR 18671 at commit [`43c564c`](https://github.com/apache/spark/commit/43c564c3783a8684d8f51902d32c5297df736219).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18671: [SPARK-21457][SQL] ExternalCatalog.listPartitions should...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18671
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18671: [SPARK-21457][SQL] ExternalCatalog.listPartitions...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18671


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18671: [SPARK-21457][SQL] ExternalCatalog.listPartitions...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18671#discussion_r128005142
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -1088,9 +1088,19 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
           table: String,
           partialSpec: Option[TablePartitionSpec] = None): Seq[CatalogTablePartition] = withClient {
         val partColNameMap = buildLowerCasePartColNameMap(getTable(db, table))
    -    client.getPartitions(db, table, partialSpec.map(lowerCasePartitionSpec)).map { part =>
    +    val res = client.getPartitions(db, table, partialSpec.map(lowerCasePartitionSpec)).map { part =>
           part.copy(spec = restorePartitionSpec(part.spec, partColNameMap))
         }
    +
    +    partialSpec match {
    +      // This might be a bug of Hive: When the partition value inside the partial partition spec
    +      // contains dot, and we ask Hive to list partitions w.r.t. the partial partition spec, Hive
    +      // treats dot as matching any single character and may return more partitions than we
    +      // expected. Here we do an extra filter to drop unexpected partitions.
    +      case Some(spec) if spec.exists(_._2.contains(".")) =>
    --- End diff --
    
    I tried other special chars but seems dot is the only one having the problem.
    
    And seems Hive does want to do exact matching: https://github.com/apache/hive/blob/release-1.2.1/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1529-L1535


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org