You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2017/02/07 17:37:34 UTC

[GitHub] spark pull request #16837: [SPARK-19359][SQL] renaming partition should not ...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/16837

    [SPARK-19359][SQL] renaming partition should not leave useless directories

    ## What changes were proposed in this pull request?
    
    when we rename a nested path, different file systems have different behaviors. e.g. in jenkins, renaming `a=1/b=2` to `A=2/B=2` will success, but leave an empty directory `a=1`. in mac os, the renaming will fail and result to `a=1/B=2`.
    
    This PR rename the partition path recursively from the first partition column, to be most compatible with different file systems.
    
    ## How was this patch tested?
    
    new regression test

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark partition

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16837.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16837
    
----
commit 47254448c60cbf92921a8140cf0f513a3329b0ff
Author: Wenchen Fan <we...@databricks.com>
Date:   2017-02-07T17:28:23Z

    renaming partition should not leave useless directories

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16837: [SPARK-19359][SQL] renaming partition should not ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16837#discussion_r99999328
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -892,21 +892,59 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
         val hasUpperCasePartitionColumn = partitionColumnNames.exists(col => col.toLowerCase != col)
         if (tableMeta.tableType == MANAGED && hasUpperCasePartitionColumn) {
           val tablePath = new Path(tableMeta.location)
    +      val fs = tablePath.getFileSystem(hadoopConf)
           val newParts = newSpecs.map { spec =>
    +        val rightPath = renamePartitionDirectory(fs, tablePath, partitionColumnNames, spec)
             val partition = client.getPartition(db, table, lowerCasePartitionSpec(spec))
    -        val wrongPath = new Path(partition.location)
    -        val rightPath = ExternalCatalogUtils.generatePartitionPath(
    -          spec, partitionColumnNames, tablePath)
    +        partition.copy(storage = partition.storage.copy(locationUri = Some(rightPath.toString)))
    +      }
    +      alterPartitions(db, table, newParts)
    +    }
    +  }
    +
    +  /**
    +   * Rename the partition directory w.r.t. the actual partition columns.
    +   *
    +   * It will recursively rename the partition directory from the first partition column, to be most
    +   * compatible with different file systems. e.g. in some file systems, renaming `a=1/b=2` to
    +   * `A=1/B=2` will result to `a=1/B=2`, while in some other file systems, the renaming works, but
    +   * will leave an empty directory `a=1`.
    +   */
    +  private def renamePartitionDirectory(
    +      fs: FileSystem,
    +      tablePath: Path,
    +      partCols: Seq[String],
    +      newSpec: TablePartitionSpec): Path = {
    +    import ExternalCatalogUtils.getPartitionPathString
    +
    +    var expectedPartitionPath = tablePath
    +    var actualPartitionPath = tablePath
    +    partCols.foreach { col =>
    +      val partitionValue = newSpec(col)
    +      val expectedPartitionPathString = getPartitionPathString(col, partitionValue)
    +      val actualPartitionPathString = getPartitionPathString(col.toLowerCase, partitionValue)
    +
    +      expectedPartitionPath = new Path(expectedPartitionPath, expectedPartitionPathString)
    +      actualPartitionPath = new Path(actualPartitionPath, actualPartitionPathString)
    --- End diff --
    
    oh yea, we should always rename directory within one nested level.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    **[Test build #72617 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72617/testReport)** for PR 16837 at commit [`2c4d3c7`](https://github.com/apache/spark/commit/2c4d3c71e971bd039936c43143718ba7091d6113).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16837: [SPARK-19359][SQL] renaming partition should not ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16837#discussion_r100008693
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/PartitionProviderCompatibilitySuite.scala ---
    @@ -481,4 +484,30 @@ class PartitionProviderCompatibilitySuite
           assert(spark.sql("show partitions test").count() == 5)
         }
       }
    +
    +  test("SPARK-19359: renaming partition should not leave useless directories") {
    +    withTable("t", "t1") {
    +      Seq((1, 2, 3)).toDF("id", "A", "B").write.partitionBy("A", "B").saveAsTable("t")
    +      spark.sql("alter table t partition(A=2, B=3) rename to partition(A=4, B=5)")
    +
    +      var table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
    +      var tablePath = new Path(table.location)
    +      val fs = tablePath.getFileSystem(spark.sessionState.newHadoopConf())
    +      // the `A=2` directory is still there, we follow this behavior from hive.
    +      assert(fs.listStatus(tablePath)
    +        .filterNot(_.getPath.toString.contains("A=2")).count(_.isDirectory) == 1)
    +      assert(fs.listStatus(new Path(tablePath, "A=4")).count(_.isDirectory) == 1)
    +
    +
    +      Seq((1, 2, 3, 4)).toDF("id", "A", "b", "C").write.partitionBy("A", "b", "C").saveAsTable("t1")
    +      spark.sql("alter table t1 partition(A=2, b=3, C=4) rename to partition(A=4, b=5, C=6)")
    +      table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t1"))
    +      tablePath = new Path(table.location)
    +      // the `A=2` directory is still there, we follow this behavior from hive.
    +      assert(fs.listStatus(tablePath)
    +        .filterNot(_.getPath.toString.contains("A=2")).count(_.isDirectory) == 1)
    --- End diff --
    
    If going to check `A=2` directory exist, I think here is `filter`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16837: [SPARK-19359][SQL] renaming partition should not ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16837


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    **[Test build #72555 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72555/testReport)** for PR 16837 at commit [`d8f2ba5`](https://github.com/apache/spark/commit/d8f2ba5eac229d906e2d69117c97006a7d372716).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    **[Test build #72578 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72578/testReport)** for PR 16837 at commit [`329886e`](https://github.com/apache/spark/commit/329886e54d3a70e2314d67b6b6060fc33cef9b8d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72547/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    **[Test build #72524 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72524/testReport)** for PR 16837 at commit [`953cb9e`](https://github.com/apache/spark/commit/953cb9eec3894930f91473dcb12dfc1843d2f379).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72578/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    **[Test build #72523 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72523/testReport)** for PR 16837 at commit [`4725444`](https://github.com/apache/spark/commit/47254448c60cbf92921a8140cf0f513a3329b0ff).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    no it's only related to a hack in `HiveExternalCatalog`, I have updated the description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16837: [SPARK-19359][SQL] renaming partition should not ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16837#discussion_r99998413
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -892,21 +892,59 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
         val hasUpperCasePartitionColumn = partitionColumnNames.exists(col => col.toLowerCase != col)
         if (tableMeta.tableType == MANAGED && hasUpperCasePartitionColumn) {
           val tablePath = new Path(tableMeta.location)
    +      val fs = tablePath.getFileSystem(hadoopConf)
           val newParts = newSpecs.map { spec =>
    +        val rightPath = renamePartitionDirectory(fs, tablePath, partitionColumnNames, spec)
             val partition = client.getPartition(db, table, lowerCasePartitionSpec(spec))
    -        val wrongPath = new Path(partition.location)
    -        val rightPath = ExternalCatalogUtils.generatePartitionPath(
    -          spec, partitionColumnNames, tablePath)
    +        partition.copy(storage = partition.storage.copy(locationUri = Some(rightPath.toString)))
    +      }
    +      alterPartitions(db, table, newParts)
    +    }
    +  }
    +
    +  /**
    +   * Rename the partition directory w.r.t. the actual partition columns.
    +   *
    +   * It will recursively rename the partition directory from the first partition column, to be most
    +   * compatible with different file systems. e.g. in some file systems, renaming `a=1/b=2` to
    +   * `A=1/B=2` will result to `a=1/B=2`, while in some other file systems, the renaming works, but
    +   * will leave an empty directory `a=1`.
    +   */
    +  private def renamePartitionDirectory(
    +      fs: FileSystem,
    +      tablePath: Path,
    +      partCols: Seq[String],
    +      newSpec: TablePartitionSpec): Path = {
    +    import ExternalCatalogUtils.getPartitionPathString
    +
    +    var expectedPartitionPath = tablePath
    +    var actualPartitionPath = tablePath
    +    partCols.foreach { col =>
    +      val partitionValue = newSpec(col)
    +      val expectedPartitionPathString = getPartitionPathString(col, partitionValue)
    +      val actualPartitionPathString = getPartitionPathString(col.toLowerCase, partitionValue)
    +
    +      expectedPartitionPath = new Path(expectedPartitionPath, expectedPartitionPathString)
    +      actualPartitionPath = new Path(actualPartitionPath, actualPartitionPathString)
    --- End diff --
    
    Since we rename previous `actualPartitionPath` to `expectedPartitionPath` in last run, `actualPartitionPath` doesn't exist anymore.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16837: [SPARK-19359][SQL] renaming partition should not ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16837#discussion_r100174684
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -892,21 +892,58 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
         val hasUpperCasePartitionColumn = partitionColumnNames.exists(col => col.toLowerCase != col)
         if (tableMeta.tableType == MANAGED && hasUpperCasePartitionColumn) {
           val tablePath = new Path(tableMeta.location)
    +      val fs = tablePath.getFileSystem(hadoopConf)
           val newParts = newSpecs.map { spec =>
    +        val rightPath = renamePartitionDirectory(fs, tablePath, partitionColumnNames, spec)
             val partition = client.getPartition(db, table, lowerCasePartitionSpec(spec))
    -        val wrongPath = new Path(partition.location)
    -        val rightPath = ExternalCatalogUtils.generatePartitionPath(
    -          spec, partitionColumnNames, tablePath)
    +        partition.copy(storage = partition.storage.copy(locationUri = Some(rightPath.toString)))
    +      }
    +      alterPartitions(db, table, newParts)
    +    }
    +  }
    +
    +  /**
    +   * Rename the partition directory w.r.t. the actual partition columns.
    +   *
    +   * It will recursively rename the partition directory from the first partition column, to be most
    +   * compatible with different file systems. e.g. in some file systems, renaming `a=1/b=2` to
    +   * `A=1/B=2` will result to `a=1/B=2`, while in some other file systems, the renaming works, but
    +   * will leave an empty directory `a=1`.
    +   */
    +  private def renamePartitionDirectory(
    +      fs: FileSystem,
    +      tablePath: Path,
    +      partCols: Seq[String],
    +      newSpec: TablePartitionSpec): Path = {
    +    import ExternalCatalogUtils.getPartitionPathString
    +
    +    var totalPath = tablePath
    --- End diff --
    
    How about `currFullPath` or `fullPath`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    Does this change not require changing the other external catalog?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by windpiger <gi...@git.apache.org>.
Github user windpiger commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    LGTM,it is a better and safer way to process the useless dirs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16837: [SPARK-19359][SQL] renaming partition should not ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16837#discussion_r99981498
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -892,21 +892,61 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
         val hasUpperCasePartitionColumn = partitionColumnNames.exists(col => col.toLowerCase != col)
         if (tableMeta.tableType == MANAGED && hasUpperCasePartitionColumn) {
           val tablePath = new Path(tableMeta.location)
    +      val fs = tablePath.getFileSystem(hadoopConf)
           val newParts = newSpecs.map { spec =>
    +        val rightPath = renamePartitionDirectory(fs, tablePath, partitionColumnNames, spec)
             val partition = client.getPartition(db, table, lowerCasePartitionSpec(spec))
    -        val wrongPath = new Path(partition.location)
    -        val rightPath = ExternalCatalogUtils.generatePartitionPath(
    -          spec, partitionColumnNames, tablePath)
    +        partition.copy(storage = partition.storage.copy(locationUri = Some(rightPath.toString)))
    +      }
    +      alterPartitions(db, table, newParts)
    +    }
    +  }
    +
    +  /**
    +   * Rename the partition directory w.r.t. the actual partition columns.
    +   *
    +   * It will recursively rename the partition directory from the first partition column, to be most
    +   * compatible with different file systems. e.g. in some file systems, renaming `a=1/b=2` to
    +   * `A=1/B=2` will result to `a=1/B=2`, while in some other file systems, the renaming works, but
    +   * will leave an empty directory `a=1`.
    +   */
    +  private def renamePartitionDirectory(
    +      fs: FileSystem,
    +      tablePath: Path,
    +      partCols: Seq[String],
    +      newSpec: TablePartitionSpec): Path = {
    +    import ExternalCatalogUtils._
    +
    +    var totalPath = tablePath
    +    partCols.foreach { col =>
    +      val partitionValue = newSpec(col)
    +      val partitionString = if (partitionValue == null) {
    +        DEFAULT_PARTITION_NAME
    +      } else {
    +        escapePathName(partitionValue)
    +      }
    +
    +      val nextPartPath = new Path(totalPath, escapePathName(col) + "=" + partitionString)
    +      if (fs.exists(nextPartPath)) {
    +        // It is possible that some parental partition directories already exist or doesn't need to
    +        // be renamed. e.g. the partition columns are `a` and `B`, then we don't need to rename
    +        // `/table_path/a=1`. Or we already have a partition directory `A=1/B=2`, and we rename
    +        // another partition to `A=1/B=3`, then we will have `A=1/B=2` and `a=1/b=3`, and we should
    +        // just move `a=1/b=3` into `A=1` with new name `B=3`.
    +      } else {
    +        val wrongPartPath = new Path(
    --- End diff --
    
    good catch!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    **[Test build #72555 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72555/testReport)** for PR 16837 at commit [`d8f2ba5`](https://github.com/apache/spark/commit/d8f2ba5eac229d906e2d69117c97006a7d372716).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    **[Test build #72547 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72547/testReport)** for PR 16837 at commit [`c1a7f1d`](https://github.com/apache/spark/commit/c1a7f1ddc5fbb0c3a0ab7da9eb77347812c422f7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    **[Test build #72524 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72524/testReport)** for PR 16837 at commit [`953cb9e`](https://github.com/apache/spark/commit/953cb9eec3894930f91473dcb12dfc1843d2f379).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72617/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72524/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72566/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16837: [SPARK-19359][SQL] renaming partition should not ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16837#discussion_r99980560
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -892,21 +892,61 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
         val hasUpperCasePartitionColumn = partitionColumnNames.exists(col => col.toLowerCase != col)
         if (tableMeta.tableType == MANAGED && hasUpperCasePartitionColumn) {
           val tablePath = new Path(tableMeta.location)
    +      val fs = tablePath.getFileSystem(hadoopConf)
           val newParts = newSpecs.map { spec =>
    +        val rightPath = renamePartitionDirectory(fs, tablePath, partitionColumnNames, spec)
             val partition = client.getPartition(db, table, lowerCasePartitionSpec(spec))
    -        val wrongPath = new Path(partition.location)
    -        val rightPath = ExternalCatalogUtils.generatePartitionPath(
    -          spec, partitionColumnNames, tablePath)
    +        partition.copy(storage = partition.storage.copy(locationUri = Some(rightPath.toString)))
    +      }
    +      alterPartitions(db, table, newParts)
    +    }
    +  }
    +
    +  /**
    +   * Rename the partition directory w.r.t. the actual partition columns.
    +   *
    +   * It will recursively rename the partition directory from the first partition column, to be most
    +   * compatible with different file systems. e.g. in some file systems, renaming `a=1/b=2` to
    +   * `A=1/B=2` will result to `a=1/B=2`, while in some other file systems, the renaming works, but
    +   * will leave an empty directory `a=1`.
    +   */
    +  private def renamePartitionDirectory(
    +      fs: FileSystem,
    +      tablePath: Path,
    +      partCols: Seq[String],
    +      newSpec: TablePartitionSpec): Path = {
    +    import ExternalCatalogUtils._
    +
    +    var totalPath = tablePath
    +    partCols.foreach { col =>
    +      val partitionValue = newSpec(col)
    +      val partitionString = if (partitionValue == null) {
    +        DEFAULT_PARTITION_NAME
    +      } else {
    +        escapePathName(partitionValue)
    +      }
    +
    +      val nextPartPath = new Path(totalPath, escapePathName(col) + "=" + partitionString)
    +      if (fs.exists(nextPartPath)) {
    +        // It is possible that some parental partition directories already exist or doesn't need to
    +        // be renamed. e.g. the partition columns are `a` and `B`, then we don't need to rename
    +        // `/table_path/a=1`. Or we already have a partition directory `A=1/B=2`, and we rename
    +        // another partition to `A=1/B=3`, then we will have `A=1/B=2` and `a=1/b=3`, and we should
    +        // just move `a=1/b=3` into `A=1` with new name `B=3`.
    +      } else {
    +        val wrongPartPath = new Path(
    --- End diff --
    
    The basepath of `wrongPartPath` needs to update with the column in previous iteration.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    **[Test build #72566 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72566/testReport)** for PR 16837 at commit [`329886e`](https://github.com/apache/spark/commit/329886e54d3a70e2314d67b6b6060fc33cef9b8d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72555/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    LGTM except one comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    **[Test build #72617 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72617/testReport)** for PR 16837 at commit [`2c4d3c7`](https://github.com/apache/spark/commit/2c4d3c71e971bd039936c43143718ba7091d6113).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    **[Test build #72578 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72578/testReport)** for PR 16837 at commit [`329886e`](https://github.com/apache/spark/commit/329886e54d3a70e2314d67b6b6060fc33cef9b8d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72523/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    cc @gatorsmile @windpiger @viirya 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    **[Test build #72523 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72523/testReport)** for PR 16837 at commit [`4725444`](https://github.com/apache/spark/commit/47254448c60cbf92921a8140cf0f513a3329b0ff).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16837: [SPARK-19359][SQL] renaming partition should not ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16837#discussion_r100038960
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/PartitionProviderCompatibilitySuite.scala ---
    @@ -481,4 +484,30 @@ class PartitionProviderCompatibilitySuite
           assert(spark.sql("show partitions test").count() == 5)
         }
       }
    +
    +  test("SPARK-19359: renaming partition should not leave useless directories") {
    +    withTable("t", "t1") {
    +      Seq((1, 2, 3)).toDF("id", "A", "B").write.partitionBy("A", "B").saveAsTable("t")
    +      spark.sql("alter table t partition(A=2, B=3) rename to partition(A=4, B=5)")
    +
    +      var table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
    +      var tablePath = new Path(table.location)
    +      val fs = tablePath.getFileSystem(spark.sessionState.newHadoopConf())
    +      // the `A=2` directory is still there, we follow this behavior from hive.
    +      assert(fs.listStatus(tablePath)
    +        .filterNot(_.getPath.toString.contains("A=2")).count(_.isDirectory) == 1)
    +      assert(fs.listStatus(new Path(tablePath, "A=4")).count(_.isDirectory) == 1)
    +
    +
    +      Seq((1, 2, 3, 4)).toDF("id", "A", "b", "C").write.partitionBy("A", "b", "C").saveAsTable("t1")
    +      spark.sql("alter table t1 partition(A=2, b=3, C=4) rename to partition(A=4, b=5, C=6)")
    +      table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t1"))
    +      tablePath = new Path(table.location)
    +      // the `A=2` directory is still there, we follow this behavior from hive.
    +      assert(fs.listStatus(tablePath)
    +        .filterNot(_.getPath.toString.contains("A=2")).count(_.isDirectory) == 1)
    --- End diff --
    
    I wanna check the number of partition directories except `A=2`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16837
  
    **[Test build #72547 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72547/testReport)** for PR 16837 at commit [`c1a7f1d`](https://github.com/apache/spark/commit/c1a7f1ddc5fbb0c3a0ab7da9eb77347812c422f7).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org