You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by windpiger <gi...@git.apache.org> on 2017/01/25 07:51:25 UTC

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

GitHub user windpiger opened a pull request:

    https://github.com/apache/spark/pull/16700

    [SPARK-19359][SQL]clear useless path after rename a partition with upper-case by HiveExternalCatalog

    ## What changes were proposed in this pull request?
    
    Hive metastore is not case preserving and keep partition columns with lower case names. 
    
    If SparkSQL create a table with upper-case partion name use HiveExternalCatalog, when we rename partition, it first call the HiveClient to renamePartition, which will create a new lower case partition path, then SparkSql rename the lower case path to the upper-case.
    
    while if the renamed partition contains more than one depth partition ,e.g. A=1/B=2, hive renamePartition change to a=1/b=2, then SparkSql rename it to A=1/B=2, but the a=1 still exists in the filesystem, we should also delete it.
    
    ## How was this patch tested?
    unit test added


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/windpiger/spark clearUselessPathAfterRenamPartition

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16700.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16700
    
----
commit 6a8efddfb44f3f72c15ff6c20cd6ce341bec6da7
Author: windpiger <so...@outlook.com>
Date:   2017-01-25T07:44:20Z

    [SPARK-19359][SQL]clear useless path after rename a partition with upper-case in HiveExternalCatalog

commit 878d45eb81ad9b1b5486689a527c0c2cdfe81b81
Author: windpiger <so...@outlook.com>
Date:   2017-01-25T07:51:07Z

    reset a tc

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    **[Test build #72026 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72026/testReport)** for PR 16700 at commit [`7aba059`](https://github.com/apache/spark/commit/7aba0592b78a3bccd44632a24ad83e6ef520478c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r98134323
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -839,6 +839,25 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
         spec.map { case (k, v) => partCols.find(_.equalsIgnoreCase(k)).get -> v }
       }
     
    +
    +  /**
    +   * The partition path created by Hive is in lowercase, while Spark SQL will
    +   * rename it with the partition name in partitionColumnNames, and this function
    +   * returns the extra lowercase path created by Hive, and then we can delete it.
    +   * e.g. /path/A=1/B=2/C=3 is changed to /path/A=4/B=5/C=6, this function returns is
    --- End diff --
    
    `returns is` -> `returns`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by windpiger <gi...@git.apache.org>.

Github user windpiger commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    **[Test build #71978 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71978/testReport)** for PR 16700 at commit [`878d45e`](https://github.com/apache/spark/commit/878d45eb81ad9b1b5486689a527c0c2cdfe81b81).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r97933674
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala ---
    @@ -120,6 +120,17 @@ object ExternalCatalogUtils {
           new Path(totalPath, nextPartPath)
         }
       }
    +
    +  def getExtraPartPathCreatedByHive(
    +                             lowerCaseSpec: TablePartitionSpec,
    +                             partitionColumnNames: Seq[String],
    +                             tablePath: Path): Path = {
    --- End diff --
    
    Please fix the indent issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r98317719
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -899,6 +919,21 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
               spec, partitionColumnNames, tablePath)
             try {
               tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
    +
    +          // If the newSpec contains more than one depth partition, FileSystem.rename just deletes
    +          // the leaf(i.e. wrongPath), we should check if wrongPath's parents need to be deleted.
    +          // For example, give a newSpec 'A=1/B=2', after calling Hive's client.renamePartitions,
    --- End diff --
    
    `give` -> `given`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    **[Test build #72025 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72025/testReport)** for PR 16700 at commit [`12acdc6`](https://github.com/apache/spark/commit/12acdc6096def8c9c21e007b2d22658dbe33b364).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r98055435
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -839,6 +839,25 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
         spec.map { case (k, v) => partCols.find(_.equalsIgnoreCase(k)).get -> v }
       }
     
    +
    +  /**
    +   * partition path created by Hive is lower-case, while Spark SQL will
    +   * rename it with the partition name in partitionColumnNames, and this function
    +   * return the extra lower-case path created by Hive, and then we can delete it.
    --- End diff --
    
    `return` -> `returns`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r98057316
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -899,6 +918,21 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
               spec, partitionColumnNames, tablePath)
             try {
               tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
    +
    +          // if the newSpec contains more than one depth partitoin, FileSystem.rename just delete
    +          // only one path(wrongPath), we should check if wrongPath's parents need to be deleted.
    --- End diff --
    
    `only one path(wrongPath)` -> `the leaf (i.e., wrongPath)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72017/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    **[Test build #72062 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72062/testReport)** for PR 16700 at commit [`0136388`](https://github.com/apache/spark/commit/01363887d4b755c14da1e14915fab124f6550e6d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r97938408
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala ---
    @@ -120,6 +120,17 @@ object ExternalCatalogUtils {
           new Path(totalPath, nextPartPath)
         }
       }
    +
    +  def getExtraPartPathCreatedByHive(
    --- End diff --
    
    Happy Chinese New Year


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r98325213
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -899,6 +919,21 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
               spec, partitionColumnNames, tablePath)
             try {
               tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
    +
    +          // If the newSpec contains more than one depth partition, FileSystem.rename just deletes
    +          // the leaf(i.e. wrongPath), we should check if wrongPath's parents need to be deleted.
    +          // For example, give a newSpec 'A=1/B=2', after calling Hive's client.renamePartitions,
    +          // the location path in FileSystem is changed to 'a=1/b=2', which is wrongPath, then
    +          // although we renamed it to 'A=1/B=2', 'a=1/b=2' in FileSystem is deleted, but 'a=1'
    +          // is still exists, which we also need to delete
    +          val delHivePartPathAfterRename = getExtraPartPathCreatedByHive(
    --- End diff --
    
    `client.renamePartitions` is called at the beginning of `renamePartitions` for all specs at once. It creates the directory `a=1` and `a=1/b=2` and `a=1/b=3`.
    
    When you iterates specs and rename the directories with FileSystem.rename, in the first iteration, `a=1/b=2` is renamed, and `a=1` is deleted in this change, then `a=1/b=3` will be deleted too. So in next iteration, the renaming of `a=1/b=3` to `A=1/B=3` will fail.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72062/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r98327523
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -899,6 +919,21 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
               spec, partitionColumnNames, tablePath)
             try {
               tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
    --- End diff --
    
    Found an issue here... When we call `rename`, not all the file systems have the same behaviors. For example, on mac OS, when we doing this `.../tbl/a=5/b=6` -> `.../tbl/A=5/B=6` . The result is `.../tbl/a=5/B=6`. Thus, it is not recursive. However, the file system used in Jenkin does not have such an issue. You can hit this issue if you are using macOS. Thus, this fix causes an regression, but the bug is not in your fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by windpiger <gi...@git.apache.org>.

Github user windpiger commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    cc @gatorsmile @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    **[Test build #72069 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72069/testReport)** for PR 16700 at commit [`de4c409`](https://github.com/apache/spark/commit/de4c409a9fbbe1f7fbed2d750f9d287e27470d3e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r98056949
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -899,6 +918,21 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
               spec, partitionColumnNames, tablePath)
             try {
               tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
    +
    +          // if the newSpec contains more than one depth partitoin, FileSystem.rename just delete
    --- End diff --
    
    `partitoin` -> `partition`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r99856024
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -899,6 +919,21 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
               spec, partitionColumnNames, tablePath)
             try {
               tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
    +
    +          // If the newSpec contains more than one depth partition, FileSystem.rename just deletes
    +          // the leaf(i.e. wrongPath), we should check if wrongPath's parents need to be deleted.
    +          // For example, give a newSpec 'A=1/B=2', after calling Hive's client.renamePartitions,
    +          // the location path in FileSystem is changed to 'a=1/b=2', which is wrongPath, then
    +          // although we renamed it to 'A=1/B=2', 'a=1/b=2' in FileSystem is deleted, but 'a=1'
    +          // is still exists, which we also need to delete
    +          val delHivePartPathAfterRename = getExtraPartPathCreatedByHive(
    --- End diff --
    
    this can be worse. If we already have a partition `A=1/B=2`, and we rename some other partition to `A=1/B=3`, then we will have `A=1/B=2` and `a=1/b=3`, and we have a lot of work to do, instead of just a renaming.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    **[Test build #72017 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72017/testReport)** for PR 16700 at commit [`40efce2`](https://github.com/apache/spark/commit/40efce2a607908ff06cce85c5e782ed3b1320546).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r97933750
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala ---
    @@ -120,6 +120,17 @@ object ExternalCatalogUtils {
           new Path(totalPath, nextPartPath)
         }
       }
    +
    +  def getExtraPartPathCreatedByHive(
    --- End diff --
    
    A general suggestion. When the function names become not self-descriptive. Write the function comments. 
    
    For example, here, please add comments for `generatePartitionPath` and `getExtraPartPathCreatedByHive`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by windpiger <gi...@git.apache.org>.

Github user windpiger commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r97938358
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala ---
    @@ -120,6 +120,17 @@ object ExternalCatalogUtils {
           new Path(totalPath, nextPartPath)
         }
       }
    +
    +  def getExtraPartPathCreatedByHive(
    --- End diff --
    
    ok ,thanks very much~ BTW, Happy Chinese New Year~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by windpiger <gi...@git.apache.org>.

Github user windpiger commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r98131097
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -899,6 +918,21 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
               spec, partitionColumnNames, tablePath)
             try {
               tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
    +
    +          // if the newSpec contains more than one depth partitoin, FileSystem.rename just delete
    +          // only one path(wrongPath), we should check if wrongPath's parents need to be deleted.
    --- End diff --
    
    Thanks a lot! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r98317808
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -899,6 +919,21 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
               spec, partitionColumnNames, tablePath)
             try {
               tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
    +
    +          // If the newSpec contains more than one depth partition, FileSystem.rename just deletes
    +          // the leaf(i.e. wrongPath), we should check if wrongPath's parents need to be deleted.
    +          // For example, give a newSpec 'A=1/B=2', after calling Hive's client.renamePartitions,
    +          // the location path in FileSystem is changed to 'a=1/b=2', which is wrongPath, then
    --- End diff --
    
    `, then` -> `. Then`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71978/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72066/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by windpiger <gi...@git.apache.org>.

Github user windpiger commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    oh, sorry I see your comments just now...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    **[Test build #72020 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72020/testReport)** for PR 16700 at commit [`40efce2`](https://github.com/apache/spark/commit/40efce2a607908ff06cce85c5e782ed3b1320546).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    **[Test build #72025 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72025/testReport)** for PR 16700 at commit [`12acdc6`](https://github.com/apache/spark/commit/12acdc6096def8c9c21e007b2d22658dbe33b364).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    **[Test build #71987 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71987/testReport)** for PR 16700 at commit [`5403595`](https://github.com/apache/spark/commit/54035956b2b98301c6635923c0ffc88bc88f979c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r98134168
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -899,6 +918,22 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
               spec, partitionColumnNames, tablePath)
             try {
               tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
    +
    +          // If the newSpec contains more than one depth partition, FileSystem.rename just deletes
    +          // the leaf(i.e. wrongPath), we should check if wrongPath's parents need to be deleted.
    +          // for example:
    +          // newSpec is 'A=1/B=2', after renamePartitions by Hive, the location path in FileSystem
    +          // is changed to 'a=1/b=2', which is wrongPath, then we renamed to 'A=1/B=2', and
    +          // 'a=1/b=2' in FileSystem is deleted, while 'a=1' is already exists,
    +          // which should also be deleted
    --- End diff --
    
    How about?
    > For example, give a newSpec 'A=1/B=2', after calling Hive's client.renamePartitions, the location path in FileSystem is changed to 'a=1/b=2', which is wrongPath. Then, although we renamed it to 'A=1/B=2', 'a=1/b=2' in FileSystem is deleted but 'a=1' still exists. We also need to delete the useless directory.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r97977055
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala ---
    @@ -120,6 +120,24 @@ object ExternalCatalogUtils {
           new Path(totalPath, nextPartPath)
         }
       }
    +
    +  /**
    +   * partition path created by Hive is lower-case, while Spark SQL will
    +   * rename it with the partition name in partitionColumnNames, and this function
    +   * return the extra lower-case path created by Hive, and then we can delete it.
    +   * e.g. /path/A=1/B=2/C=3 rename to /path/A=4/B=5/C=6, the extra path returned is
    +   * /path/a=4, which also include all its' child path, such as /path/a=4/b=2
    +   */
    +  def getExtraPartPathCreatedByHive(
    --- End diff --
    
    this is a hive only feature, why do we put it in `ExternalCatalogUtils`? How about the `object HiveExternalCatalog`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72020/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by windpiger <gi...@git.apache.org>.

Github user windpiger commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by windpiger <gi...@git.apache.org>.

Github user windpiger commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r98130978
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -839,6 +839,25 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
         spec.map { case (k, v) => partCols.find(_.equalsIgnoreCase(k)).get -> v }
       }
     
    +
    +  /**
    +   * partition path created by Hive is lower-case, while Spark SQL will
    +   * rename it with the partition name in partitionColumnNames, and this function
    +   * return the extra lower-case path created by Hive, and then we can delete it.
    +   * e.g. /path/A=1/B=2/C=3 rename to /path/A=4/B=5/C=6, the extra path returned is
    +   * /path/a=4, which also include all its' child path, such as /path/a=4/b=2
    +   */
    +  def getExtraPartPathCreatedByHive(
    +                                     lowerCaseSpec: TablePartitionSpec,
    +                                     partitionColumnNames: Seq[String],
    +                                     tablePath: Path): Path = {
    --- End diff --
    
    oh...sorry...thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r98322377
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -899,6 +919,21 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
               spec, partitionColumnNames, tablePath)
             try {
               tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
    +
    +          // If the newSpec contains more than one depth partition, FileSystem.rename just deletes
    +          // the leaf(i.e. wrongPath), we should check if wrongPath's parents need to be deleted.
    +          // For example, give a newSpec 'A=1/B=2', after calling Hive's client.renamePartitions,
    +          // the location path in FileSystem is changed to 'a=1/b=2', which is wrongPath, then
    +          // although we renamed it to 'A=1/B=2', 'a=1/b=2' in FileSystem is deleted, but 'a=1'
    +          // is still exists, which we also need to delete
    +          val delHivePartPathAfterRename = getExtraPartPathCreatedByHive(
    --- End diff --
    
    Hmmm, could it possibly have multiple specs sharing the same parent directory, e.g., 'A=1/B=2', 'A=1/B=3', ...?
    
    If so, when you delete the path 'a=1' here, in processing the next spec 'A=1/B=3', I think the rename will fail.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r98055321
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -839,6 +839,25 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
         spec.map { case (k, v) => partCols.find(_.equalsIgnoreCase(k)).get -> v }
       }
     
    +
    +  /**
    +   * partition path created by Hive is lower-case, while Spark SQL will
    +   * rename it with the partition name in partitionColumnNames, and this function
    +   * return the extra lower-case path created by Hive, and then we can delete it.
    +   * e.g. /path/A=1/B=2/C=3 rename to /path/A=4/B=5/C=6, the extra path returned is
    +   * /path/a=4, which also include all its' child path, such as /path/a=4/b=2
    +   */
    +  def getExtraPartPathCreatedByHive(
    +                                     lowerCaseSpec: TablePartitionSpec,
    +                                     partitionColumnNames: Seq[String],
    +                                     tablePath: Path): Path = {
    --- End diff --
    
    indent issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r98317671
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -839,6 +839,26 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
         spec.map { case (k, v) => partCols.find(_.equalsIgnoreCase(k)).get -> v }
       }
     
    +
    +  /**
    +   * The partition path created by Hive is in lowercase, while Spark SQL will
    +   * rename it with the partition name in partitionColumnNames, and this function
    +   * returns the extra lowercase path created by Hive, and then we can delete it.
    --- End diff --
    
    Nit: all of them are commas. You need to use periods. : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r98317696
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -839,6 +839,26 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
         spec.map { case (k, v) => partCols.find(_.equalsIgnoreCase(k)).get -> v }
       }
     
    +
    +  /**
    +   * The partition path created by Hive is in lowercase, while Spark SQL will
    +   * rename it with the partition name in partitionColumnNames, and this function
    +   * returns the extra lowercase path created by Hive, and then we can delete it.
    +   * e.g. /path/A=1/B=2/C=3 is changed to /path/A=4/B=5/C=6, this function returns
    --- End diff --
    
    The same issue here. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r98317878
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -899,6 +919,21 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
               spec, partitionColumnNames, tablePath)
             try {
               tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
    +
    +          // If the newSpec contains more than one depth partition, FileSystem.rename just deletes
    +          // the leaf(i.e. wrongPath), we should check if wrongPath's parents need to be deleted.
    +          // For example, give a newSpec 'A=1/B=2', after calling Hive's client.renamePartitions,
    +          // the location path in FileSystem is changed to 'a=1/b=2', which is wrongPath, then
    +          // although we renamed it to 'A=1/B=2', 'a=1/b=2' in FileSystem is deleted, but 'a=1'
    --- End diff --
    
    Either `although` or `but` needs to be deleted. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    **[Test build #72069 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72069/testReport)** for PR 16700 at commit [`de4c409`](https://github.com/apache/spark/commit/de4c409a9fbbe1f7fbed2d750f9d287e27470d3e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r98056076
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -839,6 +839,25 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
         spec.map { case (k, v) => partCols.find(_.equalsIgnoreCase(k)).get -> v }
       }
     
    +
    +  /**
    +   * partition path created by Hive is lower-case, while Spark SQL will
    +   * rename it with the partition name in partitionColumnNames, and this function
    +   * return the extra lower-case path created by Hive, and then we can delete it.
    +   * e.g. /path/A=1/B=2/C=3 rename to /path/A=4/B=5/C=6, the extra path returned is
    --- End diff --
    
    `the extra path returned is` -> `this function returns`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r98056351
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -839,6 +839,25 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
         spec.map { case (k, v) => partCols.find(_.equalsIgnoreCase(k)).get -> v }
       }
     
    +
    +  /**
    +   * partition path created by Hive is lower-case, while Spark SQL will
    +   * rename it with the partition name in partitionColumnNames, and this function
    +   * return the extra lower-case path created by Hive, and then we can delete it.
    +   * e.g. /path/A=1/B=2/C=3 rename to /path/A=4/B=5/C=6, the extra path returned is
    +   * /path/a=4, which also include all its' child path, such as /path/a=4/b=2
    --- End diff --
    
    `/path/a=4, which also include all its' child path, such as /path/a=4/b=2` -> `/path/a=4`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by windpiger <gi...@git.apache.org>.

Github user windpiger commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    thanks, what about `getExtraPartPathCreatedByHive `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r98133500
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -899,6 +918,22 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
               spec, partitionColumnNames, tablePath)
             try {
               tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
    +
    +          // If the newSpec contains more than one depth partition, FileSystem.rename just deletes
    +          // the leaf(i.e. wrongPath), we should check if wrongPath's parents need to be deleted.
    +          // for example:
    +          // newSpec is 'A=1/B=2', after renamePartitions by Hive, the location path in FileSystem
    +          // is changed to 'a=1/b=2', which is wrongPath, then we renamed to 'A=1/B=2', and
    +          // 'a=1/b=2' in FileSystem is deleted, while 'a=1' is already exists,
    +          // which should also be deleted
    +          val delHivePartPathAfterRename = getExtraPartPathCreatedByHive(
    +            lowerCasePartitionSpec(spec),
    --- End diff --
    
    The last comment: I prefer to calling `lowerCasePartitionSpec ` in the func `getExtraPartPathCreatedByHive`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r98056608
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -839,6 +839,25 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
         spec.map { case (k, v) => partCols.find(_.equalsIgnoreCase(k)).get -> v }
       }
     
    +
    +  /**
    +   * partition path created by Hive is lower-case, while Spark SQL will
    --- End diff --
    
    `partition path` -> `The partition path`
    `is lower-case` -> `is in lowercase`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    **[Test build #72026 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72026/testReport)** for PR 16700 at commit [`7aba059`](https://github.com/apache/spark/commit/7aba0592b78a3bccd44632a24ad83e6ef520478c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    Just revert it. Let us wait for the decision how we plan to deal with the file renaming. My major concern is the errors in file renaming could cause the data loss, unless we can introduce a robust rollback solution. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    LGTM except three comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    **[Test build #71987 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71987/testReport)** for PR 16700 at commit [`5403595`](https://github.com/apache/spark/commit/54035956b2b98301c6635923c0ffc88bc88f979c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    **[Test build #71976 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71976/testReport)** for PR 16700 at commit [`878d45e`](https://github.com/apache/spark/commit/878d45eb81ad9b1b5486689a527c0c2cdfe81b81).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    What happens if the partitioning columns have more than two columns?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71976/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    **[Test build #72066 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72066/testReport)** for PR 16700 at commit [`de4c409`](https://github.com/apache/spark/commit/de4c409a9fbbe1f7fbed2d750f9d287e27470d3e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by windpiger <gi...@git.apache.org>.

Github user windpiger commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    the example showed A/B are two partition columns


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    **[Test build #72020 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72020/testReport)** for PR 16700 at commit [`40efce2`](https://github.com/apache/spark/commit/40efce2a607908ff06cce85c5e782ed3b1320546).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r98055867
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -839,6 +839,25 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
         spec.map { case (k, v) => partCols.find(_.equalsIgnoreCase(k)).get -> v }
       }
     
    +
    +  /**
    +   * partition path created by Hive is lower-case, while Spark SQL will
    +   * rename it with the partition name in partitionColumnNames, and this function
    +   * return the extra lower-case path created by Hive, and then we can delete it.
    +   * e.g. /path/A=1/B=2/C=3 rename to /path/A=4/B=5/C=6, the extra path returned is
    --- End diff --
    
    `rename to` -> `is changed to`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r98057702
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -899,6 +918,21 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
               spec, partitionColumnNames, tablePath)
             try {
               tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
    +
    +          // if the newSpec contains more than one depth partitoin, FileSystem.rename just delete
    +          // only one path(wrongPath), we should check if wrongPath's parents need to be deleted.
    +          // for example:
    +          // newSpec is 'A=1/B=2', after renamePartitions by Hive, the location path in FileSystem
    +          // changed to 'a=1/b=2', which is wrongPath, then we renamed to 'A=1/B=2', and 'a=1/b=2'
    +          // in FileSystem is deleted, while 'a=1' is already exists, which should also be deleted
    +          val delHivePartPathAfterRename = getExtraPartPathCreatedByHive(
    +            lowerCasePartitionSpec(spec),
    +            partitionColumnNames,
    +            tablePath)
    --- End diff --
    
    This can be moved into the `if`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71987/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r98325110
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -899,6 +919,21 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
               spec, partitionColumnNames, tablePath)
             try {
               tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
    +
    +          // If the newSpec contains more than one depth partition, FileSystem.rename just deletes
    +          // the leaf(i.e. wrongPath), we should check if wrongPath's parents need to be deleted.
    +          // For example, give a newSpec 'A=1/B=2', after calling Hive's client.renamePartitions,
    +          // the location path in FileSystem is changed to 'a=1/b=2', which is wrongPath, then
    +          // although we renamed it to 'A=1/B=2', 'a=1/b=2' in FileSystem is deleted, but 'a=1'
    +          // is still exists, which we also need to delete
    +          val delHivePartPathAfterRename = getExtraPartPathCreatedByHive(
    --- End diff --
    
    The path `a=1` was created when you call `client.renamePartitions`, right? Based on my understanding, when you rename `A=1/B=3`, Hive will create the directory `a=1` and `a=1/b=3`. Thus, the rename will not fail. Have you made a try?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    **[Test build #71978 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71978/testReport)** for PR 16700 at commit [`878d45e`](https://github.com/apache/spark/commit/878d45eb81ad9b1b5486689a527c0c2cdfe81b81).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72069/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72026/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    Yeah, if we having three columns, does your solution resolve all the issues?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r98325754
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -899,6 +919,21 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
               spec, partitionColumnNames, tablePath)
             try {
               tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
    +
    +          // If the newSpec contains more than one depth partition, FileSystem.rename just deletes
    +          // the leaf(i.e. wrongPath), we should check if wrongPath's parents need to be deleted.
    +          // For example, give a newSpec 'A=1/B=2', after calling Hive's client.renamePartitions,
    +          // the location path in FileSystem is changed to 'a=1/b=2', which is wrongPath, then
    +          // although we renamed it to 'A=1/B=2', 'a=1/b=2' in FileSystem is deleted, but 'a=1'
    +          // is still exists, which we also need to delete
    +          val delHivePartPathAfterRename = getExtraPartPathCreatedByHive(
    --- End diff --
    
    So far, the partition rename DDL we support is for a single pair of partition spec. That is, `ALTER TABLE table PARTITION spec1 RENAME TO PARTITION spec2`. This is not an issue for end users.
    
    Thus, your concern looks reasonable, but I think we should not support the multi-partition renaming in the SessionCatalog and ExternalCatalog. It just makes the code more complex for error handling. Let me remove it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by windpiger <gi...@git.apache.org>.

Github user windpiger commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16700#discussion_r97980819
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala ---
    @@ -120,6 +120,24 @@ object ExternalCatalogUtils {
           new Path(totalPath, nextPartPath)
         }
       }
    +
    +  /**
    +   * partition path created by Hive is lower-case, while Spark SQL will
    +   * rename it with the partition name in partitionColumnNames, and this function
    +   * return the extra lower-case path created by Hive, and then we can delete it.
    +   * e.g. /path/A=1/B=2/C=3 rename to /path/A=4/B=5/C=6, the extra path returned is
    +   * /path/a=4, which also include all its' child path, such as /path/a=4/b=2
    +   */
    +  def getExtraPartPathCreatedByHive(
    --- End diff --
    
    thanks \uff01I will move it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by windpiger <gi...@git.apache.org>.

Github user windpiger commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    1.
    renamePartition:
    A=1/B=2/C=3 -> A=4/B=5/C=6
    path created by Hive after renamePartition:
    /path/a=4/b=5/c=6
    and SparkSQL rename it /path/A=4/B=5/C=6, and this pr will delete `/path/a=4`.
    
    2.
    renamePartition:
    a=1/B=2/C=3 -> a=4/B=5/C=6
    path created by Hive after renamePartition:
    /path/a=4/b=5/c=6
    and SparkSQL rename it /path/a=4/B=5/C=6, and this pr will delete `/path/a=4/b=5`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    Thanks! Merging it to master. 
    
    You can fix the minor comments in your other PRs. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/16700
  
    Just a general point that it seems odd to have a method whose name suggests it finds useless partition dirs. Can this be done more simply - is it not just a question of one extra delete somewhere?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16700


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org