You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/04/16 16:14:32 UTC

[GitHub] [spark] xkrogen opened a new pull request #32207: [SPARK-35106] Avoid failing rename in HadoopMapReduceCommitProtocol with dynamic partition overwrite

xkrogen opened a new pull request #32207:
URL: https://github.com/apache/spark/pull/32207

### What changes were proposed in this pull request?
Clean up code in `HadoopMapReduceCommitProtocol#commitJob` to avoid renames that will always fail (usually silently).

### Why are the changes needed?
This renames in this block will always fail under `dynamicPartitionOverwrite == true`:
https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L191-L218

We have the following sequence of events:
1. The first block deletes all parent directories of `filesToMove.values`
2. The for-loop block attempts to rename all `filesToMove.keys` to `filesToMove.values`
3. The third block does directory-level renames to place files into their final locations

All renames in the for-loop will always fail, since all parent directories of `filesToMove.values` were just deleted. Under a normal HDFS scenario, the contract of `fs.rename` is to return `false` under such a failure scenario, as opposed to throwing an exception. This allows for dynamic partition overwrite to work, albeit with a bunch of failed renames in the middle. Really, we should only run the for-loop deletions in the `dynamicPartitionOverwrite == false` case, and consolidate the two if-blocks for the `true` case.

### Does this PR introduce _any_ user-facing change?
In almost all cases, no. However if you happen to use a `FileSystem` implementation which throws an exception on this kind of `fs.rename` case, `dynamicPartitionOverwrite` will be unusable prior to this PR, and start working after this PR.

### How was this patch tested?
Did not add/modify tests. Didn't see test cases for this file. Open to suggestions on where/how to add such tests.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] xkrogen commented on pull request #32207: [SPARK-35106] Avoid failing rename in HadoopMapReduceCommitProtocol with dynamic partition overwrite

Posted by GitBox <gi...@apache.org>.

xkrogen commented on pull request #32207:
URL: https://github.com/apache/spark/pull/32207#issuecomment-821680010


   I took a look at the failing test:
   ```
   org.apache.spark.sql.sources.InsertSuite.SPARK-20236: dynamic partition overwrite with customer partition path
   
   == Results ==
   !== Correct Answer - 3 ==   == Spark Answer - 2 ==
   !struct<>                   struct<i:int,part1:int,part2:int>
    [2,2,2]                    [2,2,2]
    [3,1,2]                    [3,1,2]
   ![4,1,1]                    
   ```
   
   It does seem legitimate, but it led me to be even more confused about this functionality. It fails at this last step:
   ```
           sql("insert overwrite table t partition(part1=1, part2) select 4, 1")
           checkAnswer(spark.table("t"), Row(4, 1, 1) :: Row(2, 2, 2) :: Row(3, 1, 2) :: Nil)
   ```
   
   I ran through this code in a debugger and it appears that the unit test is relying on the behavior of `LocalFileSystem`, which is different from HDFS: if you try to rename to a nonexistent directory, the directory is silently created for you. I double-checked in a Spark shell to be sure, and this is not what HDFS does; it returns `false` from `rename` as expected.
   
   This means that the unit test works properly on a local FS, but fails when run against HDFS. I verified this by executing the unit test code (slightly modified) in a Spark Shell instance:
   ```scala
   scala> val scheme = "file"
   scala> :paste
   // Entering paste mode (ctrl-D to finish)
   
           val basepath = s"$scheme:/tmp/ekrogentest/base"
           val path1 = s"$scheme:/tmp/ekrogentest/1"
           val path2 = s"$scheme:/tmp/ekrogentest/2"
   
           // refresh everything
           sql("DROP TABLE IF EXISTS t")
           val fs = new Path(basepath).getFileSystem(sc.hadoopConfiguration)
           fs.delete(new Path(basepath).getParent, true)
           Seq(basepath, path1, path2).foreach(p => fs.mkdirs(new Path(p)))
   
           sql(
             s"""
               |create table t(i int, part1 int, part2 int) using parquet
               |partitioned by (part1, part2) location '$basepath'
             """.stripMargin)
   
           //val path1 = Utils.createTempDir()
           sql(s"alter table t add partition(part1=1, part2=1) location '$path1'")
           sql(s"insert into t partition(part1=1, part2=1) select 1")
           sql("SELECT * FROM t").show()
           //checkAnswer(spark.table("t"), Row(1, 1, 1))
   
           sql("insert overwrite table t partition(part1=1, part2=1) select 2")
           sql("SELECT * FROM t").show()
           //checkAnswer(spark.table("t"), Row(2, 1, 1))
   
           sql("insert overwrite table t partition(part1=2, part2) select 2, 2")
           sql("SELECT * FROM t").show()
           //checkAnswer(spark.table("t"), Row(2, 1, 1) :: Row(2, 2, 2) :: Nil)
   
           //val path2 = Utils.createTempDir()
           sql(s"alter table t add partition(part1=1, part2=2) location '$path2'")
           sql("insert overwrite table t partition(part1=1, part2=2) select 3")
           sql("SELECT * FROM t").show()
           //checkAnswer(spark.table("t"), Row(2, 1, 1) :: Row(2, 2, 2) :: Row(3, 1, 2) :: Nil)
   
           sql("insert overwrite table t partition(part1=1, part2) select 4, 1")
           sql("SELECT * FROM t").show()
           //checkAnswer(spark.table("t"), Row(4, 1, 1) :: Row(2, 2, 2) :: Row(3, 1, 2) :: Nil)
   
   // Exiting paste mode, now interpreting.
   
   +---+-----+-----+
   |  i|part1|part2|
   +---+-----+-----+
   |  1|    1|    1|
   +---+-----+-----+
   
   +---+-----+-----+
   |  i|part1|part2|
   +---+-----+-----+
   |  2|    1|    1|
   +---+-----+-----+
   
   +---+-----+-----+
   |  i|part1|part2|
   +---+-----+-----+
   |  2|    2|    2|
   |  2|    1|    1|
   +---+-----+-----+
   
   +---+-----+-----+
   |  i|part1|part2|
   +---+-----+-----+
   |  3|    1|    2|
   |  2|    2|    2|
   |  2|    1|    1|
   +---+-----+-----+
   
   +---+-----+-----+
   |  i|part1|part2|
   +---+-----+-----+
   |  3|    1|    2|
   |  2|    2|    2|
   |  4|    1|    1|
   +---+-----+-----+
   ```
   Works fine when using the local file system.
   
   However when I rerun the same using HDFS:
   ```scala
   scala> val scheme = "hdfs"
   scheme: String = hdfs
   
   scala> :paste
   // Entering paste mode (ctrl-D to finish)
   
           val basepath = s"$scheme:/tmp/ekrogentest/base"
           val path1 = s"$scheme:/tmp/ekrogentest/1"
           val path2 = s"$scheme:/tmp/ekrogentest/2"
   
           sql("DROP TABLE IF EXISTS t")
           val fs = new Path(basepath).getFileSystem(sc.hadoopConfiguration)
           fs.delete(new Path(basepath).getParent, true)
           Seq(basepath, path1, path2).foreach(p => fs.mkdirs(new Path(p)))
   
           sql(
             s"""
               |create table t(i int, part1 int, part2 int) using parquet
               |partitioned by (part1, part2) location '$basepath'
             """.stripMargin)
   
           //val path1 = Utils.createTempDir()
           sql(s"alter table t add partition(part1=1, part2=1) location '$path1'")
           sql(s"insert into t partition(part1=1, part2=1) select 1")
           sql("SELECT * FROM t").show()
           //checkAnswer(spark.table("t"), Row(1, 1, 1))
   
           sql("insert overwrite table t partition(part1=1, part2=1) select 2")
           sql("SELECT * FROM t").show()
           //checkAnswer(spark.table("t"), Row(2, 1, 1))
   
           sql("insert overwrite table t partition(part1=2, part2) select 2, 2")
           sql("SELECT * FROM t").show()
           //checkAnswer(spark.table("t"), Row(2, 1, 1) :: Row(2, 2, 2) :: Nil)
   
           //val path2 = Utils.createTempDir()
           sql(s"alter table t add partition(part1=1, part2=2) location '$path2'")
           sql("insert overwrite table t partition(part1=1, part2=2) select 3")
           sql("SELECT * FROM t").show()
           //checkAnswer(spark.table("t"), Row(2, 1, 1) :: Row(2, 2, 2) :: Row(3, 1, 2) :: Nil)
   
           sql("insert overwrite table t partition(part1=1, part2) select 4, 1")
           sql("SELECT * FROM t").show()
           //checkAnswer(spark.table("t"), Row(4, 1, 1) :: Row(2, 2, 2) :: Row(3, 1, 2) :: Nil)
   
   // Exiting paste mode, now interpreting.
   
   +---+-----+-----+
   |  i|part1|part2|
   +---+-----+-----+
   |  1|    1|    1|
   +---+-----+-----+
   
   21/04/16 22:43:38 WARN HadoopFSUtils: The directory hdfs://.../tmp/ekrogentest/1 was not found. Was it deleted very recently?
   +---+-----+-----+
   |  i|part1|part2|
   +---+-----+-----+
   +---+-----+-----+
   
   21/04/16 22:43:39 WARN HadoopFSUtils: The directory hdfs://.../tmp/ekrogentest/1 was not found. Was it deleted very recently?
   
   +---+-----+-----+
   |  i|part1|part2|
   +---+-----+-----+
   |  2|    2|    2|
   +---+-----+-----+
   
   21/04/16 22:43:39 WARN HadoopFSUtils: The directory hdfs://.../tmp/ekrogentest/2 was not found. Was it deleted very recently?
   21/04/16 22:43:39 WARN HadoopFSUtils: The directory hdfs://.../tmp/ekrogentest/1 was not found. Was it deleted very recently?
   +---+-----+-----+
   |  i|part1|part2|
   +---+-----+-----+
   |  2|    2|    2|
   +---+-----+-----+
   
   21/04/16 22:43:39 WARN HadoopFSUtils: The directory hdfs://.../tmp/ekrogentest/2 was not found. Was it deleted very recently?
   21/04/16 22:43:40 WARN HadoopFSUtils: The directory hdfs://.../tmp/ekrogentest/1 was not found. Was it deleted very recently?
   +---+-----+-----+
   |  i|part1|part2|
   +---+-----+-----+
   |  2|    2|    2|
   +---+-----+-----+
   
   basepath: String = hdfs:/tmp/ekrogentest/base
   path1: String = hdfs:/tmp/ekrogentest/1
   path2: String = hdfs:/tmp/ekrogentest/2
   fs: org.apache.hadoop.fs.FileSystem = DFS[DFSClient[...]]
   ```
   Now everything is broken.
   
   @cloud-fan , it looks like you added this, what is the expected behavior here? I can't tell if I'm missing something.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #32207: [SPARK-35106] Avoid failing rename in HadoopMapReduceCommitProtocol with dynamic partition overwrite

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #32207:
URL: https://github.com/apache/spark/pull/32207#issuecomment-821304399


   **[Test build #137490 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137490/testReport)** for PR 32207 at commit [`07f9189`](https://github.com/apache/spark/commit/07f9189e9367f1b5b3f73c0a605d5d9ad72b4d20).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] xkrogen commented on pull request #32207: [SPARK-35106] Avoid failing rename in HadoopMapReduceCommitProtocol with dynamic partition overwrite

Posted by GitBox <gi...@apache.org>.

xkrogen commented on pull request #32207:
URL: https://github.com/apache/spark/pull/32207#issuecomment-840626635


   Closing in favor fo #32530 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] YuzhouSun commented on pull request #32207: [SPARK-35106] Avoid failing rename in HadoopMapReduceCommitProtocol with dynamic partition overwrite

Posted by GitBox <gi...@apache.org>.

YuzhouSun commented on pull request #32207:
URL: https://github.com/apache/spark/pull/32207#issuecomment-840387892


   Created https://github.com/apache/spark/pull/32530. @cloud-fan Could you review it? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan edited a comment on pull request #32207: [SPARK-35106] Avoid failing rename in HadoopMapReduceCommitProtocol with dynamic partition overwrite

Posted by GitBox <gi...@apache.org>.

cloud-fan edited a comment on pull request #32207:
URL: https://github.com/apache/spark/pull/32207#issuecomment-822287864


   @xkrogen the test does seem to be legitimate, but this is likely to be a long standing bug. Can you look into it and see if we can support it? If not we need to throw a clear error instead of relying on `LocalFileSystem` behavior.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] xkrogen closed pull request #32207: [SPARK-35106] Avoid failing rename in HadoopMapReduceCommitProtocol with dynamic partition overwrite

Posted by GitBox <gi...@apache.org>.

xkrogen closed pull request #32207:
URL: https://github.com/apache/spark/pull/32207


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32207: [SPARK-35106] Avoid failing rename in HadoopMapReduceCommitProtocol with dynamic partition overwrite

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #32207:
URL: https://github.com/apache/spark/pull/32207#issuecomment-821335418


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42065/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] xkrogen edited a comment on pull request #32207: [SPARK-35106] Avoid failing rename in HadoopMapReduceCommitProtocol with dynamic partition overwrite

Posted by GitBox <gi...@apache.org>.

xkrogen edited a comment on pull request #32207:
URL: https://github.com/apache/spark/pull/32207#issuecomment-840626635


   Closing in favor of #32530 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32207: [SPARK-35106] Avoid failing rename in HadoopMapReduceCommitProtocol with dynamic partition overwrite

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #32207:
URL: https://github.com/apache/spark/pull/32207#issuecomment-821457366


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137490/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] YuzhouSun edited a comment on pull request #32207: [SPARK-35106] Avoid failing rename in HadoopMapReduceCommitProtocol with dynamic partition overwrite

Posted by GitBox <gi...@apache.org>.

YuzhouSun edited a comment on pull request #32207:
URL: https://github.com/apache/spark/pull/32207#issuecomment-840224412


   Hello, about “we should only run Block 2 in the dynamicPartitionOverwrite == false case”: the Block 2 is actually meant for custom partition paths (i.e. absolute partitions), in both dynamic partition overwrite or static partition overwrite cases. That’s probably the reason why InsertSuite.test("SPARK-20236: dynamic partition overwrite with custom partition path") failed with the changes.
   
   The fix could be re-creating the parent directories when required. We created another PR for this jira.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #32207: [SPARK-35106] Avoid failing rename in HadoopMapReduceCommitProtocol with dynamic partition overwrite

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on pull request #32207:
URL: https://github.com/apache/spark/pull/32207#issuecomment-822578512


   Or we can ignore the test first and move forward. It's a long-standing bug and not caused by this patch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32207: [SPARK-35106] Avoid failing rename in HadoopMapReduceCommitProtocol with dynamic partition overwrite

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #32207:
URL: https://github.com/apache/spark/pull/32207#issuecomment-821335418


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42065/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] YuzhouSun commented on pull request #32207: [SPARK-35106] Avoid failing rename in HadoopMapReduceCommitProtocol with dynamic partition overwrite

Posted by GitBox <gi...@apache.org>.

YuzhouSun commented on pull request #32207:
URL: https://github.com/apache/spark/pull/32207#issuecomment-840224412


   Hello, about “we should only run Block 2 in the dynamicPartitionOverwrite == false case”: the Block 2 is actually meant for custom partition paths (i.e. absolute partitions), either in dynamic partition overwrite or static partition overwrite cases. That’s probably the reason why InsertSuite.test("SPARK-20236: dynamic partition overwrite with custom partition path") failed with the changes.
   
   The fix could be re-creating the parent directories when required. We created another PR for this jira.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32207: [SPARK-35106] Avoid failing rename in HadoopMapReduceCommitProtocol with dynamic partition overwrite

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #32207:
URL: https://github.com/apache/spark/pull/32207#issuecomment-821326603






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32207: [SPARK-35106] Avoid failing rename in HadoopMapReduceCommitProtocol with dynamic partition overwrite

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #32207:
URL: https://github.com/apache/spark/pull/32207#issuecomment-821441672


   **[Test build #137490 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137490/testReport)** for PR 32207 at commit [`07f9189`](https://github.com/apache/spark/commit/07f9189e9367f1b5b3f73c0a605d5d9ad72b4d20).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32207: [SPARK-35106] Avoid failing rename in HadoopMapReduceCommitProtocol with dynamic partition overwrite

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #32207:
URL: https://github.com/apache/spark/pull/32207#issuecomment-821457366


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137490/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #32207: [SPARK-35106] Avoid failing rename in HadoopMapReduceCommitProtocol with dynamic partition overwrite

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on pull request #32207:
URL: https://github.com/apache/spark/pull/32207#issuecomment-840342929


   @YuzhouSun Can you help to take over this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #32207: [SPARK-35106] Avoid failing rename in HadoopMapReduceCommitProtocol with dynamic partition overwrite

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on pull request #32207:
URL: https://github.com/apache/spark/pull/32207#issuecomment-822287864


   @xkrogen the test does seem to be legitimate, but this is likely to a long standing bug. Can you look into it and see if we can support it? If not we need to throw a clear error instead of relying on `LocalFileSystem` behavior.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] xkrogen commented on pull request #32207: [SPARK-35106] Avoid failing rename in HadoopMapReduceCommitProtocol with dynamic partition overwrite

Posted by GitBox <gi...@apache.org>.

xkrogen commented on pull request #32207:
URL: https://github.com/apache/spark/pull/32207#issuecomment-821286246


   cc @mridulm and @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] xkrogen commented on pull request #32207: [SPARK-35106] Avoid failing rename in HadoopMapReduceCommitProtocol with dynamic partition overwrite

Posted by GitBox <gi...@apache.org>.

xkrogen commented on pull request #32207:
URL: https://github.com/apache/spark/pull/32207#issuecomment-822573342


   I don't quite understand the commit sequence under the various modes so I can't provide any quick input on whether this is easily fixable. I already spent more time on this issue than I was expecting and it's pretty far outside of my normal scope so I can't devote more time currently, but if I do find some spare cycles in the future, I will try to circle back here. Thanks for your input so far!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32207: [SPARK-35106] Avoid failing rename in HadoopMapReduceCommitProtocol with dynamic partition overwrite

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #32207:
URL: https://github.com/apache/spark/pull/32207#issuecomment-821304399


   **[Test build #137490 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137490/testReport)** for PR 32207 at commit [`07f9189`](https://github.com/apache/spark/commit/07f9189e9367f1b5b3f73c0a605d5d9ad72b4d20).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] github-actions[bot] commented on pull request #32207: [SPARK-35106] Avoid failing rename in HadoopMapReduceCommitProtocol with dynamic partition overwrite

Posted by GitBox <gi...@apache.org>.

github-actions[bot] commented on pull request #32207:
URL: https://github.com/apache/spark/pull/32207#issuecomment-821286204


   **[Test build #756210704](https://github.com/xkrogen/spark/actions/runs/756210704)** for PR 32207 at commit [`07f9189`](https://github.com/xkrogen/spark/commit/07f9189e9367f1b5b3f73c0a605d5d9ad72b4d20).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org