You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "adrian-wang (via GitHub)" <gi...@apache.org> on 2024/01/14 14:47:39 UTC

[PR] SPARK-46714 overwrite a partition with custom location [spark]

adrian-wang opened a new pull request, #44725:
URL: https://github.com/apache/spark/pull/44725

   ### What changes were proposed in this pull request?
   
   Sometimes we use more than one filesystems for data warehouse, for example one for hot/warm data and another for cold data, with different storages to save total cost. But it seems after spark convert table writing into data source writing, it is not working as expected.
   
   Before this patch, when overwrite a partition with custom location:
   1. if the partition location is on same filesystem with its table, the partition location remain the same.
   2. else, spark will throw an exception `java.io.IOException: Wrong FS ...`
   After this patch, the behavior will align with Hive: the overwritten partition will be recreated under table location.
   
   ### Why are the changes needed?
   1. to align behavior with Hive
   2. support existing partitions on a separate filesystem from table location.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes.
   Before this patch, when overwrite a partition with custom location:
   1. if the partition location is on same filesystem with its table, the partition location remain the same.
   2. else, spark will throw an exception `java.io.IOException: Wrong FS ...`
   After this patch, the behavior will align with Hive: the overwritten partition will be recreated under table location.
   
   ### How was this patch tested?
   
   Added a unit test case.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46714][SQL] Overwrite a partition with custom location [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #44725:
URL: https://github.com/apache/spark/pull/44725#discussion_r1451750371


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala:
##########
@@ -261,7 +299,7 @@ case class InsertIntoHadoopFsRelationCommand(
       fs: FileSystem,
       table: CatalogTable,
       qualifiedOutputPath: Path,
-      partitions: Seq[CatalogTablePartition]): Map[TablePartitionSpec, String] = {
+      partitions: Seq[CatalogTablePartition]) = {

Review Comment:
   why need this change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46714][SQL] Overwrite a partition with custom location [spark]

Posted by "adrian-wang (via GitHub)" <gi...@apache.org>.
adrian-wang commented on code in PR #44725:
URL: https://github.com/apache/spark/pull/44725#discussion_r1451879819


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala:
##########
@@ -261,7 +299,7 @@ case class InsertIntoHadoopFsRelationCommand(
       fs: FileSystem,
       table: CatalogTable,
       qualifiedOutputPath: Path,
-      partitions: Seq[CatalogTablePartition]): Map[TablePartitionSpec, String] = {
+      partitions: Seq[CatalogTablePartition]) = {

Review Comment:
   Sorry, this change must from IDE, I'll revert it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46714][SQL] Overwrite a partition with custom location [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #44725:
URL: https://github.com/apache/spark/pull/44725#issuecomment-1890978233

   cc @cloud-fan FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46714][SQL] Overwrite a partition with custom location [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #44725: [SPARK-46714][SQL] Overwrite a partition with custom location
URL: https://github.com/apache/spark/pull/44725


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46714][SQL] Overwrite a partition with custom location [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #44725:
URL: https://github.com/apache/spark/pull/44725#issuecomment-2091944051

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org