You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "anishshri-db (via GitHub)" <gi...@apache.org> on 2023/03/06 03:50:36 UTC

[GitHub] [spark] anishshri-db opened a new pull request, #40292: [SPARK-42676] Write temp checkpoints for streaming queries to local filesystem even if default FS is set differently

anishshri-db opened a new pull request, #40292:
URL: https://github.com/apache/spark/pull/40292

   ### What changes were proposed in this pull request?
   Write temp checkpoints for streaming queries to local filesystem even if default FS is set differently
   
   ### Why are the changes needed?
   We have seen cases where the default FS could be a remote file system and since the path for streaming checkpoints is not specified explcitily, this could cause pileup under 2 cases:
   
   - query exits with exception and the flag to force checkpoint removal is not set
   - driver/cluster terminates without query being terminated gracefully
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Verified that the checkpoint is resolved and written to the local FS
   
   ```
   23/03/04 01:42:49 INFO ResolveWriteToStream: Checkpoint root file:/local_disk0/tmp/temporary-c97ab8bd-6b03-4c28-93ea-751d30a2d3f9 resolved to file:/local_disk0/tmp/temporary-c97ab8bd-6b03-4c28-93ea-751d30a2d3f9.
   ...
   23/03/04 01:46:37 INFO MicroBatchExecution: [queryId = 66c4c] Deleting checkpoint file:/local_disk0/tmp/temporary-c97ab8bd-6b03-4c28-93ea-751d30a2d3f9.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40292: [SPARK-42676][SS] Write temp checkpoints for streaming queries to local filesystem even if default FS is set differently

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #40292:
URL: https://github.com/apache/spark/pull/40292#discussion_r1188090366


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ResolveWriteToStream.scala:
##########
@@ -81,7 +81,9 @@ object ResolveWriteToStream extends Rule[LogicalPlan] with SQLConfHelper {
           s" the query didn't fail: $tempDir. If it's required to delete it under any" +
           s" circumstances, please set ${SQLConf.FORCE_DELETE_TEMP_CHECKPOINT_LOCATION.key} to" +
           s" true. Important to know deleting temp checkpoint folder is best effort.")
-        tempDir
+        // SPARK-42676 - Write temp checkpoints for streaming queries to local filesystem
+        // even if default FS is set differently
+        "file://" + tempDir

Review Comment:
   This actually broke the Windows support (https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/46451196) ..



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] anishshri-db commented on pull request #40292: [SPARK-42676] Write temp checkpoints for streaming queries to local filesystem even if default FS is set differently

Posted by "anishshri-db (via GitHub)" <gi...@apache.org>.
anishshri-db commented on PR #40292:
URL: https://github.com/apache/spark/pull/40292#issuecomment-1455397903

   @HeartSaVioR - please take a look. Thx


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40292: [SPARK-42676][SS] Write temp checkpoints for streaming queries to local filesystem even if default FS is set differently

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #40292:
URL: https://github.com/apache/spark/pull/40292#discussion_r1188090201


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ResolveWriteToStream.scala:
##########
@@ -81,7 +81,9 @@ object ResolveWriteToStream extends Rule[LogicalPlan] with SQLConfHelper {
           s" the query didn't fail: $tempDir. If it's required to delete it under any" +
           s" circumstances, please set ${SQLConf.FORCE_DELETE_TEMP_CHECKPOINT_LOCATION.key} to" +
           s" true. Important to know deleting temp checkpoint folder is best effort.")
-        tempDir
+        // SPARK-42676 - Write temp checkpoints for streaming queries to local filesystem
+        // even if default FS is set differently
+        "file://" + tempDir

Review Comment:
   This actually broke the Windows support (https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/46451196). Canonical paths are actually not necessarily a URI too - it might contain white spaces as an example.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR closed pull request #40292: [SPARK-42676][SS] Write temp checkpoints for streaming queries to local filesystem even if default FS is set differently

Posted by "HeartSaVioR (via GitHub)" <gi...@apache.org>.
HeartSaVioR closed pull request #40292: [SPARK-42676][SS] Write temp checkpoints for streaming queries to local filesystem even if default FS is set differently
URL: https://github.com/apache/spark/pull/40292


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #40292: [SPARK-42676][SS] Write temp checkpoints for streaming queries to local filesystem even if default FS is set differently

Posted by "HeartSaVioR (via GitHub)" <gi...@apache.org>.
HeartSaVioR commented on PR #40292:
URL: https://github.com/apache/spark/pull/40292#issuecomment-1455549225

   Thanks! Merging to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org