You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "zzzzming95 (via GitHub)" <gi...@apache.org> on 2023/05/11 15:36:49 UTC

[GitHub] [spark] zzzzming95 commented on a diff in pull request #41000: [SPARK-43327] Trigger `committer.setupJob` before plan execute in `FileFormatWriter#write`

zzzzming95 commented on code in PR #41000:
URL: https://github.com/apache/spark/pull/41000#discussion_r1191365410


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala:
##########
@@ -159,6 +159,17 @@ object FileFormatWriter extends Logging {
       statsTrackers = statsTrackers
     )
 
+    SQLExecution.checkSQLExecutionId(sparkSession)
+
+    // propagate the description UUID into the jobs, so that committers
+    // get an ID guaranteed to be unique.
+    job.getConfiguration.set("spark.sql.sources.writeJobUUID", description.uuid)
+
+    // This call shouldn't be put into the `try` block below because it only initializes and
+    // prepares the job, any exception thrown from here shouldn't cause abortJob() to be called.
+    // It must be run before `materializeAdaptiveSparkPlan()`

Review Comment:
   >  What is the fallout of committer.setupJob(job) not being executed in presence of an error?
   
   Spark will delete partition location  when running `insert overwrite` . 
   
   https://github.com/apache/spark/pull/41000#issuecomment-1543974004
   
   And it will create new  location in `committer.setupJob(job)` , then execute the job. But in https://github.com/apache/spark/pull/38358 , we triggered the job execution in advance . 
   
   So when the job execute failed , the location path would be delete and no create .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org