You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/10/11 10:36:34 UTC

[GitHub] [spark] Clarkkkkk opened a new pull request #26090: [SPARK-29302]Fix writing file collision in dynamic partition overwrite mode within speculative execution

Clarkkkkk opened a new pull request #26090: [SPARK-29302]Fix writing file collision in dynamic partition overwrite mode within speculative execution
URL: https://github.com/apache/spark/pull/26090
 
 
   ### What changes were proposed in this pull request?
   When inserting into a partitioned DataSource table (would not reproduced if using a Hive table) with dynamic partition overwrite and speculative execution, attempts of same task will try to write same files.
   
   This PR reuse FileOutputCommitter to avoid write collision, and rename files in staging directory to final output directory using the original logic in HadoopMapReduceCommitProtocol#commitJob.
   
   
   ### Why are the changes needed?
   Task failed is this circumstance.
   
   
   ### Does this PR introduce any user-facing change?
   No.
   
   
   ### How was this patch tested?
   This patch is tested by existing tests in org.apache.spark.sql.sources.InsertSuite.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org