You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/07/06 05:58:43 UTC

[GitHub] [spark] turboFei edited a comment on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode

turboFei edited a comment on pull request #29000:
URL: https://github.com/apache/spark/pull/29000#issuecomment-653996517


   Just left some comments.
   
   This PR did resolve the issue, it also involve some costs.
   In this pr, for dynamic partition overwrite mode.
   Each task might create multi partition paths under a unique task attempt output path.
   In fact, Dynamic partition overwrite always cause too many small files if user does not repartition by dynamic partition columns.
   So, I am afraid that this pr might cause lots of directories during runtime.
   
   I prefer #28989, in this PR, I define a Spark staging output committer based on the current implementation of HadoopMapReduceCommitProtocol.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org