You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2020/10/01 13:32:35 UTC

[GitHub] [hadoop] steveloughran commented on pull request #2349: MAPREDUCE-7282. Move away from V2 commit algorithm

steveloughran commented on pull request #2349:
URL: https://github.com/apache/hadoop/pull/2349#issuecomment-702137474


   @jbrennan333 what do you think we should say instead of deprecated? "not recommended". 
   
   I was thinking of adding a link to the JIRA and changing the issue text there to clarify
   * safe if names and content of generated output files is consistent across all task attempts
   * unsafe if different TAs generate bad files (biggest risk, as partial failure of 1st attempt may leave)
   * unsafe if different TAs generate different content in same files (only an issue on a network partition and TA #1 generates output as/after TA #2 does its work.
   
   cleanup of job will delete the whole job attempt dir so that's the maximum time that a partitioned TA may commit work. There's no risk of some VM pausing for 3 hours, restarting and an in progress TA completing its work and overwriting the final output. This is good.
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org