You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Steve Loughran <st...@hortonworks.com> on 2017/10/16 14:16:37 UTC

Task commit protocol & cleanup in Task.done()


I'm trying to write down what goes on in how MR uses committers, as part of HADOOP-13786, and I now have some questions about why some failure codepaths in the Task don't call abort


1. After preemption

https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java#L1137


Committer.commit() task is called, but without the try/catch/abort sequence used in Task.commit()

1.   Task after a failure of commit.pending()

https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java#L1152

once the retry count is reached, the task calls System.exit() but not OutputCommitter.abort(). As the repeated failure of this call is a possible sign of AM failure, I'd have expected an abort() call.


For S3A we worry about abort() as we need to cancel all pending uploads to avoid running up bills: I'd like to be confident that on all failure paths that OutputCommitter.abort() is invoked. I know that the AM itself can call abort() for a task if it thinks a container has failed, but if the AM itself has failed, that's not going to happen,


Have I misunderstood something about the commit protocol's implementation, or would a patch with a couple more calls to Task.discardOutput() be welcomed?

Thanks,

-Steve