You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Pedro Costa <ps...@gmail.com> on 2011/11/15 18:05:39 UTC

Purpose of COMMIT_PENDING

Hi,

Hadoop MR tasks can have the state COMMIT_PENDING.

1- What's the purpose of that state?
2- What's the reason for a task being in this state?
3- It's only the last task before finishing a job that enters this state?

-- 
Thanks,

Re: Purpose of COMMIT_PENDING

Posted by Harsh J <ha...@cloudera.com>.
Pedro,

Simply put: Speculative execution.

When a task enters that state, it means that it has completed the M/R execution, and its awaiting the tracker to commit it so that it can run the OutputCommitter process and finalize the outputs (outputs lie in temporary directories until committed, if you check with FileOutputCommitter, the default OutputCommitter in Hadoop MR).

This is to avoid conflicting outputs when you have speculatives turned on. Two tasks can complete at the same time and you do not want both to be committed. So the TT will commit the first one that reports back, and kill away the other COMMIT_PENDING waiting one in this case.

You might notice (3) cause speculative execution does affect the tail of a job run.

On 15-Nov-2011, at 10:35 PM, Pedro Costa wrote:

> Hi,
> 
> Hadoop MR tasks can have the state COMMIT_PENDING.
> 
> 1- What's the purpose of that state?
> 2- What's the reason for a task being in this state?
> 3- It's only the last task before finishing a job that enters this state?
> 
> -- 
> Thanks,