You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Calvin <ip...@gmail.com> on 2014/09/10 00:33:39 UTC

unexplained time between map 100% reduce 100% and job completion

Hi,

I'm wondering what goes on between the time a mapreduce job has 100%
of its map and reduce tasks complete and the status change to a
completed job.

An example log:

14/09/09 08:14:26 INFO mapreduce.Job:  map 100% reduce 98%
14/09/09 08:14:47 INFO mapreduce.Job:  map 100% reduce 99%
14/09/09 08:15:25 INFO mapreduce.Job:  map 100% reduce 100%
14/09/09 08:23:22 INFO mapreduce.Job: Job job_1410269811222_0001
completed successfully

In this case, it takes about 8 minutes for it to output that the job
has completed successfully. My initial guess is that it's doing some
filesystem tasks, but I'm not doing any heavy writing via the reducers
(it's a simple wordcount variant):

File Input Format Counters Bytes Read=334624311758
File Output Format Counters Bytes Written=107785

Any ideas regarding this behavior or where I should look first?

Thanks,
Calvin