You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Friso van Vollenhoven <fv...@xebia.com> on 2010/06/21 17:44:49 UTC
reducers run past 100% (does that problem still exist?)
Hi all,
When I run long running map/reduce jobs the reducers run past 100% before reaching completion. Sometimes as far as up to 140%. I have searched the mailing list and other resources and noticed bug reports related to this when using map output compression, but all appear to be fixed by now.
The job I am running reads sequence files from HDFS and in the reducer inserts records into HBase. The reducer has NullWritable as both output key and output value.
Some additional info:
- the job takes in total close to 60 hours to complete
- there are 10 reducers
- the map output is compressed using the default codec and block compression
- speculative execution is turned off (otherwise we could be hitting HBase harder than necessary)
- mapred.job.reuse.jvm.num.tasks = 1
- io.sort.factor = 100
- io.sort.record.percent = 0.3
- io.sort.spill.percent = 0.9
- mapred.inmem.merge.threshold = 100
- mapred.job.reduce.input.buffer.percent = 1.0
I am using Hadoop 0.20.2 on a small cluster (1x NN+JT, 4x DN+TT).
Does anyone have a clue? Or can anyone tell me how the progress info for reducers is calculated? Any help is appreciated.
Regards,
Friso
Re: reducers run past 100% (does that problem still exist?)
Posted by Friso van Vollenhoven <fv...@xebia.com>.
Hi Ravi,
The reducers are all in the reduce phase (so copy and sort finished by then). We do use compression on the mapper output, but I thought that the issues relating to that are fixed in the 0.20.2 release. Can you (or anyone) confirm there is still such a bug?
Thanks,
Friso
On Jun 22, 2010, at 11:57 AM, Ravi Gummadi wrote:
> Reduce task has 3 phases: copy phase, sort phase and reduce phase. Each phase will correspond to 33.33% of the total reduce task's progress. Which phase was your reducer in when you saw the progress > 100%(You could see the phase on the web UI in the column "state" after the ">" symbol) ?
>
> If you see progress > 66.7% while the task is in sort phase, then the problem could be in merge progress calculation, which is already fixed in HADOOP-5210. Hadoop version 0.20.2 should already contain the fix of HADOOP-5210.
> Otherwise, if the 3rd phase of reduce task(reduce phase) is started with 66.7% only and then if progress goes beyond 100%, then may be the bug(in hadoop) is because of not calculating progress correctly for the case of "compressed input to reducer".
>
> -Ravi
>
> Friso van Vollenhoven wrote:
>> Hi all,
>>
>> When I run long running map/reduce jobs the reducers run past 100% before reaching completion. Sometimes as far as up to 140%. I have searched the mailing list and other resources and noticed bug reports related to this when using map output compression, but all appear to be fixed by now.
>>
>> The job I am running reads sequence files from HDFS and in the reducer inserts records into HBase. The reducer has NullWritable as both output key and output value.
>> Some additional info:
>> - the job takes in total close to 60 hours to complete
>> - there are 10 reducers
>> - the map output is compressed using the default codec and block compression
>> - speculative execution is turned off (otherwise we could be hitting HBase harder than necessary)
>> - mapred.job.reuse.jvm.num.tasks = 1
>> - io.sort.factor = 100
>> - io.sort.record.percent = 0.3
>> - io.sort.spill.percent = 0.9
>> - mapred.inmem.merge.threshold = 100
>> - mapred.job.reduce.input.buffer.percent = 1.0
>>
>> I am using Hadoop 0.20.2 on a small cluster (1x NN+JT, 4x DN+TT).
>>
>> Does anyone have a clue? Or can anyone tell me how the progress info for reducers is calculated? Any help is appreciated.
>>
>>
>> Regards,
>> Friso
>>
>>
>
Re: reducers run past 100% (does that problem still exist?)
Posted by Ravi Gummadi <gr...@yahoo-inc.com>.
Reduce task has 3 phases: copy phase, sort phase and reduce phase. Each
phase will correspond to 33.33% of the total reduce task's progress.
Which phase was your reducer in when you saw the progress > 100%(You
could see the phase on the web UI in the column "state" after the ">"
symbol) ?
If you see progress > 66.7% while the task is in sort phase, then the
problem could be in merge progress calculation, which is already fixed
in HADOOP-5210. Hadoop version 0.20.2 should already contain the fix of
HADOOP-5210.
Otherwise, if the 3rd phase of reduce task(reduce phase) is started with
66.7% only and then if progress goes beyond 100%, then may be the bug(in
hadoop) is because of not calculating progress correctly for the case of
"compressed input to reducer".
-Ravi
Friso van Vollenhoven wrote:
> Hi all,
>
> When I run long running map/reduce jobs the reducers run past 100% before reaching completion. Sometimes as far as up to 140%. I have searched the mailing list and other resources and noticed bug reports related to this when using map output compression, but all appear to be fixed by now.
>
> The job I am running reads sequence files from HDFS and in the reducer inserts records into HBase. The reducer has NullWritable as both output key and output value.
> Some additional info:
> - the job takes in total close to 60 hours to complete
> - there are 10 reducers
> - the map output is compressed using the default codec and block compression
> - speculative execution is turned off (otherwise we could be hitting HBase harder than necessary)
> - mapred.job.reuse.jvm.num.tasks = 1
> - io.sort.factor = 100
> - io.sort.record.percent = 0.3
> - io.sort.spill.percent = 0.9
> - mapred.inmem.merge.threshold = 100
> - mapred.job.reduce.input.buffer.percent = 1.0
>
> I am using Hadoop 0.20.2 on a small cluster (1x NN+JT, 4x DN+TT).
>
> Does anyone have a clue? Or can anyone tell me how the progress info for reducers is calculated? Any help is appreciated.
>
>
> Regards,
> Friso
>
>