You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Rob Stewart <ro...@googlemail.com> on 2010/12/11 12:05:02 UTC

Slow final few reducers

Hi,

I have a problem with a MapReduce job I am trying to run on a 32 node cluster.

The final few reducers take a *lot* longer than the rest. e.g. If I
specify 100 reducers, the first 90 will complete in 5 minutes, and
then the remaining 10 reducers might take 10 minutes.

Same is true for any number of reducers... 200 reducers: 180/190 will
complete in 5 minutes, and the last 10/20 will take 10 minutes.

Is this normal Hadoop behavior? I know that the output of the Reducer
function is not sorted, so can't figure out why this decline of
performance at the tail end of the job?

thanks,

Rob Stewart

Re: Slow final few reducers

Posted by Ted Dunning <td...@maprtech.com>.

It sounds like your key distribution is being reflected in the size of your
reduce tasks, thus
making some of them take much longer than the rest.

There are three solutions to this:

a) down-sample.  Particularly for statistical computations, once you have
seen a thousand instances, you have seen them all.  This is particularly
effective for self-join problems where the reduce skew can be the input skew
squared.  Down-sampling can be done in a combiner.  Remember to retain the
number of lost records.

b) combine.  If you are counting or doing some other sort of associative
operation, then combiners will distribute your reduce load more widely.

c) sub-reduce.  This is related to combining but you have more control (and
less bandwidth reduction).  Simply add a random number in the range [1..n]
to each common key.  This breaks it down by a factor of n.  Work on the
pieces and then combine later.

On Sat, Dec 11, 2010 at 3:38 AM, Rob Stewart <ro...@googlemail.com>wrote:

> - I know that for a fact my key distribution is quite radically skewed
> (some keys with *many* value, most keys with few).
>

Re: Slow final few reducers

Posted by Harsh J <qw...@gmail.com>.

On Sat, Dec 11, 2010 at 7:41 PM, Rob Stewart
<ro...@googlemail.com> wrote:
> Sorry my fault - It's someone running a network simulator on the cluster !
>

Culprit found? *wide grin*

-- 
Harsh J
www.harshj.com

Re: Slow final few reducers

Posted by Rob Stewart <ro...@googlemail.com>.

Sorry my fault - It's someone running a network simulator on the cluster !

Rob

On 11 December 2010 14:09, Rob Stewart <ro...@googlemail.com> wrote:
> OK, slight update:
>
> Immediately underneath public void reduce(), I have added a:
> System.out.println("Key: " + key.toString());
>
> And I am logged on a node that is still working on a reducer. However,
> it stopped printing "Key:" long ago, so it is not processing new keys.
>
> But looking more closely at "top" on this node, there are *two* linux
> processes going at 100% CPU. The first is java, which, using "jps -l"
> I realize is "Child", but the second is a process called "setdest",
> which I strongly suspect has to do with my Hadoop job.
>
> What is "setdest", and what is it actually doing? And why is it taking so long?
>
> cheers,
>
> Rob Stewart
>
>
>
> On 11 December 2010 12:26, Harsh J <qw...@gmail.com> wrote:
>> On Sat, Dec 11, 2010 at 5:25 PM, Rob Stewart
>> <ro...@googlemail.com> wrote:
>>> Oh,
>>>
>>> I should add, of the Java processes running on the remaining nodes for
>>> the final wave of reducers, the one taking all the CPU is the "Child"
>>> process (not TaskTracker). I log into the Master, and also, the Java
>>> process taking all the CPU is "Child".
>>>
>>> Is this normal?
>>
>> Yes, "Child" is the Task JVM.
>>
>>>
>>> thanks,
>>> Rob
>>>
>>> On 11 December 2010 11:38, Rob Stewart <ro...@googlemail.com> wrote:
>>>> Hi, many thanks for your response.
>>>>
>>>> A few observations:
>>>> - I know that for a fact my key distribution is quite radically skewed
>>>> (some keys with *many* value, most keys with few).
>>>> - I have overlooked the fact that I need a partitioner. I suspect that
>>>> this will help dramatically.
>>>>
>>>> I realize that the number of partitions should equal the number of
>>>> reducers (e.g. 100).
>>>>
>>>> So if here are my <key>,<values> (where values is a count):
>>>> <the>,<500>
>>>> <a>,<1000>
>>>> <the cat>,<20>
>>>> <the cat sat on the mat>,<1>
>>>>
>>>> and I have 3 reducers, how do I make:
>>>> Reducer-1: <the>
>>>> Reducer-2: <a>
>>>> Reducer-3: <the cat> & <the cat sat on the mat>
>>>>
>>>>
>>>> thanks,
>>>>
>>>> Rob
>>>>
>>>> On 11 December 2010 11:12, Harsh J <qw...@gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> Certain reducers may receive a higher share of data than others
>>>>> (Depending on your data/key distribution, the partition function,
>>>>> etc.). Compare the longer reduce tasks' counters with the quicker
>>>>> ones.
>>>>>
>>>>> Are you sure that the reducers that take long are definitely the last
>>>>> wave, as in with IDs of 180-200 (and not a random bunch of reduce
>>>>> tasks taking longer)?
>>>>>
>>>>> Also take a look at the logs, and the machines that run these
>>>>> particular reducers -- ensure nothing is wrong on them.
>>>>>
>>>>> There's nothing specifically written in Hadoop for the "last wave" of
>>>>> Reduce tasks to take longer. Each reducer writes to its own file, and
>>>>> is completely independent.
>>>>>
>>>>> --
>>>>> Harsh J
>>>>> www.harshj.com
>>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Harsh J
>> www.harshj.com
>>
>

Re: Slow final few reducers

Posted by Rob Stewart <ro...@googlemail.com>.

OK, slight update:

Immediately underneath public void reduce(), I have added a:
System.out.println("Key: " + key.toString());

And I am logged on a node that is still working on a reducer. However,
it stopped printing "Key:" long ago, so it is not processing new keys.

But looking more closely at "top" on this node, there are *two* linux
processes going at 100% CPU. The first is java, which, using "jps -l"
I realize is "Child", but the second is a process called "setdest",
which I strongly suspect has to do with my Hadoop job.

What is "setdest", and what is it actually doing? And why is it taking so long?

cheers,

Rob Stewart



On 11 December 2010 12:26, Harsh J <qw...@gmail.com> wrote:
> On Sat, Dec 11, 2010 at 5:25 PM, Rob Stewart
> <ro...@googlemail.com> wrote:
>> Oh,
>>
>> I should add, of the Java processes running on the remaining nodes for
>> the final wave of reducers, the one taking all the CPU is the "Child"
>> process (not TaskTracker). I log into the Master, and also, the Java
>> process taking all the CPU is "Child".
>>
>> Is this normal?
>
> Yes, "Child" is the Task JVM.
>
>>
>> thanks,
>> Rob
>>
>> On 11 December 2010 11:38, Rob Stewart <ro...@googlemail.com> wrote:
>>> Hi, many thanks for your response.
>>>
>>> A few observations:
>>> - I know that for a fact my key distribution is quite radically skewed
>>> (some keys with *many* value, most keys with few).
>>> - I have overlooked the fact that I need a partitioner. I suspect that
>>> this will help dramatically.
>>>
>>> I realize that the number of partitions should equal the number of
>>> reducers (e.g. 100).
>>>
>>> So if here are my <key>,<values> (where values is a count):
>>> <the>,<500>
>>> <a>,<1000>
>>> <the cat>,<20>
>>> <the cat sat on the mat>,<1>
>>>
>>> and I have 3 reducers, how do I make:
>>> Reducer-1: <the>
>>> Reducer-2: <a>
>>> Reducer-3: <the cat> & <the cat sat on the mat>
>>>
>>>
>>> thanks,
>>>
>>> Rob
>>>
>>> On 11 December 2010 11:12, Harsh J <qw...@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> Certain reducers may receive a higher share of data than others
>>>> (Depending on your data/key distribution, the partition function,
>>>> etc.). Compare the longer reduce tasks' counters with the quicker
>>>> ones.
>>>>
>>>> Are you sure that the reducers that take long are definitely the last
>>>> wave, as in with IDs of 180-200 (and not a random bunch of reduce
>>>> tasks taking longer)?
>>>>
>>>> Also take a look at the logs, and the machines that run these
>>>> particular reducers -- ensure nothing is wrong on them.
>>>>
>>>> There's nothing specifically written in Hadoop for the "last wave" of
>>>> Reduce tasks to take longer. Each reducer writes to its own file, and
>>>> is completely independent.
>>>>
>>>> --
>>>> Harsh J
>>>> www.harshj.com
>>>>
>>>
>>
>
>
>
> --
> Harsh J
> www.harshj.com
>

Re: Slow final few reducers

Posted by Harsh J <qw...@gmail.com>.

On Sat, Dec 11, 2010 at 5:25 PM, Rob Stewart
<ro...@googlemail.com> wrote:
> Oh,
>
> I should add, of the Java processes running on the remaining nodes for
> the final wave of reducers, the one taking all the CPU is the "Child"
> process (not TaskTracker). I log into the Master, and also, the Java
> process taking all the CPU is "Child".
>
> Is this normal?

Yes, "Child" is the Task JVM.

>
> thanks,
> Rob
>
> On 11 December 2010 11:38, Rob Stewart <ro...@googlemail.com> wrote:
>> Hi, many thanks for your response.
>>
>> A few observations:
>> - I know that for a fact my key distribution is quite radically skewed
>> (some keys with *many* value, most keys with few).
>> - I have overlooked the fact that I need a partitioner. I suspect that
>> this will help dramatically.
>>
>> I realize that the number of partitions should equal the number of
>> reducers (e.g. 100).
>>
>> So if here are my <key>,<values> (where values is a count):
>> <the>,<500>
>> <a>,<1000>
>> <the cat>,<20>
>> <the cat sat on the mat>,<1>
>>
>> and I have 3 reducers, how do I make:
>> Reducer-1: <the>
>> Reducer-2: <a>
>> Reducer-3: <the cat> & <the cat sat on the mat>
>>
>>
>> thanks,
>>
>> Rob
>>
>> On 11 December 2010 11:12, Harsh J <qw...@gmail.com> wrote:
>>> Hi,
>>>
>>> Certain reducers may receive a higher share of data than others
>>> (Depending on your data/key distribution, the partition function,
>>> etc.). Compare the longer reduce tasks' counters with the quicker
>>> ones.
>>>
>>> Are you sure that the reducers that take long are definitely the last
>>> wave, as in with IDs of 180-200 (and not a random bunch of reduce
>>> tasks taking longer)?
>>>
>>> Also take a look at the logs, and the machines that run these
>>> particular reducers -- ensure nothing is wrong on them.
>>>
>>> There's nothing specifically written in Hadoop for the "last wave" of
>>> Reduce tasks to take longer. Each reducer writes to its own file, and
>>> is completely independent.
>>>
>>> --
>>> Harsh J
>>> www.harshj.com
>>>
>>
>



-- 
Harsh J
www.harshj.com

Re: Slow final few reducers

Posted by Rob Stewart <ro...@googlemail.com>.

Oh,

I should add, of the Java processes running on the remaining nodes for
the final wave of reducers, the one taking all the CPU is the "Child"
process (not TaskTracker). I log into the Master, and also, the Java
process taking all the CPU is "Child".

Is this normal?

thanks,
Rob

On 11 December 2010 11:38, Rob Stewart <ro...@googlemail.com> wrote:
> Hi, many thanks for your response.
>
> A few observations:
> - I know that for a fact my key distribution is quite radically skewed
> (some keys with *many* value, most keys with few).
> - I have overlooked the fact that I need a partitioner. I suspect that
> this will help dramatically.
>
> I realize that the number of partitions should equal the number of
> reducers (e.g. 100).
>
> So if here are my <key>,<values> (where values is a count):
> <the>,<500>
> <a>,<1000>
> <the cat>,<20>
> <the cat sat on the mat>,<1>
>
> and I have 3 reducers, how do I make:
> Reducer-1: <the>
> Reducer-2: <a>
> Reducer-3: <the cat> & <the cat sat on the mat>
>
>
> thanks,
>
> Rob
>
> On 11 December 2010 11:12, Harsh J <qw...@gmail.com> wrote:
>> Hi,
>>
>> Certain reducers may receive a higher share of data than others
>> (Depending on your data/key distribution, the partition function,
>> etc.). Compare the longer reduce tasks' counters with the quicker
>> ones.
>>
>> Are you sure that the reducers that take long are definitely the last
>> wave, as in with IDs of 180-200 (and not a random bunch of reduce
>> tasks taking longer)?
>>
>> Also take a look at the logs, and the machines that run these
>> particular reducers -- ensure nothing is wrong on them.
>>
>> There's nothing specifically written in Hadoop for the "last wave" of
>> Reduce tasks to take longer. Each reducer writes to its own file, and
>> is completely independent.
>>
>> --
>> Harsh J
>> www.harshj.com
>>
>

Re: Slow final few reducers

Posted by Rob Stewart <ro...@googlemail.com>.

Hi, many thanks for your response.

A few observations:
- I know that for a fact my key distribution is quite radically skewed
(some keys with *many* value, most keys with few).
- I have overlooked the fact that I need a partitioner. I suspect that
this will help dramatically.

I realize that the number of partitions should equal the number of
reducers (e.g. 100).

So if here are my <key>,<values> (where values is a count):
<the>,<500>
<a>,<1000>
<the cat>,<20>
<the cat sat on the mat>,<1>

and I have 3 reducers, how do I make:
Reducer-1: <the>
Reducer-2: <a>
Reducer-3: <the cat> & <the cat sat on the mat>


thanks,

Rob

On 11 December 2010 11:12, Harsh J <qw...@gmail.com> wrote:
> Hi,
>
> Certain reducers may receive a higher share of data than others
> (Depending on your data/key distribution, the partition function,
> etc.). Compare the longer reduce tasks' counters with the quicker
> ones.
>
> Are you sure that the reducers that take long are definitely the last
> wave, as in with IDs of 180-200 (and not a random bunch of reduce
> tasks taking longer)?
>
> Also take a look at the logs, and the machines that run these
> particular reducers -- ensure nothing is wrong on them.
>
> There's nothing specifically written in Hadoop for the "last wave" of
> Reduce tasks to take longer. Each reducer writes to its own file, and
> is completely independent.
>
> --
> Harsh J
> www.harshj.com
>

Re: Slow final few reducers

Posted by Harsh J <qw...@gmail.com>.

Hi,

Certain reducers may receive a higher share of data than others
(Depending on your data/key distribution, the partition function,
etc.). Compare the longer reduce tasks' counters with the quicker
ones.

Are you sure that the reducers that take long are definitely the last
wave, as in with IDs of 180-200 (and not a random bunch of reduce
tasks taking longer)?

Also take a look at the logs, and the machines that run these
particular reducers -- ensure nothing is wrong on them.

There's nothing specifically written in Hadoop for the "last wave" of
Reduce tasks to take longer. Each reducer writes to its own file, and
is completely independent.

-- 
Harsh J
www.harshj.com

Re: Slow final few reducers

Posted by Ted Dunning <td...@maprtech.com>.

The job history program tells you this.  The syntax is hideous, but there is
a parser provided.

On Sat, Dec 11, 2010 at 8:23 AM, Mithila Nagendra <mn...@asu.edu> wrote:

> Just curious and off topic :) How do you find the time taken by each
> reducer? What command/method do you use? I need that for my research.
>

Re: Slow final few reducers

Posted by Mithila Nagendra <mn...@asu.edu>.

Hi Rob,

Just curious and off topic :) How do you find the time taken by each
reducer? What command/method do you use? I need that for my research.

Thanks,
Mithila

On Sat, Dec 11, 2010 at 4:05 AM, Rob Stewart <ro...@googlemail.com>wrote:

> Hi,
>
> I have a problem with a MapReduce job I am trying to run on a 32 node
> cluster.
>
> The final few reducers take a *lot* longer than the rest. e.g. If I
> specify 100 reducers, the first 90 will complete in 5 minutes, and
> then the remaining 10 reducers might take 10 minutes.
>
> Same is true for any number of reducers... 200 reducers: 180/190 will
> complete in 5 minutes, and the last 10/20 will take 10 minutes.
>
> Is this normal Hadoop behavior? I know that the output of the Reducer
> function is not sorted, so can't figure out why this decline of
> performance at the tail end of the job?
>
> thanks,
>
> Rob Stewart
>