You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Tarandeep Singh <ta...@gmail.com> on 2010/08/18 23:44:04 UTC

Hadoop not utilizing max reducer capacity + reducer stuck in pending state

Hi,

I am seeing some strange behavior in Hadoop - I am running a small test
cluster with a capacity of 18 mappers and 18 reducers. I fire a lot of jobs
simultaneously and over time I have observed Hadoop is not utilizing all the
18 slots for the reducers.

And now even if I run just one job (no other jobs running), it starts less
than 18 reducers. Initially it was starting all 18 but gradually it
decreased. For example it started only 13 reducers for a job that I just
submitted.

Further, one reducer is stuck in pending state for a very long time. While
all other reducers finished, one reducer was stuck in pending state for at
least 20-30 minutes.

The mappers seem to be doing fine. Any thoughts/suggestions what could be
happening here?

Cluster conf-
1) Master- also runs 4 mappers + 4 reducers
2) 2 slaves- run 7 mappers + 7 reducers

I run ganglia monitoring system and I can tell you system was not overloaded
at any time.

Thanks,
Tarandeep

Re: Hadoop not utilizing max reducer capacity + reducer stuck in pending state

Posted by Tarandeep Singh <ta...@gmail.com>.

ok I missed one thing.. I had turned on the speculative execution...
so this explains why less reducers are running (there are 2 reducers running
for some tasks)..

However, I still could not find why a reducer was stuck at pending state for
a long time... there were no other jobs running and all other reducers had
finished.



On Wed, Aug 18, 2010 at 2:44 PM, Tarandeep Singh <ta...@gmail.com>wrote:

> Hi,
>
> I am seeing some strange behavior in Hadoop - I am running a small test
> cluster with a capacity of 18 mappers and 18 reducers. I fire a lot of jobs
> simultaneously and over time I have observed Hadoop is not utilizing all the
> 18 slots for the reducers.
>
> And now even if I run just one job (no other jobs running), it starts less
> than 18 reducers. Initially it was starting all 18 but gradually it
> decreased. For example it started only 13 reducers for a job that I just
> submitted.
>
> Further, one reducer is stuck in pending state for a very long time. While
> all other reducers finished, one reducer was stuck in pending state for at
> least 20-30 minutes.
>
> The mappers seem to be doing fine. Any thoughts/suggestions what could be
> happening here?
>
> Cluster conf-
> 1) Master- also runs 4 mappers + 4 reducers
> 2) 2 slaves- run 7 mappers + 7 reducers
>
> I run ganglia monitoring system and I can tell you system was not
> overloaded at any time.
>
> Thanks,
> Tarandeep
>
>
>
>