You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Mayuran Yogarajah <ma...@casalemedia.com> on 2009/08/12 22:15:26 UTC

I've probably hit some system limits

I had 3 jobs running and I saw something a bit odd.  Two of the tasks
are reducing, one of them is using all the reducers so the other is
waiting, this is OK.  However the 3rd job is still in the mapping phase
and even though the web interface shows map capacity at 96 I only
see about 7-12 mappers actually running.  I'm wondering if theres
some setting I need to change, perhaps I've hit some system limit.  Can
someone point me in the right direction please?

The other thing was that with the two jobs that are in the reducing phase,
the reducer for one job wouldn't actually start until all the mappers of the
_other_ job completed which seems kind of odd.  Is this expected ?

thanks,
M

Re: I've probably hit some system limits

Posted by Amandeep Khurana <am...@gmail.com>.

On Wed, Aug 12, 2009 at 2:14 PM, Mayuran Yogarajah <
mayuran.yogarajah@casalemedia.com> wrote:

> Hello,
>
> Amandeep Khurana wrote:
>
>> So you are running 16 map tasks per node? Plus 2 reducers?
>>
>>
> Thats correct.
>
>> I think that's high. With 6gb RAM, you should be looking at around 2
>> map tasks plus 1 reducer...
>> I have 9 nodes with quad core + 8gb RAM and I run 2M+1R on each node..
>>
>>
>>
> I thought the number of maps should be set to 1/2 - 2 * number of cpus,
> thats why
> we set it so high.  Right now I've set:
> mapred.tasktracker.map.tasks.maximum = 16
> mapred.tasktracker.reduce.tasks.maximum = 16
>

Its 2*number of nodes
Moreover, its not only the CPU's, but also the RAM that matters.. Plus I/O..
Now, I'm not sure if you are I/O bound on this job or not, but thats also a
consideration.

Reduce the number to 2+1 and see how it goes. Once things work stably,
increase the mapper by 2 and see.. You'll have to try a few times before
you'll find out the optimal number for your setup.

>
> So the max mappers/reducers is 96/96.
>
>  How much heap size have you given your hadoop instance?
>>
>> Also, is there a lot of processing going on in the mappers and reducers?
>>
>>
>>
> Yes these are pretty intensive jobs.
>
> thanks,
> M
>

Re: I've probably hit some system limits

Posted by Mayuran Yogarajah <ma...@casalemedia.com>.

Hello,

Amandeep Khurana wrote:
> So you are running 16 map tasks per node? Plus 2 reducers?
>   
Thats correct.
> I think that's high. With 6gb RAM, you should be looking at around 2
> map tasks plus 1 reducer...
> I have 9 nodes with quad core + 8gb RAM and I run 2M+1R on each node..
>
>   
I thought the number of maps should be set to 1/2 - 2 * number of cpus, 
thats why
we set it so high.  Right now I've set:
mapred.tasktracker.map.tasks.maximum = 16
mapred.tasktracker.reduce.tasks.maximum = 16

So the max mappers/reducers is 96/96.

> How much heap size have you given your hadoop instance?
>
> Also, is there a lot of processing going on in the mappers and reducers?
>
>   
Yes these are pretty intensive jobs.

thanks,
M

Re: I've probably hit some system limits

Posted by Amandeep Khurana <am...@gmail.com>.

So you are running 16 map tasks per node? Plus 2 reducers?
I think that's high. With 6gb RAM, you should be looking at around 2
map tasks plus 1 reducer...
I have 9 nodes with quad core + 8gb RAM and I run 2M+1R on each node..

How much heap size have you given your hadoop instance?

Also, is there a lot of processing going on in the mappers and reducers?

On 8/12/09, Mayuran Yogarajah <ma...@casalemedia.com> wrote:
> Amandeep Khurana wrote:
>>
>> Ah.. That might be the issue.. I dont know the solution to this.. Wait for
>> someone else to answer. The mappers not starting could be because of this
>> as
>> well.
>>
>> Whats your cluster configuration? How many cpu's, RAM etc...?
>>
>>
> There are 6 servers in the cluster, they're all the same hardware
> cpu/ram wise: 2xquad core
> and 6gigs of ram.
>
> thanks,
> M
>
>
>

-- 

Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz

Re: I've probably hit some system limits

Posted by Bhupesh Bansal <bb...@linkedin.com>.

Hey Mayuran, 

One reason might be that the input data is available only on few nodes and
Hence only that node is being used for mappers .. You should be able to run
A dfs fsck and see for the input path how many actual replicas do you have.

Otherwise go to the slaves and take a thread-dump for all java child
processes ?? (kill -3) The threaddump will go into hadoop logs and you can
look them through hadoop UI if the mappers are getting stuck somewhere.

Best
Bhupesh

On 8/12/09 1:36 PM, "Mayuran Yogarajah" <ma...@casalemedia.com>
wrote:

> Amandeep Khurana wrote:
>> 
>> Ah.. That might be the issue.. I dont know the solution to this.. Wait for
>> someone else to answer. The mappers not starting could be because of this as
>> well.
>> 
>> Whats your cluster configuration? How many cpu's, RAM etc...?
>> 
>>   
> There are 6 servers in the cluster, they're all the same hardware
> cpu/ram wise: 2xquad core
> and 6gigs of ram.
> 
> thanks,
> M
> 
>

Re: I've probably hit some system limits

Posted by Mayuran Yogarajah <ma...@casalemedia.com>.

Amandeep Khurana wrote:
>
> Ah.. That might be the issue.. I dont know the solution to this.. Wait for
> someone else to answer. The mappers not starting could be because of this as
> well.
>
> Whats your cluster configuration? How many cpu's, RAM etc...?
>
>   
There are 6 servers in the cluster, they're all the same hardware 
cpu/ram wise: 2xquad core
and 6gigs of ram.

thanks,
M

Re: I've probably hit some system limits

Posted by Amandeep Khurana <am...@gmail.com>.

On Wed, Aug 12, 2009 at 1:27 PM, Mayuran Yogarajah <
mayuran.yogarajah@casalemedia.com> wrote:

> Hello,
>
> Amandeep Khurana wrote:
>
>> On Wed, Aug 12, 2009 at 1:15 PM, Mayuran Yogarajah <
>> mayuran.yogarajah@casalemedia.com> wrote:
>>
>>
>>
>>> I had 3 jobs running and I saw something a bit odd.  Two of the tasks
>>> are reducing, one of them is using all the reducers so the other is
>>> waiting, this is OK.  However the 3rd job is still in the mapping phase
>>> and even though the web interface shows map capacity at 96 I only
>>> see about 7-12 mappers actually running.  I'm wondering if theres
>>> some setting I need to change, perhaps I've hit some system limit.  Can
>>> someone point me in the right direction please?
>>>
>>>
>>>
>>
>> Are there any pending mappers remaining? Are you using any scheduler?
>>
>>
>>
> Yes there were pending mappers remaining, I'm not using any scheduler.
>

>
>>
>>> The other thing was that with the two jobs that are in the reducing
>>> phase,
>>> the reducer for one job wouldn't actually start until all the mappers of
>>> the
>>> _other_ job completed which seems kind of odd.  Is this expected ?
>>>
>>>
>>>
>>
>> Reducers dont really start the "reduce" phase till the mappers are
>> completed. However, the process gets spawned off and the copying of the
>> intermediate keys from the mapper starts off.
>>
>>
>>
>>
> That was my understanding for the same job but this was across two
> different jobs.  There
> were no reduce tasks running from job #2 until all of the map jobs of job
> #1 completed.
>
> On a side note I just saw this in the task tracker log, I don't know if its
> related:
> INFO org.mortbay.http.SocketListener: LOW ON THREADS ((40-40+0)<1) on
> SocketListener0@0.0.0.0:50060
> WARN org.mortbay.http.SocketListener: OUT OF THREADS:
> SocketListener0@0.0.0.0:50060
>

Ah.. That might be the issue.. I dont know the solution to this.. Wait for
someone else to answer. The mappers not starting could be because of this as
well.

Whats your cluster configuration? How many cpu's, RAM etc...?


>
>
> thanks,
> M
>
>

Re: I've probably hit some system limits

Posted by Mayuran Yogarajah <ma...@casalemedia.com>.

Hello,

Amandeep Khurana wrote:
> On Wed, Aug 12, 2009 at 1:15 PM, Mayuran Yogarajah <
> mayuran.yogarajah@casalemedia.com> wrote:
>
>   
>> I had 3 jobs running and I saw something a bit odd.  Two of the tasks
>> are reducing, one of them is using all the reducers so the other is
>> waiting, this is OK.  However the 3rd job is still in the mapping phase
>> and even though the web interface shows map capacity at 96 I only
>> see about 7-12 mappers actually running.  I'm wondering if theres
>> some setting I need to change, perhaps I've hit some system limit.  Can
>> someone point me in the right direction please?
>>
>>     
>
> Are there any pending mappers remaining? Are you using any scheduler?
>
>   
Yes there were pending mappers remaining, I'm not using any scheduler.
>   
>> The other thing was that with the two jobs that are in the reducing phase,
>> the reducer for one job wouldn't actually start until all the mappers of
>> the
>> _other_ job completed which seems kind of odd.  Is this expected ?
>>
>>     
>
> Reducers dont really start the "reduce" phase till the mappers are
> completed. However, the process gets spawned off and the copying of the
> intermediate keys from the mapper starts off.
>
>
>   
That was my understanding for the same job but this was across two 
different jobs.  There
were no reduce tasks running from job #2 until all of the map jobs of 
job #1 completed.

On a side note I just saw this in the task tracker log, I don't know if 
its related:
INFO org.mortbay.http.SocketListener: LOW ON THREADS ((40-40+0)<1) on 
SocketListener0@0.0.0.0:50060
WARN org.mortbay.http.SocketListener: OUT OF THREADS: 
SocketListener0@0.0.0.0:50060


thanks,
M

Re: I've probably hit some system limits

Posted by Amandeep Khurana <am...@gmail.com>.

On Wed, Aug 12, 2009 at 1:15 PM, Mayuran Yogarajah <
mayuran.yogarajah@casalemedia.com> wrote:

> I had 3 jobs running and I saw something a bit odd.  Two of the tasks
> are reducing, one of them is using all the reducers so the other is
> waiting, this is OK.  However the 3rd job is still in the mapping phase
> and even though the web interface shows map capacity at 96 I only
> see about 7-12 mappers actually running.  I'm wondering if theres
> some setting I need to change, perhaps I've hit some system limit.  Can
> someone point me in the right direction please?
>

Are there any pending mappers remaining? Are you using any scheduler?


>
> The other thing was that with the two jobs that are in the reducing phase,
> the reducer for one job wouldn't actually start until all the mappers of
> the
> _other_ job completed which seems kind of odd.  Is this expected ?
>

Reducers dont really start the "reduce" phase till the mappers are
completed. However, the process gets spawned off and the copying of the
intermediate keys from the mapper starts off.


> thanks,
> M
>