You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by himanshu chandola <hi...@yahoo.com> on 2009/09/12 07:01:42 UTC

nodes lying idle

Hi everyone,
Ive a cluster of 40 nodes. The input file has 2^18 lines and every line is an input to a map job. Every node is a quad core and hence I've set mapred.tasktracker.map/reduce.tasks.maximum to a value greater than 4. The first 20 nodes are showing hadoop jobs taking 100% but with only one process running while since its a quad core I would've liked to see 4 java processes taking 100% (there are 5 java processes on this system but 4 are idle and only one is taking 100% or 1 cpu). For the last half of the nodes, the cpu usage of hadoop processes is 0. This is really strange since my map tasks are processing in a very slow way and I wouldve liked to use all nodes and all the cores. 

What could possibly be wrong ? It would really help if anyone could suggest .

thanks

H

 Morpheus: Do you believe in fate, Neo?
Neo: No.
Morpheus: Why Not?
Neo: Because I don't like the idea that I'm not in control of my life.



      

Re: nodes lying idle

Posted by himanshu chandola <hi...@yahoo.com>.
So, I used JobConf.setNumMapTasks and it worked. I used setNumMapTasks(40) and I ended up with 100 maps rather than the 6 I had initially. 
The size of my data is 32 mb but every line is converted into an object and the computations are cpu intensive so I would like to have as many map jobs as there are cores.
There is no xml entry of the type map.tasks.maximum. I'm using cloudera's distribution 0.18.3-14.

 Morpheus: Do you believe in fate, Neo?
Neo: No.
Morpheus: Why Not?
Neo: Because I don't like the idea that I'm not in control of my life.



----- Original Message ----
From: Chandraprakash Bhagtani <cp...@gmail.com>
To: general@hadoop.apache.org
Sent: Saturday, September 12, 2009 8:22:13 AM
Subject: Re: nodes lying idle

no *mapred.tasktracker.map/reduce.tasks.maximum* value is for single
datanode. i.e. this many mappers and reducers will run on single datanode.
for example if you have set

*mapred.tasktracker.map.tasks.maximum = 4
**mapred.tasktracker.reduce.tasks.maximum = 4*
and no. of datanodes = 40

then entire cluster's map task capacity = 4*40 = 160
*map.tasks.maximum* = 6 means only 6 maps will run for your job which will
definitely not use all of your cluster resources.

what is the size of your data? and what is your cluster specifications?

-- 
Thanks & Regards,
Chandra Prakash Bhagtani,


On Sat, Sep 12, 2009 at 12:25 PM, himanshu chandola <
himanshu_coolguy@yahoo.com> wrote:

> Thanks for the tip.
> So is the value mapred.tasktracker.map/reduce.tasks.maximum for the entire
> cluster ? I had set the map.tasks.maximum to 6 and hitting the web interface
> it shows up that total map tasks for my job is just 6. My tasks are cpu
> intensive and hence I would like each of my quad core nodes to be running 4
> hadoop map tasks atleast . The whole cluster is running just 6 and each of
> these 6 nodes is  running 1 each.
>
>  Morpheus: Do you believe in fate, Neo?
> Neo: No.
> Morpheus: Why Not?
> Neo: Because I don't like the idea that I'm not in control of my life.
>
>
>
> ----- Original Message ----
> From: Chandraprakash Bhagtani <cp...@gmail.com>
> To: general@hadoop.apache.org
> Sent: Saturday, September 12, 2009 1:49:41 AM
> Subject: Re: nodes lying idle
>
> You need to check your cluster's Map/Reduce task  capacity. i.e. how many
> Map/Reduce task can run on cluster at once. You can check it on
> http://JobtrackerServerIP:50030. You should also check total number of map
> tasks in your job. It should be greater than map task capacity of the
> cluster.
>
> Intially reduce tasks will be idle till first batch of map task complete.
> --
> Thanks & Regards,
> Chandra Prakash Bhagtani,
>
> On Sat, Sep 12, 2009 at 10:31 AM, himanshu chandola <
> himanshu_coolguy@yahoo.com> wrote:
>
> > Hi everyone,
> > Ive a cluster of 40 nodes. The input file has 2^18 lines and every line
> is
> > an input to a map job. Every node is a quad core and hence I've set
> > mapred.tasktracker.map/reduce.tasks.maximum to a value greater than 4.
> The
> > first 20 nodes are showing hadoop jobs taking 100% but with only one
> process
> > running while since its a quad core I would've liked to see 4 java
> processes
> > taking 100% (there are 5 java processes on this system but 4 are idle and
> > only one is taking 100% or 1 cpu). For the last half of the nodes, the
> cpu
> > usage of hadoop processes is 0. This is really strange since my map tasks
> > are processing in a very slow way and I wouldve liked to use all nodes
> and
> > all the cores.
> >
> > What could possibly be wrong ? It would really help if anyone could
> suggest
> > .
> >
> > thanks
> >
> > H
> >
> >  Morpheus: Do you believe in fate, Neo?
> > Neo: No.
> > Morpheus: Why Not?
> > Neo: Because I don't like the idea that I'm not in control of my life.
> >
> >
> >
> >
> >
>
>
>
>
>



      

Re: nodes lying idle

Posted by Chandraprakash Bhagtani <cp...@gmail.com>.
no *mapred.tasktracker.map/reduce.tasks.maximum* value is for single
datanode. i.e. this many mappers and reducers will run on single datanode.
for example if you have set

*mapred.tasktracker.map.tasks.maximum = 4
**mapred.tasktracker.reduce.tasks.maximum = 4*
and no. of datanodes = 40

then entire cluster's map task capacity = 4*40 = 160
*map.tasks.maximum* = 6 means only 6 maps will run for your job which will
definitely not use all of your cluster resources.

what is the size of your data? and what is your cluster specifications?

-- 
Thanks & Regards,
Chandra Prakash Bhagtani,


On Sat, Sep 12, 2009 at 12:25 PM, himanshu chandola <
himanshu_coolguy@yahoo.com> wrote:

> Thanks for the tip.
> So is the value mapred.tasktracker.map/reduce.tasks.maximum for the entire
> cluster ? I had set the map.tasks.maximum to 6 and hitting the web interface
> it shows up that total map tasks for my job is just 6. My tasks are cpu
> intensive and hence I would like each of my quad core nodes to be running 4
> hadoop map tasks atleast . The whole cluster is running just 6 and each of
> these 6 nodes is  running 1 each.
>
>  Morpheus: Do you believe in fate, Neo?
> Neo: No.
> Morpheus: Why Not?
> Neo: Because I don't like the idea that I'm not in control of my life.
>
>
>
> ----- Original Message ----
> From: Chandraprakash Bhagtani <cp...@gmail.com>
> To: general@hadoop.apache.org
> Sent: Saturday, September 12, 2009 1:49:41 AM
> Subject: Re: nodes lying idle
>
> You need to check your cluster's Map/Reduce task  capacity. i.e. how many
> Map/Reduce task can run on cluster at once. You can check it on
> http://JobtrackerServerIP:50030. You should also check total number of map
> tasks in your job. It should be greater than map task capacity of the
> cluster.
>
> Intially reduce tasks will be idle till first batch of map task complete.
> --
> Thanks & Regards,
> Chandra Prakash Bhagtani,
>
> On Sat, Sep 12, 2009 at 10:31 AM, himanshu chandola <
> himanshu_coolguy@yahoo.com> wrote:
>
> > Hi everyone,
> > Ive a cluster of 40 nodes. The input file has 2^18 lines and every line
> is
> > an input to a map job. Every node is a quad core and hence I've set
> > mapred.tasktracker.map/reduce.tasks.maximum to a value greater than 4.
> The
> > first 20 nodes are showing hadoop jobs taking 100% but with only one
> process
> > running while since its a quad core I would've liked to see 4 java
> processes
> > taking 100% (there are 5 java processes on this system but 4 are idle and
> > only one is taking 100% or 1 cpu). For the last half of the nodes, the
> cpu
> > usage of hadoop processes is 0. This is really strange since my map tasks
> > are processing in a very slow way and I wouldve liked to use all nodes
> and
> > all the cores.
> >
> > What could possibly be wrong ? It would really help if anyone could
> suggest
> > .
> >
> > thanks
> >
> > H
> >
> >  Morpheus: Do you believe in fate, Neo?
> > Neo: No.
> > Morpheus: Why Not?
> > Neo: Because I don't like the idea that I'm not in control of my life.
> >
> >
> >
> >
> >
>
>
>
>
>

Re: nodes lying idle

Posted by himanshu chandola <hi...@yahoo.com>.
Thanks for the tip. 
So is the value mapred.tasktracker.map/reduce.tasks.maximum for the entire cluster ? I had set the map.tasks.maximum to 6 and hitting the web interface it shows up that total map tasks for my job is just 6. My tasks are cpu intensive and hence I would like each of my quad core nodes to be running 4 hadoop map tasks atleast . The whole cluster is running just 6 and each of these 6 nodes is  running 1 each.

 Morpheus: Do you believe in fate, Neo?
Neo: No.
Morpheus: Why Not?
Neo: Because I don't like the idea that I'm not in control of my life.



----- Original Message ----
From: Chandraprakash Bhagtani <cp...@gmail.com>
To: general@hadoop.apache.org
Sent: Saturday, September 12, 2009 1:49:41 AM
Subject: Re: nodes lying idle

You need to check your cluster's Map/Reduce task  capacity. i.e. how many
Map/Reduce task can run on cluster at once. You can check it on
http://JobtrackerServerIP:50030. You should also check total number of map
tasks in your job. It should be greater than map task capacity of the
cluster.

Intially reduce tasks will be idle till first batch of map task complete.
-- 
Thanks & Regards,
Chandra Prakash Bhagtani,

On Sat, Sep 12, 2009 at 10:31 AM, himanshu chandola <
himanshu_coolguy@yahoo.com> wrote:

> Hi everyone,
> Ive a cluster of 40 nodes. The input file has 2^18 lines and every line is
> an input to a map job. Every node is a quad core and hence I've set
> mapred.tasktracker.map/reduce.tasks.maximum to a value greater than 4. The
> first 20 nodes are showing hadoop jobs taking 100% but with only one process
> running while since its a quad core I would've liked to see 4 java processes
> taking 100% (there are 5 java processes on this system but 4 are idle and
> only one is taking 100% or 1 cpu). For the last half of the nodes, the cpu
> usage of hadoop processes is 0. This is really strange since my map tasks
> are processing in a very slow way and I wouldve liked to use all nodes and
> all the cores.
>
> What could possibly be wrong ? It would really help if anyone could suggest
> .
>
> thanks
>
> H
>
>  Morpheus: Do you believe in fate, Neo?
> Neo: No.
> Morpheus: Why Not?
> Neo: Because I don't like the idea that I'm not in control of my life.
>
>
>
>
>



      

Re: nodes lying idle

Posted by Chandraprakash Bhagtani <cp...@gmail.com>.
You need to check your cluster's Map/Reduce task  capacity. i.e. how many
Map/Reduce task can run on cluster at once. You can check it on
http://JobtrackerServerIP:50030. You should also check total number of map
tasks in your job. It should be greater than map task capacity of the
cluster.

Intially reduce tasks will be idle till first batch of map task complete.
-- 
Thanks & Regards,
Chandra Prakash Bhagtani,

On Sat, Sep 12, 2009 at 10:31 AM, himanshu chandola <
himanshu_coolguy@yahoo.com> wrote:

> Hi everyone,
> Ive a cluster of 40 nodes. The input file has 2^18 lines and every line is
> an input to a map job. Every node is a quad core and hence I've set
> mapred.tasktracker.map/reduce.tasks.maximum to a value greater than 4. The
> first 20 nodes are showing hadoop jobs taking 100% but with only one process
> running while since its a quad core I would've liked to see 4 java processes
> taking 100% (there are 5 java processes on this system but 4 are idle and
> only one is taking 100% or 1 cpu). For the last half of the nodes, the cpu
> usage of hadoop processes is 0. This is really strange since my map tasks
> are processing in a very slow way and I wouldve liked to use all nodes and
> all the cores.
>
> What could possibly be wrong ? It would really help if anyone could suggest
> .
>
> thanks
>
> H
>
>  Morpheus: Do you believe in fate, Neo?
> Neo: No.
> Morpheus: Why Not?
> Neo: Because I don't like the idea that I'm not in control of my life.
>
>
>
>
>