You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by jamborta <ja...@gmail.com> on 2010/05/05 00:54:14 UTC

new to hadoop

Hi,

I am tring to set up a small hadoop cluster with 6 machines. the problem I
have now is that if I set the memory allocated to a task low (e.g -Xmx512m)
the application does not run, if I set it higher some machines in the
cluster only have not got too much memory (1 or 2GB) and when the
computation gets intensive hadoop create so many jobs and send them to these
weaker machines, which brings the whole cluster down. 
my question is whether it is possible to specify -Xmx for each machine in
the cluster and specify how many task can run on a machine. or what is the
optimal setting in this situation?

thanks for your help

Tom

-- 
View this message in context: http://old.nabble.com/new-to-hadoop-tp28454028p28454028.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Re: new to hadoop

Posted by Tamas Jambor <ja...@googlemail.com>.
great. thank you. I'll set it up that way.

Tom

On 05/05/2010 00:37, Ravi Phulari wrote:
> How much RAM ?
> With 6-8GB RAM you can go for 4 mappers and 2 reducers (this is my 
> personal guess).
>
> -
> Ravi
>
> On 5/4/10 4:33 PM, "Tamas Jambor" <ja...@googlemail.com> wrote:
>
>     thank you. so what would be the optimal setting for
>     mapred.map.tasks and mapred.reduce.tasks, say, on a dual-core machine?
>
>     Tom
>
>     On 05/05/2010 00:12, Ravi Phulari wrote:
>
>         Re: new to hadoop You can configure (conf/hadoop-env.sh)
>         configuration files on each node to specify --Xmx values.
>         You can use conf/mapred-site.xml to configure default mappers
>         and reducers running on a node.
>
>         <property>
>         <name>mapred.map.tasks</name>
>         <value>2</value>
>         <description>The default number of map tasks per job.
>           Ignored when mapred.job.tracker is "local".
>         </description>
>         </property>
>
>         <property>
>         <name>mapred.reduce.tasks</name>
>         <value>1</value>
>         <description>The default number of reduce tasks per job.
>         Typically set to 99%
>           of the cluster's reduce capacity, so that if a node fails
>         the reduces can
>           still be executed in a single wave.
>           Ignored when mapred.job.tracker is "local".
>         </description>
>         </property>
>
>
>         -
>         Ravi
>
>         On 5/4/10 3:54 PM, "jamborta" <ja...@gmail.com> wrote:
>
>
>
>
>             Hi,
>
>             I am tring to set up a small hadoop cluster with 6
>             machines. the problem I
>             have now is that if I set the memory allocated to a task
>             low (e.g -Xmx512m)
>             the application does not run, if I set it higher some
>             machines in the
>             cluster only have not got too much memory (1 or 2GB) and
>             when the
>             computation gets intensive hadoop create so many jobs and
>             send them to these
>             weaker machines, which brings the whole cluster down.
>             my question is whether it is possible to specify -Xmx for
>             each machine in
>             the cluster and specify how many task can run on a
>             machine. or what is the
>             optimal setting in this situation?
>
>             thanks for your help
>
>             Tom
>
>             --
>             View this message in context:
>             http://old.nabble.com/new-to-hadoop-tp28454028p28454028.html
>             Sent from the Hadoop core-user mailing list archive at
>             Nabble.com.
>
>
>
>
>         Ravi
>
>
> Ravi
> -- 
>


Re: new to hadoop

Posted by Ravi Phulari <rp...@yahoo-inc.com>.
How much RAM ?
With 6-8GB RAM you can go for 4 mappers and 2 reducers (this is my personal guess).

-
Ravi

On 5/4/10 4:33 PM, "Tamas Jambor" <ja...@googlemail.com> wrote:

thank you. so what would be the optimal setting for mapred.map.tasks and mapred.reduce.tasks, say, on a dual-core machine?

Tom

On 05/05/2010 00:12, Ravi Phulari wrote:
Re: new to hadoop You can configure (conf/hadoop-env.sh) configuration files on each node to specify -Xmx values.
You can use conf/mapred-site.xml to configure default mappers and reducers running on a node.

<property>
  <name>mapred.map.tasks</name>
  <value>2</value>
  <description>The default number of map tasks per job.
  Ignored when mapred.job.tracker is "local".
  </description>
</property>

<property>
  <name>mapred.reduce.tasks</name>
  <value>1</value>
  <description>The default number of reduce tasks per job. Typically set to 99%
  of the cluster's reduce capacity, so that if a node fails the reduces can
  still be executed in a single wave.
  Ignored when mapred.job.tracker is "local".
  </description>
</property>


-
Ravi

On 5/4/10 3:54 PM, "jamborta" <ja...@gmail.com> wrote:




Hi,

I am tring to set up a small hadoop cluster with 6 machines. the problem I
have now is that if I set the memory allocated to a task low (e.g -Xmx512m)
the application does not run, if I set it higher some machines in the
cluster only have not got too much memory (1 or 2GB) and when the
computation gets intensive hadoop create so many jobs and send them to these
weaker machines, which brings the whole cluster down.
my question is whether it is possible to specify -Xmx for each machine in
the cluster and specify how many task can run on a machine. or what is the
optimal setting in this situation?

thanks for your help

Tom

--
View this message in context: http://old.nabble.com/new-to-hadoop-tp28454028p28454028.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.




Ravi

Ravi
--


Re: new to hadoop

Posted by Tamas Jambor <ja...@googlemail.com>.
thank you. so what would be the optimal setting for mapred.map.tasks and 
mapred.reduce.tasks, say, on a dual-core machine?

Tom

On 05/05/2010 00:12, Ravi Phulari wrote:
> You can configure (conf/hadoop-env.sh) configuration files on each 
> node to specify --Xmx values.
> You can use conf/mapred-site.xml to configure default mappers and 
> reducers running on a node.
>
> <property>
> <name>mapred.map.tasks</name>
> <value>2</value>
> <description>The default number of map tasks per job.
>   Ignored when mapred.job.tracker is "local".
> </description>
> </property>
>
> <property>
> <name>mapred.reduce.tasks</name>
> <value>1</value>
> <description>The default number of reduce tasks per job. Typically set 
> to 99%
>   of the cluster's reduce capacity, so that if a node fails the 
> reduces can
>   still be executed in a single wave.
>   Ignored when mapred.job.tracker is "local".
> </description>
> </property>
>
>
> -
> Ravi
>
> On 5/4/10 3:54 PM, "jamborta" <ja...@gmail.com> wrote:
>
>
>
>     Hi,
>
>     I am tring to set up a small hadoop cluster with 6 machines. the
>     problem I
>     have now is that if I set the memory allocated to a task low (e.g
>     -Xmx512m)
>     the application does not run, if I set it higher some machines in the
>     cluster only have not got too much memory (1 or 2GB) and when the
>     computation gets intensive hadoop create so many jobs and send
>     them to these
>     weaker machines, which brings the whole cluster down.
>     my question is whether it is possible to specify -Xmx for each
>     machine in
>     the cluster and specify how many task can run on a machine. or
>     what is the
>     optimal setting in this situation?
>
>     thanks for your help
>
>     Tom
>
>     --
>     View this message in context:
>     http://old.nabble.com/new-to-hadoop-tp28454028p28454028.html
>     Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>
>
> Ravi
> -- 
>

Re: new to hadoop

Posted by Ravi Phulari <rp...@yahoo-inc.com>.
You can configure (conf/hadoop-env.sh) configuration files on each node to specify -Xmx values.
You can use conf/mapred-site.xml to configure default mappers and reducers running on a node.

<property>
  <name>mapred.map.tasks</name>
  <value>2</value>
  <description>The default number of map tasks per job.
  Ignored when mapred.job.tracker is "local".
  </description>
</property>

<property>
  <name>mapred.reduce.tasks</name>
  <value>1</value>
  <description>The default number of reduce tasks per job. Typically set to 99%
  of the cluster's reduce capacity, so that if a node fails the reduces can
  still be executed in a single wave.
  Ignored when mapred.job.tracker is "local".
  </description>
</property>


-
Ravi

On 5/4/10 3:54 PM, "jamborta" <ja...@gmail.com> wrote:



Hi,

I am tring to set up a small hadoop cluster with 6 machines. the problem I
have now is that if I set the memory allocated to a task low (e.g -Xmx512m)
the application does not run, if I set it higher some machines in the
cluster only have not got too much memory (1 or 2GB) and when the
computation gets intensive hadoop create so many jobs and send them to these
weaker machines, which brings the whole cluster down.
my question is whether it is possible to specify -Xmx for each machine in
the cluster and specify how many task can run on a machine. or what is the
optimal setting in this situation?

thanks for your help

Tom

--
View this message in context: http://old.nabble.com/new-to-hadoop-tp28454028p28454028.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Ravi
--