You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by jamborta <ja...@gmail.com> on 2010/05/05 00:54:14 UTC
new to hadoop
Hi,
I am tring to set up a small hadoop cluster with 6 machines. the problem I
have now is that if I set the memory allocated to a task low (e.g -Xmx512m)
the application does not run, if I set it higher some machines in the
cluster only have not got too much memory (1 or 2GB) and when the
computation gets intensive hadoop create so many jobs and send them to these
weaker machines, which brings the whole cluster down.
my question is whether it is possible to specify -Xmx for each machine in
the cluster and specify how many task can run on a machine. or what is the
optimal setting in this situation?
thanks for your help
Tom
--
View this message in context: http://old.nabble.com/new-to-hadoop-tp28454028p28454028.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: new to hadoop
Posted by Tamas Jambor <ja...@googlemail.com>.
great. thank you. I'll set it up that way.
Tom
On 05/05/2010 00:37, Ravi Phulari wrote:
> How much RAM ?
> With 6-8GB RAM you can go for 4 mappers and 2 reducers (this is my
> personal guess).
>
> -
> Ravi
>
> On 5/4/10 4:33 PM, "Tamas Jambor" <ja...@googlemail.com> wrote:
>
> thank you. so what would be the optimal setting for
> mapred.map.tasks and mapred.reduce.tasks, say, on a dual-core machine?
>
> Tom
>
> On 05/05/2010 00:12, Ravi Phulari wrote:
>
> Re: new to hadoop You can configure (conf/hadoop-env.sh)
> configuration files on each node to specify --Xmx values.
> You can use conf/mapred-site.xml to configure default mappers
> and reducers running on a node.
>
> <property>
> <name>mapred.map.tasks</name>
> <value>2</value>
> <description>The default number of map tasks per job.
> Ignored when mapred.job.tracker is "local".
> </description>
> </property>
>
> <property>
> <name>mapred.reduce.tasks</name>
> <value>1</value>
> <description>The default number of reduce tasks per job.
> Typically set to 99%
> of the cluster's reduce capacity, so that if a node fails
> the reduces can
> still be executed in a single wave.
> Ignored when mapred.job.tracker is "local".
> </description>
> </property>
>
>
> -
> Ravi
>
> On 5/4/10 3:54 PM, "jamborta" <ja...@gmail.com> wrote:
>
>
>
>
> Hi,
>
> I am tring to set up a small hadoop cluster with 6
> machines. the problem I
> have now is that if I set the memory allocated to a task
> low (e.g -Xmx512m)
> the application does not run, if I set it higher some
> machines in the
> cluster only have not got too much memory (1 or 2GB) and
> when the
> computation gets intensive hadoop create so many jobs and
> send them to these
> weaker machines, which brings the whole cluster down.
> my question is whether it is possible to specify -Xmx for
> each machine in
> the cluster and specify how many task can run on a
> machine. or what is the
> optimal setting in this situation?
>
> thanks for your help
>
> Tom
>
> --
> View this message in context:
> http://old.nabble.com/new-to-hadoop-tp28454028p28454028.html
> Sent from the Hadoop core-user mailing list archive at
> Nabble.com.
>
>
>
>
> Ravi
>
>
> Ravi
> --
>
Re: new to hadoop
Posted by Ravi Phulari <rp...@yahoo-inc.com>.
How much RAM ?
With 6-8GB RAM you can go for 4 mappers and 2 reducers (this is my personal guess).
-
Ravi
On 5/4/10 4:33 PM, "Tamas Jambor" <ja...@googlemail.com> wrote:
thank you. so what would be the optimal setting for mapred.map.tasks and mapred.reduce.tasks, say, on a dual-core machine?
Tom
On 05/05/2010 00:12, Ravi Phulari wrote:
Re: new to hadoop You can configure (conf/hadoop-env.sh) configuration files on each node to specify -Xmx values.
You can use conf/mapred-site.xml to configure default mappers and reducers running on a node.
<property>
<name>mapred.map.tasks</name>
<value>2</value>
<description>The default number of map tasks per job.
Ignored when mapred.job.tracker is "local".
</description>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>1</value>
<description>The default number of reduce tasks per job. Typically set to 99%
of the cluster's reduce capacity, so that if a node fails the reduces can
still be executed in a single wave.
Ignored when mapred.job.tracker is "local".
</description>
</property>
-
Ravi
On 5/4/10 3:54 PM, "jamborta" <ja...@gmail.com> wrote:
Hi,
I am tring to set up a small hadoop cluster with 6 machines. the problem I
have now is that if I set the memory allocated to a task low (e.g -Xmx512m)
the application does not run, if I set it higher some machines in the
cluster only have not got too much memory (1 or 2GB) and when the
computation gets intensive hadoop create so many jobs and send them to these
weaker machines, which brings the whole cluster down.
my question is whether it is possible to specify -Xmx for each machine in
the cluster and specify how many task can run on a machine. or what is the
optimal setting in this situation?
thanks for your help
Tom
--
View this message in context: http://old.nabble.com/new-to-hadoop-tp28454028p28454028.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.
Ravi
Ravi
--
Re: new to hadoop
Posted by Tamas Jambor <ja...@googlemail.com>.
thank you. so what would be the optimal setting for mapred.map.tasks and
mapred.reduce.tasks, say, on a dual-core machine?
Tom
On 05/05/2010 00:12, Ravi Phulari wrote:
> You can configure (conf/hadoop-env.sh) configuration files on each
> node to specify --Xmx values.
> You can use conf/mapred-site.xml to configure default mappers and
> reducers running on a node.
>
> <property>
> <name>mapred.map.tasks</name>
> <value>2</value>
> <description>The default number of map tasks per job.
> Ignored when mapred.job.tracker is "local".
> </description>
> </property>
>
> <property>
> <name>mapred.reduce.tasks</name>
> <value>1</value>
> <description>The default number of reduce tasks per job. Typically set
> to 99%
> of the cluster's reduce capacity, so that if a node fails the
> reduces can
> still be executed in a single wave.
> Ignored when mapred.job.tracker is "local".
> </description>
> </property>
>
>
> -
> Ravi
>
> On 5/4/10 3:54 PM, "jamborta" <ja...@gmail.com> wrote:
>
>
>
> Hi,
>
> I am tring to set up a small hadoop cluster with 6 machines. the
> problem I
> have now is that if I set the memory allocated to a task low (e.g
> -Xmx512m)
> the application does not run, if I set it higher some machines in the
> cluster only have not got too much memory (1 or 2GB) and when the
> computation gets intensive hadoop create so many jobs and send
> them to these
> weaker machines, which brings the whole cluster down.
> my question is whether it is possible to specify -Xmx for each
> machine in
> the cluster and specify how many task can run on a machine. or
> what is the
> optimal setting in this situation?
>
> thanks for your help
>
> Tom
>
> --
> View this message in context:
> http://old.nabble.com/new-to-hadoop-tp28454028p28454028.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>
>
> Ravi
> --
>
Re: new to hadoop
Posted by Ravi Phulari <rp...@yahoo-inc.com>.
You can configure (conf/hadoop-env.sh) configuration files on each node to specify -Xmx values.
You can use conf/mapred-site.xml to configure default mappers and reducers running on a node.
<property>
<name>mapred.map.tasks</name>
<value>2</value>
<description>The default number of map tasks per job.
Ignored when mapred.job.tracker is "local".
</description>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>1</value>
<description>The default number of reduce tasks per job. Typically set to 99%
of the cluster's reduce capacity, so that if a node fails the reduces can
still be executed in a single wave.
Ignored when mapred.job.tracker is "local".
</description>
</property>
-
Ravi
On 5/4/10 3:54 PM, "jamborta" <ja...@gmail.com> wrote:
Hi,
I am tring to set up a small hadoop cluster with 6 machines. the problem I
have now is that if I set the memory allocated to a task low (e.g -Xmx512m)
the application does not run, if I set it higher some machines in the
cluster only have not got too much memory (1 or 2GB) and when the
computation gets intensive hadoop create so many jobs and send them to these
weaker machines, which brings the whole cluster down.
my question is whether it is possible to specify -Xmx for each machine in
the cluster and specify how many task can run on a machine. or what is the
optimal setting in this situation?
thanks for your help
Tom
--
View this message in context: http://old.nabble.com/new-to-hadoop-tp28454028p28454028.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.
Ravi
--