You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Pramy Bhats <pr...@googlemail.com> on 2010/10/05 15:40:33 UTC

Set number Reducer per machines.

Hi,

I am trying to run a job on my hadoop cluster, where I get consistently get
heap space error.

I increased the heap-space to 4 GB in hadoop-env.sh and reboot the cluster.
However, I still get the heap space error.


One of things, I want to try is to reduce the number of map / reduce process
per machine. Currently each machine can have 2 maps and 2 reduce process
running.


I want to configure the hadoop to run 1 map and 1 reduce per machine to give
more heap space per process.

How can I configure the number of maps and number of reducer per node ?


thanks in advance,
-- Pramod

Re: Set number Reducer per machines.

Posted by ed <ha...@gmail.com>.
Ah yes,

It looks like both the mapper and reducer are using a map structure which
will be created on the heap.  All the values from the reducer are being
inserted into the map structure.  If you have lots of values for a single
key then you're going to run out of heap memory really fast.  Do you have a
rough estimate for the number of values per key?  We had this problem when
we first started using map-reduce (we'd create large arrays in the reducer
to hold data to sort).  Turns out this is generally a very bad idea (it's
particularly bad when the number of values per key is not bounded since
sometimes you're algorithm will work and other times you'll get out of
memory errors).  In our case we redesigned our algorithm to not require
holding lots of values in memory by taking advantage of Hadoop's sorting
capability and secondary sorting capability.

My guess is you won't be able to use the cloud9 mapper and reducer unless
your data changes so that the number of unique values per key is much
lower.  It's also possible that you're running out of heap space in the
mapper as your create the map there.  How many items are in the terms
array?  I

String[] terms = text.split("\\s+");

Sorry that's probably not much help to you.

~Ed

On Wed, Oct 6, 2010 at 8:04 AM, Pramy Bhats <pr...@googlemail.com>wrote:

> Hi Ed,
> I was using the following file for mapreduce job.
>
> Cloud9/src/dist/edu/umd/cloud9/example/cooccur/ComputeCooccurrenceMatrixStripes.java
> thanks,
> --Pramod
>
> On Tue, Oct 5, 2010 at 10:51 PM, ed <ha...@gmail.com> wrote:
>
> > What are the exact files you are using for the mapper and reducer from
> the
> > cloud9 package?
> >
> > On Tue, Oct 5, 2010 at 2:15 PM, Pramy Bhats <pramybhats@googlemail.com
> > >wrote:
> >
> > > Hi Ed,
> > >
> > > I was trying to benchmark some application code available online.
> > > http://github.com/lintool/Cloud9
> > >
> > > For the program computing concurrentmatrix strips. However, the code
> > itself
> > > is problematic because it throws heap-space error for even very small
> > data
> > > sets.
> > >
> > > thanks,
> > > --Pramod
> > >
> > >
> > >
> > > On Tue, Oct 5, 2010 at 5:50 PM, ed <ha...@gmail.com> wrote:
> > >
> > > > Hi Pramod,
> > > >
> > > > How much memory does each node in your cluster have?
> > > >
> > > > What type of processors do those nodes have? (dual core, quad core,
> > dual
> > > > quad core? etc..)
> > > >
> > > > In what step are you seeing the heap space error (mapper or reducer?)
> > > >
> > > > It's quite possible that you're mapper or reducer code could be
> > improved
> > > to
> > > > reduce heap space usage.
> > > >
> > > > ~Ed
> > > >
> > > > On Tue, Oct 5, 2010 at 10:05 AM, Marcos Medrado Rubinelli <
> > > > marcosm@buscape-inc.com> wrote:
> > > >
> > > > > You can set the mapred.tasktracker.map.tasks.maximum and
> > > > > mapred.tasktracker.reduce.tasks.maximum properties in your
> > > > mapred-site.xml
> > > > > file, but you may also want to check your current
> > > mapred.child.java.opts
> > > > and
> > > > > mapred.child.ulimit values to make sure they aren't overriding the
> > 4GB
> > > > you
> > > > > set globally.
> > > > >
> > > > > Cheers,
> > > > > Marcos
> > > > >
> > > > >  Hi,
> > > > >>
> > > > >> I am trying to run a job on my hadoop cluster, where I get
> > > consistently
> > > > >> get
> > > > >> heap space error.
> > > > >>
> > > > >> I increased the heap-space to 4 GB in hadoop-env.sh and reboot the
> > > > >> cluster.
> > > > >> However, I still get the heap space error.
> > > > >>
> > > > >>
> > > > >> One of things, I want to try is to reduce the number of map /
> reduce
> > > > >> process
> > > > >> per machine. Currently each machine can have 2 maps and 2 reduce
> > > process
> > > > >> running.
> > > > >>
> > > > >>
> > > > >> I want to configure the hadoop to run 1 map and 1 reduce per
> machine
> > > to
> > > > >> give
> > > > >> more heap space per process.
> > > > >>
> > > > >> How can I configure the number of maps and number of reducer per
> > node
> > > ?
> > > > >>
> > > > >>
> > > > >> thanks in advance,
> > > > >> -- Pramod
> > > > >>
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: Set number Reducer per machines.

Posted by Pramy Bhats <pr...@googlemail.com>.
Hi Ed,
I was using the following file for mapreduce job.
Cloud9/src/dist/edu/umd/cloud9/example/cooccur/ComputeCooccurrenceMatrixStripes.java
thanks,
--Pramod

On Tue, Oct 5, 2010 at 10:51 PM, ed <ha...@gmail.com> wrote:

> What are the exact files you are using for the mapper and reducer from the
> cloud9 package?
>
> On Tue, Oct 5, 2010 at 2:15 PM, Pramy Bhats <pramybhats@googlemail.com
> >wrote:
>
> > Hi Ed,
> >
> > I was trying to benchmark some application code available online.
> > http://github.com/lintool/Cloud9
> >
> > For the program computing concurrentmatrix strips. However, the code
> itself
> > is problematic because it throws heap-space error for even very small
> data
> > sets.
> >
> > thanks,
> > --Pramod
> >
> >
> >
> > On Tue, Oct 5, 2010 at 5:50 PM, ed <ha...@gmail.com> wrote:
> >
> > > Hi Pramod,
> > >
> > > How much memory does each node in your cluster have?
> > >
> > > What type of processors do those nodes have? (dual core, quad core,
> dual
> > > quad core? etc..)
> > >
> > > In what step are you seeing the heap space error (mapper or reducer?)
> > >
> > > It's quite possible that you're mapper or reducer code could be
> improved
> > to
> > > reduce heap space usage.
> > >
> > > ~Ed
> > >
> > > On Tue, Oct 5, 2010 at 10:05 AM, Marcos Medrado Rubinelli <
> > > marcosm@buscape-inc.com> wrote:
> > >
> > > > You can set the mapred.tasktracker.map.tasks.maximum and
> > > > mapred.tasktracker.reduce.tasks.maximum properties in your
> > > mapred-site.xml
> > > > file, but you may also want to check your current
> > mapred.child.java.opts
> > > and
> > > > mapred.child.ulimit values to make sure they aren't overriding the
> 4GB
> > > you
> > > > set globally.
> > > >
> > > > Cheers,
> > > > Marcos
> > > >
> > > >  Hi,
> > > >>
> > > >> I am trying to run a job on my hadoop cluster, where I get
> > consistently
> > > >> get
> > > >> heap space error.
> > > >>
> > > >> I increased the heap-space to 4 GB in hadoop-env.sh and reboot the
> > > >> cluster.
> > > >> However, I still get the heap space error.
> > > >>
> > > >>
> > > >> One of things, I want to try is to reduce the number of map / reduce
> > > >> process
> > > >> per machine. Currently each machine can have 2 maps and 2 reduce
> > process
> > > >> running.
> > > >>
> > > >>
> > > >> I want to configure the hadoop to run 1 map and 1 reduce per machine
> > to
> > > >> give
> > > >> more heap space per process.
> > > >>
> > > >> How can I configure the number of maps and number of reducer per
> node
> > ?
> > > >>
> > > >>
> > > >> thanks in advance,
> > > >> -- Pramod
> > > >>
> > > >>
> > > >
> > >
> >
>

Re: Set number Reducer per machines.

Posted by ed <ha...@gmail.com>.
What are the exact files you are using for the mapper and reducer from the
cloud9 package?

On Tue, Oct 5, 2010 at 2:15 PM, Pramy Bhats <pr...@googlemail.com>wrote:

> Hi Ed,
>
> I was trying to benchmark some application code available online.
> http://github.com/lintool/Cloud9
>
> For the program computing concurrentmatrix strips. However, the code itself
> is problematic because it throws heap-space error for even very small data
> sets.
>
> thanks,
> --Pramod
>
>
>
> On Tue, Oct 5, 2010 at 5:50 PM, ed <ha...@gmail.com> wrote:
>
> > Hi Pramod,
> >
> > How much memory does each node in your cluster have?
> >
> > What type of processors do those nodes have? (dual core, quad core, dual
> > quad core? etc..)
> >
> > In what step are you seeing the heap space error (mapper or reducer?)
> >
> > It's quite possible that you're mapper or reducer code could be improved
> to
> > reduce heap space usage.
> >
> > ~Ed
> >
> > On Tue, Oct 5, 2010 at 10:05 AM, Marcos Medrado Rubinelli <
> > marcosm@buscape-inc.com> wrote:
> >
> > > You can set the mapred.tasktracker.map.tasks.maximum and
> > > mapred.tasktracker.reduce.tasks.maximum properties in your
> > mapred-site.xml
> > > file, but you may also want to check your current
> mapred.child.java.opts
> > and
> > > mapred.child.ulimit values to make sure they aren't overriding the 4GB
> > you
> > > set globally.
> > >
> > > Cheers,
> > > Marcos
> > >
> > >  Hi,
> > >>
> > >> I am trying to run a job on my hadoop cluster, where I get
> consistently
> > >> get
> > >> heap space error.
> > >>
> > >> I increased the heap-space to 4 GB in hadoop-env.sh and reboot the
> > >> cluster.
> > >> However, I still get the heap space error.
> > >>
> > >>
> > >> One of things, I want to try is to reduce the number of map / reduce
> > >> process
> > >> per machine. Currently each machine can have 2 maps and 2 reduce
> process
> > >> running.
> > >>
> > >>
> > >> I want to configure the hadoop to run 1 map and 1 reduce per machine
> to
> > >> give
> > >> more heap space per process.
> > >>
> > >> How can I configure the number of maps and number of reducer per node
> ?
> > >>
> > >>
> > >> thanks in advance,
> > >> -- Pramod
> > >>
> > >>
> > >
> >
>

Re: Set number Reducer per machines.

Posted by Pramy Bhats <pr...@googlemail.com>.
Hi Ed,

I was trying to benchmark some application code available online.
http://github.com/lintool/Cloud9

For the program computing concurrentmatrix strips. However, the code itself
is problematic because it throws heap-space error for even very small data
sets.

thanks,
--Pramod



On Tue, Oct 5, 2010 at 5:50 PM, ed <ha...@gmail.com> wrote:

> Hi Pramod,
>
> How much memory does each node in your cluster have?
>
> What type of processors do those nodes have? (dual core, quad core, dual
> quad core? etc..)
>
> In what step are you seeing the heap space error (mapper or reducer?)
>
> It's quite possible that you're mapper or reducer code could be improved to
> reduce heap space usage.
>
> ~Ed
>
> On Tue, Oct 5, 2010 at 10:05 AM, Marcos Medrado Rubinelli <
> marcosm@buscape-inc.com> wrote:
>
> > You can set the mapred.tasktracker.map.tasks.maximum and
> > mapred.tasktracker.reduce.tasks.maximum properties in your
> mapred-site.xml
> > file, but you may also want to check your current mapred.child.java.opts
> and
> > mapred.child.ulimit values to make sure they aren't overriding the 4GB
> you
> > set globally.
> >
> > Cheers,
> > Marcos
> >
> >  Hi,
> >>
> >> I am trying to run a job on my hadoop cluster, where I get consistently
> >> get
> >> heap space error.
> >>
> >> I increased the heap-space to 4 GB in hadoop-env.sh and reboot the
> >> cluster.
> >> However, I still get the heap space error.
> >>
> >>
> >> One of things, I want to try is to reduce the number of map / reduce
> >> process
> >> per machine. Currently each machine can have 2 maps and 2 reduce process
> >> running.
> >>
> >>
> >> I want to configure the hadoop to run 1 map and 1 reduce per machine to
> >> give
> >> more heap space per process.
> >>
> >> How can I configure the number of maps and number of reducer per node ?
> >>
> >>
> >> thanks in advance,
> >> -- Pramod
> >>
> >>
> >
>

Re: Set number Reducer per machines.

Posted by ed <ha...@gmail.com>.
Hi Pramod,

How much memory does each node in your cluster have?

What type of processors do those nodes have? (dual core, quad core, dual
quad core? etc..)

In what step are you seeing the heap space error (mapper or reducer?)

It's quite possible that you're mapper or reducer code could be improved to
reduce heap space usage.

~Ed

On Tue, Oct 5, 2010 at 10:05 AM, Marcos Medrado Rubinelli <
marcosm@buscape-inc.com> wrote:

> You can set the mapred.tasktracker.map.tasks.maximum and
> mapred.tasktracker.reduce.tasks.maximum properties in your mapred-site.xml
> file, but you may also want to check your current mapred.child.java.opts and
> mapred.child.ulimit values to make sure they aren't overriding the 4GB you
> set globally.
>
> Cheers,
> Marcos
>
>  Hi,
>>
>> I am trying to run a job on my hadoop cluster, where I get consistently
>> get
>> heap space error.
>>
>> I increased the heap-space to 4 GB in hadoop-env.sh and reboot the
>> cluster.
>> However, I still get the heap space error.
>>
>>
>> One of things, I want to try is to reduce the number of map / reduce
>> process
>> per machine. Currently each machine can have 2 maps and 2 reduce process
>> running.
>>
>>
>> I want to configure the hadoop to run 1 map and 1 reduce per machine to
>> give
>> more heap space per process.
>>
>> How can I configure the number of maps and number of reducer per node ?
>>
>>
>> thanks in advance,
>> -- Pramod
>>
>>
>

Re: Set number Reducer per machines.

Posted by Marcos Medrado Rubinelli <ma...@buscape-inc.com>.
You can set the mapred.tasktracker.map.tasks.maximum and 
mapred.tasktracker.reduce.tasks.maximum properties in your 
mapred-site.xml file, but you may also want to check your current 
mapred.child.java.opts and mapred.child.ulimit values to make sure they 
aren't overriding the 4GB you set globally.

Cheers,
Marcos
> Hi,
>
> I am trying to run a job on my hadoop cluster, where I get consistently get
> heap space error.
>
> I increased the heap-space to 4 GB in hadoop-env.sh and reboot the cluster.
> However, I still get the heap space error.
>
>
> One of things, I want to try is to reduce the number of map / reduce process
> per machine. Currently each machine can have 2 maps and 2 reduce process
> running.
>
>
> I want to configure the hadoop to run 1 map and 1 reduce per machine to give
> more heap space per process.
>
> How can I configure the number of maps and number of reducer per node ?
>
>
> thanks in advance,
> -- Pramod
>