You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Li Li <fa...@gmail.com> on 2014/04/03 02:07:19 UTC

how to solve reducer memory problem?

I have a map reduce program that do some matrix operations. in the
reducer, it will average many large matrix(each matrix takes up
400+MB(said by Map output bytes). so if there 50 matrix to a reducer,
then the total memory usage is 20GB. so the reduce task got exception:

FATAL org.apache.hadoop.mapred.Child: Error running child :
java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:344)
at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:406)
at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:238)
at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:438)
at org.apache.hadoop.mapred.Merger.merge(Merger.java:142)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.createKVIterator(ReduceTask.java:2539)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$400(ReduceTask.java:661)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

one method I can come up with is use Combiner to save sums of some
matrixs and their count
but it still can solve the problem because the combiner is not fully
controled by me.

Re: how to solve reducer memory problem?

Posted by James Casaletto <jc...@maprtech.com>.

Hi Li Li
You might try to increase the number of reducers in your mapreduce job to
spread the memory requirement over a larger number of reducers.  This
increases the number of partitions (and therefore the size of each
partition that a single reducer has to work on).  The default number of
reducers is 1.

I don't know the invocation of your mapreduce job, but specifying
mapred.num.reduce.tasks could help.  Assuming your code uses ToolRunner,
you can specify 10 reducers as follows:

hadoop jar myjar my.driver -D mapred.num.reduce.tasks=10 <standard args>

hope that helps
-james



On Wed, Apr 2, 2014 at 5:07 PM, Li Li <fa...@gmail.com> wrote:

> I have a map reduce program that do some matrix operations. in the
> reducer, it will average many large matrix(each matrix takes up
> 400+MB(said by Map output bytes). so if there 50 matrix to a reducer,
> then the total memory usage is 20GB. so the reduce task got exception:
>
> FATAL org.apache.hadoop.mapred.Child: Error running child :
> java.lang.OutOfMemoryError: Java heap space
> at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:344)
> at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:406)
> at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:238)
> at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:438)
> at org.apache.hadoop.mapred.Merger.merge(Merger.java:142)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier.createKVIterator(ReduceTask.java:2539)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$400(ReduceTask.java:661)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
> one method I can come up with is use Combiner to save sums of some
> matrixs and their count
> but it still can solve the problem because the combiner is not fully
> controled by me.
>