You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Vyacheslav Zholudev <vy...@gmail.com> on 2011/10/21 16:26:11 UTC

High memory usage in Reducer

Hi,

I have a mapred job that has about 60 million input records, and groups them into 1 or 2 element unit (that is a reducer always gets 1 or 2 records with the same key). 

I have 2Gb of RAM set up for each map/reduce task and some of the reduce tasks fail with OutOfMemoryError. 
I've got a dump of one of the reduce task when it was close to OOM. It turned out that most of memory is consumed by  the org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread  class that holds  org.apache.hadoop.mapred.Merger$Segment objects that are still reachable (there are about 170 of them taking about 8Mb or retained size each).

Unfortunately, I'm not an expert in hadoop code, so I can't tell whether it's normal behavior or not. However, the common sense tells me that memory consumption is a bit too high.

Do you have any ideas/thoughts about the described issue?

Any pointers are highly appreciated

Vyacheslav

Writing an Hbase Result object out to SequenceFileOutputFormat

Posted by Aaron Baff <Aa...@telescope.tv>.

So, I'm trying to write out an Hbase Result object (same one I get from my TableMapper) to a SequenceFileOutputFormat from my Reducer as the value, but I'm getting an error when it's trying to get a serializer. It looks like the SerializationFactory can't find a Serialization (only one listed in the Job's configuration in io.serializations is WritableSerialization) that will accept the Result object. Which is funny, because from the source code of the WritableSerialization.accept(), it uses Writable.class.isAssignableFrom(c). When I run that manually and pass it Result.class, it returns true.


When I use TextOutputFormat, it outputs fine, which I'm guessing is because it just calls .toString() on the Key & Value, so doesn't need to mess with the Serialization stuff. So, any ideas or hints to try and get this working?

In case it matters, I'm on CDH3u1.

    java.lang.NullPointerException
        at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
        at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:908)
        at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:843)
        at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:393)
        at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:354)
        at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:427)
        at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:61)
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:559)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
        at org.apache.hadoop.mapred.Child.main(Child.java:264)