You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by John <jo...@gmail.com> on 2013/11/01 14:48:47 UTC

OutOfMemoryError in MapReduce Job

Hi,

I have a problem with the memory. My use case is the following: I've crated
a MapReduce-job and iterate in this over every row. If the row has more
than for example 10k columns I will create a bloomfilter (a bitSet) for
this row and store it in the hbase structure. This worked fine so far.

BUT, now I try to store a BitSet with 1000000000 elements = ~120mb in size.
In every map()-function there exist 2 BitSet. If i try to execute the
MR-job I got this error: http://pastebin.com/DxFYNuBG

Obviously, the tasktracker does not have enougth memory. I try to adjust
the configuration for the memory, but I'm not sure which is the right one.
I try to change the "MapReduce Child Java Maximum Heap Size" value from 1GB
to 2GB, but still got the same error.

Which parameters do I have to adjust? BTW. I'm using CDH 4.4.0 with the
Clouder Manager

kind regards

Re: OutOfMemoryError in MapReduce Job

Posted by John <jo...@gmail.com>.

Okay thank you for your help. Snappy just works fine for me.


2013/11/3 Asaf Mesika <as...@gmail.com>

> HBase will compress the entire KeyValue that's one thing.
> Second thing: If you use TableOutputFormat I believe the Put will be
> inserted into HBase in the reducer side.
> 3rd - the compression only takes place in the Flush - which means you data
> is travelling uncompressed between teh mapper, reducer and HBase WAL /
> Memstore.
>
> Compressing in Java can be done through
> http://commons.apache.org/proper/commons-compress/zip.html.
> For speed - go for Snappy - but for POC zip should do the trick.
>
>
>
> On Sat, Nov 2, 2013 at 6:46 PM, John <jo...@gmail.com> wrote:
>
> > You mean I should use the BitSet, transform it into bytes and then
> compress
> > it by my own in the map-function? Hmmm ... I could try it. What is the
> best
> > way to compress it in java?
> >
> > BTW. I'm not sure how exactly the hbase compression works. As I
> mentioned I
> > have allready enabled the LZO compression for the columnfamily. The
> > question is, where the bytes are compressed? Directly in the map-function
> > (If no, is it possible to compress it there with lzo?!) or in the region
> > server?
> >
> > kind regards
> >
> >
> > 2013/11/2 Asaf Mesika <as...@gmail.com>
> >
> > > If mean, if you take all those bytes if the bit set and zip them,
> > wouldn't
> > > you reduce it significantly? Less traffic on the wire, memory in HBase,
> > > etc.
> > >
> > > On Saturday, November 2, 2013, John wrote:
> > >
> > > > I already use LZO compression in HBase. Or do you mean a compressed
> > Java
> > > > object? Do you know an implementation?
> > > >
> > > > kind regards
> > > >
> > > >
> > > > 2013/11/2 Asaf Mesika <asaf.mesika@gmail.com <javascript:;>>
> > > >
> > > > > I would try to compress this bit set.
> > > > >
> > > > > On Nov 2, 2013, at 2:43 PM, John <johnnyenglish739@gmail.com
> > > <javascript:;>>
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > thanks for your answer! I increase the "Map Task Maximum Heap
> Size"
> > > to
> > > > > 2gb
> > > > > > and it seems to work. The OutOfMemoryEroror is gone. But the
> HBase
> > > > Region
> > > > > > server are now crashing all the time :-/ I try to store the
> > bitvector
> > > > > > (120mb in size) for some rows. This seems to be very memory
> > > intensive,
> > > > > the
> > > > > > usedHeapMB increase very fast (up to 2gb). I'm  not sure if it is
> > the
> > > > > > reading or the writing task which causes this, but I thnk its the
> > > > writing
> > > > > > task. Any idea how to minimize the memory usage? My mapper looks
> > like
> > > > > this:
> > > > > >
> > > > > > public class MyMapper extends TableMapper<ImmutableBytesWritable,
> > > Put>
> > > > {
> > > > > >
> > > > > > private void storeBitvectorToHBase(
> > > > > >        Put row = new Put(name);
> > > > > >        row.setWriteToWAL(false);
> > > > > >        row.add(cf,    Bytes.toBytes("columname"),
> > > > > toByteArray(bitvector));
> > > > > >        ImmutableBytesWritable key = new ImmutableBytesWritable(
> > > > > >                name);
> > > > > >        context.write(key, row);
> > > > > > }
> > > > > > }
> > > > > >
> > > > > >
> > > > > > kind regards
> > > > > >
> > > > > >
> > > > > > 2013/11/1 Jean-Marc Spaggiari <jean-marc@spaggiari.org
> > <javascript:;>>
> > > > > >
> > > > > >> Ho John,
> > > > > >>
> > > > > >> You might be better to ask this on the CDH mailing list since
> it's
> > > > more
> > > > > >> related to Cloudera Manager than HBase.
> > > > > >>
> > > > > >> In the meantime, can you try to update the "Map Task Maximum
> Heap
> > > > Size"
> > > > > >> parameter too?
> > > > > >>
> > > > > >> JM
> > > > > >>
> > > > > >>
> > > > > >> 2013/11/1 John <johnnyenglish739@gmail.com <javascript:;>>
> > > > > >>
> > > > > >>> Hi,
> > > > > >>>
> > > > > >>> I have a problem with the memory. My use case is the following:
> > > I've
> > > > > >> crated
> > > > > >>> a MapReduce-job and iterate in this over every row. If the row
> > has
> > > > more
> > > > > >>> than for example 10k columns I will create a bloomfilter (a
> > bitSet)
> > > > for
> > > > > >>> this row and store it in the hbase structure. This worked fine
> so
> > > > far.
> > > > > >>>
> > > > > >>> BUT, now I try to store a BitSet with 1000000000 elements =
> > ~120mb
> > > in
> > > > > >> size.
> > > > > >>> In every map()-function there exist 2 BitSet. If i try to
> execute
> > > the
> > > > > >>> MR-job I got this error: http://pastebin.com/DxFYNuBG
> > > > > >>>
> > > > > >>> Obviously, the tasktracker does not have enougth memory. I try
> to
> > > > > adjust
> > > > > >>> the configuration for the memory, but I'm not sure which is the
> > > right
> > > > > >> one.
> > > > > >>> I try to change the "MapReduce Child Java Maximum Heap Size"
> > value
> > > > from
> > > > > >> 1GB
> > > > > >>> to 2GB, but still got the same error.
> > > > > >>>
> > > > > >>> Which parameters do I have to adjust? BTW. I'm using CDH 4.4.0
> > with
> > > > the
> > > > > >>> Clouder Manager
> > > > > >>>
> > > > > >>> kind regards
> > > > > >>>
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: OutOfMemoryError in MapReduce Job

Posted by Asaf Mesika <as...@gmail.com>.

HBase will compress the entire KeyValue that's one thing.
Second thing: If you use TableOutputFormat I believe the Put will be
inserted into HBase in the reducer side.
3rd - the compression only takes place in the Flush - which means you data
is travelling uncompressed between teh mapper, reducer and HBase WAL /
Memstore.

Compressing in Java can be done through
http://commons.apache.org/proper/commons-compress/zip.html.
For speed - go for Snappy - but for POC zip should do the trick.



On Sat, Nov 2, 2013 at 6:46 PM, John <jo...@gmail.com> wrote:

> You mean I should use the BitSet, transform it into bytes and then compress
> it by my own in the map-function? Hmmm ... I could try it. What is the best
> way to compress it in java?
>
> BTW. I'm not sure how exactly the hbase compression works. As I mentioned I
> have allready enabled the LZO compression for the columnfamily. The
> question is, where the bytes are compressed? Directly in the map-function
> (If no, is it possible to compress it there with lzo?!) or in the region
> server?
>
> kind regards
>
>
> 2013/11/2 Asaf Mesika <as...@gmail.com>
>
> > If mean, if you take all those bytes if the bit set and zip them,
> wouldn't
> > you reduce it significantly? Less traffic on the wire, memory in HBase,
> > etc.
> >
> > On Saturday, November 2, 2013, John wrote:
> >
> > > I already use LZO compression in HBase. Or do you mean a compressed
> Java
> > > object? Do you know an implementation?
> > >
> > > kind regards
> > >
> > >
> > > 2013/11/2 Asaf Mesika <asaf.mesika@gmail.com <javascript:;>>
> > >
> > > > I would try to compress this bit set.
> > > >
> > > > On Nov 2, 2013, at 2:43 PM, John <johnnyenglish739@gmail.com
> > <javascript:;>>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > thanks for your answer! I increase the "Map Task Maximum Heap Size"
> > to
> > > > 2gb
> > > > > and it seems to work. The OutOfMemoryEroror is gone. But the HBase
> > > Region
> > > > > server are now crashing all the time :-/ I try to store the
> bitvector
> > > > > (120mb in size) for some rows. This seems to be very memory
> > intensive,
> > > > the
> > > > > usedHeapMB increase very fast (up to 2gb). I'm  not sure if it is
> the
> > > > > reading or the writing task which causes this, but I thnk its the
> > > writing
> > > > > task. Any idea how to minimize the memory usage? My mapper looks
> like
> > > > this:
> > > > >
> > > > > public class MyMapper extends TableMapper<ImmutableBytesWritable,
> > Put>
> > > {
> > > > >
> > > > > private void storeBitvectorToHBase(
> > > > >        Put row = new Put(name);
> > > > >        row.setWriteToWAL(false);
> > > > >        row.add(cf,    Bytes.toBytes("columname"),
> > > > toByteArray(bitvector));
> > > > >        ImmutableBytesWritable key = new ImmutableBytesWritable(
> > > > >                name);
> > > > >        context.write(key, row);
> > > > > }
> > > > > }
> > > > >
> > > > >
> > > > > kind regards
> > > > >
> > > > >
> > > > > 2013/11/1 Jean-Marc Spaggiari <jean-marc@spaggiari.org
> <javascript:;>>
> > > > >
> > > > >> Ho John,
> > > > >>
> > > > >> You might be better to ask this on the CDH mailing list since it's
> > > more
> > > > >> related to Cloudera Manager than HBase.
> > > > >>
> > > > >> In the meantime, can you try to update the "Map Task Maximum Heap
> > > Size"
> > > > >> parameter too?
> > > > >>
> > > > >> JM
> > > > >>
> > > > >>
> > > > >> 2013/11/1 John <johnnyenglish739@gmail.com <javascript:;>>
> > > > >>
> > > > >>> Hi,
> > > > >>>
> > > > >>> I have a problem with the memory. My use case is the following:
> > I've
> > > > >> crated
> > > > >>> a MapReduce-job and iterate in this over every row. If the row
> has
> > > more
> > > > >>> than for example 10k columns I will create a bloomfilter (a
> bitSet)
> > > for
> > > > >>> this row and store it in the hbase structure. This worked fine so
> > > far.
> > > > >>>
> > > > >>> BUT, now I try to store a BitSet with 1000000000 elements =
> ~120mb
> > in
> > > > >> size.
> > > > >>> In every map()-function there exist 2 BitSet. If i try to execute
> > the
> > > > >>> MR-job I got this error: http://pastebin.com/DxFYNuBG
> > > > >>>
> > > > >>> Obviously, the tasktracker does not have enougth memory. I try to
> > > > adjust
> > > > >>> the configuration for the memory, but I'm not sure which is the
> > right
> > > > >> one.
> > > > >>> I try to change the "MapReduce Child Java Maximum Heap Size"
> value
> > > from
> > > > >> 1GB
> > > > >>> to 2GB, but still got the same error.
> > > > >>>
> > > > >>> Which parameters do I have to adjust? BTW. I'm using CDH 4.4.0
> with
> > > the
> > > > >>> Clouder Manager
> > > > >>>
> > > > >>> kind regards
> > > > >>>
> > > > >>
> > > >
> > > >
> > >
> >
>

Re: OutOfMemoryError in MapReduce Job

Posted by John <jo...@gmail.com>.

@Ted: okay, thanks for the information

@Asaf: It seems to work if I compress the bytes by my self. I use snappy
for that ( https://code.google.com/p/snappy/ ). The 120mb BitSet is
compressed to a 5mb  byte array. So  far the hbase server did not crashed.
Thanks!

kind regards


2013/11/2 Ted Yu <yu...@gmail.com>

> Compression happens on server.
> See src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java (0.94)
>
> In 0.96 and beyond, see http://hbase.apache.org/book.html#rpc.configs
>
> Cheers
>
> On Sat, Nov 2, 2013 at 9:46 AM, John <jo...@gmail.com> wrote:
>
> > You mean I should use the BitSet, transform it into bytes and then
> compress
> > it by my own in the map-function? Hmmm ... I could try it. What is the
> best
> > way to compress it in java?
> >
> > BTW. I'm not sure how exactly the hbase compression works. As I
> mentioned I
> > have allready enabled the LZO compression for the columnfamily. The
> > question is, where the bytes are compressed? Directly in the map-function
> > (If no, is it possible to compress it there with lzo?!) or in the region
> > server?
> >
> > kind regards
> >
> >
> > 2013/11/2 Asaf Mesika <as...@gmail.com>
> >
> > > If mean, if you take all those bytes if the bit set and zip them,
> > wouldn't
> > > you reduce it significantly? Less traffic on the wire, memory in HBase,
> > > etc.
> > >
> > > On Saturday, November 2, 2013, John wrote:
> > >
> > > > I already use LZO compression in HBase. Or do you mean a compressed
> > Java
> > > > object? Do you know an implementation?
> > > >
> > > > kind regards
> > > >
> > > >
> > > > 2013/11/2 Asaf Mesika <asaf.mesika@gmail.com <javascript:;>>
> > > >
> > > > > I would try to compress this bit set.
> > > > >
> > > > > On Nov 2, 2013, at 2:43 PM, John <johnnyenglish739@gmail.com
> > > <javascript:;>>
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > thanks for your answer! I increase the "Map Task Maximum Heap
> Size"
> > > to
> > > > > 2gb
> > > > > > and it seems to work. The OutOfMemoryEroror is gone. But the
> HBase
> > > > Region
> > > > > > server are now crashing all the time :-/ I try to store the
> > bitvector
> > > > > > (120mb in size) for some rows. This seems to be very memory
> > > intensive,
> > > > > the
> > > > > > usedHeapMB increase very fast (up to 2gb). I'm  not sure if it is
> > the
> > > > > > reading or the writing task which causes this, but I thnk its the
> > > > writing
> > > > > > task. Any idea how to minimize the memory usage? My mapper looks
> > like
> > > > > this:
> > > > > >
> > > > > > public class MyMapper extends TableMapper<ImmutableBytesWritable,
> > > Put>
> > > > {
> > > > > >
> > > > > > private void storeBitvectorToHBase(
> > > > > >        Put row = new Put(name);
> > > > > >        row.setWriteToWAL(false);
> > > > > >        row.add(cf,    Bytes.toBytes("columname"),
> > > > > toByteArray(bitvector));
> > > > > >        ImmutableBytesWritable key = new ImmutableBytesWritable(
> > > > > >                name);
> > > > > >        context.write(key, row);
> > > > > > }
> > > > > > }
> > > > > >
> > > > > >
> > > > > > kind regards
> > > > > >
> > > > > >
> > > > > > 2013/11/1 Jean-Marc Spaggiari <jean-marc@spaggiari.org
> > <javascript:;>>
> > > > > >
> > > > > >> Ho John,
> > > > > >>
> > > > > >> You might be better to ask this on the CDH mailing list since
> it's
> > > > more
> > > > > >> related to Cloudera Manager than HBase.
> > > > > >>
> > > > > >> In the meantime, can you try to update the "Map Task Maximum
> Heap
> > > > Size"
> > > > > >> parameter too?
> > > > > >>
> > > > > >> JM
> > > > > >>
> > > > > >>
> > > > > >> 2013/11/1 John <johnnyenglish739@gmail.com <javascript:;>>
> > > > > >>
> > > > > >>> Hi,
> > > > > >>>
> > > > > >>> I have a problem with the memory. My use case is the following:
> > > I've
> > > > > >> crated
> > > > > >>> a MapReduce-job and iterate in this over every row. If the row
> > has
> > > > more
> > > > > >>> than for example 10k columns I will create a bloomfilter (a
> > bitSet)
> > > > for
> > > > > >>> this row and store it in the hbase structure. This worked fine
> so
> > > > far.
> > > > > >>>
> > > > > >>> BUT, now I try to store a BitSet with 1000000000 elements =
> > ~120mb
> > > in
> > > > > >> size.
> > > > > >>> In every map()-function there exist 2 BitSet. If i try to
> execute
> > > the
> > > > > >>> MR-job I got this error: http://pastebin.com/DxFYNuBG
> > > > > >>>
> > > > > >>> Obviously, the tasktracker does not have enougth memory. I try
> to
> > > > > adjust
> > > > > >>> the configuration for the memory, but I'm not sure which is the
> > > right
> > > > > >> one.
> > > > > >>> I try to change the "MapReduce Child Java Maximum Heap Size"
> > value
> > > > from
> > > > > >> 1GB
> > > > > >>> to 2GB, but still got the same error.
> > > > > >>>
> > > > > >>> Which parameters do I have to adjust? BTW. I'm using CDH 4.4.0
> > with
> > > > the
> > > > > >>> Clouder Manager
> > > > > >>>
> > > > > >>> kind regards
> > > > > >>>
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: OutOfMemoryError in MapReduce Job

Posted by Ted Yu <yu...@gmail.com>.

Compression happens on server.
See src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java (0.94)

In 0.96 and beyond, see http://hbase.apache.org/book.html#rpc.configs

Cheers

On Sat, Nov 2, 2013 at 9:46 AM, John <jo...@gmail.com> wrote:

> You mean I should use the BitSet, transform it into bytes and then compress
> it by my own in the map-function? Hmmm ... I could try it. What is the best
> way to compress it in java?
>
> BTW. I'm not sure how exactly the hbase compression works. As I mentioned I
> have allready enabled the LZO compression for the columnfamily. The
> question is, where the bytes are compressed? Directly in the map-function
> (If no, is it possible to compress it there with lzo?!) or in the region
> server?
>
> kind regards
>
>
> 2013/11/2 Asaf Mesika <as...@gmail.com>
>
> > If mean, if you take all those bytes if the bit set and zip them,
> wouldn't
> > you reduce it significantly? Less traffic on the wire, memory in HBase,
> > etc.
> >
> > On Saturday, November 2, 2013, John wrote:
> >
> > > I already use LZO compression in HBase. Or do you mean a compressed
> Java
> > > object? Do you know an implementation?
> > >
> > > kind regards
> > >
> > >
> > > 2013/11/2 Asaf Mesika <asaf.mesika@gmail.com <javascript:;>>
> > >
> > > > I would try to compress this bit set.
> > > >
> > > > On Nov 2, 2013, at 2:43 PM, John <johnnyenglish739@gmail.com
> > <javascript:;>>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > thanks for your answer! I increase the "Map Task Maximum Heap Size"
> > to
> > > > 2gb
> > > > > and it seems to work. The OutOfMemoryEroror is gone. But the HBase
> > > Region
> > > > > server are now crashing all the time :-/ I try to store the
> bitvector
> > > > > (120mb in size) for some rows. This seems to be very memory
> > intensive,
> > > > the
> > > > > usedHeapMB increase very fast (up to 2gb). I'm  not sure if it is
> the
> > > > > reading or the writing task which causes this, but I thnk its the
> > > writing
> > > > > task. Any idea how to minimize the memory usage? My mapper looks
> like
> > > > this:
> > > > >
> > > > > public class MyMapper extends TableMapper<ImmutableBytesWritable,
> > Put>
> > > {
> > > > >
> > > > > private void storeBitvectorToHBase(
> > > > >        Put row = new Put(name);
> > > > >        row.setWriteToWAL(false);
> > > > >        row.add(cf,    Bytes.toBytes("columname"),
> > > > toByteArray(bitvector));
> > > > >        ImmutableBytesWritable key = new ImmutableBytesWritable(
> > > > >                name);
> > > > >        context.write(key, row);
> > > > > }
> > > > > }
> > > > >
> > > > >
> > > > > kind regards
> > > > >
> > > > >
> > > > > 2013/11/1 Jean-Marc Spaggiari <jean-marc@spaggiari.org
> <javascript:;>>
> > > > >
> > > > >> Ho John,
> > > > >>
> > > > >> You might be better to ask this on the CDH mailing list since it's
> > > more
> > > > >> related to Cloudera Manager than HBase.
> > > > >>
> > > > >> In the meantime, can you try to update the "Map Task Maximum Heap
> > > Size"
> > > > >> parameter too?
> > > > >>
> > > > >> JM
> > > > >>
> > > > >>
> > > > >> 2013/11/1 John <johnnyenglish739@gmail.com <javascript:;>>
> > > > >>
> > > > >>> Hi,
> > > > >>>
> > > > >>> I have a problem with the memory. My use case is the following:
> > I've
> > > > >> crated
> > > > >>> a MapReduce-job and iterate in this over every row. If the row
> has
> > > more
> > > > >>> than for example 10k columns I will create a bloomfilter (a
> bitSet)
> > > for
> > > > >>> this row and store it in the hbase structure. This worked fine so
> > > far.
> > > > >>>
> > > > >>> BUT, now I try to store a BitSet with 1000000000 elements =
> ~120mb
> > in
> > > > >> size.
> > > > >>> In every map()-function there exist 2 BitSet. If i try to execute
> > the
> > > > >>> MR-job I got this error: http://pastebin.com/DxFYNuBG
> > > > >>>
> > > > >>> Obviously, the tasktracker does not have enougth memory. I try to
> > > > adjust
> > > > >>> the configuration for the memory, but I'm not sure which is the
> > right
> > > > >> one.
> > > > >>> I try to change the "MapReduce Child Java Maximum Heap Size"
> value
> > > from
> > > > >> 1GB
> > > > >>> to 2GB, but still got the same error.
> > > > >>>
> > > > >>> Which parameters do I have to adjust? BTW. I'm using CDH 4.4.0
> with
> > > the
> > > > >>> Clouder Manager
> > > > >>>
> > > > >>> kind regards
> > > > >>>
> > > > >>
> > > >
> > > >
> > >
> >
>

Re: OutOfMemoryError in MapReduce Job

Posted by John <jo...@gmail.com>.

You mean I should use the BitSet, transform it into bytes and then compress
it by my own in the map-function? Hmmm ... I could try it. What is the best
way to compress it in java?

BTW. I'm not sure how exactly the hbase compression works. As I mentioned I
have allready enabled the LZO compression for the columnfamily. The
question is, where the bytes are compressed? Directly in the map-function
(If no, is it possible to compress it there with lzo?!) or in the region
server?

kind regards


2013/11/2 Asaf Mesika <as...@gmail.com>

> If mean, if you take all those bytes if the bit set and zip them, wouldn't
> you reduce it significantly? Less traffic on the wire, memory in HBase,
> etc.
>
> On Saturday, November 2, 2013, John wrote:
>
> > I already use LZO compression in HBase. Or do you mean a compressed Java
> > object? Do you know an implementation?
> >
> > kind regards
> >
> >
> > 2013/11/2 Asaf Mesika <asaf.mesika@gmail.com <javascript:;>>
> >
> > > I would try to compress this bit set.
> > >
> > > On Nov 2, 2013, at 2:43 PM, John <johnnyenglish739@gmail.com
> <javascript:;>>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > thanks for your answer! I increase the "Map Task Maximum Heap Size"
> to
> > > 2gb
> > > > and it seems to work. The OutOfMemoryEroror is gone. But the HBase
> > Region
> > > > server are now crashing all the time :-/ I try to store the bitvector
> > > > (120mb in size) for some rows. This seems to be very memory
> intensive,
> > > the
> > > > usedHeapMB increase very fast (up to 2gb). I'm  not sure if it is the
> > > > reading or the writing task which causes this, but I thnk its the
> > writing
> > > > task. Any idea how to minimize the memory usage? My mapper looks like
> > > this:
> > > >
> > > > public class MyMapper extends TableMapper<ImmutableBytesWritable,
> Put>
> > {
> > > >
> > > > private void storeBitvectorToHBase(
> > > >        Put row = new Put(name);
> > > >        row.setWriteToWAL(false);
> > > >        row.add(cf,    Bytes.toBytes("columname"),
> > > toByteArray(bitvector));
> > > >        ImmutableBytesWritable key = new ImmutableBytesWritable(
> > > >                name);
> > > >        context.write(key, row);
> > > > }
> > > > }
> > > >
> > > >
> > > > kind regards
> > > >
> > > >
> > > > 2013/11/1 Jean-Marc Spaggiari <jean-marc@spaggiari.org<javascript:;>>
> > > >
> > > >> Ho John,
> > > >>
> > > >> You might be better to ask this on the CDH mailing list since it's
> > more
> > > >> related to Cloudera Manager than HBase.
> > > >>
> > > >> In the meantime, can you try to update the "Map Task Maximum Heap
> > Size"
> > > >> parameter too?
> > > >>
> > > >> JM
> > > >>
> > > >>
> > > >> 2013/11/1 John <johnnyenglish739@gmail.com <javascript:;>>
> > > >>
> > > >>> Hi,
> > > >>>
> > > >>> I have a problem with the memory. My use case is the following:
> I've
> > > >> crated
> > > >>> a MapReduce-job and iterate in this over every row. If the row has
> > more
> > > >>> than for example 10k columns I will create a bloomfilter (a bitSet)
> > for
> > > >>> this row and store it in the hbase structure. This worked fine so
> > far.
> > > >>>
> > > >>> BUT, now I try to store a BitSet with 1000000000 elements = ~120mb
> in
> > > >> size.
> > > >>> In every map()-function there exist 2 BitSet. If i try to execute
> the
> > > >>> MR-job I got this error: http://pastebin.com/DxFYNuBG
> > > >>>
> > > >>> Obviously, the tasktracker does not have enougth memory. I try to
> > > adjust
> > > >>> the configuration for the memory, but I'm not sure which is the
> right
> > > >> one.
> > > >>> I try to change the "MapReduce Child Java Maximum Heap Size" value
> > from
> > > >> 1GB
> > > >>> to 2GB, but still got the same error.
> > > >>>
> > > >>> Which parameters do I have to adjust? BTW. I'm using CDH 4.4.0 with
> > the
> > > >>> Clouder Manager
> > > >>>
> > > >>> kind regards
> > > >>>
> > > >>
> > >
> > >
> >
>

Re: OutOfMemoryError in MapReduce Job

Posted by Asaf Mesika <as...@gmail.com>.

If mean, if you take all those bytes if the bit set and zip them, wouldn't
you reduce it significantly? Less traffic on the wire, memory in HBase, etc.

On Saturday, November 2, 2013, John wrote:

> I already use LZO compression in HBase. Or do you mean a compressed Java
> object? Do you know an implementation?
>
> kind regards
>
>
> 2013/11/2 Asaf Mesika <asaf.mesika@gmail.com <javascript:;>>
>
> > I would try to compress this bit set.
> >
> > On Nov 2, 2013, at 2:43 PM, John <johnnyenglish739@gmail.com<javascript:;>>
> wrote:
> >
> > > Hi,
> > >
> > > thanks for your answer! I increase the "Map Task Maximum Heap Size" to
> > 2gb
> > > and it seems to work. The OutOfMemoryEroror is gone. But the HBase
> Region
> > > server are now crashing all the time :-/ I try to store the bitvector
> > > (120mb in size) for some rows. This seems to be very memory intensive,
> > the
> > > usedHeapMB increase very fast (up to 2gb). I'm  not sure if it is the
> > > reading or the writing task which causes this, but I thnk its the
> writing
> > > task. Any idea how to minimize the memory usage? My mapper looks like
> > this:
> > >
> > > public class MyMapper extends TableMapper<ImmutableBytesWritable, Put>
> {
> > >
> > > private void storeBitvectorToHBase(
> > >        Put row = new Put(name);
> > >        row.setWriteToWAL(false);
> > >        row.add(cf,    Bytes.toBytes("columname"),
> > toByteArray(bitvector));
> > >        ImmutableBytesWritable key = new ImmutableBytesWritable(
> > >                name);
> > >        context.write(key, row);
> > > }
> > > }
> > >
> > >
> > > kind regards
> > >
> > >
> > > 2013/11/1 Jean-Marc Spaggiari <jean-marc@spaggiari.org <javascript:;>>
> > >
> > >> Ho John,
> > >>
> > >> You might be better to ask this on the CDH mailing list since it's
> more
> > >> related to Cloudera Manager than HBase.
> > >>
> > >> In the meantime, can you try to update the "Map Task Maximum Heap
> Size"
> > >> parameter too?
> > >>
> > >> JM
> > >>
> > >>
> > >> 2013/11/1 John <johnnyenglish739@gmail.com <javascript:;>>
> > >>
> > >>> Hi,
> > >>>
> > >>> I have a problem with the memory. My use case is the following: I've
> > >> crated
> > >>> a MapReduce-job and iterate in this over every row. If the row has
> more
> > >>> than for example 10k columns I will create a bloomfilter (a bitSet)
> for
> > >>> this row and store it in the hbase structure. This worked fine so
> far.
> > >>>
> > >>> BUT, now I try to store a BitSet with 1000000000 elements = ~120mb in
> > >> size.
> > >>> In every map()-function there exist 2 BitSet. If i try to execute the
> > >>> MR-job I got this error: http://pastebin.com/DxFYNuBG
> > >>>
> > >>> Obviously, the tasktracker does not have enougth memory. I try to
> > adjust
> > >>> the configuration for the memory, but I'm not sure which is the right
> > >> one.
> > >>> I try to change the "MapReduce Child Java Maximum Heap Size" value
> from
> > >> 1GB
> > >>> to 2GB, but still got the same error.
> > >>>
> > >>> Which parameters do I have to adjust? BTW. I'm using CDH 4.4.0 with
> the
> > >>> Clouder Manager
> > >>>
> > >>> kind regards
> > >>>
> > >>
> >
> >
>

Re: OutOfMemoryError in MapReduce Job

Posted by John <jo...@gmail.com>.

I already use LZO compression in HBase. Or do you mean a compressed Java
object? Do you know an implementation?

kind regards


2013/11/2 Asaf Mesika <as...@gmail.com>

> I would try to compress this bit set.
>
> On Nov 2, 2013, at 2:43 PM, John <jo...@gmail.com> wrote:
>
> > Hi,
> >
> > thanks for your answer! I increase the "Map Task Maximum Heap Size" to
> 2gb
> > and it seems to work. The OutOfMemoryEroror is gone. But the HBase Region
> > server are now crashing all the time :-/ I try to store the bitvector
> > (120mb in size) for some rows. This seems to be very memory intensive,
> the
> > usedHeapMB increase very fast (up to 2gb). I'm  not sure if it is the
> > reading or the writing task which causes this, but I thnk its the writing
> > task. Any idea how to minimize the memory usage? My mapper looks like
> this:
> >
> > public class MyMapper extends TableMapper<ImmutableBytesWritable, Put> {
> >
> > private void storeBitvectorToHBase(
> >        Put row = new Put(name);
> >        row.setWriteToWAL(false);
> >        row.add(cf,    Bytes.toBytes("columname"),
> toByteArray(bitvector));
> >        ImmutableBytesWritable key = new ImmutableBytesWritable(
> >                name);
> >        context.write(key, row);
> > }
> > }
> >
> >
> > kind regards
> >
> >
> > 2013/11/1 Jean-Marc Spaggiari <je...@spaggiari.org>
> >
> >> Ho John,
> >>
> >> You might be better to ask this on the CDH mailing list since it's more
> >> related to Cloudera Manager than HBase.
> >>
> >> In the meantime, can you try to update the "Map Task Maximum Heap Size"
> >> parameter too?
> >>
> >> JM
> >>
> >>
> >> 2013/11/1 John <jo...@gmail.com>
> >>
> >>> Hi,
> >>>
> >>> I have a problem with the memory. My use case is the following: I've
> >> crated
> >>> a MapReduce-job and iterate in this over every row. If the row has more
> >>> than for example 10k columns I will create a bloomfilter (a bitSet) for
> >>> this row and store it in the hbase structure. This worked fine so far.
> >>>
> >>> BUT, now I try to store a BitSet with 1000000000 elements = ~120mb in
> >> size.
> >>> In every map()-function there exist 2 BitSet. If i try to execute the
> >>> MR-job I got this error: http://pastebin.com/DxFYNuBG
> >>>
> >>> Obviously, the tasktracker does not have enougth memory. I try to
> adjust
> >>> the configuration for the memory, but I'm not sure which is the right
> >> one.
> >>> I try to change the "MapReduce Child Java Maximum Heap Size" value from
> >> 1GB
> >>> to 2GB, but still got the same error.
> >>>
> >>> Which parameters do I have to adjust? BTW. I'm using CDH 4.4.0 with the
> >>> Clouder Manager
> >>>
> >>> kind regards
> >>>
> >>
>
>

Re: OutOfMemoryError in MapReduce Job

Posted by Asaf Mesika <as...@gmail.com>.

I would try to compress this bit set.

On Nov 2, 2013, at 2:43 PM, John <jo...@gmail.com> wrote:

> Hi,
> 
> thanks for your answer! I increase the "Map Task Maximum Heap Size" to 2gb
> and it seems to work. The OutOfMemoryEroror is gone. But the HBase Region
> server are now crashing all the time :-/ I try to store the bitvector
> (120mb in size) for some rows. This seems to be very memory intensive, the
> usedHeapMB increase very fast (up to 2gb). I'm  not sure if it is the
> reading or the writing task which causes this, but I thnk its the writing
> task. Any idea how to minimize the memory usage? My mapper looks like this:
> 
> public class MyMapper extends TableMapper<ImmutableBytesWritable, Put> {
> 
> private void storeBitvectorToHBase(
>        Put row = new Put(name);
>        row.setWriteToWAL(false);
>        row.add(cf,    Bytes.toBytes("columname"), toByteArray(bitvector));
>        ImmutableBytesWritable key = new ImmutableBytesWritable(
>                name);
>        context.write(key, row);
> }
> }
> 
> 
> kind regards
> 
> 
> 2013/11/1 Jean-Marc Spaggiari <je...@spaggiari.org>
> 
>> Ho John,
>> 
>> You might be better to ask this on the CDH mailing list since it's more
>> related to Cloudera Manager than HBase.
>> 
>> In the meantime, can you try to update the "Map Task Maximum Heap Size"
>> parameter too?
>> 
>> JM
>> 
>> 
>> 2013/11/1 John <jo...@gmail.com>
>> 
>>> Hi,
>>> 
>>> I have a problem with the memory. My use case is the following: I've
>> crated
>>> a MapReduce-job and iterate in this over every row. If the row has more
>>> than for example 10k columns I will create a bloomfilter (a bitSet) for
>>> this row and store it in the hbase structure. This worked fine so far.
>>> 
>>> BUT, now I try to store a BitSet with 1000000000 elements = ~120mb in
>> size.
>>> In every map()-function there exist 2 BitSet. If i try to execute the
>>> MR-job I got this error: http://pastebin.com/DxFYNuBG
>>> 
>>> Obviously, the tasktracker does not have enougth memory. I try to adjust
>>> the configuration for the memory, but I'm not sure which is the right
>> one.
>>> I try to change the "MapReduce Child Java Maximum Heap Size" value from
>> 1GB
>>> to 2GB, but still got the same error.
>>> 
>>> Which parameters do I have to adjust? BTW. I'm using CDH 4.4.0 with the
>>> Clouder Manager
>>> 
>>> kind regards
>>> 
>>

Re: OutOfMemoryError in MapReduce Job

Posted by John <jo...@gmail.com>.

Hi,

thanks for your answer! I increase the "Map Task Maximum Heap Size" to 2gb
and it seems to work. The OutOfMemoryEroror is gone. But the HBase Region
server are now crashing all the time :-/ I try to store the bitvector
(120mb in size) for some rows. This seems to be very memory intensive, the
usedHeapMB increase very fast (up to 2gb). I'm  not sure if it is the
reading or the writing task which causes this, but I thnk its the writing
task. Any idea how to minimize the memory usage? My mapper looks like this:

public class MyMapper extends TableMapper<ImmutableBytesWritable, Put> {

private void storeBitvectorToHBase(
        Put row = new Put(name);
        row.setWriteToWAL(false);
        row.add(cf,    Bytes.toBytes("columname"), toByteArray(bitvector));
        ImmutableBytesWritable key = new ImmutableBytesWritable(
                name);
        context.write(key, row);
}
}


kind regards


2013/11/1 Jean-Marc Spaggiari <je...@spaggiari.org>

> Ho John,
>
> You might be better to ask this on the CDH mailing list since it's more
> related to Cloudera Manager than HBase.
>
> In the meantime, can you try to update the "Map Task Maximum Heap Size"
> parameter too?
>
> JM
>
>
> 2013/11/1 John <jo...@gmail.com>
>
> > Hi,
> >
> > I have a problem with the memory. My use case is the following: I've
> crated
> > a MapReduce-job and iterate in this over every row. If the row has more
> > than for example 10k columns I will create a bloomfilter (a bitSet) for
> > this row and store it in the hbase structure. This worked fine so far.
> >
> > BUT, now I try to store a BitSet with 1000000000 elements = ~120mb in
> size.
> > In every map()-function there exist 2 BitSet. If i try to execute the
> > MR-job I got this error: http://pastebin.com/DxFYNuBG
> >
> > Obviously, the tasktracker does not have enougth memory. I try to adjust
> > the configuration for the memory, but I'm not sure which is the right
> one.
> > I try to change the "MapReduce Child Java Maximum Heap Size" value from
> 1GB
> > to 2GB, but still got the same error.
> >
> > Which parameters do I have to adjust? BTW. I'm using CDH 4.4.0 with the
> > Clouder Manager
> >
> > kind regards
> >
>

Re: OutOfMemoryError in MapReduce Job

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Ho John,

You might be better to ask this on the CDH mailing list since it's more
related to Cloudera Manager than HBase.

In the meantime, can you try to update the "Map Task Maximum Heap Size"
parameter too?

JM


2013/11/1 John <jo...@gmail.com>

> Hi,
>
> I have a problem with the memory. My use case is the following: I've crated
> a MapReduce-job and iterate in this over every row. If the row has more
> than for example 10k columns I will create a bloomfilter (a bitSet) for
> this row and store it in the hbase structure. This worked fine so far.
>
> BUT, now I try to store a BitSet with 1000000000 elements = ~120mb in size.
> In every map()-function there exist 2 BitSet. If i try to execute the
> MR-job I got this error: http://pastebin.com/DxFYNuBG
>
> Obviously, the tasktracker does not have enougth memory. I try to adjust
> the configuration for the memory, but I'm not sure which is the right one.
> I try to change the "MapReduce Child Java Maximum Heap Size" value from 1GB
> to 2GB, but still got the same error.
>
> Which parameters do I have to adjust? BTW. I'm using CDH 4.4.0 with the
> Clouder Manager
>
> kind regards
>