You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by 祝海通 <zh...@gmail.com> on 2011/01/03 10:56:25 UTC

Hbase auto table operation using script

In our HBase test for YCSB benchmark, we want to create table and drop table
with script automatically.
But I fount that in our script. /bin/Hbase shell; disable 'usertable'; drop
'usertable'.
disable 'usertable'; drop 'usertable'. can not execute.
Using the "" does not work.

haitong

Re: problem with LZO compressor on write only loads

Posted by Stack <st...@duboce.net>.

Friso:

Did you cc' Kevin?  He might have an idea?

Good on you,
St.Ack

On Mon, Jan 3, 2011 at 8:15 AM, Friso van Vollenhoven
<fv...@xebia.com> wrote:
> Hi all,
>
> I seem to run into a problem that occurs when using LZO compression on a heavy write only load. I am using 0.90 RC1 and, thus, the LZO compressor code that supports the reinit() method (from Kevin Weil's github, version 0.4.8). There are some more Hadoop LZO incarnations, so I am pointing my question to this list.
>
> It looks like the compressor uses direct byte buffers to store the original and compressed bytes in memory, so the native code can work with it without the JVM having to copy anything around. The direct buffers are possibly reused after a reinit() call, but will often be newly created in the init() method, because the existing buffer can be the wrong size for reusing. The latter case will leave the previously used buffers by the compressor instance eligible for garbage collection. I think the problem is that this collection never occurs (in time), because the GC does not consider it necessary yet. The GC does not know about the native heap and based on the state of the JVM heap, there is no reason to finalize these objects yet. However, direct byte buffers are only freed in the finalizer, so the native heap keeps growing. On write only loads, a full GC will rarely happen, because the max heap will not grow far beyond the mem stores (no block cache is used). So what happens is that the machine starts using swap before the GC will ever clean up the direct byte buffers. I am guessing that without the reinit() support, the buffers were collected earlier because the referring objects would also be collected every now and then or things would perhaps just never promote to an older generation.
>
> When I do a pmap on a running RS after it has grown to some 40Gb resident size (with a 16Gb heap), it will show a lot of near 64M anon blocks (presumably native heap). I show this before with the 0.4.6 version of Hadoop LZO, but that was under normal load. After that I went back to a HBase version that does not require the reinit(). Now I am on 0.90 with the new LZO, but never did a heavy load like this one with that, until now...
>
> Can anyone with a better understanding of the LZO code confirm that the above could be the case? If so, would it be possible to change the LZO compressor (and decompressor) to use maybe just one fixed size buffer (they all appear near 64M anyway) or possibly reuse an existing buffer also when it is not the exact required size but just large enough to make do? Having short lived direct byte buffers is apparently a discouraged practice. If anyone can provide some pointers on what to look out for, I could invest some time in creating a patch.
>
>
> Thanks,
> Friso
>
>

Re: problem with LZO compressor on write only loads

Posted by Andrey Stepachev <oc...@gmail.com>.

Yes, I tried.

2011/1/12 Sandy Pratt <pr...@adobe.com>

> I'm curious if you've tried -XX:MaxDirectMemorySize=256m (or whatever
> value).
>
> > -----Original Message-----
> > From: Andrey Stepachev [mailto:octo47@gmail.com]
> > Sent: Tuesday, January 11, 2011 12:58
> > To: user@hbase.apache.org
> > Subject: Re: problem with LZO compressor on write only loads
> >
> > Not only with LZO, but with regular gzip I got the same issue (on sun and
> > jrocket jvm). Looks like some bug for me. Don't know how to beat this
> bug.
> >
> > 2011/1/3 Friso van Vollenhoven <fv...@xebia.com>
> >
> > > Hi all,
> > >
> > > I seem to run into a problem that occurs when using LZO compression on
> > > a heavy write only load. I am using 0.90 RC1 and, thus, the LZO
> > > compressor code that supports the reinit() method (from Kevin Weil's
> > > github, version 0.4.8). There are some more Hadoop LZO incarnations,
> > > so I am pointing my question to this list.
> > >
> > > Thanks,
> > > Friso
> > >
> > >
>

RE: problem with LZO compressor on write only loads

Posted by Sandy Pratt <pr...@adobe.com>.

I'm curious if you've tried -XX:MaxDirectMemorySize=256m (or whatever value).

> -----Original Message-----
> From: Andrey Stepachev [mailto:octo47@gmail.com]
> Sent: Tuesday, January 11, 2011 12:58
> To: user@hbase.apache.org
> Subject: Re: problem with LZO compressor on write only loads
> 
> Not only with LZO, but with regular gzip I got the same issue (on sun and
> jrocket jvm). Looks like some bug for me. Don't know how to beat this bug.
> 
> 2011/1/3 Friso van Vollenhoven <fv...@xebia.com>
> 
> > Hi all,
> >
> > I seem to run into a problem that occurs when using LZO compression on
> > a heavy write only load. I am using 0.90 RC1 and, thus, the LZO
> > compressor code that supports the reinit() method (from Kevin Weil's
> > github, version 0.4.8). There are some more Hadoop LZO incarnations,
> > so I am pointing my question to this list.
> >
> > Thanks,
> > Friso
> >
> >

Re: problem with LZO compressor on write only loads

Posted by Andrey Stepachev <oc...@gmail.com>.

Not only with LZO, but with regular gzip I got the same issue (on sun and
jrocket jvm). Looks like some
bug for me. Don't know how to beat this bug.

2011/1/3 Friso van Vollenhoven <fv...@xebia.com>

> Hi all,
>
> I seem to run into a problem that occurs when using LZO compression on a
> heavy write only load. I am using 0.90 RC1 and, thus, the LZO compressor
> code that supports the reinit() method (from Kevin Weil's github, version
> 0.4.8). There are some more Hadoop LZO incarnations, so I am pointing my
> question to this list.
>
> Thanks,
> Friso
>
>

Re: problem with LZO compressor on write only loads

Posted by Friso van Vollenhoven <fv...@xebia.com>.

Hi Todd,

I am on Centos 5.5 with glibc 2.5. uname -a = Linux m1r1.inrdb.ripe.net 2.6.18-194.11.4.el5 #1 SMP Tue Sep 21 05:04:09 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux.

I will try a run with that env var in my hbase-env.sh.


Thanks,
Friso



On 3 jan 2011, at 19:18, Todd Lipcon wrote:

> Hi Friso,
> 
> Which OS are you running? Particularly, which version of glibc?
> 
> Can you try running with the environment variable MALLOC_ARENA_MAX=1 set?
> 
> Thanks
> -Todd
> 
> On Mon, Jan 3, 2011 at 8:15 AM, Friso van Vollenhoven <
> fvanvollenhoven@xebia.com> wrote:
> 
>> Hi all,
>> 
>> I seem to run into a problem that occurs when using LZO compression on a
>> heavy write only load. I am using 0.90 RC1 and, thus, the LZO compressor
>> code that supports the reinit() method (from Kevin Weil's github, version
>> 0.4.8). There are some more Hadoop LZO incarnations, so I am pointing my
>> question to this list.
>> 
>> It looks like the compressor uses direct byte buffers to store the original
>> and compressed bytes in memory, so the native code can work with it without
>> the JVM having to copy anything around. The direct buffers are possibly
>> reused after a reinit() call, but will often be newly created in the init()
>> method, because the existing buffer can be the wrong size for reusing. The
>> latter case will leave the previously used buffers by the compressor
>> instance eligible for garbage collection. I think the problem is that this
>> collection never occurs (in time), because the GC does not consider it
>> necessary yet. The GC does not know about the native heap and based on the
>> state of the JVM heap, there is no reason to finalize these objects yet.
>> However, direct byte buffers are only freed in the finalizer, so the native
>> heap keeps growing. On write only loads, a full GC will rarely happen,
>> because the max heap will not grow far beyond the mem stores (no block cache
>> is used). So what happens is that the machine starts using swap before the
>> GC will ever clean up the direct byte buffers. I am guessing that without
>> the reinit() support, the buffers were collected earlier because the
>> referring objects would also be collected every now and then or things would
>> perhaps just never promote to an older generation.
>> 
>> When I do a pmap on a running RS after it has grown to some 40Gb resident
>> size (with a 16Gb heap), it will show a lot of near 64M anon blocks
>> (presumably native heap). I show this before with the 0.4.6 version of
>> Hadoop LZO, but that was under normal load. After that I went back to a
>> HBase version that does not require the reinit(). Now I am on 0.90 with the
>> new LZO, but never did a heavy load like this one with that, until now...
>> 
>> Can anyone with a better understanding of the LZO code confirm that the
>> above could be the case? If so, would it be possible to change the LZO
>> compressor (and decompressor) to use maybe just one fixed size buffer (they
>> all appear near 64M anyway) or possibly reuse an existing buffer also when
>> it is not the exact required size but just large enough to make do? Having
>> short lived direct byte buffers is apparently a discouraged practice. If
>> anyone can provide some pointers on what to look out for, I could invest
>> some time in creating a patch.
>> 
>> 
>> Thanks,
>> Friso
>> 
>> 
> 
> 
> -- 
> Todd Lipcon
> Software Engineer, Cloudera

Re: problem with LZO compressor on write only loads

Posted by Friso van Vollenhoven <fv...@xebia.com>.

Nothing out of the ordinary. HFile blocks are default 64KB. Max file size is 1GB. Writes are without WAL. Client side write buffer is larger than default at 16MB. The mem store flush size is 128M. Compaction threshold and blocking store files are 5 and 9 respectively. All the rest is defaults.

It could be that the writes are a bit poorly distributed in the beginning of the job. The tables are created with pre-created regions, but it still took one or two splits to get it nicely distributed across all machines last time I ran it (which was on 0.89 with an old / ancient LZO version).

What I also notice is that with 0.90 HBase reports close to an order of magnitude less requests per second (in the master UI). I used to do about 300K req/s and now it rarely gets above 40K. I am guessing that potentially swapping and the OS not having any memory available to buffers isn't helping here, but it's still significant. But if it is allocating 64M blocks a lot where it shouldn't, then that explains a bit as well.


Friso



On 4 jan 2011, at 01:54, Todd Lipcon wrote:

> Fishy. Are your cells particularly large? Or have you tuned the HFile block
> size at all?
> 
> -Todd
> 
> On Mon, Jan 3, 2011 at 2:15 PM, Friso van Vollenhoven <
> fvanvollenhoven@xebia.com> wrote:
> 
>> I tried it, but it doesn't seem to help. The RS processes grow to 30Gb in
>> minutes after the job started.
>> 
>> Any ideas?
>> 
>> 
>> Friso
>> 
>> 
>> 
>> On 3 jan 2011, at 19:18, Todd Lipcon wrote:
>> 
>>> Hi Friso,
>>> 
>>> Which OS are you running? Particularly, which version of glibc?
>>> 
>>> Can you try running with the environment variable MALLOC_ARENA_MAX=1 set?
>>> 
>>> Thanks
>>> -Todd
>>> 
>>> On Mon, Jan 3, 2011 at 8:15 AM, Friso van Vollenhoven <
>>> fvanvollenhoven@xebia.com> wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I seem to run into a problem that occurs when using LZO compression on a
>>>> heavy write only load. I am using 0.90 RC1 and, thus, the LZO compressor
>>>> code that supports the reinit() method (from Kevin Weil's github,
>> version
>>>> 0.4.8). There are some more Hadoop LZO incarnations, so I am pointing my
>>>> question to this list.
>>>> 
>>>> It looks like the compressor uses direct byte buffers to store the
>> original
>>>> and compressed bytes in memory, so the native code can work with it
>> without
>>>> the JVM having to copy anything around. The direct buffers are possibly
>>>> reused after a reinit() call, but will often be newly created in the
>> init()
>>>> method, because the existing buffer can be the wrong size for reusing.
>> The
>>>> latter case will leave the previously used buffers by the compressor
>>>> instance eligible for garbage collection. I think the problem is that
>> this
>>>> collection never occurs (in time), because the GC does not consider it
>>>> necessary yet. The GC does not know about the native heap and based on
>> the
>>>> state of the JVM heap, there is no reason to finalize these objects yet.
>>>> However, direct byte buffers are only freed in the finalizer, so the
>> native
>>>> heap keeps growing. On write only loads, a full GC will rarely happen,
>>>> because the max heap will not grow far beyond the mem stores (no block
>> cache
>>>> is used). So what happens is that the machine starts using swap before
>> the
>>>> GC will ever clean up the direct byte buffers. I am guessing that
>> without
>>>> the reinit() support, the buffers were collected earlier because the
>>>> referring objects would also be collected every now and then or things
>> would
>>>> perhaps just never promote to an older generation.
>>>> 
>>>> When I do a pmap on a running RS after it has grown to some 40Gb
>> resident
>>>> size (with a 16Gb heap), it will show a lot of near 64M anon blocks
>>>> (presumably native heap). I show this before with the 0.4.6 version of
>>>> Hadoop LZO, but that was under normal load. After that I went back to a
>>>> HBase version that does not require the reinit(). Now I am on 0.90 with
>> the
>>>> new LZO, but never did a heavy load like this one with that, until
>> now...
>>>> 
>>>> Can anyone with a better understanding of the LZO code confirm that the
>>>> above could be the case? If so, would it be possible to change the LZO
>>>> compressor (and decompressor) to use maybe just one fixed size buffer
>> (they
>>>> all appear near 64M anyway) or possibly reuse an existing buffer also
>> when
>>>> it is not the exact required size but just large enough to make do?
>> Having
>>>> short lived direct byte buffers is apparently a discouraged practice. If
>>>> anyone can provide some pointers on what to look out for, I could invest
>>>> some time in creating a patch.
>>>> 
>>>> 
>>>> Thanks,
>>>> Friso
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>> 
>> 
> 
> 
> -- 
> Todd Lipcon
> Software Engineer, Cloudera

Re: problem with LZO compressor on write only loads

Posted by Friso van Vollenhoven <fv...@xebia.com>.

Hey Todd,

Just FYI, I have only tried the 0.4.8 LZO version with the G1 collector, not CMS. When I saw the problem with earlier versions I did a run with both G1 and CMS and it looked the same.

I am not sure if it makes a difference, though. My guess is that the problem occurs because the byte buffers created by the compressor objects are being reused a couple of times making them longer lived and promote out of young gen, which then keeps them from being finalized for a long time, which in turn never releases the native allocations. But this is just my hunch. I have not looked into verifying this...


Friso



On 9 jan 2011, at 03:48, Todd Lipcon wrote:

> Hey everyone,
>
> Just wanted to let you know that I will be looking into this this coming
> week - we've marked it as an important thing to investigate prior t our next
> beta release.
>
> Thanks
> -Todd
>
> On Sat, Jan 8, 2011 at 4:59 AM, Tatsuya Kawano <ta...@gmail.com>wrote:
>
>>
>> Hi Friso,
>>
>> So you found HBase 0.89 on CDH3b2 doesn't have the problem. I wonder what
>> would happen if you replace hadoop-core-*.jar in CDH3b3 with the one
>> contained in HBase 0.90RC distribution
>> (hadoop-core-0.20-append-r1056497.jar) and then rebuild hadoop-lzo against
>> it.
>>
>> Here is the comment on the LzoCompressor#reinit() method:
>>
>> -----------------------------------
>> // ... this method isn't in vanilla 0.20.2, but is in CDH3b3 and YDH
>> public void reinit(Configuration conf) {
>> -----------------------------------
>>
>>
>> https://github.com/kevinweil/hadoop-lzo/blob/6cbf4e232d7972c94107600567333a372ea08c0a/src/java/com/hadoop/compression/lzo/LzoCompressor.java#L196
>>
>>
>> I don't know if hadoop-core-0.20-append-r1056497.jar is a vanilla 0.20.2 or
>> more like CDH3b3. Maybe I'm wrong, but if it doesn't call reinit(), you'll
>> have a good chance to get a stable HBase 0.90.
>>
>> Good luck!
>>
>> Tatsuya
>>
>> --
>> Tatsuya Kawano (Mr.)
>> Tokyo, Japan
>>
>> http://twitter.com/#!/tatsuya6502
>>
>>
>>
>>
>> On 01/08/2011, at 6:33 PM, Friso van Vollenhoven wrote:
>>
>>> Hey Ryan,
>>> I went back to the older version. Problem is that going to HBase 0.90
>> requires a API change on the compressor side, which forces you to a version
>> newer than 0.4.6 or so. So I also had to go back to HBase 0.89, which is
>> again not compatible with CDH3b3, so I am back on CDH3b2 again. HBase 0.89
>> is stable for us, so this is not at all a problem. But this LZO problem is
>> really in the way of our projected upgrade path (my client would like to end
>> up with CDH3 everything in the end, because of the support options available
>> in case things go wrong and the Cloudera administration courses available
>> when new ops people are hired).
>>>
>>> Cheers,
>>> Friso
>>>
>>>
>>>
>>> On 7 jan 2011, at 22:28, Ryan Rawson wrote:
>>>
>>>> Hey,
>>>>
>>>> Here at SU we continue to use version 0.1.0 of hadoop-gpl-compression.
>>>> I know some of the newer versions had bugs which leaked
>>>> DirectByteBuffer space, which might be what you are running in to.
>>>>
>>>> Give the older version a shot, there really hasnt been much in the way
>>>> of how LZO works in a while, most of the 'extra' stuff added was to
>>>> support features hbase does not use.
>>>>
>>>> Good luck!
>>>>
>>>> -ryan
>>>>
>>>> ps: http://code.google.com/p/hadoop-gpl-compression/downloads/list
>>>>
>>>>
>>>> On Wed, Jan 5, 2011 at 10:26 PM, Friso van Vollenhoven
>>>> <fv...@xebia.com> wrote:
>>>>> Thanks Sandy.
>>>>>
>>>>> Does setting -XX:MaxDirectMemorySize help in triggering GC when you're
>> reaching that limit? Or does it just OOME before the actual RAM is exhausted
>> (then you prevent swapping, which is nicer, though)?
>>>>>
>>>>> I guess LZO is not a solution that fits all, but we do a lot of random
>> reads and latency can be an issue for us, so I suppose we have to stick with
>> it.
>>>>>
>>>>>
>>>>> Friso
>>>>>
>>>>>
>>>>>
>>>>> On 5 jan 2011, at 20:36, Sandy Pratt wrote:
>>>>>
>>>>>> I was in a similar situation recently, with similar symptoms, and I
>> experienced a crash very similar to yours.  I don't have the specifics handy
>> at the moment, but I did post to this list about it a few weeks ago.  My
>> workload is fairly write-heavy.  I write about 10-20 million smallish
>> protobuf/xml blobs per day to an HBase cluster of 12 very underpowered
>> machines.
>>>>>>
>>>>>> The suggestions I received were two: 1) update to the latest
>> hadoop-lzo and 2) specify a max direct memory size to the JVM (e.g.
>> -XX:MaxDirectMemorySize=256m).
>>>>>>
>>>>>> I took a third route - change my tables back to gz compression for the
>> time being while I figure out what to do.  Since then, my memory usage has
>> been rock steady, but more importantly my tables are roughly half the size
>> on disk that they were with LZO, and there has been no noticeable drop in
>> performance (but remember this is a write heavy workload, I'm not trying to
>> serve an online workload with low latency or anything like that).  At this
>> point, I might not return to LZO.
>>>>>>
>>>>>> In general, I'm not convinced that "use LZO" is universally good
>> advice for all HBase users.  For one thing, I think it assumes that all
>> installations are focused towards low latency, which is not always the case
>> (sometimes merely good latency is enough and great latency is not needed).
>> Secondly, it assumes some things about where the performance bottleneck
>> lives.   For example, LZO performs well in micro-benchmarks, but if you find
>> yourself in an IO-bound batch processing situation, you might be better
>> served by a higher compression ratio, even if it's more computationally
>> expensive.
>>>>>>
>>>>>> Sandy
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Friso van Vollenhoven [mailto:fvanvollenhoven@xebia.com]
>>>>>>> Sent: Tuesday, January 04, 2011 08:00
>>>>>>> To: <us...@hbase.apache.org>
>>>>>>> Subject: Re: problem with LZO compressor on write only loads
>>>>>>>
>>>>>>> I ran the job again, but with less other processes running on the
>> same
>>>>>>> machine, so with more physical memory available to HBase. This was to
>> see
>>>>>>> whether there was a point where it would stop allocating more
>> buffers.
>>>>>>> When I do this, after many hours, one of the RSes crashed with a
>> OOME. See
>>>>>>> here:
>>>>>>>
>>>>>>> 2011-01-04 11:32:01,332 FATAL
>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
>>>>>>> server serverName=w5r1.inrdb.ripe.net,60020,1294091507228,
>>>>>>> load=(requests=6246, regions=258, usedHeap=1790, maxHeap=16000):
>>>>>>> Uncaught exception in service thread regionserver60020.compactor
>>>>>>> java.lang.OutOfMemoryError: Direct buffer memory
>>>>>>>     at java.nio.Bits.reserveMemory(Bits.java:633)
>>>>>>>     at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:98)
>>>>>>>     at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
>>>>>>>     at
>>>>>>> com.hadoop.compression.lzo.LzoCompressor.init(LzoCompressor.java:248)
>>>>>>>     at
>>>>>>>
>> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:207
>>>>>>> )
>>>>>>>     at
>>>>>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>>>>>>> 105)
>>>>>>>     at
>>>>>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>>>>>>> 112)
>>>>>>>     at
>>>>>>>
>> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(C
>>>>>>> ompression.java:200)
>>>>>>>     at
>>>>>>>
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile
>>>>>>> .java:397)
>>>>>>>     at
>>>>>>>
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
>>>>>>>     at
>>>>>>>
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.ja
>>>>>>> va:354)
>>>>>>>     at
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
>>>>>>>     at
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
>>>>>>>     at
>>>>>>>
>> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.j
>>>>>>> ava:836)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:931)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:732)
>>>>>>>     at
>>>>>>>
>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav
>>>>>>> a:764)
>>>>>>>     at
>>>>>>>
>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav
>>>>>>> a:709)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSp
>>>>>>> litThread.java:81)
>>>>>>> 2011-01-04 11:32:01,369 INFO
>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
>>>>>>> request=0.0, regions=258, stores=516, storefiles=186,
>>>>>>> storefileIndexSize=179, memstoreSize=2125, compactionQueueSize=2,
>>>>>>> usedHeap=1797, maxHeap=16000, blockCacheSize=55051488,
>>>>>>> blockCacheFree=6655834912, blockCacheCount=0, blockCacheHitCount=0,
>>>>>>> blockCacheMissCount=2397107, blockCacheEvictedCount=0,
>>>>>>> blockCacheHitRatio=0, blockCacheHitCachingRatio=0
>>>>>>>
>>>>>>> I am guessing the OS won't allocate any more memory to the process.
>> As you
>>>>>>> can see, the used heap is nowhere near the max heap.
>>>>>>>
>>>>>>> Also, this happens from the compaction, it seems. I had not
>> considered those
>>>>>>> as a suspect yet. I could try running with a larger compaction
>> threshold and
>>>>>>> blocking store files. Since this is a write only load, it should not
>> be a problem.
>>>>>>> In our normal operation, compactions and splits are quite common,
>> though,
>>>>>>> because we do read-modify-write cycles a lot. Anyone else doing
>> update
>>>>>>> heavy work with LZO?
>>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Friso
>>>>>>>
>>>>>>>
>>>>>>> On 4 jan 2011, at 01:54, Todd Lipcon wrote:
>>>>>>>
>>>>>>>> Fishy. Are your cells particularly large? Or have you tuned the
>> HFile
>>>>>>>> block size at all?
>>>>>>>>
>>>>>>>> -Todd
>>>>>>>>
>>>>>>>> On Mon, Jan 3, 2011 at 2:15 PM, Friso van Vollenhoven <
>>>>>>>> fvanvollenhoven@xebia.com> wrote:
>>>>>>>>
>>>>>>>>> I tried it, but it doesn't seem to help. The RS processes grow to
>>>>>>>>> 30Gb in minutes after the job started.
>>>>>>>>>
>>>>>>>>> Any ideas?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Friso
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3 jan 2011, at 19:18, Todd Lipcon wrote:
>>>>>>>>>
>>>>>>>>>> Hi Friso,
>>>>>>>>>>
>>>>>>>>>> Which OS are you running? Particularly, which version of glibc?
>>>>>>>>>>
>>>>>>>>>> Can you try running with the environment variable
>>>>>>> MALLOC_ARENA_MAX=1 set?
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> -Todd
>>>>>>>>>>
>>>>>>>>>> On Mon, Jan 3, 2011 at 8:15 AM, Friso van Vollenhoven <
>>>>>>>>>> fvanvollenhoven@xebia.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I seem to run into a problem that occurs when using LZO
>> compression
>>>>>>>>>>> on a heavy write only load. I am using 0.90 RC1 and, thus, the
>> LZO
>>>>>>>>>>> compressor code that supports the reinit() method (from Kevin
>>>>>>>>>>> Weil's github,
>>>>>>>>> version
>>>>>>>>>>> 0.4.8). There are some more Hadoop LZO incarnations, so I am
>>>>>>>>>>> pointing my question to this list.
>>>>>>>>>>>
>>>>>>>>>>> It looks like the compressor uses direct byte buffers to store
>> the
>>>>>>>>> original
>>>>>>>>>>> and compressed bytes in memory, so the native code can work with
>> it
>>>>>>>>> without
>>>>>>>>>>> the JVM having to copy anything around. The direct buffers are
>>>>>>>>>>> possibly reused after a reinit() call, but will often be newly
>>>>>>>>>>> created in the
>>>>>>>>> init()
>>>>>>>>>>> method, because the existing buffer can be the wrong size for
>> reusing.
>>>>>>>>> The
>>>>>>>>>>> latter case will leave the previously used buffers by the
>>>>>>>>>>> compressor instance eligible for garbage collection. I think the
>>>>>>>>>>> problem is that
>>>>>>>>> this
>>>>>>>>>>> collection never occurs (in time), because the GC does not
>> consider
>>>>>>>>>>> it necessary yet. The GC does not know about the native heap and
>>>>>>>>>>> based on
>>>>>>>>> the
>>>>>>>>>>> state of the JVM heap, there is no reason to finalize these
>> objects yet.
>>>>>>>>>>> However, direct byte buffers are only freed in the finalizer, so
>>>>>>>>>>> the
>>>>>>>>> native
>>>>>>>>>>> heap keeps growing. On write only loads, a full GC will rarely
>>>>>>>>>>> happen, because the max heap will not grow far beyond the mem
>>>>>>>>>>> stores (no block
>>>>>>>>> cache
>>>>>>>>>>> is used). So what happens is that the machine starts using swap
>>>>>>>>>>> before
>>>>>>>>> the
>>>>>>>>>>> GC will ever clean up the direct byte buffers. I am guessing that
>>>>>>>>> without
>>>>>>>>>>> the reinit() support, the buffers were collected earlier because
>>>>>>>>>>> the referring objects would also be collected every now and then
>> or
>>>>>>>>>>> things
>>>>>>>>> would
>>>>>>>>>>> perhaps just never promote to an older generation.
>>>>>>>>>>>
>>>>>>>>>>> When I do a pmap on a running RS after it has grown to some 40Gb
>>>>>>>>> resident
>>>>>>>>>>> size (with a 16Gb heap), it will show a lot of near 64M anon
>> blocks
>>>>>>>>>>> (presumably native heap). I show this before with the 0.4.6
>> version
>>>>>>>>>>> of Hadoop LZO, but that was under normal load. After that I went
>>>>>>>>>>> back to a HBase version that does not require the reinit(). Now I
>>>>>>>>>>> am on 0.90 with
>>>>>>>>> the
>>>>>>>>>>> new LZO, but never did a heavy load like this one with that,
>> until
>>>>>>>>> now...
>>>>>>>>>>>
>>>>>>>>>>> Can anyone with a better understanding of the LZO code confirm
>> that
>>>>>>>>>>> the above could be the case? If so, would it be possible to
>> change
>>>>>>>>>>> the LZO compressor (and decompressor) to use maybe just one fixed
>>>>>>>>>>> size buffer
>>>>>>>>> (they
>>>>>>>>>>> all appear near 64M anyway) or possibly reuse an existing buffer
>>>>>>>>>>> also
>>>>>>>>> when
>>>>>>>>>>> it is not the exact required size but just large enough to make
>> do?
>>>>>>>>> Having
>>>>>>>>>>> short lived direct byte buffers is apparently a discouraged
>>>>>>>>>>> practice. If anyone can provide some pointers on what to look out
>>>>>>>>>>> for, I could invest some time in creating a patch.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Friso
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Todd Lipcon
>>>>>>>>>> Software Engineer, Cloudera
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Todd Lipcon
>>>>>>>> Software Engineer, Cloudera
>>>>>>
>>>>>
>>>>>
>>>
>>
>>
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera

Re: problem with LZO compressor on write only loads

Posted by Todd Lipcon <to...@cloudera.com>.

Hey everyone,

Just wanted to let you know that I will be looking into this this coming
week - we've marked it as an important thing to investigate prior t our next
beta release.

Thanks
-Todd

On Sat, Jan 8, 2011 at 4:59 AM, Tatsuya Kawano <ta...@gmail.com>wrote:

>
> Hi Friso,
>
> So you found HBase 0.89 on CDH3b2 doesn't have the problem. I wonder what
> would happen if you replace hadoop-core-*.jar in CDH3b3 with the one
> contained in HBase 0.90RC distribution
> (hadoop-core-0.20-append-r1056497.jar) and then rebuild hadoop-lzo against
> it.
>
> Here is the comment on the LzoCompressor#reinit() method:
>
> -----------------------------------
> // ... this method isn't in vanilla 0.20.2, but is in CDH3b3 and YDH
>  public void reinit(Configuration conf) {
> -----------------------------------
>
>
> https://github.com/kevinweil/hadoop-lzo/blob/6cbf4e232d7972c94107600567333a372ea08c0a/src/java/com/hadoop/compression/lzo/LzoCompressor.java#L196
>
>
> I don't know if hadoop-core-0.20-append-r1056497.jar is a vanilla 0.20.2 or
> more like CDH3b3. Maybe I'm wrong, but if it doesn't call reinit(), you'll
> have a good chance to get a stable HBase 0.90.
>
> Good luck!
>
> Tatsuya
>
> --
> Tatsuya Kawano (Mr.)
> Tokyo, Japan
>
> http://twitter.com/#!/tatsuya6502
>
>
>
>
> On 01/08/2011, at 6:33 PM, Friso van Vollenhoven wrote:
>
> > Hey Ryan,
> > I went back to the older version. Problem is that going to HBase 0.90
> requires a API change on the compressor side, which forces you to a version
> newer than 0.4.6 or so. So I also had to go back to HBase 0.89, which is
> again not compatible with CDH3b3, so I am back on CDH3b2 again. HBase 0.89
> is stable for us, so this is not at all a problem. But this LZO problem is
> really in the way of our projected upgrade path (my client would like to end
> up with CDH3 everything in the end, because of the support options available
> in case things go wrong and the Cloudera administration courses available
> when new ops people are hired).
> >
> > Cheers,
> > Friso
> >
> >
> >
> > On 7 jan 2011, at 22:28, Ryan Rawson wrote:
> >
> >> Hey,
> >>
> >> Here at SU we continue to use version 0.1.0 of hadoop-gpl-compression.
> >> I know some of the newer versions had bugs which leaked
> >> DirectByteBuffer space, which might be what you are running in to.
> >>
> >> Give the older version a shot, there really hasnt been much in the way
> >> of how LZO works in a while, most of the 'extra' stuff added was to
> >> support features hbase does not use.
> >>
> >> Good luck!
> >>
> >> -ryan
> >>
> >> ps: http://code.google.com/p/hadoop-gpl-compression/downloads/list
> >>
> >>
> >> On Wed, Jan 5, 2011 at 10:26 PM, Friso van Vollenhoven
> >> <fv...@xebia.com> wrote:
> >>> Thanks Sandy.
> >>>
> >>> Does setting -XX:MaxDirectMemorySize help in triggering GC when you're
> reaching that limit? Or does it just OOME before the actual RAM is exhausted
> (then you prevent swapping, which is nicer, though)?
> >>>
> >>> I guess LZO is not a solution that fits all, but we do a lot of random
> reads and latency can be an issue for us, so I suppose we have to stick with
> it.
> >>>
> >>>
> >>> Friso
> >>>
> >>>
> >>>
> >>> On 5 jan 2011, at 20:36, Sandy Pratt wrote:
> >>>
> >>>> I was in a similar situation recently, with similar symptoms, and I
> experienced a crash very similar to yours.  I don't have the specifics handy
> at the moment, but I did post to this list about it a few weeks ago.  My
> workload is fairly write-heavy.  I write about 10-20 million smallish
> protobuf/xml blobs per day to an HBase cluster of 12 very underpowered
> machines.
> >>>>
> >>>> The suggestions I received were two: 1) update to the latest
> hadoop-lzo and 2) specify a max direct memory size to the JVM (e.g.
> -XX:MaxDirectMemorySize=256m).
> >>>>
> >>>> I took a third route - change my tables back to gz compression for the
> time being while I figure out what to do.  Since then, my memory usage has
> been rock steady, but more importantly my tables are roughly half the size
> on disk that they were with LZO, and there has been no noticeable drop in
> performance (but remember this is a write heavy workload, I'm not trying to
> serve an online workload with low latency or anything like that).  At this
> point, I might not return to LZO.
> >>>>
> >>>> In general, I'm not convinced that "use LZO" is universally good
> advice for all HBase users.  For one thing, I think it assumes that all
> installations are focused towards low latency, which is not always the case
> (sometimes merely good latency is enough and great latency is not needed).
>  Secondly, it assumes some things about where the performance bottleneck
> lives.   For example, LZO performs well in micro-benchmarks, but if you find
> yourself in an IO-bound batch processing situation, you might be better
> served by a higher compression ratio, even if it's more computationally
> expensive.
> >>>>
> >>>> Sandy
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Friso van Vollenhoven [mailto:fvanvollenhoven@xebia.com]
> >>>>> Sent: Tuesday, January 04, 2011 08:00
> >>>>> To: <us...@hbase.apache.org>
> >>>>> Subject: Re: problem with LZO compressor on write only loads
> >>>>>
> >>>>> I ran the job again, but with less other processes running on the
> same
> >>>>> machine, so with more physical memory available to HBase. This was to
> see
> >>>>> whether there was a point where it would stop allocating more
> buffers.
> >>>>> When I do this, after many hours, one of the RSes crashed with a
> OOME. See
> >>>>> here:
> >>>>>
> >>>>> 2011-01-04 11:32:01,332 FATAL
> >>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
> >>>>> server serverName=w5r1.inrdb.ripe.net,60020,1294091507228,
> >>>>> load=(requests=6246, regions=258, usedHeap=1790, maxHeap=16000):
> >>>>> Uncaught exception in service thread regionserver60020.compactor
> >>>>> java.lang.OutOfMemoryError: Direct buffer memory
> >>>>>      at java.nio.Bits.reserveMemory(Bits.java:633)
> >>>>>      at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:98)
> >>>>>      at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
> >>>>>      at
> >>>>> com.hadoop.compression.lzo.LzoCompressor.init(LzoCompressor.java:248)
> >>>>>      at
> >>>>>
> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:207
> >>>>> )
> >>>>>      at
> >>>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
> >>>>> 105)
> >>>>>      at
> >>>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
> >>>>> 112)
> >>>>>      at
> >>>>>
> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(C
> >>>>> ompression.java:200)
> >>>>>      at
> >>>>>
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile
> >>>>> .java:397)
> >>>>>      at
> >>>>>
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
> >>>>>      at
> >>>>>
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.ja
> >>>>> va:354)
> >>>>>      at
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
> >>>>>      at
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
> >>>>>      at
> >>>>>
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.j
> >>>>> ava:836)
> >>>>>      at
> >>>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:931)
> >>>>>      at
> >>>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:732)
> >>>>>      at
> >>>>>
> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav
> >>>>> a:764)
> >>>>>      at
> >>>>>
> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav
> >>>>> a:709)
> >>>>>      at
> >>>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSp
> >>>>> litThread.java:81)
> >>>>> 2011-01-04 11:32:01,369 INFO
> >>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> >>>>> request=0.0, regions=258, stores=516, storefiles=186,
> >>>>> storefileIndexSize=179, memstoreSize=2125, compactionQueueSize=2,
> >>>>> usedHeap=1797, maxHeap=16000, blockCacheSize=55051488,
> >>>>> blockCacheFree=6655834912, blockCacheCount=0, blockCacheHitCount=0,
> >>>>> blockCacheMissCount=2397107, blockCacheEvictedCount=0,
> >>>>> blockCacheHitRatio=0, blockCacheHitCachingRatio=0
> >>>>>
> >>>>> I am guessing the OS won't allocate any more memory to the process.
> As you
> >>>>> can see, the used heap is nowhere near the max heap.
> >>>>>
> >>>>> Also, this happens from the compaction, it seems. I had not
> considered those
> >>>>> as a suspect yet. I could try running with a larger compaction
> threshold and
> >>>>> blocking store files. Since this is a write only load, it should not
> be a problem.
> >>>>> In our normal operation, compactions and splits are quite common,
> though,
> >>>>> because we do read-modify-write cycles a lot. Anyone else doing
> update
> >>>>> heavy work with LZO?
> >>>>>
> >>>>>
> >>>>> Cheers,
> >>>>> Friso
> >>>>>
> >>>>>
> >>>>> On 4 jan 2011, at 01:54, Todd Lipcon wrote:
> >>>>>
> >>>>>> Fishy. Are your cells particularly large? Or have you tuned the
> HFile
> >>>>>> block size at all?
> >>>>>>
> >>>>>> -Todd
> >>>>>>
> >>>>>> On Mon, Jan 3, 2011 at 2:15 PM, Friso van Vollenhoven <
> >>>>>> fvanvollenhoven@xebia.com> wrote:
> >>>>>>
> >>>>>>> I tried it, but it doesn't seem to help. The RS processes grow to
> >>>>>>> 30Gb in minutes after the job started.
> >>>>>>>
> >>>>>>> Any ideas?
> >>>>>>>
> >>>>>>>
> >>>>>>> Friso
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 3 jan 2011, at 19:18, Todd Lipcon wrote:
> >>>>>>>
> >>>>>>>> Hi Friso,
> >>>>>>>>
> >>>>>>>> Which OS are you running? Particularly, which version of glibc?
> >>>>>>>>
> >>>>>>>> Can you try running with the environment variable
> >>>>> MALLOC_ARENA_MAX=1 set?
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>> -Todd
> >>>>>>>>
> >>>>>>>> On Mon, Jan 3, 2011 at 8:15 AM, Friso van Vollenhoven <
> >>>>>>>> fvanvollenhoven@xebia.com> wrote:
> >>>>>>>>
> >>>>>>>>> Hi all,
> >>>>>>>>>
> >>>>>>>>> I seem to run into a problem that occurs when using LZO
> compression
> >>>>>>>>> on a heavy write only load. I am using 0.90 RC1 and, thus, the
> LZO
> >>>>>>>>> compressor code that supports the reinit() method (from Kevin
> >>>>>>>>> Weil's github,
> >>>>>>> version
> >>>>>>>>> 0.4.8). There are some more Hadoop LZO incarnations, so I am
> >>>>>>>>> pointing my question to this list.
> >>>>>>>>>
> >>>>>>>>> It looks like the compressor uses direct byte buffers to store
> the
> >>>>>>> original
> >>>>>>>>> and compressed bytes in memory, so the native code can work with
> it
> >>>>>>> without
> >>>>>>>>> the JVM having to copy anything around. The direct buffers are
> >>>>>>>>> possibly reused after a reinit() call, but will often be newly
> >>>>>>>>> created in the
> >>>>>>> init()
> >>>>>>>>> method, because the existing buffer can be the wrong size for
> reusing.
> >>>>>>> The
> >>>>>>>>> latter case will leave the previously used buffers by the
> >>>>>>>>> compressor instance eligible for garbage collection. I think the
> >>>>>>>>> problem is that
> >>>>>>> this
> >>>>>>>>> collection never occurs (in time), because the GC does not
> consider
> >>>>>>>>> it necessary yet. The GC does not know about the native heap and
> >>>>>>>>> based on
> >>>>>>> the
> >>>>>>>>> state of the JVM heap, there is no reason to finalize these
> objects yet.
> >>>>>>>>> However, direct byte buffers are only freed in the finalizer, so
> >>>>>>>>> the
> >>>>>>> native
> >>>>>>>>> heap keeps growing. On write only loads, a full GC will rarely
> >>>>>>>>> happen, because the max heap will not grow far beyond the mem
> >>>>>>>>> stores (no block
> >>>>>>> cache
> >>>>>>>>> is used). So what happens is that the machine starts using swap
> >>>>>>>>> before
> >>>>>>> the
> >>>>>>>>> GC will ever clean up the direct byte buffers. I am guessing that
> >>>>>>> without
> >>>>>>>>> the reinit() support, the buffers were collected earlier because
> >>>>>>>>> the referring objects would also be collected every now and then
> or
> >>>>>>>>> things
> >>>>>>> would
> >>>>>>>>> perhaps just never promote to an older generation.
> >>>>>>>>>
> >>>>>>>>> When I do a pmap on a running RS after it has grown to some 40Gb
> >>>>>>> resident
> >>>>>>>>> size (with a 16Gb heap), it will show a lot of near 64M anon
> blocks
> >>>>>>>>> (presumably native heap). I show this before with the 0.4.6
> version
> >>>>>>>>> of Hadoop LZO, but that was under normal load. After that I went
> >>>>>>>>> back to a HBase version that does not require the reinit(). Now I
> >>>>>>>>> am on 0.90 with
> >>>>>>> the
> >>>>>>>>> new LZO, but never did a heavy load like this one with that,
> until
> >>>>>>> now...
> >>>>>>>>>
> >>>>>>>>> Can anyone with a better understanding of the LZO code confirm
> that
> >>>>>>>>> the above could be the case? If so, would it be possible to
> change
> >>>>>>>>> the LZO compressor (and decompressor) to use maybe just one fixed
> >>>>>>>>> size buffer
> >>>>>>> (they
> >>>>>>>>> all appear near 64M anyway) or possibly reuse an existing buffer
> >>>>>>>>> also
> >>>>>>> when
> >>>>>>>>> it is not the exact required size but just large enough to make
> do?
> >>>>>>> Having
> >>>>>>>>> short lived direct byte buffers is apparently a discouraged
> >>>>>>>>> practice. If anyone can provide some pointers on what to look out
> >>>>>>>>> for, I could invest some time in creating a patch.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Friso
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Todd Lipcon
> >>>>>>>> Software Engineer, Cloudera
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Todd Lipcon
> >>>>>> Software Engineer, Cloudera
> >>>>
> >>>
> >>>
> >
>
>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: problem with LZO compressor on write only loads

Posted by Tatsuya Kawano <ta...@gmail.com>.

Hi Friso, 

So you found HBase 0.89 on CDH3b2 doesn't have the problem. I wonder what would happen if you replace hadoop-core-*.jar in CDH3b3 with the one contained in HBase 0.90RC distribution (hadoop-core-0.20-append-r1056497.jar) and then rebuild hadoop-lzo against it. 

Here is the comment on the LzoCompressor#reinit() method: 

-----------------------------------
// ... this method isn't in vanilla 0.20.2, but is in CDH3b3 and YDH
  public void reinit(Configuration conf) {
-----------------------------------

https://github.com/kevinweil/hadoop-lzo/blob/6cbf4e232d7972c94107600567333a372ea08c0a/src/java/com/hadoop/compression/lzo/LzoCompressor.java#L196


I don't know if hadoop-core-0.20-append-r1056497.jar is a vanilla 0.20.2 or more like CDH3b3. Maybe I'm wrong, but if it doesn't call reinit(), you'll have a good chance to get a stable HBase 0.90.

Good luck! 

Tatsuya 

--
Tatsuya Kawano (Mr.)
Tokyo, Japan

http://twitter.com/#!/tatsuya6502




On 01/08/2011, at 6:33 PM, Friso van Vollenhoven wrote:

> Hey Ryan,
> I went back to the older version. Problem is that going to HBase 0.90 requires a API change on the compressor side, which forces you to a version newer than 0.4.6 or so. So I also had to go back to HBase 0.89, which is again not compatible with CDH3b3, so I am back on CDH3b2 again. HBase 0.89 is stable for us, so this is not at all a problem. But this LZO problem is really in the way of our projected upgrade path (my client would like to end up with CDH3 everything in the end, because of the support options available in case things go wrong and the Cloudera administration courses available when new ops people are hired).
> 
> Cheers,
> Friso
> 
> 
> 
> On 7 jan 2011, at 22:28, Ryan Rawson wrote:
> 
>> Hey,
>> 
>> Here at SU we continue to use version 0.1.0 of hadoop-gpl-compression.
>> I know some of the newer versions had bugs which leaked
>> DirectByteBuffer space, which might be what you are running in to.
>> 
>> Give the older version a shot, there really hasnt been much in the way
>> of how LZO works in a while, most of the 'extra' stuff added was to
>> support features hbase does not use.
>> 
>> Good luck!
>> 
>> -ryan
>> 
>> ps: http://code.google.com/p/hadoop-gpl-compression/downloads/list
>> 
>> 
>> On Wed, Jan 5, 2011 at 10:26 PM, Friso van Vollenhoven
>> <fv...@xebia.com> wrote:
>>> Thanks Sandy.
>>> 
>>> Does setting -XX:MaxDirectMemorySize help in triggering GC when you're reaching that limit? Or does it just OOME before the actual RAM is exhausted (then you prevent swapping, which is nicer, though)?
>>> 
>>> I guess LZO is not a solution that fits all, but we do a lot of random reads and latency can be an issue for us, so I suppose we have to stick with it.
>>> 
>>> 
>>> Friso
>>> 
>>> 
>>> 
>>> On 5 jan 2011, at 20:36, Sandy Pratt wrote:
>>> 
>>>> I was in a similar situation recently, with similar symptoms, and I experienced a crash very similar to yours.  I don't have the specifics handy at the moment, but I did post to this list about it a few weeks ago.  My workload is fairly write-heavy.  I write about 10-20 million smallish protobuf/xml blobs per day to an HBase cluster of 12 very underpowered machines.
>>>> 
>>>> The suggestions I received were two: 1) update to the latest hadoop-lzo and 2) specify a max direct memory size to the JVM (e.g. -XX:MaxDirectMemorySize=256m).
>>>> 
>>>> I took a third route - change my tables back to gz compression for the time being while I figure out what to do.  Since then, my memory usage has been rock steady, but more importantly my tables are roughly half the size on disk that they were with LZO, and there has been no noticeable drop in performance (but remember this is a write heavy workload, I'm not trying to serve an online workload with low latency or anything like that).  At this point, I might not return to LZO.
>>>> 
>>>> In general, I'm not convinced that "use LZO" is universally good advice for all HBase users.  For one thing, I think it assumes that all installations are focused towards low latency, which is not always the case (sometimes merely good latency is enough and great latency is not needed).  Secondly, it assumes some things about where the performance bottleneck lives.   For example, LZO performs well in micro-benchmarks, but if you find yourself in an IO-bound batch processing situation, you might be better served by a higher compression ratio, even if it's more computationally expensive.
>>>> 
>>>> Sandy
>>>> 
>>>>> -----Original Message-----
>>>>> From: Friso van Vollenhoven [mailto:fvanvollenhoven@xebia.com]
>>>>> Sent: Tuesday, January 04, 2011 08:00
>>>>> To: <us...@hbase.apache.org>
>>>>> Subject: Re: problem with LZO compressor on write only loads
>>>>> 
>>>>> I ran the job again, but with less other processes running on the same
>>>>> machine, so with more physical memory available to HBase. This was to see
>>>>> whether there was a point where it would stop allocating more buffers.
>>>>> When I do this, after many hours, one of the RSes crashed with a OOME. See
>>>>> here:
>>>>> 
>>>>> 2011-01-04 11:32:01,332 FATAL
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
>>>>> server serverName=w5r1.inrdb.ripe.net,60020,1294091507228,
>>>>> load=(requests=6246, regions=258, usedHeap=1790, maxHeap=16000):
>>>>> Uncaught exception in service thread regionserver60020.compactor
>>>>> java.lang.OutOfMemoryError: Direct buffer memory
>>>>>      at java.nio.Bits.reserveMemory(Bits.java:633)
>>>>>      at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:98)
>>>>>      at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
>>>>>      at
>>>>> com.hadoop.compression.lzo.LzoCompressor.init(LzoCompressor.java:248)
>>>>>      at
>>>>> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:207
>>>>> )
>>>>>      at
>>>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>>>>> 105)
>>>>>      at
>>>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>>>>> 112)
>>>>>      at
>>>>> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(C
>>>>> ompression.java:200)
>>>>>      at
>>>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile
>>>>> .java:397)
>>>>>      at
>>>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
>>>>>      at
>>>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.ja
>>>>> va:354)
>>>>>      at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
>>>>>      at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
>>>>>      at
>>>>> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.j
>>>>> ava:836)
>>>>>      at
>>>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:931)
>>>>>      at
>>>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:732)
>>>>>      at
>>>>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav
>>>>> a:764)
>>>>>      at
>>>>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav
>>>>> a:709)
>>>>>      at
>>>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSp
>>>>> litThread.java:81)
>>>>> 2011-01-04 11:32:01,369 INFO
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
>>>>> request=0.0, regions=258, stores=516, storefiles=186,
>>>>> storefileIndexSize=179, memstoreSize=2125, compactionQueueSize=2,
>>>>> usedHeap=1797, maxHeap=16000, blockCacheSize=55051488,
>>>>> blockCacheFree=6655834912, blockCacheCount=0, blockCacheHitCount=0,
>>>>> blockCacheMissCount=2397107, blockCacheEvictedCount=0,
>>>>> blockCacheHitRatio=0, blockCacheHitCachingRatio=0
>>>>> 
>>>>> I am guessing the OS won't allocate any more memory to the process. As you
>>>>> can see, the used heap is nowhere near the max heap.
>>>>> 
>>>>> Also, this happens from the compaction, it seems. I had not considered those
>>>>> as a suspect yet. I could try running with a larger compaction threshold and
>>>>> blocking store files. Since this is a write only load, it should not be a problem.
>>>>> In our normal operation, compactions and splits are quite common, though,
>>>>> because we do read-modify-write cycles a lot. Anyone else doing update
>>>>> heavy work with LZO?
>>>>> 
>>>>> 
>>>>> Cheers,
>>>>> Friso
>>>>> 
>>>>> 
>>>>> On 4 jan 2011, at 01:54, Todd Lipcon wrote:
>>>>> 
>>>>>> Fishy. Are your cells particularly large? Or have you tuned the HFile
>>>>>> block size at all?
>>>>>> 
>>>>>> -Todd
>>>>>> 
>>>>>> On Mon, Jan 3, 2011 at 2:15 PM, Friso van Vollenhoven <
>>>>>> fvanvollenhoven@xebia.com> wrote:
>>>>>> 
>>>>>>> I tried it, but it doesn't seem to help. The RS processes grow to
>>>>>>> 30Gb in minutes after the job started.
>>>>>>> 
>>>>>>> Any ideas?
>>>>>>> 
>>>>>>> 
>>>>>>> Friso
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 3 jan 2011, at 19:18, Todd Lipcon wrote:
>>>>>>> 
>>>>>>>> Hi Friso,
>>>>>>>> 
>>>>>>>> Which OS are you running? Particularly, which version of glibc?
>>>>>>>> 
>>>>>>>> Can you try running with the environment variable
>>>>> MALLOC_ARENA_MAX=1 set?
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> -Todd
>>>>>>>> 
>>>>>>>> On Mon, Jan 3, 2011 at 8:15 AM, Friso van Vollenhoven <
>>>>>>>> fvanvollenhoven@xebia.com> wrote:
>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> 
>>>>>>>>> I seem to run into a problem that occurs when using LZO compression
>>>>>>>>> on a heavy write only load. I am using 0.90 RC1 and, thus, the LZO
>>>>>>>>> compressor code that supports the reinit() method (from Kevin
>>>>>>>>> Weil's github,
>>>>>>> version
>>>>>>>>> 0.4.8). There are some more Hadoop LZO incarnations, so I am
>>>>>>>>> pointing my question to this list.
>>>>>>>>> 
>>>>>>>>> It looks like the compressor uses direct byte buffers to store the
>>>>>>> original
>>>>>>>>> and compressed bytes in memory, so the native code can work with it
>>>>>>> without
>>>>>>>>> the JVM having to copy anything around. The direct buffers are
>>>>>>>>> possibly reused after a reinit() call, but will often be newly
>>>>>>>>> created in the
>>>>>>> init()
>>>>>>>>> method, because the existing buffer can be the wrong size for reusing.
>>>>>>> The
>>>>>>>>> latter case will leave the previously used buffers by the
>>>>>>>>> compressor instance eligible for garbage collection. I think the
>>>>>>>>> problem is that
>>>>>>> this
>>>>>>>>> collection never occurs (in time), because the GC does not consider
>>>>>>>>> it necessary yet. The GC does not know about the native heap and
>>>>>>>>> based on
>>>>>>> the
>>>>>>>>> state of the JVM heap, there is no reason to finalize these objects yet.
>>>>>>>>> However, direct byte buffers are only freed in the finalizer, so
>>>>>>>>> the
>>>>>>> native
>>>>>>>>> heap keeps growing. On write only loads, a full GC will rarely
>>>>>>>>> happen, because the max heap will not grow far beyond the mem
>>>>>>>>> stores (no block
>>>>>>> cache
>>>>>>>>> is used). So what happens is that the machine starts using swap
>>>>>>>>> before
>>>>>>> the
>>>>>>>>> GC will ever clean up the direct byte buffers. I am guessing that
>>>>>>> without
>>>>>>>>> the reinit() support, the buffers were collected earlier because
>>>>>>>>> the referring objects would also be collected every now and then or
>>>>>>>>> things
>>>>>>> would
>>>>>>>>> perhaps just never promote to an older generation.
>>>>>>>>> 
>>>>>>>>> When I do a pmap on a running RS after it has grown to some 40Gb
>>>>>>> resident
>>>>>>>>> size (with a 16Gb heap), it will show a lot of near 64M anon blocks
>>>>>>>>> (presumably native heap). I show this before with the 0.4.6 version
>>>>>>>>> of Hadoop LZO, but that was under normal load. After that I went
>>>>>>>>> back to a HBase version that does not require the reinit(). Now I
>>>>>>>>> am on 0.90 with
>>>>>>> the
>>>>>>>>> new LZO, but never did a heavy load like this one with that, until
>>>>>>> now...
>>>>>>>>> 
>>>>>>>>> Can anyone with a better understanding of the LZO code confirm that
>>>>>>>>> the above could be the case? If so, would it be possible to change
>>>>>>>>> the LZO compressor (and decompressor) to use maybe just one fixed
>>>>>>>>> size buffer
>>>>>>> (they
>>>>>>>>> all appear near 64M anyway) or possibly reuse an existing buffer
>>>>>>>>> also
>>>>>>> when
>>>>>>>>> it is not the exact required size but just large enough to make do?
>>>>>>> Having
>>>>>>>>> short lived direct byte buffers is apparently a discouraged
>>>>>>>>> practice. If anyone can provide some pointers on what to look out
>>>>>>>>> for, I could invest some time in creating a patch.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Friso
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Todd Lipcon
>>>>>>>> Software Engineer, Cloudera
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Todd Lipcon
>>>>>> Software Engineer, Cloudera
>>>> 
>>> 
>>> 
>

Re: problem with LZO compressor on write only loads

Posted by Friso van Vollenhoven <fv...@xebia.com>.

Hey Ryan,
I went back to the older version. Problem is that going to HBase 0.90 requires a API change on the compressor side, which forces you to a version newer than 0.4.6 or so. So I also had to go back to HBase 0.89, which is again not compatible with CDH3b3, so I am back on CDH3b2 again. HBase 0.89 is stable for us, so this is not at all a problem. But this LZO problem is really in the way of our projected upgrade path (my client would like to end up with CDH3 everything in the end, because of the support options available in case things go wrong and the Cloudera administration courses available when new ops people are hired).

Cheers,
Friso



On 7 jan 2011, at 22:28, Ryan Rawson wrote:

> Hey,
> 
> Here at SU we continue to use version 0.1.0 of hadoop-gpl-compression.
> I know some of the newer versions had bugs which leaked
> DirectByteBuffer space, which might be what you are running in to.
> 
> Give the older version a shot, there really hasnt been much in the way
> of how LZO works in a while, most of the 'extra' stuff added was to
> support features hbase does not use.
> 
> Good luck!
> 
> -ryan
> 
> ps: http://code.google.com/p/hadoop-gpl-compression/downloads/list
> 
> 
> On Wed, Jan 5, 2011 at 10:26 PM, Friso van Vollenhoven
> <fv...@xebia.com> wrote:
>> Thanks Sandy.
>> 
>> Does setting -XX:MaxDirectMemorySize help in triggering GC when you're reaching that limit? Or does it just OOME before the actual RAM is exhausted (then you prevent swapping, which is nicer, though)?
>> 
>> I guess LZO is not a solution that fits all, but we do a lot of random reads and latency can be an issue for us, so I suppose we have to stick with it.
>> 
>> 
>> Friso
>> 
>> 
>> 
>> On 5 jan 2011, at 20:36, Sandy Pratt wrote:
>> 
>>> I was in a similar situation recently, with similar symptoms, and I experienced a crash very similar to yours.  I don't have the specifics handy at the moment, but I did post to this list about it a few weeks ago.  My workload is fairly write-heavy.  I write about 10-20 million smallish protobuf/xml blobs per day to an HBase cluster of 12 very underpowered machines.
>>> 
>>> The suggestions I received were two: 1) update to the latest hadoop-lzo and 2) specify a max direct memory size to the JVM (e.g. -XX:MaxDirectMemorySize=256m).
>>> 
>>> I took a third route - change my tables back to gz compression for the time being while I figure out what to do.  Since then, my memory usage has been rock steady, but more importantly my tables are roughly half the size on disk that they were with LZO, and there has been no noticeable drop in performance (but remember this is a write heavy workload, I'm not trying to serve an online workload with low latency or anything like that).  At this point, I might not return to LZO.
>>> 
>>> In general, I'm not convinced that "use LZO" is universally good advice for all HBase users.  For one thing, I think it assumes that all installations are focused towards low latency, which is not always the case (sometimes merely good latency is enough and great latency is not needed).  Secondly, it assumes some things about where the performance bottleneck lives.   For example, LZO performs well in micro-benchmarks, but if you find yourself in an IO-bound batch processing situation, you might be better served by a higher compression ratio, even if it's more computationally expensive.
>>> 
>>> Sandy
>>> 
>>>> -----Original Message-----
>>>> From: Friso van Vollenhoven [mailto:fvanvollenhoven@xebia.com]
>>>> Sent: Tuesday, January 04, 2011 08:00
>>>> To: <us...@hbase.apache.org>
>>>> Subject: Re: problem with LZO compressor on write only loads
>>>> 
>>>> I ran the job again, but with less other processes running on the same
>>>> machine, so with more physical memory available to HBase. This was to see
>>>> whether there was a point where it would stop allocating more buffers.
>>>> When I do this, after many hours, one of the RSes crashed with a OOME. See
>>>> here:
>>>> 
>>>> 2011-01-04 11:32:01,332 FATAL
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
>>>> server serverName=w5r1.inrdb.ripe.net,60020,1294091507228,
>>>> load=(requests=6246, regions=258, usedHeap=1790, maxHeap=16000):
>>>> Uncaught exception in service thread regionserver60020.compactor
>>>> java.lang.OutOfMemoryError: Direct buffer memory
>>>>       at java.nio.Bits.reserveMemory(Bits.java:633)
>>>>       at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:98)
>>>>       at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
>>>>       at
>>>> com.hadoop.compression.lzo.LzoCompressor.init(LzoCompressor.java:248)
>>>>       at
>>>> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:207
>>>> )
>>>>       at
>>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>>>> 105)
>>>>       at
>>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>>>> 112)
>>>>       at
>>>> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(C
>>>> ompression.java:200)
>>>>       at
>>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile
>>>> .java:397)
>>>>       at
>>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
>>>>       at
>>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.ja
>>>> va:354)
>>>>       at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
>>>>       at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
>>>>       at
>>>> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.j
>>>> ava:836)
>>>>       at
>>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:931)
>>>>       at
>>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:732)
>>>>       at
>>>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav
>>>> a:764)
>>>>       at
>>>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav
>>>> a:709)
>>>>       at
>>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSp
>>>> litThread.java:81)
>>>> 2011-01-04 11:32:01,369 INFO
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
>>>> request=0.0, regions=258, stores=516, storefiles=186,
>>>> storefileIndexSize=179, memstoreSize=2125, compactionQueueSize=2,
>>>> usedHeap=1797, maxHeap=16000, blockCacheSize=55051488,
>>>> blockCacheFree=6655834912, blockCacheCount=0, blockCacheHitCount=0,
>>>> blockCacheMissCount=2397107, blockCacheEvictedCount=0,
>>>> blockCacheHitRatio=0, blockCacheHitCachingRatio=0
>>>> 
>>>> I am guessing the OS won't allocate any more memory to the process. As you
>>>> can see, the used heap is nowhere near the max heap.
>>>> 
>>>> Also, this happens from the compaction, it seems. I had not considered those
>>>> as a suspect yet. I could try running with a larger compaction threshold and
>>>> blocking store files. Since this is a write only load, it should not be a problem.
>>>> In our normal operation, compactions and splits are quite common, though,
>>>> because we do read-modify-write cycles a lot. Anyone else doing update
>>>> heavy work with LZO?
>>>> 
>>>> 
>>>> Cheers,
>>>> Friso
>>>> 
>>>> 
>>>> On 4 jan 2011, at 01:54, Todd Lipcon wrote:
>>>> 
>>>>> Fishy. Are your cells particularly large? Or have you tuned the HFile
>>>>> block size at all?
>>>>> 
>>>>> -Todd
>>>>> 
>>>>> On Mon, Jan 3, 2011 at 2:15 PM, Friso van Vollenhoven <
>>>>> fvanvollenhoven@xebia.com> wrote:
>>>>> 
>>>>>> I tried it, but it doesn't seem to help. The RS processes grow to
>>>>>> 30Gb in minutes after the job started.
>>>>>> 
>>>>>> Any ideas?
>>>>>> 
>>>>>> 
>>>>>> Friso
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 3 jan 2011, at 19:18, Todd Lipcon wrote:
>>>>>> 
>>>>>>> Hi Friso,
>>>>>>> 
>>>>>>> Which OS are you running? Particularly, which version of glibc?
>>>>>>> 
>>>>>>> Can you try running with the environment variable
>>>> MALLOC_ARENA_MAX=1 set?
>>>>>>> 
>>>>>>> Thanks
>>>>>>> -Todd
>>>>>>> 
>>>>>>> On Mon, Jan 3, 2011 at 8:15 AM, Friso van Vollenhoven <
>>>>>>> fvanvollenhoven@xebia.com> wrote:
>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> I seem to run into a problem that occurs when using LZO compression
>>>>>>>> on a heavy write only load. I am using 0.90 RC1 and, thus, the LZO
>>>>>>>> compressor code that supports the reinit() method (from Kevin
>>>>>>>> Weil's github,
>>>>>> version
>>>>>>>> 0.4.8). There are some more Hadoop LZO incarnations, so I am
>>>>>>>> pointing my question to this list.
>>>>>>>> 
>>>>>>>> It looks like the compressor uses direct byte buffers to store the
>>>>>> original
>>>>>>>> and compressed bytes in memory, so the native code can work with it
>>>>>> without
>>>>>>>> the JVM having to copy anything around. The direct buffers are
>>>>>>>> possibly reused after a reinit() call, but will often be newly
>>>>>>>> created in the
>>>>>> init()
>>>>>>>> method, because the existing buffer can be the wrong size for reusing.
>>>>>> The
>>>>>>>> latter case will leave the previously used buffers by the
>>>>>>>> compressor instance eligible for garbage collection. I think the
>>>>>>>> problem is that
>>>>>> this
>>>>>>>> collection never occurs (in time), because the GC does not consider
>>>>>>>> it necessary yet. The GC does not know about the native heap and
>>>>>>>> based on
>>>>>> the
>>>>>>>> state of the JVM heap, there is no reason to finalize these objects yet.
>>>>>>>> However, direct byte buffers are only freed in the finalizer, so
>>>>>>>> the
>>>>>> native
>>>>>>>> heap keeps growing. On write only loads, a full GC will rarely
>>>>>>>> happen, because the max heap will not grow far beyond the mem
>>>>>>>> stores (no block
>>>>>> cache
>>>>>>>> is used). So what happens is that the machine starts using swap
>>>>>>>> before
>>>>>> the
>>>>>>>> GC will ever clean up the direct byte buffers. I am guessing that
>>>>>> without
>>>>>>>> the reinit() support, the buffers were collected earlier because
>>>>>>>> the referring objects would also be collected every now and then or
>>>>>>>> things
>>>>>> would
>>>>>>>> perhaps just never promote to an older generation.
>>>>>>>> 
>>>>>>>> When I do a pmap on a running RS after it has grown to some 40Gb
>>>>>> resident
>>>>>>>> size (with a 16Gb heap), it will show a lot of near 64M anon blocks
>>>>>>>> (presumably native heap). I show this before with the 0.4.6 version
>>>>>>>> of Hadoop LZO, but that was under normal load. After that I went
>>>>>>>> back to a HBase version that does not require the reinit(). Now I
>>>>>>>> am on 0.90 with
>>>>>> the
>>>>>>>> new LZO, but never did a heavy load like this one with that, until
>>>>>> now...
>>>>>>>> 
>>>>>>>> Can anyone with a better understanding of the LZO code confirm that
>>>>>>>> the above could be the case? If so, would it be possible to change
>>>>>>>> the LZO compressor (and decompressor) to use maybe just one fixed
>>>>>>>> size buffer
>>>>>> (they
>>>>>>>> all appear near 64M anyway) or possibly reuse an existing buffer
>>>>>>>> also
>>>>>> when
>>>>>>>> it is not the exact required size but just large enough to make do?
>>>>>> Having
>>>>>>>> short lived direct byte buffers is apparently a discouraged
>>>>>>>> practice. If anyone can provide some pointers on what to look out
>>>>>>>> for, I could invest some time in creating a patch.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Friso
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Todd Lipcon
>>>>>>> Software Engineer, Cloudera
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Todd Lipcon
>>>>> Software Engineer, Cloudera
>>> 
>> 
>>

Re: problem with LZO compressor on write only loads

Posted by Ryan Rawson <ry...@gmail.com>.

Hey,

Here at SU we continue to use version 0.1.0 of hadoop-gpl-compression.
 I know some of the newer versions had bugs which leaked
DirectByteBuffer space, which might be what you are running in to.

Give the older version a shot, there really hasnt been much in the way
of how LZO works in a while, most of the 'extra' stuff added was to
support features hbase does not use.

Good luck!

-ryan

ps: http://code.google.com/p/hadoop-gpl-compression/downloads/list


On Wed, Jan 5, 2011 at 10:26 PM, Friso van Vollenhoven
<fv...@xebia.com> wrote:
> Thanks Sandy.
>
> Does setting -XX:MaxDirectMemorySize help in triggering GC when you're reaching that limit? Or does it just OOME before the actual RAM is exhausted (then you prevent swapping, which is nicer, though)?
>
> I guess LZO is not a solution that fits all, but we do a lot of random reads and latency can be an issue for us, so I suppose we have to stick with it.
>
>
> Friso
>
>
>
> On 5 jan 2011, at 20:36, Sandy Pratt wrote:
>
>> I was in a similar situation recently, with similar symptoms, and I experienced a crash very similar to yours.  I don't have the specifics handy at the moment, but I did post to this list about it a few weeks ago.  My workload is fairly write-heavy.  I write about 10-20 million smallish protobuf/xml blobs per day to an HBase cluster of 12 very underpowered machines.
>>
>> The suggestions I received were two: 1) update to the latest hadoop-lzo and 2) specify a max direct memory size to the JVM (e.g. -XX:MaxDirectMemorySize=256m).
>>
>> I took a third route - change my tables back to gz compression for the time being while I figure out what to do.  Since then, my memory usage has been rock steady, but more importantly my tables are roughly half the size on disk that they were with LZO, and there has been no noticeable drop in performance (but remember this is a write heavy workload, I'm not trying to serve an online workload with low latency or anything like that).  At this point, I might not return to LZO.
>>
>> In general, I'm not convinced that "use LZO" is universally good advice for all HBase users.  For one thing, I think it assumes that all installations are focused towards low latency, which is not always the case (sometimes merely good latency is enough and great latency is not needed).  Secondly, it assumes some things about where the performance bottleneck lives.   For example, LZO performs well in micro-benchmarks, but if you find yourself in an IO-bound batch processing situation, you might be better served by a higher compression ratio, even if it's more computationally expensive.
>>
>> Sandy
>>
>>> -----Original Message-----
>>> From: Friso van Vollenhoven [mailto:fvanvollenhoven@xebia.com]
>>> Sent: Tuesday, January 04, 2011 08:00
>>> To: <us...@hbase.apache.org>
>>> Subject: Re: problem with LZO compressor on write only loads
>>>
>>> I ran the job again, but with less other processes running on the same
>>> machine, so with more physical memory available to HBase. This was to see
>>> whether there was a point where it would stop allocating more buffers.
>>> When I do this, after many hours, one of the RSes crashed with a OOME. See
>>> here:
>>>
>>> 2011-01-04 11:32:01,332 FATAL
>>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
>>> server serverName=w5r1.inrdb.ripe.net,60020,1294091507228,
>>> load=(requests=6246, regions=258, usedHeap=1790, maxHeap=16000):
>>> Uncaught exception in service thread regionserver60020.compactor
>>> java.lang.OutOfMemoryError: Direct buffer memory
>>>        at java.nio.Bits.reserveMemory(Bits.java:633)
>>>        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:98)
>>>        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
>>>        at
>>> com.hadoop.compression.lzo.LzoCompressor.init(LzoCompressor.java:248)
>>>        at
>>> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:207
>>> )
>>>        at
>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>>> 105)
>>>        at
>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>>> 112)
>>>        at
>>> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(C
>>> ompression.java:200)
>>>        at
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile
>>> .java:397)
>>>        at
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
>>>        at
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.ja
>>> va:354)
>>>        at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
>>>        at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
>>>        at
>>> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.j
>>> ava:836)
>>>        at
>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:931)
>>>        at
>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:732)
>>>        at
>>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav
>>> a:764)
>>>        at
>>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav
>>> a:709)
>>>        at
>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSp
>>> litThread.java:81)
>>> 2011-01-04 11:32:01,369 INFO
>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
>>> request=0.0, regions=258, stores=516, storefiles=186,
>>> storefileIndexSize=179, memstoreSize=2125, compactionQueueSize=2,
>>> usedHeap=1797, maxHeap=16000, blockCacheSize=55051488,
>>> blockCacheFree=6655834912, blockCacheCount=0, blockCacheHitCount=0,
>>> blockCacheMissCount=2397107, blockCacheEvictedCount=0,
>>> blockCacheHitRatio=0, blockCacheHitCachingRatio=0
>>>
>>> I am guessing the OS won't allocate any more memory to the process. As you
>>> can see, the used heap is nowhere near the max heap.
>>>
>>> Also, this happens from the compaction, it seems. I had not considered those
>>> as a suspect yet. I could try running with a larger compaction threshold and
>>> blocking store files. Since this is a write only load, it should not be a problem.
>>> In our normal operation, compactions and splits are quite common, though,
>>> because we do read-modify-write cycles a lot. Anyone else doing update
>>> heavy work with LZO?
>>>
>>>
>>> Cheers,
>>> Friso
>>>
>>>
>>> On 4 jan 2011, at 01:54, Todd Lipcon wrote:
>>>
>>>> Fishy. Are your cells particularly large? Or have you tuned the HFile
>>>> block size at all?
>>>>
>>>> -Todd
>>>>
>>>> On Mon, Jan 3, 2011 at 2:15 PM, Friso van Vollenhoven <
>>>> fvanvollenhoven@xebia.com> wrote:
>>>>
>>>>> I tried it, but it doesn't seem to help. The RS processes grow to
>>>>> 30Gb in minutes after the job started.
>>>>>
>>>>> Any ideas?
>>>>>
>>>>>
>>>>> Friso
>>>>>
>>>>>
>>>>>
>>>>> On 3 jan 2011, at 19:18, Todd Lipcon wrote:
>>>>>
>>>>>> Hi Friso,
>>>>>>
>>>>>> Which OS are you running? Particularly, which version of glibc?
>>>>>>
>>>>>> Can you try running with the environment variable
>>> MALLOC_ARENA_MAX=1 set?
>>>>>>
>>>>>> Thanks
>>>>>> -Todd
>>>>>>
>>>>>> On Mon, Jan 3, 2011 at 8:15 AM, Friso van Vollenhoven <
>>>>>> fvanvollenhoven@xebia.com> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I seem to run into a problem that occurs when using LZO compression
>>>>>>> on a heavy write only load. I am using 0.90 RC1 and, thus, the LZO
>>>>>>> compressor code that supports the reinit() method (from Kevin
>>>>>>> Weil's github,
>>>>> version
>>>>>>> 0.4.8). There are some more Hadoop LZO incarnations, so I am
>>>>>>> pointing my question to this list.
>>>>>>>
>>>>>>> It looks like the compressor uses direct byte buffers to store the
>>>>> original
>>>>>>> and compressed bytes in memory, so the native code can work with it
>>>>> without
>>>>>>> the JVM having to copy anything around. The direct buffers are
>>>>>>> possibly reused after a reinit() call, but will often be newly
>>>>>>> created in the
>>>>> init()
>>>>>>> method, because the existing buffer can be the wrong size for reusing.
>>>>> The
>>>>>>> latter case will leave the previously used buffers by the
>>>>>>> compressor instance eligible for garbage collection. I think the
>>>>>>> problem is that
>>>>> this
>>>>>>> collection never occurs (in time), because the GC does not consider
>>>>>>> it necessary yet. The GC does not know about the native heap and
>>>>>>> based on
>>>>> the
>>>>>>> state of the JVM heap, there is no reason to finalize these objects yet.
>>>>>>> However, direct byte buffers are only freed in the finalizer, so
>>>>>>> the
>>>>> native
>>>>>>> heap keeps growing. On write only loads, a full GC will rarely
>>>>>>> happen, because the max heap will not grow far beyond the mem
>>>>>>> stores (no block
>>>>> cache
>>>>>>> is used). So what happens is that the machine starts using swap
>>>>>>> before
>>>>> the
>>>>>>> GC will ever clean up the direct byte buffers. I am guessing that
>>>>> without
>>>>>>> the reinit() support, the buffers were collected earlier because
>>>>>>> the referring objects would also be collected every now and then or
>>>>>>> things
>>>>> would
>>>>>>> perhaps just never promote to an older generation.
>>>>>>>
>>>>>>> When I do a pmap on a running RS after it has grown to some 40Gb
>>>>> resident
>>>>>>> size (with a 16Gb heap), it will show a lot of near 64M anon blocks
>>>>>>> (presumably native heap). I show this before with the 0.4.6 version
>>>>>>> of Hadoop LZO, but that was under normal load. After that I went
>>>>>>> back to a HBase version that does not require the reinit(). Now I
>>>>>>> am on 0.90 with
>>>>> the
>>>>>>> new LZO, but never did a heavy load like this one with that, until
>>>>> now...
>>>>>>>
>>>>>>> Can anyone with a better understanding of the LZO code confirm that
>>>>>>> the above could be the case? If so, would it be possible to change
>>>>>>> the LZO compressor (and decompressor) to use maybe just one fixed
>>>>>>> size buffer
>>>>> (they
>>>>>>> all appear near 64M anyway) or possibly reuse an existing buffer
>>>>>>> also
>>>>> when
>>>>>>> it is not the exact required size but just large enough to make do?
>>>>> Having
>>>>>>> short lived direct byte buffers is apparently a discouraged
>>>>>>> practice. If anyone can provide some pointers on what to look out
>>>>>>> for, I could invest some time in creating a patch.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Friso
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Todd Lipcon
>>>>>> Software Engineer, Cloudera
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>
>
>

Re: problem with LZO compressor on write only loads

Posted by Friso van Vollenhoven <fv...@xebia.com>.

Hey Sandy,

I tried to find the answer myself too at first. There are indeed some different answers for different JVMs. I think it is unbounded, because when not setting it, my JVM process grows to over 50GB before dying. If there is a bound, it surely is quite high then.

The problem only occurs on the later LZO versions. This was on 0.4.8. It does not occur with versions that do not have support for re-initializing compressor objects. Problem is that the latest HBase that works with this is 0.89, which kind of blocks our projected upgrade path...


Thanks,
Friso



On 6 jan 2011, at 07:38, Sandy Pratt wrote:

> Friso,
> 
> I can't answer that.  I tried to find the default for that parameter, and I thought I had an answer (unbounded), but it turned out the doc I was looking at pertained to JRockit, not Sun.  I found a post on Stack Overflow that purported to answer the question, but the source they cited was from Apache Harmony source code IIRC.  So I'm not sure if setting that parameter will help or not, because I don't even know if the value is bounded by default or not.  I set it anyway, and it certainly hasn't broken anything.
> 
> If you update to the latest LZO, I'd be curious to hear how it goes.
> 
> Sandy
> 
> 
> -----Original Message-----
> From: Friso van Vollenhoven [mailto:fvanvollenhoven@xebia.com]
> Sent: Wednesday, January 05, 2011 10:27 PM
> To: <us...@hbase.apache.org>
> Subject: Re: problem with LZO compressor on write only loads
> 
> Thanks Sandy.
> 
> Does setting -XX:MaxDirectMemorySize help in triggering GC when you're reaching that limit? Or does it just OOME before the actual RAM is exhausted (then you prevent swapping, which is nicer, though)?
> 
> I guess LZO is not a solution that fits all, but we do a lot of random reads and latency can be an issue for us, so I suppose we have to stick with it.
> 
> 
> Friso
> 
> 
> 
> On 5 jan 2011, at 20:36, Sandy Pratt wrote:
> 
>> I was in a similar situation recently, with similar symptoms, and I experienced a crash very similar to yours.  I don't have the specifics handy at the moment, but I did post to this list about it a few weeks ago.  My workload is fairly write-heavy.  I write about 10-20 million smallish protobuf/xml blobs per day to an HBase cluster of 12 very underpowered machines.
>> 
>> The suggestions I received were two: 1) update to the latest hadoop-lzo and 2) specify a max direct memory size to the JVM (e.g. -XX:MaxDirectMemorySize=256m).
>> 
>> I took a third route - change my tables back to gz compression for the time being while I figure out what to do.  Since then, my memory usage has been rock steady, but more importantly my tables are roughly half the size on disk that they were with LZO, and there has been no noticeable drop in performance (but remember this is a write heavy workload, I'm not trying to serve an online workload with low latency or anything like that).  At this point, I might not return to LZO.
>> 
>> In general, I'm not convinced that "use LZO" is universally good advice for all HBase users.  For one thing, I think it assumes that all installations are focused towards low latency, which is not always the case (sometimes merely good latency is enough and great latency is not needed).  Secondly, it assumes some things about where the performance bottleneck lives.   For example, LZO performs well in micro-benchmarks, but if you find yourself in an IO-bound batch processing situation, you might be better served by a higher compression ratio, even if it's more computationally expensive.
>> 
>> Sandy
>> 
>>> -----Original Message-----
>>> From: Friso van Vollenhoven [mailto:fvanvollenhoven@xebia.com]
>>> Sent: Tuesday, January 04, 2011 08:00
>>> To: <us...@hbase.apache.org>
>>> Subject: Re: problem with LZO compressor on write only loads
>>> 
>>> I ran the job again, but with less other processes running on the
>>> same machine, so with more physical memory available to HBase. This
>>> was to see whether there was a point where it would stop allocating more buffers.
>>> When I do this, after many hours, one of the RSes crashed with a
>>> OOME. See
>>> here:
>>> 
>>> 2011-01-04 11:32:01,332 FATAL
>>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
>>> server serverName=w5r1.inrdb.ripe.net,60020,1294091507228,
>>> load=(requests=6246, regions=258, usedHeap=1790, maxHeap=16000):
>>> Uncaught exception in service thread regionserver60020.compactor
>>> java.lang.OutOfMemoryError: Direct buffer memory
>>>       at java.nio.Bits.reserveMemory(Bits.java:633)
>>>       at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:98)
>>>       at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
>>>       at
>>> com.hadoop.compression.lzo.LzoCompressor.init(LzoCompressor.java:248)
>>>       at
>>> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:20
>>> 7
>>> )
>>>       at
>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>>> 105)
>>>       at
>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>>> 112)
>>>       at
>>> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(
>>> C
>>> ompression.java:200)
>>>       at
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HF
>>> ile
>>> .java:397)
>>>       at
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
>>>       at
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFil
>>> e.ja
>>> va:354)
>>>       at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
>>>       at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
>>>       at
>>> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFil
>>> e.j
>>> ava:836)
>>>       at
>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:931)
>>>       at
>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:732)
>>>       at
>>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.ja
>>> v
>>> a:764)
>>>       at
>>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.ja
>>> v
>>> a:709)
>>>       at
>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSp
>>> litThread.java:81)
>>> 2011-01-04 11:32:01,369 INFO
>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
>>> request=0.0, regions=258, stores=516, storefiles=186,
>>> storefileIndexSize=179, memstoreSize=2125, compactionQueueSize=2,
>>> usedHeap=1797, maxHeap=16000, blockCacheSize=55051488,
>>> blockCacheFree=6655834912, blockCacheCount=0, blockCacheHitCount=0,
>>> blockCacheMissCount=2397107, blockCacheEvictedCount=0,
>>> blockCacheHitRatio=0, blockCacheHitCachingRatio=0
>>> 
>>> I am guessing the OS won't allocate any more memory to the process.
>>> As you can see, the used heap is nowhere near the max heap.
>>> 
>>> Also, this happens from the compaction, it seems. I had not
>>> considered those as a suspect yet. I could try running with a larger
>>> compaction threshold and blocking store files. Since this is a write only load, it should not be a problem.
>>> In our normal operation, compactions and splits are quite common,
>>> though, because we do read-modify-write cycles a lot. Anyone else
>>> doing update heavy work with LZO?
>>> 
>>> 
>>> Cheers,
>>> Friso
>>> 
>>> 
>>> On 4 jan 2011, at 01:54, Todd Lipcon wrote:
>>> 
>>>> Fishy. Are your cells particularly large? Or have you tuned the
>>>> HFile block size at all?
>>>> 
>>>> -Todd
>>>> 
>>>> On Mon, Jan 3, 2011 at 2:15 PM, Friso van Vollenhoven <
>>>> fvanvollenhoven@xebia.com> wrote:
>>>> 
>>>>> I tried it, but it doesn't seem to help. The RS processes grow to
>>>>> 30Gb in minutes after the job started.
>>>>> 
>>>>> Any ideas?
>>>>> 
>>>>> 
>>>>> Friso
>>>>> 
>>>>> 
>>>>> 
>>>>> On 3 jan 2011, at 19:18, Todd Lipcon wrote:
>>>>> 
>>>>>> Hi Friso,
>>>>>> 
>>>>>> Which OS are you running? Particularly, which version of glibc?
>>>>>> 
>>>>>> Can you try running with the environment variable
>>> MALLOC_ARENA_MAX=1 set?
>>>>>> 
>>>>>> Thanks
>>>>>> -Todd
>>>>>> 
>>>>>> On Mon, Jan 3, 2011 at 8:15 AM, Friso van Vollenhoven <
>>>>>> fvanvollenhoven@xebia.com> wrote:
>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> I seem to run into a problem that occurs when using LZO
>>>>>>> compression on a heavy write only load. I am using 0.90 RC1 and,
>>>>>>> thus, the LZO compressor code that supports the reinit() method
>>>>>>> (from Kevin Weil's github,
>>>>> version
>>>>>>> 0.4.8). There are some more Hadoop LZO incarnations, so I am
>>>>>>> pointing my question to this list.
>>>>>>> 
>>>>>>> It looks like the compressor uses direct byte buffers to store
>>>>>>> the
>>>>> original
>>>>>>> and compressed bytes in memory, so the native code can work with
>>>>>>> it
>>>>> without
>>>>>>> the JVM having to copy anything around. The direct buffers are
>>>>>>> possibly reused after a reinit() call, but will often be newly
>>>>>>> created in the
>>>>> init()
>>>>>>> method, because the existing buffer can be the wrong size for reusing.
>>>>> The
>>>>>>> latter case will leave the previously used buffers by the
>>>>>>> compressor instance eligible for garbage collection. I think the
>>>>>>> problem is that
>>>>> this
>>>>>>> collection never occurs (in time), because the GC does not
>>>>>>> consider it necessary yet. The GC does not know about the native
>>>>>>> heap and based on
>>>>> the
>>>>>>> state of the JVM heap, there is no reason to finalize these objects yet.
>>>>>>> However, direct byte buffers are only freed in the finalizer, so
>>>>>>> the
>>>>> native
>>>>>>> heap keeps growing. On write only loads, a full GC will rarely
>>>>>>> happen, because the max heap will not grow far beyond the mem
>>>>>>> stores (no block
>>>>> cache
>>>>>>> is used). So what happens is that the machine starts using swap
>>>>>>> before
>>>>> the
>>>>>>> GC will ever clean up the direct byte buffers. I am guessing that
>>>>> without
>>>>>>> the reinit() support, the buffers were collected earlier because
>>>>>>> the referring objects would also be collected every now and then
>>>>>>> or things
>>>>> would
>>>>>>> perhaps just never promote to an older generation.
>>>>>>> 
>>>>>>> When I do a pmap on a running RS after it has grown to some 40Gb
>>>>> resident
>>>>>>> size (with a 16Gb heap), it will show a lot of near 64M anon
>>>>>>> blocks (presumably native heap). I show this before with the
>>>>>>> 0.4.6 version of Hadoop LZO, but that was under normal load.
>>>>>>> After that I went back to a HBase version that does not require
>>>>>>> the reinit(). Now I am on 0.90 with
>>>>> the
>>>>>>> new LZO, but never did a heavy load like this one with that,
>>>>>>> until
>>>>> now...
>>>>>>> 
>>>>>>> Can anyone with a better understanding of the LZO code confirm
>>>>>>> that the above could be the case? If so, would it be possible to
>>>>>>> change the LZO compressor (and decompressor) to use maybe just
>>>>>>> one fixed size buffer
>>>>> (they
>>>>>>> all appear near 64M anyway) or possibly reuse an existing buffer
>>>>>>> also
>>>>> when
>>>>>>> it is not the exact required size but just large enough to make do?
>>>>> Having
>>>>>>> short lived direct byte buffers is apparently a discouraged
>>>>>>> practice. If anyone can provide some pointers on what to look out
>>>>>>> for, I could invest some time in creating a patch.
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Friso
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Todd Lipcon
>>>>>> Software Engineer, Cloudera
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>> 
>

RE: problem with LZO compressor on write only loads

Posted by Sandy Pratt <pr...@adobe.com>.

Friso,

I can't answer that.  I tried to find the default for that parameter, and I thought I had an answer (unbounded), but it turned out the doc I was looking at pertained to JRockit, not Sun.  I found a post on Stack Overflow that purported to answer the question, but the source they cited was from Apache Harmony source code IIRC.  So I'm not sure if setting that parameter will help or not, because I don't even know if the value is bounded by default or not.  I set it anyway, and it certainly hasn't broken anything.

If you update to the latest LZO, I'd be curious to hear how it goes.

Sandy


-----Original Message-----
From: Friso van Vollenhoven [mailto:fvanvollenhoven@xebia.com] 
Sent: Wednesday, January 05, 2011 10:27 PM
To: <us...@hbase.apache.org>
Subject: Re: problem with LZO compressor on write only loads

Thanks Sandy.

Does setting -XX:MaxDirectMemorySize help in triggering GC when you're reaching that limit? Or does it just OOME before the actual RAM is exhausted (then you prevent swapping, which is nicer, though)?

I guess LZO is not a solution that fits all, but we do a lot of random reads and latency can be an issue for us, so I suppose we have to stick with it.


Friso



On 5 jan 2011, at 20:36, Sandy Pratt wrote:

> I was in a similar situation recently, with similar symptoms, and I experienced a crash very similar to yours.  I don't have the specifics handy at the moment, but I did post to this list about it a few weeks ago.  My workload is fairly write-heavy.  I write about 10-20 million smallish protobuf/xml blobs per day to an HBase cluster of 12 very underpowered machines.
> 
> The suggestions I received were two: 1) update to the latest hadoop-lzo and 2) specify a max direct memory size to the JVM (e.g. -XX:MaxDirectMemorySize=256m).
> 
> I took a third route - change my tables back to gz compression for the time being while I figure out what to do.  Since then, my memory usage has been rock steady, but more importantly my tables are roughly half the size on disk that they were with LZO, and there has been no noticeable drop in performance (but remember this is a write heavy workload, I'm not trying to serve an online workload with low latency or anything like that).  At this point, I might not return to LZO.
> 
> In general, I'm not convinced that "use LZO" is universally good advice for all HBase users.  For one thing, I think it assumes that all installations are focused towards low latency, which is not always the case (sometimes merely good latency is enough and great latency is not needed).  Secondly, it assumes some things about where the performance bottleneck lives.   For example, LZO performs well in micro-benchmarks, but if you find yourself in an IO-bound batch processing situation, you might be better served by a higher compression ratio, even if it's more computationally expensive.
> 
> Sandy
> 
>> -----Original Message-----
>> From: Friso van Vollenhoven [mailto:fvanvollenhoven@xebia.com]
>> Sent: Tuesday, January 04, 2011 08:00
>> To: <us...@hbase.apache.org>
>> Subject: Re: problem with LZO compressor on write only loads
>> 
>> I ran the job again, but with less other processes running on the 
>> same machine, so with more physical memory available to HBase. This 
>> was to see whether there was a point where it would stop allocating more buffers.
>> When I do this, after many hours, one of the RSes crashed with a 
>> OOME. See
>> here:
>> 
>> 2011-01-04 11:32:01,332 FATAL
>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region 
>> server serverName=w5r1.inrdb.ripe.net,60020,1294091507228,
>> load=(requests=6246, regions=258, usedHeap=1790, maxHeap=16000):
>> Uncaught exception in service thread regionserver60020.compactor
>> java.lang.OutOfMemoryError: Direct buffer memory
>>        at java.nio.Bits.reserveMemory(Bits.java:633)
>>        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:98)
>>        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
>>        at
>> com.hadoop.compression.lzo.LzoCompressor.init(LzoCompressor.java:248)
>>        at
>> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:20
>> 7
>> )
>>        at
>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>> 105)
>>        at
>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>> 112)
>>        at
>> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(
>> C
>> ompression.java:200)
>>        at
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HF
>> ile
>> .java:397)
>>        at
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
>>        at
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFil
>> e.ja
>> va:354)
>>        at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
>>        at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
>>        at
>> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFil
>> e.j
>> ava:836)
>>        at
>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:931)
>>        at
>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:732)
>>        at
>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.ja
>> v
>> a:764)
>>        at
>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.ja
>> v
>> a:709)
>>        at
>> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSp
>> litThread.java:81)
>> 2011-01-04 11:32:01,369 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
>> request=0.0, regions=258, stores=516, storefiles=186, 
>> storefileIndexSize=179, memstoreSize=2125, compactionQueueSize=2, 
>> usedHeap=1797, maxHeap=16000, blockCacheSize=55051488, 
>> blockCacheFree=6655834912, blockCacheCount=0, blockCacheHitCount=0, 
>> blockCacheMissCount=2397107, blockCacheEvictedCount=0, 
>> blockCacheHitRatio=0, blockCacheHitCachingRatio=0
>> 
>> I am guessing the OS won't allocate any more memory to the process. 
>> As you can see, the used heap is nowhere near the max heap.
>> 
>> Also, this happens from the compaction, it seems. I had not 
>> considered those as a suspect yet. I could try running with a larger 
>> compaction threshold and blocking store files. Since this is a write only load, it should not be a problem.
>> In our normal operation, compactions and splits are quite common, 
>> though, because we do read-modify-write cycles a lot. Anyone else 
>> doing update heavy work with LZO?
>> 
>> 
>> Cheers,
>> Friso
>> 
>> 
>> On 4 jan 2011, at 01:54, Todd Lipcon wrote:
>> 
>>> Fishy. Are your cells particularly large? Or have you tuned the 
>>> HFile block size at all?
>>> 
>>> -Todd
>>> 
>>> On Mon, Jan 3, 2011 at 2:15 PM, Friso van Vollenhoven < 
>>> fvanvollenhoven@xebia.com> wrote:
>>> 
>>>> I tried it, but it doesn't seem to help. The RS processes grow to 
>>>> 30Gb in minutes after the job started.
>>>> 
>>>> Any ideas?
>>>> 
>>>> 
>>>> Friso
>>>> 
>>>> 
>>>> 
>>>> On 3 jan 2011, at 19:18, Todd Lipcon wrote:
>>>> 
>>>>> Hi Friso,
>>>>> 
>>>>> Which OS are you running? Particularly, which version of glibc?
>>>>> 
>>>>> Can you try running with the environment variable
>> MALLOC_ARENA_MAX=1 set?
>>>>> 
>>>>> Thanks
>>>>> -Todd
>>>>> 
>>>>> On Mon, Jan 3, 2011 at 8:15 AM, Friso van Vollenhoven < 
>>>>> fvanvollenhoven@xebia.com> wrote:
>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> I seem to run into a problem that occurs when using LZO 
>>>>>> compression on a heavy write only load. I am using 0.90 RC1 and, 
>>>>>> thus, the LZO compressor code that supports the reinit() method 
>>>>>> (from Kevin Weil's github,
>>>> version
>>>>>> 0.4.8). There are some more Hadoop LZO incarnations, so I am 
>>>>>> pointing my question to this list.
>>>>>> 
>>>>>> It looks like the compressor uses direct byte buffers to store 
>>>>>> the
>>>> original
>>>>>> and compressed bytes in memory, so the native code can work with 
>>>>>> it
>>>> without
>>>>>> the JVM having to copy anything around. The direct buffers are 
>>>>>> possibly reused after a reinit() call, but will often be newly 
>>>>>> created in the
>>>> init()
>>>>>> method, because the existing buffer can be the wrong size for reusing.
>>>> The
>>>>>> latter case will leave the previously used buffers by the 
>>>>>> compressor instance eligible for garbage collection. I think the 
>>>>>> problem is that
>>>> this
>>>>>> collection never occurs (in time), because the GC does not 
>>>>>> consider it necessary yet. The GC does not know about the native 
>>>>>> heap and based on
>>>> the
>>>>>> state of the JVM heap, there is no reason to finalize these objects yet.
>>>>>> However, direct byte buffers are only freed in the finalizer, so 
>>>>>> the
>>>> native
>>>>>> heap keeps growing. On write only loads, a full GC will rarely 
>>>>>> happen, because the max heap will not grow far beyond the mem 
>>>>>> stores (no block
>>>> cache
>>>>>> is used). So what happens is that the machine starts using swap 
>>>>>> before
>>>> the
>>>>>> GC will ever clean up the direct byte buffers. I am guessing that
>>>> without
>>>>>> the reinit() support, the buffers were collected earlier because 
>>>>>> the referring objects would also be collected every now and then 
>>>>>> or things
>>>> would
>>>>>> perhaps just never promote to an older generation.
>>>>>> 
>>>>>> When I do a pmap on a running RS after it has grown to some 40Gb
>>>> resident
>>>>>> size (with a 16Gb heap), it will show a lot of near 64M anon 
>>>>>> blocks (presumably native heap). I show this before with the 
>>>>>> 0.4.6 version of Hadoop LZO, but that was under normal load. 
>>>>>> After that I went back to a HBase version that does not require 
>>>>>> the reinit(). Now I am on 0.90 with
>>>> the
>>>>>> new LZO, but never did a heavy load like this one with that, 
>>>>>> until
>>>> now...
>>>>>> 
>>>>>> Can anyone with a better understanding of the LZO code confirm 
>>>>>> that the above could be the case? If so, would it be possible to 
>>>>>> change the LZO compressor (and decompressor) to use maybe just 
>>>>>> one fixed size buffer
>>>> (they
>>>>>> all appear near 64M anyway) or possibly reuse an existing buffer 
>>>>>> also
>>>> when
>>>>>> it is not the exact required size but just large enough to make do?
>>>> Having
>>>>>> short lived direct byte buffers is apparently a discouraged 
>>>>>> practice. If anyone can provide some pointers on what to look out 
>>>>>> for, I could invest some time in creating a patch.
>>>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> Friso
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Todd Lipcon
>>>>> Software Engineer, Cloudera
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>

Re: problem with LZO compressor on write only loads

Posted by Friso van Vollenhoven <fv...@xebia.com>.

Thanks Sandy.

Does setting -XX:MaxDirectMemorySize help in triggering GC when you're reaching that limit? Or does it just OOME before the actual RAM is exhausted (then you prevent swapping, which is nicer, though)?

I guess LZO is not a solution that fits all, but we do a lot of random reads and latency can be an issue for us, so I suppose we have to stick with it.


Friso



On 5 jan 2011, at 20:36, Sandy Pratt wrote:

> I was in a similar situation recently, with similar symptoms, and I experienced a crash very similar to yours.  I don't have the specifics handy at the moment, but I did post to this list about it a few weeks ago.  My workload is fairly write-heavy.  I write about 10-20 million smallish protobuf/xml blobs per day to an HBase cluster of 12 very underpowered machines.
> 
> The suggestions I received were two: 1) update to the latest hadoop-lzo and 2) specify a max direct memory size to the JVM (e.g. -XX:MaxDirectMemorySize=256m).
> 
> I took a third route - change my tables back to gz compression for the time being while I figure out what to do.  Since then, my memory usage has been rock steady, but more importantly my tables are roughly half the size on disk that they were with LZO, and there has been no noticeable drop in performance (but remember this is a write heavy workload, I'm not trying to serve an online workload with low latency or anything like that).  At this point, I might not return to LZO.
> 
> In general, I'm not convinced that "use LZO" is universally good advice for all HBase users.  For one thing, I think it assumes that all installations are focused towards low latency, which is not always the case (sometimes merely good latency is enough and great latency is not needed).  Secondly, it assumes some things about where the performance bottleneck lives.   For example, LZO performs well in micro-benchmarks, but if you find yourself in an IO-bound batch processing situation, you might be better served by a higher compression ratio, even if it's more computationally expensive.
> 
> Sandy
> 
>> -----Original Message-----
>> From: Friso van Vollenhoven [mailto:fvanvollenhoven@xebia.com]
>> Sent: Tuesday, January 04, 2011 08:00
>> To: <us...@hbase.apache.org>
>> Subject: Re: problem with LZO compressor on write only loads
>> 
>> I ran the job again, but with less other processes running on the same
>> machine, so with more physical memory available to HBase. This was to see
>> whether there was a point where it would stop allocating more buffers.
>> When I do this, after many hours, one of the RSes crashed with a OOME. See
>> here:
>> 
>> 2011-01-04 11:32:01,332 FATAL
>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
>> server serverName=w5r1.inrdb.ripe.net,60020,1294091507228,
>> load=(requests=6246, regions=258, usedHeap=1790, maxHeap=16000):
>> Uncaught exception in service thread regionserver60020.compactor
>> java.lang.OutOfMemoryError: Direct buffer memory
>>        at java.nio.Bits.reserveMemory(Bits.java:633)
>>        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:98)
>>        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
>>        at
>> com.hadoop.compression.lzo.LzoCompressor.init(LzoCompressor.java:248)
>>        at
>> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:207
>> )
>>        at
>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>> 105)
>>        at
>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>> 112)
>>        at
>> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(C
>> ompression.java:200)
>>        at
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile
>> .java:397)
>>        at
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
>>        at
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.ja
>> va:354)
>>        at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
>>        at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
>>        at
>> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.j
>> ava:836)
>>        at
>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:931)
>>        at
>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:732)
>>        at
>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav
>> a:764)
>>        at
>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav
>> a:709)
>>        at
>> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSp
>> litThread.java:81)
>> 2011-01-04 11:32:01,369 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
>> request=0.0, regions=258, stores=516, storefiles=186,
>> storefileIndexSize=179, memstoreSize=2125, compactionQueueSize=2,
>> usedHeap=1797, maxHeap=16000, blockCacheSize=55051488,
>> blockCacheFree=6655834912, blockCacheCount=0, blockCacheHitCount=0,
>> blockCacheMissCount=2397107, blockCacheEvictedCount=0,
>> blockCacheHitRatio=0, blockCacheHitCachingRatio=0
>> 
>> I am guessing the OS won't allocate any more memory to the process. As you
>> can see, the used heap is nowhere near the max heap.
>> 
>> Also, this happens from the compaction, it seems. I had not considered those
>> as a suspect yet. I could try running with a larger compaction threshold and
>> blocking store files. Since this is a write only load, it should not be a problem.
>> In our normal operation, compactions and splits are quite common, though,
>> because we do read-modify-write cycles a lot. Anyone else doing update
>> heavy work with LZO?
>> 
>> 
>> Cheers,
>> Friso
>> 
>> 
>> On 4 jan 2011, at 01:54, Todd Lipcon wrote:
>> 
>>> Fishy. Are your cells particularly large? Or have you tuned the HFile
>>> block size at all?
>>> 
>>> -Todd
>>> 
>>> On Mon, Jan 3, 2011 at 2:15 PM, Friso van Vollenhoven <
>>> fvanvollenhoven@xebia.com> wrote:
>>> 
>>>> I tried it, but it doesn't seem to help. The RS processes grow to
>>>> 30Gb in minutes after the job started.
>>>> 
>>>> Any ideas?
>>>> 
>>>> 
>>>> Friso
>>>> 
>>>> 
>>>> 
>>>> On 3 jan 2011, at 19:18, Todd Lipcon wrote:
>>>> 
>>>>> Hi Friso,
>>>>> 
>>>>> Which OS are you running? Particularly, which version of glibc?
>>>>> 
>>>>> Can you try running with the environment variable
>> MALLOC_ARENA_MAX=1 set?
>>>>> 
>>>>> Thanks
>>>>> -Todd
>>>>> 
>>>>> On Mon, Jan 3, 2011 at 8:15 AM, Friso van Vollenhoven <
>>>>> fvanvollenhoven@xebia.com> wrote:
>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> I seem to run into a problem that occurs when using LZO compression
>>>>>> on a heavy write only load. I am using 0.90 RC1 and, thus, the LZO
>>>>>> compressor code that supports the reinit() method (from Kevin
>>>>>> Weil's github,
>>>> version
>>>>>> 0.4.8). There are some more Hadoop LZO incarnations, so I am
>>>>>> pointing my question to this list.
>>>>>> 
>>>>>> It looks like the compressor uses direct byte buffers to store the
>>>> original
>>>>>> and compressed bytes in memory, so the native code can work with it
>>>> without
>>>>>> the JVM having to copy anything around. The direct buffers are
>>>>>> possibly reused after a reinit() call, but will often be newly
>>>>>> created in the
>>>> init()
>>>>>> method, because the existing buffer can be the wrong size for reusing.
>>>> The
>>>>>> latter case will leave the previously used buffers by the
>>>>>> compressor instance eligible for garbage collection. I think the
>>>>>> problem is that
>>>> this
>>>>>> collection never occurs (in time), because the GC does not consider
>>>>>> it necessary yet. The GC does not know about the native heap and
>>>>>> based on
>>>> the
>>>>>> state of the JVM heap, there is no reason to finalize these objects yet.
>>>>>> However, direct byte buffers are only freed in the finalizer, so
>>>>>> the
>>>> native
>>>>>> heap keeps growing. On write only loads, a full GC will rarely
>>>>>> happen, because the max heap will not grow far beyond the mem
>>>>>> stores (no block
>>>> cache
>>>>>> is used). So what happens is that the machine starts using swap
>>>>>> before
>>>> the
>>>>>> GC will ever clean up the direct byte buffers. I am guessing that
>>>> without
>>>>>> the reinit() support, the buffers were collected earlier because
>>>>>> the referring objects would also be collected every now and then or
>>>>>> things
>>>> would
>>>>>> perhaps just never promote to an older generation.
>>>>>> 
>>>>>> When I do a pmap on a running RS after it has grown to some 40Gb
>>>> resident
>>>>>> size (with a 16Gb heap), it will show a lot of near 64M anon blocks
>>>>>> (presumably native heap). I show this before with the 0.4.6 version
>>>>>> of Hadoop LZO, but that was under normal load. After that I went
>>>>>> back to a HBase version that does not require the reinit(). Now I
>>>>>> am on 0.90 with
>>>> the
>>>>>> new LZO, but never did a heavy load like this one with that, until
>>>> now...
>>>>>> 
>>>>>> Can anyone with a better understanding of the LZO code confirm that
>>>>>> the above could be the case? If so, would it be possible to change
>>>>>> the LZO compressor (and decompressor) to use maybe just one fixed
>>>>>> size buffer
>>>> (they
>>>>>> all appear near 64M anyway) or possibly reuse an existing buffer
>>>>>> also
>>>> when
>>>>>> it is not the exact required size but just large enough to make do?
>>>> Having
>>>>>> short lived direct byte buffers is apparently a discouraged
>>>>>> practice. If anyone can provide some pointers on what to look out
>>>>>> for, I could invest some time in creating a patch.
>>>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> Friso
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Todd Lipcon
>>>>> Software Engineer, Cloudera
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>

RE: problem with LZO compressor on write only loads

Posted by Sandy Pratt <pr...@adobe.com>.

I was in a similar situation recently, with similar symptoms, and I experienced a crash very similar to yours.  I don't have the specifics handy at the moment, but I did post to this list about it a few weeks ago.  My workload is fairly write-heavy.  I write about 10-20 million smallish protobuf/xml blobs per day to an HBase cluster of 12 very underpowered machines.

The suggestions I received were two: 1) update to the latest hadoop-lzo and 2) specify a max direct memory size to the JVM (e.g. -XX:MaxDirectMemorySize=256m).

I took a third route - change my tables back to gz compression for the time being while I figure out what to do.  Since then, my memory usage has been rock steady, but more importantly my tables are roughly half the size on disk that they were with LZO, and there has been no noticeable drop in performance (but remember this is a write heavy workload, I'm not trying to serve an online workload with low latency or anything like that).  At this point, I might not return to LZO.

In general, I'm not convinced that "use LZO" is universally good advice for all HBase users.  For one thing, I think it assumes that all installations are focused towards low latency, which is not always the case (sometimes merely good latency is enough and great latency is not needed).  Secondly, it assumes some things about where the performance bottleneck lives.   For example, LZO performs well in micro-benchmarks, but if you find yourself in an IO-bound batch processing situation, you might be better served by a higher compression ratio, even if it's more computationally expensive.

Sandy

> -----Original Message-----
> From: Friso van Vollenhoven [mailto:fvanvollenhoven@xebia.com]
> Sent: Tuesday, January 04, 2011 08:00
> To: <us...@hbase.apache.org>
> Subject: Re: problem with LZO compressor on write only loads
> 
> I ran the job again, but with less other processes running on the same
> machine, so with more physical memory available to HBase. This was to see
> whether there was a point where it would stop allocating more buffers.
> When I do this, after many hours, one of the RSes crashed with a OOME. See
> here:
> 
> 2011-01-04 11:32:01,332 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
> server serverName=w5r1.inrdb.ripe.net,60020,1294091507228,
> load=(requests=6246, regions=258, usedHeap=1790, maxHeap=16000):
> Uncaught exception in service thread regionserver60020.compactor
> java.lang.OutOfMemoryError: Direct buffer memory
>         at java.nio.Bits.reserveMemory(Bits.java:633)
>         at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:98)
>         at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
>         at
> com.hadoop.compression.lzo.LzoCompressor.init(LzoCompressor.java:248)
>         at
> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:207
> )
>         at
> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
> 105)
>         at
> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
> 112)
>         at
> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(C
> ompression.java:200)
>         at
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile
> .java:397)
>         at
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
>         at
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.ja
> va:354)
>         at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
>         at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
>         at
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.j
> ava:836)
>         at
> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:931)
>         at
> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:732)
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav
> a:764)
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav
> a:709)
>         at
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSp
> litThread.java:81)
> 2011-01-04 11:32:01,369 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> request=0.0, regions=258, stores=516, storefiles=186,
> storefileIndexSize=179, memstoreSize=2125, compactionQueueSize=2,
> usedHeap=1797, maxHeap=16000, blockCacheSize=55051488,
> blockCacheFree=6655834912, blockCacheCount=0, blockCacheHitCount=0,
> blockCacheMissCount=2397107, blockCacheEvictedCount=0,
> blockCacheHitRatio=0, blockCacheHitCachingRatio=0
> 
> I am guessing the OS won't allocate any more memory to the process. As you
> can see, the used heap is nowhere near the max heap.
> 
> Also, this happens from the compaction, it seems. I had not considered those
> as a suspect yet. I could try running with a larger compaction threshold and
> blocking store files. Since this is a write only load, it should not be a problem.
> In our normal operation, compactions and splits are quite common, though,
> because we do read-modify-write cycles a lot. Anyone else doing update
> heavy work with LZO?
> 
> 
> Cheers,
> Friso
> 
> 
> On 4 jan 2011, at 01:54, Todd Lipcon wrote:
> 
> > Fishy. Are your cells particularly large? Or have you tuned the HFile
> > block size at all?
> >
> > -Todd
> >
> > On Mon, Jan 3, 2011 at 2:15 PM, Friso van Vollenhoven <
> > fvanvollenhoven@xebia.com> wrote:
> >
> >> I tried it, but it doesn't seem to help. The RS processes grow to
> >> 30Gb in minutes after the job started.
> >>
> >> Any ideas?
> >>
> >>
> >> Friso
> >>
> >>
> >>
> >> On 3 jan 2011, at 19:18, Todd Lipcon wrote:
> >>
> >>> Hi Friso,
> >>>
> >>> Which OS are you running? Particularly, which version of glibc?
> >>>
> >>> Can you try running with the environment variable
> MALLOC_ARENA_MAX=1 set?
> >>>
> >>> Thanks
> >>> -Todd
> >>>
> >>> On Mon, Jan 3, 2011 at 8:15 AM, Friso van Vollenhoven <
> >>> fvanvollenhoven@xebia.com> wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> I seem to run into a problem that occurs when using LZO compression
> >>>> on a heavy write only load. I am using 0.90 RC1 and, thus, the LZO
> >>>> compressor code that supports the reinit() method (from Kevin
> >>>> Weil's github,
> >> version
> >>>> 0.4.8). There are some more Hadoop LZO incarnations, so I am
> >>>> pointing my question to this list.
> >>>>
> >>>> It looks like the compressor uses direct byte buffers to store the
> >> original
> >>>> and compressed bytes in memory, so the native code can work with it
> >> without
> >>>> the JVM having to copy anything around. The direct buffers are
> >>>> possibly reused after a reinit() call, but will often be newly
> >>>> created in the
> >> init()
> >>>> method, because the existing buffer can be the wrong size for reusing.
> >> The
> >>>> latter case will leave the previously used buffers by the
> >>>> compressor instance eligible for garbage collection. I think the
> >>>> problem is that
> >> this
> >>>> collection never occurs (in time), because the GC does not consider
> >>>> it necessary yet. The GC does not know about the native heap and
> >>>> based on
> >> the
> >>>> state of the JVM heap, there is no reason to finalize these objects yet.
> >>>> However, direct byte buffers are only freed in the finalizer, so
> >>>> the
> >> native
> >>>> heap keeps growing. On write only loads, a full GC will rarely
> >>>> happen, because the max heap will not grow far beyond the mem
> >>>> stores (no block
> >> cache
> >>>> is used). So what happens is that the machine starts using swap
> >>>> before
> >> the
> >>>> GC will ever clean up the direct byte buffers. I am guessing that
> >> without
> >>>> the reinit() support, the buffers were collected earlier because
> >>>> the referring objects would also be collected every now and then or
> >>>> things
> >> would
> >>>> perhaps just never promote to an older generation.
> >>>>
> >>>> When I do a pmap on a running RS after it has grown to some 40Gb
> >> resident
> >>>> size (with a 16Gb heap), it will show a lot of near 64M anon blocks
> >>>> (presumably native heap). I show this before with the 0.4.6 version
> >>>> of Hadoop LZO, but that was under normal load. After that I went
> >>>> back to a HBase version that does not require the reinit(). Now I
> >>>> am on 0.90 with
> >> the
> >>>> new LZO, but never did a heavy load like this one with that, until
> >> now...
> >>>>
> >>>> Can anyone with a better understanding of the LZO code confirm that
> >>>> the above could be the case? If so, would it be possible to change
> >>>> the LZO compressor (and decompressor) to use maybe just one fixed
> >>>> size buffer
> >> (they
> >>>> all appear near 64M anyway) or possibly reuse an existing buffer
> >>>> also
> >> when
> >>>> it is not the exact required size but just large enough to make do?
> >> Having
> >>>> short lived direct byte buffers is apparently a discouraged
> >>>> practice. If anyone can provide some pointers on what to look out
> >>>> for, I could invest some time in creating a patch.
> >>>>
> >>>>
> >>>> Thanks,
> >>>> Friso
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Todd Lipcon
> >>> Software Engineer, Cloudera
> >>
> >>
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera

Re: problem with LZO compressor on write only loads

Posted by Friso van Vollenhoven <fv...@xebia.com>.

I ran the job again, but with less other processes running on the same machine, so with more physical memory available to HBase. This was to see whether there was a point where it would stop allocating more buffers. When I do this, after many hours, one of the RSes crashed with a OOME. See here:

2011-01-04 11:32:01,332 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server serverName=w5r1.inrdb.ripe.net,60020,1294091507228, load=(requests=6246, regions=258, usedHeap=1790, maxHeap=16000): Uncaught exception in service thread regionserver60020.compactor
java.lang.OutOfMemoryError: Direct buffer memory
        at java.nio.Bits.reserveMemory(Bits.java:633)
        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:98)
        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
        at com.hadoop.compression.lzo.LzoCompressor.init(LzoCompressor.java:248)
        at com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:207)
        at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:105)
        at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112)
        at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:200)
        at org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile.java:397)
        at org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
        at org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.java:354)
        at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
        at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
        at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:836)
        at org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:931)
        at org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:732)
        at org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:764)
        at org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:709)
        at org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:81)
2011-01-04 11:32:01,369 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: request=0.0, regions=258, stores=516, storefiles=186, storefileIndexSize=179, memstoreSize=2125, compactionQueueSize=2, usedHeap=1797, maxHeap=16000, blockCacheSize=55051488, blockCacheFree=6655834912, blockCacheCount=0, blockCacheHitCount=0, blockCacheMissCount=2397107, blockCacheEvictedCount=0, blockCacheHitRatio=0, blockCacheHitCachingRatio=0

I am guessing the OS won't allocate any more memory to the process. As you can see, the used heap is nowhere near the max heap.

Also, this happens from the compaction, it seems. I had not considered those as a suspect yet. I could try running with a larger compaction threshold and blocking store files. Since this is a write only load, it should not be a problem. In our normal operation, compactions and splits are quite common, though, because we do read-modify-write cycles a lot. Anyone else doing update heavy work with LZO?


Cheers,
Friso


On 4 jan 2011, at 01:54, Todd Lipcon wrote:

> Fishy. Are your cells particularly large? Or have you tuned the HFile block
> size at all?
> 
> -Todd
> 
> On Mon, Jan 3, 2011 at 2:15 PM, Friso van Vollenhoven <
> fvanvollenhoven@xebia.com> wrote:
> 
>> I tried it, but it doesn't seem to help. The RS processes grow to 30Gb in
>> minutes after the job started.
>> 
>> Any ideas?
>> 
>> 
>> Friso
>> 
>> 
>> 
>> On 3 jan 2011, at 19:18, Todd Lipcon wrote:
>> 
>>> Hi Friso,
>>> 
>>> Which OS are you running? Particularly, which version of glibc?
>>> 
>>> Can you try running with the environment variable MALLOC_ARENA_MAX=1 set?
>>> 
>>> Thanks
>>> -Todd
>>> 
>>> On Mon, Jan 3, 2011 at 8:15 AM, Friso van Vollenhoven <
>>> fvanvollenhoven@xebia.com> wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I seem to run into a problem that occurs when using LZO compression on a
>>>> heavy write only load. I am using 0.90 RC1 and, thus, the LZO compressor
>>>> code that supports the reinit() method (from Kevin Weil's github,
>> version
>>>> 0.4.8). There are some more Hadoop LZO incarnations, so I am pointing my
>>>> question to this list.
>>>> 
>>>> It looks like the compressor uses direct byte buffers to store the
>> original
>>>> and compressed bytes in memory, so the native code can work with it
>> without
>>>> the JVM having to copy anything around. The direct buffers are possibly
>>>> reused after a reinit() call, but will often be newly created in the
>> init()
>>>> method, because the existing buffer can be the wrong size for reusing.
>> The
>>>> latter case will leave the previously used buffers by the compressor
>>>> instance eligible for garbage collection. I think the problem is that
>> this
>>>> collection never occurs (in time), because the GC does not consider it
>>>> necessary yet. The GC does not know about the native heap and based on
>> the
>>>> state of the JVM heap, there is no reason to finalize these objects yet.
>>>> However, direct byte buffers are only freed in the finalizer, so the
>> native
>>>> heap keeps growing. On write only loads, a full GC will rarely happen,
>>>> because the max heap will not grow far beyond the mem stores (no block
>> cache
>>>> is used). So what happens is that the machine starts using swap before
>> the
>>>> GC will ever clean up the direct byte buffers. I am guessing that
>> without
>>>> the reinit() support, the buffers were collected earlier because the
>>>> referring objects would also be collected every now and then or things
>> would
>>>> perhaps just never promote to an older generation.
>>>> 
>>>> When I do a pmap on a running RS after it has grown to some 40Gb
>> resident
>>>> size (with a 16Gb heap), it will show a lot of near 64M anon blocks
>>>> (presumably native heap). I show this before with the 0.4.6 version of
>>>> Hadoop LZO, but that was under normal load. After that I went back to a
>>>> HBase version that does not require the reinit(). Now I am on 0.90 with
>> the
>>>> new LZO, but never did a heavy load like this one with that, until
>> now...
>>>> 
>>>> Can anyone with a better understanding of the LZO code confirm that the
>>>> above could be the case? If so, would it be possible to change the LZO
>>>> compressor (and decompressor) to use maybe just one fixed size buffer
>> (they
>>>> all appear near 64M anyway) or possibly reuse an existing buffer also
>> when
>>>> it is not the exact required size but just large enough to make do?
>> Having
>>>> short lived direct byte buffers is apparently a discouraged practice. If
>>>> anyone can provide some pointers on what to look out for, I could invest
>>>> some time in creating a patch.
>>>> 
>>>> 
>>>> Thanks,
>>>> Friso
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>> 
>> 
> 
> 
> -- 
> Todd Lipcon
> Software Engineer, Cloudera

Re: problem with LZO compressor on write only loads

Posted by Todd Lipcon <to...@cloudera.com>.

Fishy. Are your cells particularly large? Or have you tuned the HFile block
size at all?

-Todd

On Mon, Jan 3, 2011 at 2:15 PM, Friso van Vollenhoven <
fvanvollenhoven@xebia.com> wrote:

> I tried it, but it doesn't seem to help. The RS processes grow to 30Gb in
> minutes after the job started.
>
> Any ideas?
>
>
> Friso
>
>
>
> On 3 jan 2011, at 19:18, Todd Lipcon wrote:
>
> > Hi Friso,
> >
> > Which OS are you running? Particularly, which version of glibc?
> >
> > Can you try running with the environment variable MALLOC_ARENA_MAX=1 set?
> >
> > Thanks
> > -Todd
> >
> > On Mon, Jan 3, 2011 at 8:15 AM, Friso van Vollenhoven <
> > fvanvollenhoven@xebia.com> wrote:
> >
> >> Hi all,
> >>
> >> I seem to run into a problem that occurs when using LZO compression on a
> >> heavy write only load. I am using 0.90 RC1 and, thus, the LZO compressor
> >> code that supports the reinit() method (from Kevin Weil's github,
> version
> >> 0.4.8). There are some more Hadoop LZO incarnations, so I am pointing my
> >> question to this list.
> >>
> >> It looks like the compressor uses direct byte buffers to store the
> original
> >> and compressed bytes in memory, so the native code can work with it
> without
> >> the JVM having to copy anything around. The direct buffers are possibly
> >> reused after a reinit() call, but will often be newly created in the
> init()
> >> method, because the existing buffer can be the wrong size for reusing.
> The
> >> latter case will leave the previously used buffers by the compressor
> >> instance eligible for garbage collection. I think the problem is that
> this
> >> collection never occurs (in time), because the GC does not consider it
> >> necessary yet. The GC does not know about the native heap and based on
> the
> >> state of the JVM heap, there is no reason to finalize these objects yet.
> >> However, direct byte buffers are only freed in the finalizer, so the
> native
> >> heap keeps growing. On write only loads, a full GC will rarely happen,
> >> because the max heap will not grow far beyond the mem stores (no block
> cache
> >> is used). So what happens is that the machine starts using swap before
> the
> >> GC will ever clean up the direct byte buffers. I am guessing that
> without
> >> the reinit() support, the buffers were collected earlier because the
> >> referring objects would also be collected every now and then or things
> would
> >> perhaps just never promote to an older generation.
> >>
> >> When I do a pmap on a running RS after it has grown to some 40Gb
> resident
> >> size (with a 16Gb heap), it will show a lot of near 64M anon blocks
> >> (presumably native heap). I show this before with the 0.4.6 version of
> >> Hadoop LZO, but that was under normal load. After that I went back to a
> >> HBase version that does not require the reinit(). Now I am on 0.90 with
> the
> >> new LZO, but never did a heavy load like this one with that, until
> now...
> >>
> >> Can anyone with a better understanding of the LZO code confirm that the
> >> above could be the case? If so, would it be possible to change the LZO
> >> compressor (and decompressor) to use maybe just one fixed size buffer
> (they
> >> all appear near 64M anyway) or possibly reuse an existing buffer also
> when
> >> it is not the exact required size but just large enough to make do?
> Having
> >> short lived direct byte buffers is apparently a discouraged practice. If
> >> anyone can provide some pointers on what to look out for, I could invest
> >> some time in creating a patch.
> >>
> >>
> >> Thanks,
> >> Friso
> >>
> >>
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: problem with LZO compressor on write only loads

Posted by Friso van Vollenhoven <fv...@xebia.com>.

I tried it, but it doesn't seem to help. The RS processes grow to 30Gb in minutes after the job started.

Any ideas?


Friso



On 3 jan 2011, at 19:18, Todd Lipcon wrote:

> Hi Friso,
> 
> Which OS are you running? Particularly, which version of glibc?
> 
> Can you try running with the environment variable MALLOC_ARENA_MAX=1 set?
> 
> Thanks
> -Todd
> 
> On Mon, Jan 3, 2011 at 8:15 AM, Friso van Vollenhoven <
> fvanvollenhoven@xebia.com> wrote:
> 
>> Hi all,
>> 
>> I seem to run into a problem that occurs when using LZO compression on a
>> heavy write only load. I am using 0.90 RC1 and, thus, the LZO compressor
>> code that supports the reinit() method (from Kevin Weil's github, version
>> 0.4.8). There are some more Hadoop LZO incarnations, so I am pointing my
>> question to this list.
>> 
>> It looks like the compressor uses direct byte buffers to store the original
>> and compressed bytes in memory, so the native code can work with it without
>> the JVM having to copy anything around. The direct buffers are possibly
>> reused after a reinit() call, but will often be newly created in the init()
>> method, because the existing buffer can be the wrong size for reusing. The
>> latter case will leave the previously used buffers by the compressor
>> instance eligible for garbage collection. I think the problem is that this
>> collection never occurs (in time), because the GC does not consider it
>> necessary yet. The GC does not know about the native heap and based on the
>> state of the JVM heap, there is no reason to finalize these objects yet.
>> However, direct byte buffers are only freed in the finalizer, so the native
>> heap keeps growing. On write only loads, a full GC will rarely happen,
>> because the max heap will not grow far beyond the mem stores (no block cache
>> is used). So what happens is that the machine starts using swap before the
>> GC will ever clean up the direct byte buffers. I am guessing that without
>> the reinit() support, the buffers were collected earlier because the
>> referring objects would also be collected every now and then or things would
>> perhaps just never promote to an older generation.
>> 
>> When I do a pmap on a running RS after it has grown to some 40Gb resident
>> size (with a 16Gb heap), it will show a lot of near 64M anon blocks
>> (presumably native heap). I show this before with the 0.4.6 version of
>> Hadoop LZO, but that was under normal load. After that I went back to a
>> HBase version that does not require the reinit(). Now I am on 0.90 with the
>> new LZO, but never did a heavy load like this one with that, until now...
>> 
>> Can anyone with a better understanding of the LZO code confirm that the
>> above could be the case? If so, would it be possible to change the LZO
>> compressor (and decompressor) to use maybe just one fixed size buffer (they
>> all appear near 64M anyway) or possibly reuse an existing buffer also when
>> it is not the exact required size but just large enough to make do? Having
>> short lived direct byte buffers is apparently a discouraged practice. If
>> anyone can provide some pointers on what to look out for, I could invest
>> some time in creating a patch.
>> 
>> 
>> Thanks,
>> Friso
>> 
>> 
> 
> 
> -- 
> Todd Lipcon
> Software Engineer, Cloudera

Re: problem with LZO compressor on write only loads

Posted by Todd Lipcon <to...@cloudera.com>.

Hi Friso,

Which OS are you running? Particularly, which version of glibc?

Can you try running with the environment variable MALLOC_ARENA_MAX=1 set?

Thanks
-Todd

On Mon, Jan 3, 2011 at 8:15 AM, Friso van Vollenhoven <
fvanvollenhoven@xebia.com> wrote:

> Hi all,
>
> I seem to run into a problem that occurs when using LZO compression on a
> heavy write only load. I am using 0.90 RC1 and, thus, the LZO compressor
> code that supports the reinit() method (from Kevin Weil's github, version
> 0.4.8). There are some more Hadoop LZO incarnations, so I am pointing my
> question to this list.
>
> It looks like the compressor uses direct byte buffers to store the original
> and compressed bytes in memory, so the native code can work with it without
> the JVM having to copy anything around. The direct buffers are possibly
> reused after a reinit() call, but will often be newly created in the init()
> method, because the existing buffer can be the wrong size for reusing. The
> latter case will leave the previously used buffers by the compressor
> instance eligible for garbage collection. I think the problem is that this
> collection never occurs (in time), because the GC does not consider it
> necessary yet. The GC does not know about the native heap and based on the
> state of the JVM heap, there is no reason to finalize these objects yet.
> However, direct byte buffers are only freed in the finalizer, so the native
> heap keeps growing. On write only loads, a full GC will rarely happen,
> because the max heap will not grow far beyond the mem stores (no block cache
> is used). So what happens is that the machine starts using swap before the
> GC will ever clean up the direct byte buffers. I am guessing that without
> the reinit() support, the buffers were collected earlier because the
> referring objects would also be collected every now and then or things would
> perhaps just never promote to an older generation.
>
> When I do a pmap on a running RS after it has grown to some 40Gb resident
> size (with a 16Gb heap), it will show a lot of near 64M anon blocks
> (presumably native heap). I show this before with the 0.4.6 version of
> Hadoop LZO, but that was under normal load. After that I went back to a
> HBase version that does not require the reinit(). Now I am on 0.90 with the
> new LZO, but never did a heavy load like this one with that, until now...
>
> Can anyone with a better understanding of the LZO code confirm that the
> above could be the case? If so, would it be possible to change the LZO
> compressor (and decompressor) to use maybe just one fixed size buffer (they
> all appear near 64M anyway) or possibly reuse an existing buffer also when
> it is not the exact required size but just large enough to make do? Having
> short lived direct byte buffers is apparently a discouraged practice. If
> anyone can provide some pointers on what to look out for, I could invest
> some time in creating a patch.
>
>
> Thanks,
> Friso
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

problem with LZO compressor on write only loads

Posted by Friso van Vollenhoven <fv...@xebia.com>.

Hi all,

I seem to run into a problem that occurs when using LZO compression on a heavy write only load. I am using 0.90 RC1 and, thus, the LZO compressor code that supports the reinit() method (from Kevin Weil's github, version 0.4.8). There are some more Hadoop LZO incarnations, so I am pointing my question to this list.

It looks like the compressor uses direct byte buffers to store the original and compressed bytes in memory, so the native code can work with it without the JVM having to copy anything around. The direct buffers are possibly reused after a reinit() call, but will often be newly created in the init() method, because the existing buffer can be the wrong size for reusing. The latter case will leave the previously used buffers by the compressor instance eligible for garbage collection. I think the problem is that this collection never occurs (in time), because the GC does not consider it necessary yet. The GC does not know about the native heap and based on the state of the JVM heap, there is no reason to finalize these objects yet. However, direct byte buffers are only freed in the finalizer, so the native heap keeps growing. On write only loads, a full GC will rarely happen, because the max heap will not grow far beyond the mem stores (no block cache is used). So what happens is that the machine starts using swap before the GC will ever clean up the direct byte buffers. I am guessing that without the reinit() support, the buffers were collected earlier because the referring objects would also be collected every now and then or things would perhaps just never promote to an older generation.

When I do a pmap on a running RS after it has grown to some 40Gb resident size (with a 16Gb heap), it will show a lot of near 64M anon blocks (presumably native heap). I show this before with the 0.4.6 version of Hadoop LZO, but that was under normal load. After that I went back to a HBase version that does not require the reinit(). Now I am on 0.90 with the new LZO, but never did a heavy load like this one with that, until now...

Can anyone with a better understanding of the LZO code confirm that the above could be the case? If so, would it be possible to change the LZO compressor (and decompressor) to use maybe just one fixed size buffer (they all appear near 64M anyway) or possibly reuse an existing buffer also when it is not the exact required size but just large enough to make do? Having short lived direct byte buffers is apparently a discouraged practice. If anyone can provide some pointers on what to look out for, I could invest some time in creating a patch.

Thanks,
Friso

Re: Hbase auto table operation using script

Posted by 祝海通 <zh...@gmail.com>.

顺颂冬祺

海通
thank you very much, Sebastian Bauer! It works.

On Mon, Jan 3, 2011 at 7:10 PM, Sebastian Bauer <ad...@ugame.net.pl> wrote:

> W dniu 03.01.2011 10:56, 祝海通 pisze:
> > In our HBase test for YCSB benchmark, we want to create table and drop
> table
> > with script automatically.
> > But I fount that in our script. /bin/Hbase shell; disable 'usertable';
> drop
> > 'usertable'.
> > disable 'usertable'; drop 'usertable'. can not execute.
> > Using the "" does not work.
> >
> > haitong
> >
> echo "disable 'usertable' " | ./bin/hbase shell
>
> --
>
> Pozdrawiam
> Sebastian Bauer
> -----------------------------------------------------
> http://tikecik.pl
>
>

Re: Hbase auto table operation using script

Posted by Sebastian Bauer <ad...@ugame.net.pl>.

W dniu 03.01.2011 10:56, 祝海通 pisze:
> In our HBase test for YCSB benchmark, we want to create table and drop table
> with script automatically.
> But I fount that in our script. /bin/Hbase shell; disable 'usertable'; drop
> 'usertable'.
> disable 'usertable'; drop 'usertable'. can not execute.
> Using the "" does not work.
>
> haitong
>
echo "disable 'usertable' " | ./bin/hbase shell

-- 

Pozdrawiam
Sebastian Bauer
-----------------------------------------------------
http://tikecik.pl