You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Gaojinchao <ga...@huawei.com> on 2011/05/24 14:29:49 UTC

a question storefileIndexSize

My observation is that storefileIndexSize is large.
Is there a way to reduce it ?

Region server metric:
requests=11447, regions=10394, stores=10394, storefiles=3103, storefileIndexSize=3717,
memstoreSize=1002, compactionQueueSize=1234, flushQueueSize=0, usedHeap=6916,
maxHeap=8165, blockCacheSize=1394662632, blockCacheFree=317661976, blockCacheCount=53394,
blockCacheHitCount=16229024, blockCacheMissCount=91803814, blockCacheEvictedCount=22381853,
blockCacheHitRatio=15, blockCacheHitCachingRatio=41

Re: a question storefileIndexSize

Posted by Ted Yu <yu...@gmail.com>.
See https://issues.apache.org/jira/browse/HBASE-3857 and
https://issues.apache.org/jira/browse/HBASE-3856

Cheers

On Tue, May 24, 2011 at 5:29 AM, Gaojinchao <ga...@huawei.com> wrote:

> My observation is that storefileIndexSize is large.
> Is there a way to reduce it ?
>
> Region server metric:
> requests=11447, regions=10394, stores=10394, storefiles=3103,
> storefileIndexSize=3717,
> memstoreSize=1002, compactionQueueSize=1234, flushQueueSize=0,
> usedHeap=6916,
> maxHeap=8165, blockCacheSize=1394662632, blockCacheFree=317661976,
> blockCacheCount=53394,
> blockCacheHitCount=16229024, blockCacheMissCount=91803814,
> blockCacheEvictedCount=22381853,
> blockCacheHitRatio=15, blockCacheHitCachingRatio=41
>

Re: a question storefileIndexSize

Posted by Stack <st...@duboce.net>.
On Wed, May 25, 2011 at 4:49 PM, Matt Corgan <mc...@hotpads.com> wrote:
> I was thinking it would be a nice feature if each time an hfile was written
> it kept a count of the raw bytes (before compression) to make it easy to
> compare to the file size on disk.  It could report it in the web interface
> next to the disk size.
>
>

Its logged IIRC.

Please open an issue Matt to add this facility.
St.Ack

Re: a question storefileIndexSize

Posted by Matt Corgan <mc...@hotpads.com>.
I was thinking it would be a nice feature if each time an hfile was written
it kept a count of the raw bytes (before compression) to make it easy to
compare to the file size on disk.  It could report it in the web interface
next to the disk size.


2011/5/25 Stack <st...@duboce.net>

> Good point Matt.  I forgot about compression.  Let me add not to the
> above referenced section in the book....
> St.Ack
>
> On Wed, May 25, 2011 at 7:47 AM, Matt Corgan <mc...@hotpads.com> wrote:
> > I have a table that compresses by 30x using gzip, so the default block
> size
> > of 64 KB was writing 2 KB blocks to disk.  To reduce storefileIndexSize,
> I
> > raised the block size to 256 KB, presumably writing ~8KB disk blocks
> which
> > is still pretty small.  Maybe you could go even higher depending on your
> > compression ratio.
> >
> > btw - why 10394 regions with only 3103 storefiles?
> >
> >
> > 2011/5/25 Gaojinchao <ga...@huawei.com>
> >
> >> Region size is 512M
> >>
> >> hbase.regionserver.handler.count 50
> >> hbase.regionserver.global.memstore.upperLimit 0.4
> >> hbase.regionserver.global.memstore.lowerLimit 0.35
> >> hbase.hregion.memstore.flush.size 128M
> >> hbase.hregion.max.filesize 512M
> >> hbase.client.scanner.caching 1 hfile.block.cache.size 0.2
> >>  hbase.hregion.memstore.block.multiplier 3
> >> hbase.hstore.blockingStoreFiles 10
> >> hbase.hstore.compaction.min.size 64M
> >>
> >> compress: gz
> >>
> >> dfs.block.size 256M
> >>
> >> -----邮件原件-----
> >> 发件人: saint.ack@gmail.com [mailto:saint.ack@gmail.com] 代表 Stack
> >> 发送时间: 2011年5月25日 11:57
> >> 收件人: user@hbase.apache.org
> >> 主题: Re: a question storefileIndexSize
> >>
> >> 2011/5/24 Gaojinchao <ga...@huawei.com>:
> >> > Stack, Thanks for your reply.
> >> > block size is default.
> >> > My Key length is 26 bytes and value is 300~400 bytes.
> >> > Is it big keys and small values ?
> >> >
> >>
> >> Looks like you have 'small' keys.
> >>
> >> It looks like the index is about 1MB per storefile (storefiles=3103,
> >> storefileIndexSize=3717).  Does this seem about right?  What size are
> >> your regions?
> >>
> >> St.Ack
> >>
> >
>

Re: a question storefileIndexSize

Posted by Stack <st...@duboce.net>.
Good point Matt.  I forgot about compression.  Let me add not to the
above referenced section in the book....
St.Ack

On Wed, May 25, 2011 at 7:47 AM, Matt Corgan <mc...@hotpads.com> wrote:
> I have a table that compresses by 30x using gzip, so the default block size
> of 64 KB was writing 2 KB blocks to disk.  To reduce storefileIndexSize, I
> raised the block size to 256 KB, presumably writing ~8KB disk blocks which
> is still pretty small.  Maybe you could go even higher depending on your
> compression ratio.
>
> btw - why 10394 regions with only 3103 storefiles?
>
>
> 2011/5/25 Gaojinchao <ga...@huawei.com>
>
>> Region size is 512M
>>
>> hbase.regionserver.handler.count 50
>> hbase.regionserver.global.memstore.upperLimit 0.4
>> hbase.regionserver.global.memstore.lowerLimit 0.35
>> hbase.hregion.memstore.flush.size 128M
>> hbase.hregion.max.filesize 512M
>> hbase.client.scanner.caching 1 hfile.block.cache.size 0.2
>>  hbase.hregion.memstore.block.multiplier 3
>> hbase.hstore.blockingStoreFiles 10
>> hbase.hstore.compaction.min.size 64M
>>
>> compress: gz
>>
>> dfs.block.size 256M
>>
>> -----邮件原件-----
>> 发件人: saint.ack@gmail.com [mailto:saint.ack@gmail.com] 代表 Stack
>> 发送时间: 2011年5月25日 11:57
>> 收件人: user@hbase.apache.org
>> 主题: Re: a question storefileIndexSize
>>
>> 2011/5/24 Gaojinchao <ga...@huawei.com>:
>> > Stack, Thanks for your reply.
>> > block size is default.
>> > My Key length is 26 bytes and value is 300~400 bytes.
>> > Is it big keys and small values ?
>> >
>>
>> Looks like you have 'small' keys.
>>
>> It looks like the index is about 1MB per storefile (storefiles=3103,
>> storefileIndexSize=3717).  Does this seem about right?  What size are
>> your regions?
>>
>> St.Ack
>>
>

Re: a question storefileIndexSize

Posted by Matt Corgan <mc...@hotpads.com>.
also - how long are your column family name and column qualifiers?  they are
added to each row key in the index, so you want to make them as short as
possible


On Wed, May 25, 2011 at 10:47 AM, Matt Corgan <mc...@hotpads.com> wrote:

> I have a table that compresses by 30x using gzip, so the default block size
> of 64 KB was writing 2 KB blocks to disk.  To reduce storefileIndexSize, I
> raised the block size to 256 KB, presumably writing ~8KB disk blocks which
> is still pretty small.  Maybe you could go even higher depending on your
> compression ratio.
>
> btw - why 10394 regions with only 3103 storefiles?
>
>
>
> 2011/5/25 Gaojinchao <ga...@huawei.com>
>
>> Region size is 512M
>>
>> hbase.regionserver.handler.count 50
>> hbase.regionserver.global.memstore.upperLimit 0.4
>> hbase.regionserver.global.memstore.lowerLimit 0.35
>> hbase.hregion.memstore.flush.size 128M
>> hbase.hregion.max.filesize 512M
>> hbase.client.scanner.caching 1 hfile.block.cache.size 0.2
>>  hbase.hregion.memstore.block.multiplier 3
>> hbase.hstore.blockingStoreFiles 10
>> hbase.hstore.compaction.min.size 64M
>>
>> compress: gz
>>
>> dfs.block.size 256M
>>
>> -----邮件原件-----
>> 发件人: saint.ack@gmail.com [mailto:saint.ack@gmail.com] 代表 Stack
>> 发送时间: 2011年5月25日 11:57
>> 收件人: user@hbase.apache.org
>> 主题: Re: a question storefileIndexSize
>>
>> 2011/5/24 Gaojinchao <ga...@huawei.com>:
>> > Stack, Thanks for your reply.
>> > block size is default.
>> > My Key length is 26 bytes and value is 300~400 bytes.
>> > Is it big keys and small values ?
>> >
>>
>> Looks like you have 'small' keys.
>>
>> It looks like the index is about 1MB per storefile (storefiles=3103,
>> storefileIndexSize=3717).  Does this seem about right?  What size are
>> your regions?
>>
>> St.Ack
>>
>
>

Re: a question storefileIndexSize

Posted by Matt Corgan <mc...@hotpads.com>.
I have a table that compresses by 30x using gzip, so the default block size
of 64 KB was writing 2 KB blocks to disk.  To reduce storefileIndexSize, I
raised the block size to 256 KB, presumably writing ~8KB disk blocks which
is still pretty small.  Maybe you could go even higher depending on your
compression ratio.

btw - why 10394 regions with only 3103 storefiles?


2011/5/25 Gaojinchao <ga...@huawei.com>

> Region size is 512M
>
> hbase.regionserver.handler.count 50
> hbase.regionserver.global.memstore.upperLimit 0.4
> hbase.regionserver.global.memstore.lowerLimit 0.35
> hbase.hregion.memstore.flush.size 128M
> hbase.hregion.max.filesize 512M
> hbase.client.scanner.caching 1 hfile.block.cache.size 0.2
>  hbase.hregion.memstore.block.multiplier 3
> hbase.hstore.blockingStoreFiles 10
> hbase.hstore.compaction.min.size 64M
>
> compress: gz
>
> dfs.block.size 256M
>
> -----邮件原件-----
> 发件人: saint.ack@gmail.com [mailto:saint.ack@gmail.com] 代表 Stack
> 发送时间: 2011年5月25日 11:57
> 收件人: user@hbase.apache.org
> 主题: Re: a question storefileIndexSize
>
> 2011/5/24 Gaojinchao <ga...@huawei.com>:
> > Stack, Thanks for your reply.
> > block size is default.
> > My Key length is 26 bytes and value is 300~400 bytes.
> > Is it big keys and small values ?
> >
>
> Looks like you have 'small' keys.
>
> It looks like the index is about 1MB per storefile (storefiles=3103,
> storefileIndexSize=3717).  Does this seem about right?  What size are
> your regions?
>
> St.Ack
>

Re: a question storefileIndexSize

Posted by Gaojinchao <ga...@huawei.com>.
Region size is 512M

hbase.regionserver.handler.count 50
hbase.regionserver.global.memstore.upperLimit 0.4 
hbase.regionserver.global.memstore.lowerLimit 0.35 
hbase.hregion.memstore.flush.size 128M 
hbase.hregion.max.filesize 512M 
hbase.client.scanner.caching 1 hfile.block.cache.size 0.2
 hbase.hregion.memstore.block.multiplier 3 
hbase.hstore.blockingStoreFiles 10 
hbase.hstore.compaction.min.size 64M 

compress: gz

dfs.block.size 256M

-----邮件原件-----
发件人: saint.ack@gmail.com [mailto:saint.ack@gmail.com] 代表 Stack
发送时间: 2011年5月25日 11:57
收件人: user@hbase.apache.org
主题: Re: a question storefileIndexSize

2011/5/24 Gaojinchao <ga...@huawei.com>:
> Stack, Thanks for your reply.
> block size is default.
> My Key length is 26 bytes and value is 300~400 bytes.
> Is it big keys and small values ?
>

Looks like you have 'small' keys.

It looks like the index is about 1MB per storefile (storefiles=3103,
storefileIndexSize=3717).  Does this seem about right?  What size are
your regions?

St.Ack

Re: a question storefileIndexSize

Posted by Stack <st...@duboce.net>.
Oh, I forgot about this suggestion:
http://hbase.apache.org/book.html#keysize  I mention it because it
cites a study done by Marc Limotte where he had a similar relatively
big storefile index and he dug in.  You might be interested in how he
did his research.
St.Ack

On Tue, May 24, 2011 at 8:57 PM, Stack <st...@duboce.net> wrote:
> 2011/5/24 Gaojinchao <ga...@huawei.com>:
>> Stack, Thanks for your reply.
>> block size is default.
>> My Key length is 26 bytes and value is 300~400 bytes.
>> Is it big keys and small values ?
>>
>
> Looks like you have 'small' keys.
>
> It looks like the index is about 1MB per storefile (storefiles=3103,
> storefileIndexSize=3717).  Does this seem about right?  What size are
> your regions?
>
> St.Ack
>

Re: a question storefileIndexSize

Posted by Stack <st...@duboce.net>.
2011/5/24 Gaojinchao <ga...@huawei.com>:
> Stack, Thanks for your reply.
> block size is default.
> My Key length is 26 bytes and value is 300~400 bytes.
> Is it big keys and small values ?
>

Looks like you have 'small' keys.

It looks like the index is about 1MB per storefile (storefiles=3103,
storefileIndexSize=3717).  Does this seem about right?  What size are
your regions?

St.Ack

Re: a question storefileIndexSize

Posted by Gaojinchao <ga...@huawei.com>.
Stack, Thanks for your reply.
block size is default.
My Key length is 26 bytes and value is 300~400 bytes.
Is it big keys and small values ?


-----邮件原件-----
发件人: saint.ack@gmail.com [mailto:saint.ack@gmail.com] 代表 Stack
发送时间: 2011年5月25日 1:01
收件人: user@hbase.apache.org
主题: Re: a question storefileIndexSize

What Ted says or you could change the hfile block size; currently its
64k.  Make it bigger?  Do you have big keys and small values?   If so,
can you make do with smaller keys?  That would help with index size
too.

St.Ack

On Tue, May 24, 2011 at 5:29 AM, Gaojinchao <ga...@huawei.com> wrote:
> My observation is that storefileIndexSize is large.
> Is there a way to reduce it ?
>
> Region server metric:
> requests=11447, regions=10394, stores=10394, storefiles=3103, storefileIndexSize=3717,
> memstoreSize=1002, compactionQueueSize=1234, flushQueueSize=0, usedHeap=6916,
> maxHeap=8165, blockCacheSize=1394662632, blockCacheFree=317661976, blockCacheCount=53394,
> blockCacheHitCount=16229024, blockCacheMissCount=91803814, blockCacheEvictedCount=22381853,
> blockCacheHitRatio=15, blockCacheHitCachingRatio=41
>

Re: a question storefileIndexSize

Posted by Stack <st...@duboce.net>.
What Ted says or you could change the hfile block size; currently its
64k.  Make it bigger?  Do you have big keys and small values?   If so,
can you make do with smaller keys?  That would help with index size
too.

St.Ack

On Tue, May 24, 2011 at 5:29 AM, Gaojinchao <ga...@huawei.com> wrote:
> My observation is that storefileIndexSize is large.
> Is there a way to reduce it ?
>
> Region server metric:
> requests=11447, regions=10394, stores=10394, storefiles=3103, storefileIndexSize=3717,
> memstoreSize=1002, compactionQueueSize=1234, flushQueueSize=0, usedHeap=6916,
> maxHeap=8165, blockCacheSize=1394662632, blockCacheFree=317661976, blockCacheCount=53394,
> blockCacheHitCount=16229024, blockCacheMissCount=91803814, blockCacheEvictedCount=22381853,
> blockCacheHitRatio=15, blockCacheHitCachingRatio=41
>