You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by ChingShen <ch...@gmail.com> on 2010/10/21 05:19:17 UTC

The hfile.block.cache.size = 0 performance is better than default(0.2) in random read? Is it possible?

Hi all,

  I run my performance testing in random read, but I got the
hfile.block.cache.size = 0 performance is better than default, Is it
possible?

 My cluster (4 nodes):
 Hadoop 0.20.2, HBase 0.20.6
 1 * namenode & hmaster & zookeeper
 3 * datanode & regionserver

 P.S. Replication factor = 3, HBase heap size is 3500mb

There are 10 millions records in my testing table, and per record
approximately 1kb.

* The hfile.block.cache.size = 0*:
==============================================
java benchmark.HReadR *10000* 1
initial cost: 297 ms.
read: 198358 ms
read per: 19.8358 ms
*read thput: 50.4139 ops/sec*
==============================================
java benchmark.HReadR *100000* 1
initial cost: 285 ms.
read: 772474 ms
read per: 7.72474 ms
*read thput: 129.4542 ops/sec*
==============================================
java benchmark.HReadR *10000* 1
initial cost: 291 ms.
read: 43939 ms
read per: 4.3939 ms
*read thput: 227.58826 ops/sec*
==============================================
java benchmark.HReadR *100000* 1
initial cost: 292 ms.
read: 296763 ms
read per: 2.96763 ms
*read thput: 336.96924 ops/sec*
==============================================


* The hfile.block.cache.size = 0.2 (default)*:
==============================================
java benchmark.HReadR *10000* 1
initial cost: 282 ms.
read: 157538 ms
read per: 15.7538 ms
read thput: *63.47675* ops/sec
==============================================
java benchmark.HReadR *100000* 1
initial cost: 292 ms.
read: 983083 ms
read per: 9.83083 ms
read thput: *101.72081* ops/sec
==============================================
java benchmark.HReadR *10000* 1
initial cost: 286 ms.
read: 83260 ms
read per: 8.326 ms
read thput: *120.10569* ops/sec
==============================================
java benchmark.HReadR *100000* 1
initial cost: 288 ms.
read: 839874 ms
read per: 8.39874 ms
read thput: *119.065475* ops/sec
==============================================


Shen

RE: The hfile.block.cache.size = 0 performance is better than default(0.2) in random read? Is it possible?

Posted by Jonathan Gray <jg...@facebook.com>.
By using the block cache, read blocks are referenced within the block cache data structures and referenced for a longer amount of time than if not put into the block cache.

This will definitely add additional stress to the GC.

If you expect a very low hit ratio, it can be advantageous to not use the block cache.

You can also turn the block cache off on a per-query basis with setCacheBlocks(), though that's only supported on Scan right now.  Still makes sense for Gets so we should add it there too.

JG

> -----Original Message-----
> From: ChingShen [mailto:chingshenchen@gmail.com]
> Sent: Thursday, October 21, 2010 4:27 AM
> To: user@hbase.apache.org
> Subject: Re: The hfile.block.cache.size = 0 performance is better than
> default(0.2) in random read? Is it possible?
> 
> Hi Ryan,
> 
> *hfile.block.cache.size = 0, GC log:*
> 2010-10-21T15:53:27.486+0800: 1428.317: [GC [PSYoungGen:
> 18270K->320K(17728K)] 62008K->44066K(61696K), *0.0043520 secs*] [Times:
> user=0.00 sys=0.00, real=0.00 secs]
> 2010-10-21T15:53:27.933+0800: 1428.764: [GC [PSYoungGen:
> 17641K->256K(17024K)] 61386K->44025K(60992K), *0.0036030 secs*] [Times:
> user=0.00 sys=0.00, real=0.00 secs]
> 2010-10-21T15:53:28.380+0800: 1429.212: [GC [PSYoungGen:
> 17024K->288K(19648K)] 60793K->44130K(63616K), *0.0044410 secs*] [Times:
> user=0.00 sys=0.00, real=0.01 secs]
> 2010-10-21T15:53:28.385+0800: 1429.216: [Full GC [PSYoungGen:
> 288K->0K(19648K)] [PSOldGen: 43841K->32920K(41536K)] 44130K-
> >32920K(61184K)
> [PSPermGen: 15684K->15683K(24640K)], *0.0480800 secs*] [Times:
> user=0.05
> sys=0.00, real=0.05 secs]
> 
> *hfile.block.cache.size = 0.2, GC log:*
> 2010-10-21T16:18:31.884+0800: 1234.166: [GC [PSYoungGen:
> 469577K->182750K(534208K)] 1183254K->1013795K(1663424K), *0.1265180
> secs*]
> [Times: user=0.49 sys=0.00, real=0.13 secs]
> 2010-10-21T16:18:56.837+0800: 1259.119: [GC [PSYoungGen:
> 460382K->179115K(451392K)] 1291427K->1116923K(1580608K), *0.1231190
> secs*]
> [Times: user=0.48 sys=0.00, real=0.13 secs]
> 2010-10-21T16:19:20.121+0800: 1282.403: [GC [PSYoungGen:
> 451371K->175649K(510016K)] 1389179K->1206321K(1639232K), *0.1153410
> secs*]
> [Times: user=0.31 sys=0.01, real=0.11 secs]
> 2010-10-21T16:19:20.236+0800: 1282.518: [Full GC [PSYoungGen:
> 175649K->0K(510016K)] [PSOldGen: 1030672K->582437K(1179200K)]
> 1206321K->582437K(1689216K) [PSPermGen: 16041K->16041K(21248K)],
> *0.2538730
> secs*] [Times: user=0.26 sys=0.00, real=0.26 secs]
> 
> hfile.block.cache.size = 0:
> avg. per minor gc ~ *4ms*
> avg. per full gc ~ *50ms*
> 
> hfile.block.cache.size = 0.2:
> avg. per minor gc ~* 120ms*
> avg. per full gc ~ *250ms *
> 
> Does it mean that because I got a low hit ratio in random read, so the
> LruBlockCache object creates too many CachedBlock objects?
> 
> Thanks.
> 
> Shen
> 
> On Thu, Oct 21, 2010 at 2:20 PM, Ryan Rawson <ry...@gmail.com>
> wrote:
> 
> > block cache invalidation is done async and isnt done inline, so that
> > shouldn't be an issue.
> >
> > could be related to GC... if you could produce and compare GC logs
> > that'd be helpful.
> >
> >

Re: The hfile.block.cache.size = 0 performance is better than default(0.2) in random read? Is it possible?

Posted by ChingShen <ch...@gmail.com>.
Hi Ryan,

*hfile.block.cache.size = 0, GC log:*
2010-10-21T15:53:27.486+0800: 1428.317: [GC [PSYoungGen:
18270K->320K(17728K)] 62008K->44066K(61696K), *0.0043520 secs*] [Times:
user=0.00 sys=0.00, real=0.00 secs]
2010-10-21T15:53:27.933+0800: 1428.764: [GC [PSYoungGen:
17641K->256K(17024K)] 61386K->44025K(60992K), *0.0036030 secs*] [Times:
user=0.00 sys=0.00, real=0.00 secs]
2010-10-21T15:53:28.380+0800: 1429.212: [GC [PSYoungGen:
17024K->288K(19648K)] 60793K->44130K(63616K), *0.0044410 secs*] [Times:
user=0.00 sys=0.00, real=0.01 secs]
2010-10-21T15:53:28.385+0800: 1429.216: [Full GC [PSYoungGen:
288K->0K(19648K)] [PSOldGen: 43841K->32920K(41536K)] 44130K->32920K(61184K)
[PSPermGen: 15684K->15683K(24640K)], *0.0480800 secs*] [Times: user=0.05
sys=0.00, real=0.05 secs]

*hfile.block.cache.size = 0.2, GC log:*
2010-10-21T16:18:31.884+0800: 1234.166: [GC [PSYoungGen:
469577K->182750K(534208K)] 1183254K->1013795K(1663424K), *0.1265180 secs*]
[Times: user=0.49 sys=0.00, real=0.13 secs]
2010-10-21T16:18:56.837+0800: 1259.119: [GC [PSYoungGen:
460382K->179115K(451392K)] 1291427K->1116923K(1580608K), *0.1231190 secs*]
[Times: user=0.48 sys=0.00, real=0.13 secs]
2010-10-21T16:19:20.121+0800: 1282.403: [GC [PSYoungGen:
451371K->175649K(510016K)] 1389179K->1206321K(1639232K), *0.1153410 secs*]
[Times: user=0.31 sys=0.01, real=0.11 secs]
2010-10-21T16:19:20.236+0800: 1282.518: [Full GC [PSYoungGen:
175649K->0K(510016K)] [PSOldGen: 1030672K->582437K(1179200K)]
1206321K->582437K(1689216K) [PSPermGen: 16041K->16041K(21248K)], *0.2538730
secs*] [Times: user=0.26 sys=0.00, real=0.26 secs]

hfile.block.cache.size = 0:
avg. per minor gc ~ *4ms*
avg. per full gc ~ *50ms*

hfile.block.cache.size = 0.2:
avg. per minor gc ~* 120ms*
avg. per full gc ~ *250ms *

Does it mean that because I got a low hit ratio in random read, so the
LruBlockCache object creates too many CachedBlock objects?

Thanks.

Shen

On Thu, Oct 21, 2010 at 2:20 PM, Ryan Rawson <ry...@gmail.com> wrote:

> block cache invalidation is done async and isnt done inline, so that
> shouldn't be an issue.
>
> could be related to GC... if you could produce and compare GC logs
> that'd be helpful.
>
>

Re: The hfile.block.cache.size = 0 performance is better than default(0.2) in random read? Is it possible?

Posted by Ryan Rawson <ry...@gmail.com>.
block cache invalidation is done async and isnt done inline, so that
shouldn't be an issue.

could be related to GC... if you could produce and compare GC logs
that'd be helpful.

On Wed, Oct 20, 2010 at 11:18 PM, ChingShen <ch...@gmail.com> wrote:
> Yes, only 14.7% :-(
>
> 2010-10-21 10:59:06,675 DEBUG
> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
> Total=572.39557MB (600200272), Free=125.94194MB (132059696), Max=698.3375MB
> (732259968), Counts: Blocks=8992, Access=235641, Hit=34786, Miss=200855,
> Evictions=173, Evicted=191863, Ratios: *Hit  Ratio=14.762286841869354**%*,
> Miss Ratio=85.23771166801453%, Evicted/Run=1109.03466796875
>
> I agree much better performance with more cache, because that can improve
> the hit ratio in random read, but I want to know why I disabled block cache
> and got better throughput than default?
>
> 2010/10/21 Antonio Alvarado Hernández <aa...@gmail.com>
>
>> Hi all,
>> Could be related with a low hit ratio?
>> -aah
>>
>>
>

Re: The hfile.block.cache.size = 0 performance is better than default(0.2) in random read? Is it possible?

Posted by ChingShen <ch...@gmail.com>.
Yes, only 14.7% :-(

2010-10-21 10:59:06,675 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
Total=572.39557MB (600200272), Free=125.94194MB (132059696), Max=698.3375MB
(732259968), Counts: Blocks=8992, Access=235641, Hit=34786, Miss=200855,
Evictions=173, Evicted=191863, Ratios: *Hit  Ratio=14.762286841869354**%*,
Miss Ratio=85.23771166801453%, Evicted/Run=1109.03466796875

I agree much better performance with more cache, because that can improve
the hit ratio in random read, but I want to know why I disabled block cache
and got better throughput than default?

2010/10/21 Antonio Alvarado Hernández <aa...@gmail.com>

> Hi all,
> Could be related with a low hit ratio?
> -aah
>
>

Re: The hfile.block.cache.size = 0 performance is better than default(0.2) in random read? Is it possible?

Posted by Antonio Alvarado Hernández <aa...@gmail.com>.
Hi all,
Could be related with a low hit ratio?
-aah

2010/10/21, Tao Xie <xi...@gmail.com>:
> BTW, Ryan, can you share some configurations tips of running YCSB to get
> better random read performance?
> Or can you provide some YCSB test results? In my experiments, I get 40~50k/s
> insert throughput but only ~2k/s read throughput.
> I wonder if there is something wrong with my configuration.
>
> Thanks in advance.
>
> 2010/10/21 Ryan Rawson <ry...@gmail.com>
>
>> Our own systems show much better performance with more cache, not sure
>> why your test is weird.  Maybe you could try to reproduce your results
>> under YCSB, then we might have a chance of running your benchmark.
>>
>>
>> On Wed, Oct 20, 2010 at 8:19 PM, ChingShen <ch...@gmail.com>
>> wrote:
>> > Hi all,
>> >
>> >  I run my performance testing in random read, but I got the
>> > hfile.block.cache.size = 0 performance is better than default, Is it
>> > possible?
>> >
>> >  My cluster (4 nodes):
>> >  Hadoop 0.20.2, HBase 0.20.6
>> >  1 * namenode & hmaster & zookeeper
>> >  3 * datanode & regionserver
>> >
>> >  P.S. Replication factor = 3, HBase heap size is 3500mb
>> >
>> > There are 10 millions records in my testing table, and per record
>> > approximately 1kb.
>> >
>> > * The hfile.block.cache.size = 0*:
>> > ==============================================
>> > java benchmark.HReadR *10000* 1
>> > initial cost: 297 ms.
>> > read: 198358 ms
>> > read per: 19.8358 ms
>> > *read thput: 50.4139 ops/sec*
>> > ==============================================
>> > java benchmark.HReadR *100000* 1
>> > initial cost: 285 ms.
>> > read: 772474 ms
>> > read per: 7.72474 ms
>> > *read thput: 129.4542 ops/sec*
>> > ==============================================
>> > java benchmark.HReadR *10000* 1
>> > initial cost: 291 ms.
>> > read: 43939 ms
>> > read per: 4.3939 ms
>> > *read thput: 227.58826 ops/sec*
>> > ==============================================
>> > java benchmark.HReadR *100000* 1
>> > initial cost: 292 ms.
>> > read: 296763 ms
>> > read per: 2.96763 ms
>> > *read thput: 336.96924 ops/sec*
>> > ==============================================
>> >
>> >
>> > * The hfile.block.cache.size = 0.2 (default)*:
>> > ==============================================
>> > java benchmark.HReadR *10000* 1
>> > initial cost: 282 ms.
>> > read: 157538 ms
>> > read per: 15.7538 ms
>> > read thput: *63.47675* ops/sec
>> > ==============================================
>> > java benchmark.HReadR *100000* 1
>> > initial cost: 292 ms.
>> > read: 983083 ms
>> > read per: 9.83083 ms
>> > read thput: *101.72081* ops/sec
>> > ==============================================
>> > java benchmark.HReadR *10000* 1
>> > initial cost: 286 ms.
>> > read: 83260 ms
>> > read per: 8.326 ms
>> > read thput: *120.10569* ops/sec
>> > ==============================================
>> > java benchmark.HReadR *100000* 1
>> > initial cost: 288 ms.
>> > read: 839874 ms
>> > read per: 8.39874 ms
>> > read thput: *119.065475* ops/sec
>> > ==============================================
>> >
>> >
>> > Shen
>> >
>>
>

-- 
Enviado desde mi dispositivo móvil

Re: The hfile.block.cache.size = 0 performance is better than default(0.2) in random read? Is it possible?

Posted by Tao Xie <xi...@gmail.com>.
BTW, Ryan, can you share some configurations tips of running YCSB to get
better random read performance?
Or can you provide some YCSB test results? In my experiments, I get 40~50k/s
insert throughput but only ~2k/s read throughput.
I wonder if there is something wrong with my configuration.

Thanks in advance.

2010/10/21 Ryan Rawson <ry...@gmail.com>

> Our own systems show much better performance with more cache, not sure
> why your test is weird.  Maybe you could try to reproduce your results
> under YCSB, then we might have a chance of running your benchmark.
>
>
> On Wed, Oct 20, 2010 at 8:19 PM, ChingShen <ch...@gmail.com>
> wrote:
> > Hi all,
> >
> >  I run my performance testing in random read, but I got the
> > hfile.block.cache.size = 0 performance is better than default, Is it
> > possible?
> >
> >  My cluster (4 nodes):
> >  Hadoop 0.20.2, HBase 0.20.6
> >  1 * namenode & hmaster & zookeeper
> >  3 * datanode & regionserver
> >
> >  P.S. Replication factor = 3, HBase heap size is 3500mb
> >
> > There are 10 millions records in my testing table, and per record
> > approximately 1kb.
> >
> > * The hfile.block.cache.size = 0*:
> > ==============================================
> > java benchmark.HReadR *10000* 1
> > initial cost: 297 ms.
> > read: 198358 ms
> > read per: 19.8358 ms
> > *read thput: 50.4139 ops/sec*
> > ==============================================
> > java benchmark.HReadR *100000* 1
> > initial cost: 285 ms.
> > read: 772474 ms
> > read per: 7.72474 ms
> > *read thput: 129.4542 ops/sec*
> > ==============================================
> > java benchmark.HReadR *10000* 1
> > initial cost: 291 ms.
> > read: 43939 ms
> > read per: 4.3939 ms
> > *read thput: 227.58826 ops/sec*
> > ==============================================
> > java benchmark.HReadR *100000* 1
> > initial cost: 292 ms.
> > read: 296763 ms
> > read per: 2.96763 ms
> > *read thput: 336.96924 ops/sec*
> > ==============================================
> >
> >
> > * The hfile.block.cache.size = 0.2 (default)*:
> > ==============================================
> > java benchmark.HReadR *10000* 1
> > initial cost: 282 ms.
> > read: 157538 ms
> > read per: 15.7538 ms
> > read thput: *63.47675* ops/sec
> > ==============================================
> > java benchmark.HReadR *100000* 1
> > initial cost: 292 ms.
> > read: 983083 ms
> > read per: 9.83083 ms
> > read thput: *101.72081* ops/sec
> > ==============================================
> > java benchmark.HReadR *10000* 1
> > initial cost: 286 ms.
> > read: 83260 ms
> > read per: 8.326 ms
> > read thput: *120.10569* ops/sec
> > ==============================================
> > java benchmark.HReadR *100000* 1
> > initial cost: 288 ms.
> > read: 839874 ms
> > read per: 8.39874 ms
> > read thput: *119.065475* ops/sec
> > ==============================================
> >
> >
> > Shen
> >
>

Re: The hfile.block.cache.size = 0 performance is better than default(0.2) in random read? Is it possible?

Posted by Tao Xie <xi...@gmail.com>.
I also have similar result with YCSB. I disabled block cache (set to 0) and
got better throughput than default.
In my case my dataset is 160M records and block cache hit ratio is very low,
so frequent cache eviction causes long time pause.


2010/10/21 Ryan Rawson <ry...@gmail.com>

> Our own systems show much better performance with more cache, not sure
> why your test is weird.  Maybe you could try to reproduce your results
> under YCSB, then we might have a chance of running your benchmark.
>
>
> On Wed, Oct 20, 2010 at 8:19 PM, ChingShen <ch...@gmail.com>
> wrote:
> > Hi all,
> >
> >  I run my performance testing in random read, but I got the
> > hfile.block.cache.size = 0 performance is better than default, Is it
> > possible?
> >
> >  My cluster (4 nodes):
> >  Hadoop 0.20.2, HBase 0.20.6
> >  1 * namenode & hmaster & zookeeper
> >  3 * datanode & regionserver
> >
> >  P.S. Replication factor = 3, HBase heap size is 3500mb
> >
> > There are 10 millions records in my testing table, and per record
> > approximately 1kb.
> >
> > * The hfile.block.cache.size = 0*:
> > ==============================================
> > java benchmark.HReadR *10000* 1
> > initial cost: 297 ms.
> > read: 198358 ms
> > read per: 19.8358 ms
> > *read thput: 50.4139 ops/sec*
> > ==============================================
> > java benchmark.HReadR *100000* 1
> > initial cost: 285 ms.
> > read: 772474 ms
> > read per: 7.72474 ms
> > *read thput: 129.4542 ops/sec*
> > ==============================================
> > java benchmark.HReadR *10000* 1
> > initial cost: 291 ms.
> > read: 43939 ms
> > read per: 4.3939 ms
> > *read thput: 227.58826 ops/sec*
> > ==============================================
> > java benchmark.HReadR *100000* 1
> > initial cost: 292 ms.
> > read: 296763 ms
> > read per: 2.96763 ms
> > *read thput: 336.96924 ops/sec*
> > ==============================================
> >
> >
> > * The hfile.block.cache.size = 0.2 (default)*:
> > ==============================================
> > java benchmark.HReadR *10000* 1
> > initial cost: 282 ms.
> > read: 157538 ms
> > read per: 15.7538 ms
> > read thput: *63.47675* ops/sec
> > ==============================================
> > java benchmark.HReadR *100000* 1
> > initial cost: 292 ms.
> > read: 983083 ms
> > read per: 9.83083 ms
> > read thput: *101.72081* ops/sec
> > ==============================================
> > java benchmark.HReadR *10000* 1
> > initial cost: 286 ms.
> > read: 83260 ms
> > read per: 8.326 ms
> > read thput: *120.10569* ops/sec
> > ==============================================
> > java benchmark.HReadR *100000* 1
> > initial cost: 288 ms.
> > read: 839874 ms
> > read per: 8.39874 ms
> > read thput: *119.065475* ops/sec
> > ==============================================
> >
> >
> > Shen
> >
>

Re: The hfile.block.cache.size = 0 performance is better than default(0.2) in random read? Is it possible?

Posted by Ryan Rawson <ry...@gmail.com>.
Our own systems show much better performance with more cache, not sure
why your test is weird.  Maybe you could try to reproduce your results
under YCSB, then we might have a chance of running your benchmark.


On Wed, Oct 20, 2010 at 8:19 PM, ChingShen <ch...@gmail.com> wrote:
> Hi all,
>
>  I run my performance testing in random read, but I got the
> hfile.block.cache.size = 0 performance is better than default, Is it
> possible?
>
>  My cluster (4 nodes):
>  Hadoop 0.20.2, HBase 0.20.6
>  1 * namenode & hmaster & zookeeper
>  3 * datanode & regionserver
>
>  P.S. Replication factor = 3, HBase heap size is 3500mb
>
> There are 10 millions records in my testing table, and per record
> approximately 1kb.
>
> * The hfile.block.cache.size = 0*:
> ==============================================
> java benchmark.HReadR *10000* 1
> initial cost: 297 ms.
> read: 198358 ms
> read per: 19.8358 ms
> *read thput: 50.4139 ops/sec*
> ==============================================
> java benchmark.HReadR *100000* 1
> initial cost: 285 ms.
> read: 772474 ms
> read per: 7.72474 ms
> *read thput: 129.4542 ops/sec*
> ==============================================
> java benchmark.HReadR *10000* 1
> initial cost: 291 ms.
> read: 43939 ms
> read per: 4.3939 ms
> *read thput: 227.58826 ops/sec*
> ==============================================
> java benchmark.HReadR *100000* 1
> initial cost: 292 ms.
> read: 296763 ms
> read per: 2.96763 ms
> *read thput: 336.96924 ops/sec*
> ==============================================
>
>
> * The hfile.block.cache.size = 0.2 (default)*:
> ==============================================
> java benchmark.HReadR *10000* 1
> initial cost: 282 ms.
> read: 157538 ms
> read per: 15.7538 ms
> read thput: *63.47675* ops/sec
> ==============================================
> java benchmark.HReadR *100000* 1
> initial cost: 292 ms.
> read: 983083 ms
> read per: 9.83083 ms
> read thput: *101.72081* ops/sec
> ==============================================
> java benchmark.HReadR *10000* 1
> initial cost: 286 ms.
> read: 83260 ms
> read per: 8.326 ms
> read thput: *120.10569* ops/sec
> ==============================================
> java benchmark.HReadR *100000* 1
> initial cost: 288 ms.
> read: 839874 ms
> read per: 8.39874 ms
> read thput: *119.065475* ops/sec
> ==============================================
>
>
> Shen
>