You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Jean-Daniel Cryans <jd...@apache.org> on 2013/04/25 03:01:41 UTC

Unscientific comparison of fully-cached zipfian reading

Hey guys,

I did a little benchmarking to see what kind of numbers we get from the
block cache and the OS cache. Please see:

https://docs.google.com/spreadsheet/pub?key=0Ao87IrzZJSaydENaem5USWg4TlRKcHl0dEtTS2NBOUE&output=html

Hopefully it gives you some ballpark numbers for further discussion.

J-D

Re: Unscientific comparison of fully-cached zipfian reading

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Hey guys,

I published a few more numbers after talking to Stack, Elliott, and Todd
(thanks guys). It's the same link BTW.

First, it's interesting that the block cache is slower than direct access
without CRC to the OS buffer. One thing is that the latter still stores the
meta blocks in the BC, so you're still hitting it. So I ran a "pure" OS
buffer + SRC + no CRC test, so the BC is completely disabled, and, well,
turns out that it's slower than the pure BC test. Interesting!

Second, we thought that the BC might scale badly with a lot of blocks so I
tried swapping our concurrent hash map with Cliff Click's drop-in
replacement. Turns out that it is slower than the java CHM at that scale (8
threads hitting 9 machines). I did also a test with 80 threads but it was
still slower (4.2ms for Cliff Click VS 3.8ms for Java).

Third, I ran a test with checksumming inside HBase with OS buffer + SCR and
disabled HDFS checksumming. Keep in mind that HBase uses PureCRC32 whereas
HDFS will use faster native SSE4 calls. The result is that it was about
300us faster to checksum in HBase even if the checksumming itself is
slower. Less OS calls means much greater speed?

It seems to me that people running in production with a Hadoop version that
has PureCRC32 (Hadoop 1.1.x, 2.0) will benefit from  using HBase checksums.

We also agreed that all those numbers can be improved. HBase could use the
native checksumming for example. The block cache could also be profiled.

Anyone interested in the above might want to do micro benchmarks instead of
the macro testing I did to understand what exactly needs improving.

Cheers,

J-D

On Wed, Apr 24, 2013 at 6:01 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> Hey guys,
>
> I did a little benchmarking to see what kind of numbers we get from the
> block cache and the OS cache. Please see:
>
>
> https://docs.google.com/spreadsheet/pub?key=0Ao87IrzZJSaydENaem5USWg4TlRKcHl0dEtTS2NBOUE&output=html
>
> Hopefully it gives you some ballpark numbers for further discussion.
>
> J-D
>

Re: Unscientific comparison of fully-cached zipfian reading

Posted by Jean-Daniel Cryans <jd...@apache.org>.

On Wed, May 22, 2013 at 10:36 AM, lars hofhansl <la...@apache.org> wrote:
> The 2000-4000 was just glancing at the HMaster page every now and then.

Ah sorry to have focused on that.

> The main point I was trying to make is that the only difference is the
> number of block cache misses (which is low in the SequentialRead case and
> very high in the RandomRead case), and the number of cache misses is almost
> the same as the number of a requests.

Re-reading your email, it seems we tested different things. In my
case, whatever cache I was hitting was the only one I was planning to
hit. If I was reading from the OS cache, I was disabling the block
cache.

> (The cache misses are traced via OpenTSDB).
>
> I'll repeat my test with a single region server only. Was your test in a
> cluster or with a single region server?

The whole setup is described in the document.

J-D

Re: Unscientific comparison of fully-cached zipfian reading

Posted by lars hofhansl <la...@apache.org>.

The 2000-4000 was just glancing at the HMaster page every now and then.
The main point I was trying to make is that the only difference is the number of block cache misses (which is low in the SequentialRead case and very high in the RandomRead case), and the number of cache misses is almost the same as the number of a requests.

(The cache misses are traced via OpenTSDB).

I'll repeat my test with a single region server only. Was your test in a cluster or with a single region server?


-- Lars



________________________________
 From: Jean-Daniel Cryans <jd...@apache.org>
To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <la...@apache.org> 
Sent: Wednesday, May 22, 2013 10:23 AM
Subject: Re: Unscientific comparison of fully-cached zipfian reading
 

On Tue, May 21, 2013 at 6:08 PM, lars hofhansl <la...@apache.org> wrote:
> I just did a similar test using PE on a test cluster (16 DNs/RSs, 158 mappers).
> I set it up such that the data does not fit into the aggregate block cache, but does fit into the aggregate OS buffer cache, in my case that turned out to be 100m 1k rows.
> Now I ran the SequentialRead and RandomRead tests.
>
> In both cases I see no disk activity (since the data fits into the OS cache). The SequentialRead run finishes in about 7mins, whereas the RandomRead run takes over 34mins.
> This is with CDH4.2.1 and HBase 0.94.7 compiled against it and with SCR enabled.
>
> The only difference is that in the SequentialRead case it is likely that the next Get can still use the previously cached block, whereas in the RandomRead read almost every Get need to fetch a block from the OS cache (as verified by the cache miss rate, which is roughly the same as the request count per RegionServer). Except for enabling SCR all other settings are close to the defaults.
>
> I see 2000-4000 req/s/regionserver and the same number of cache missed per second and RegionServer in the RandomRead, meaning each RegionServer brought in about 125-200mb/s from the OS cache, which seems a tad low.

That's a lot of variance. In my test the latencies I wrote there were
stable around those numbers. So we have a different way of measuring?

>
>
> So this would imply that reading from the OS cache is almost 5x slower than reading from the block cache. It would be interesting to explore the discrepancy.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Jean-Daniel Cryans <jd...@apache.org>
> To: "dev@hbase.apache.org" <de...@hbase.apache.org>
> Sent: Wednesday, April 24, 2013 6:01 PM
> Subject: Unscientific comparison of fully-cached zipfian reading
>
>
> Hey guys,
>
> I did a little benchmarking to see what kind of numbers we get from the
> block cache and the OS cache. Please see:
>
> https://docs.google.com/spreadsheet/pub?key=0Ao87IrzZJSaydENaem5USWg4TlRKcHl0dEtTS2NBOUE&output=html
>
> Hopefully it gives you some ballpark numbers for further discussion.
>
> J-D

Re: Unscientific comparison of fully-cached zipfian reading

Posted by Jean-Daniel Cryans <jd...@apache.org>.

On Tue, May 21, 2013 at 6:08 PM, lars hofhansl <la...@apache.org> wrote:
> I just did a similar test using PE on a test cluster (16 DNs/RSs, 158 mappers).
> I set it up such that the data does not fit into the aggregate block cache, but does fit into the aggregate OS buffer cache, in my case that turned out to be 100m 1k rows.
> Now I ran the SequentialRead and RandomRead tests.
>
> In both cases I see no disk activity (since the data fits into the OS cache). The SequentialRead run finishes in about 7mins, whereas the RandomRead run takes over 34mins.
> This is with CDH4.2.1 and HBase 0.94.7 compiled against it and with SCR enabled.
>
> The only difference is that in the SequentialRead case it is likely that the next Get can still use the previously cached block, whereas in the RandomRead read almost every Get need to fetch a block from the OS cache (as verified by the cache miss rate, which is roughly the same as the request count per RegionServer). Except for enabling SCR all other settings are close to the defaults.
>
> I see 2000-4000 req/s/regionserver and the same number of cache missed per second and RegionServer in the RandomRead, meaning each RegionServer brought in about 125-200mb/s from the OS cache, which seems a tad low.

That's a lot of variance. In my test the latencies I wrote there were
stable around those numbers. So we have a different way of measuring?

>
>
> So this would imply that reading from the OS cache is almost 5x slower than reading from the block cache. It would be interesting to explore the discrepancy.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Jean-Daniel Cryans <jd...@apache.org>
> To: "dev@hbase.apache.org" <de...@hbase.apache.org>
> Sent: Wednesday, April 24, 2013 6:01 PM
> Subject: Unscientific comparison of fully-cached zipfian reading
>
>
> Hey guys,
>
> I did a little benchmarking to see what kind of numbers we get from the
> block cache and the OS cache. Please see:
>
> https://docs.google.com/spreadsheet/pub?key=0Ao87IrzZJSaydENaem5USWg4TlRKcHl0dEtTS2NBOUE&output=html
>
> Hopefully it gives you some ballpark numbers for further discussion.
>
> J-D

Re: Unscientific comparison of fully-cached zipfian reading

Posted by lars hofhansl <la...@apache.org>.

I just did a similar test using PE on a test cluster (16 DNs/RSs, 158 mappers).
I set it up such that the data does not fit into the aggregate block cache, but does fit into the aggregate OS buffer cache, in my case that turned out to be 100m 1k rows.
Now I ran the SequentialRead and RandomRead tests.

In both cases I see no disk activity (since the data fits into the OS cache). The SequentialRead run finishes in about 7mins, whereas the RandomRead run takes over 34mins.
This is with CDH4.2.1 and HBase 0.94.7 compiled against it and with SCR enabled.

The only difference is that in the SequentialRead case it is likely that the next Get can still use the previously cached block, whereas in the RandomRead read almost every Get need to fetch a block from the OS cache (as verified by the cache miss rate, which is roughly the same as the request count per RegionServer). Except for enabling SCR all other settings are close to the defaults.

I see 2000-4000 req/s/regionserver and the same number of cache missed per second and RegionServer in the RandomRead, meaning each RegionServer brought in about 125-200mb/s from the OS cache, which seems a tad low.


So this would imply that reading from the OS cache is almost 5x slower than reading from the block cache. It would be interesting to explore the discrepancy.


-- Lars



________________________________
 From: Jean-Daniel Cryans <jd...@apache.org>
To: "dev@hbase.apache.org" <de...@hbase.apache.org> 
Sent: Wednesday, April 24, 2013 6:01 PM
Subject: Unscientific comparison of fully-cached zipfian reading
 

Hey guys,

I did a little benchmarking to see what kind of numbers we get from the
block cache and the OS cache. Please see:

https://docs.google.com/spreadsheet/pub?key=0Ao87IrzZJSaydENaem5USWg4TlRKcHl0dEtTS2NBOUE&output=html

Hopefully it gives you some ballpark numbers for further discussion.

J-D