You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Stack <st...@duboce.net> on 2014/04/05 06:22:08 UTC

blockcache 101

Nick:

+ You measure 99th percentile.  Did you take measure of average/mean
response times doing your blockcache comparison?  (Our LarsHofhansl had it
that that on average reads out of bucket cache were a good bit slower).  Or
is this a TODO?
+ We should just remove slabcache because bucket cache is consistently
better and why have two means of doing same thing?  Or, do you need more
proof bucketcache subsumes slabcache?

Thanks boss,
St.Ack

Re: blockcache 101

Posted by Nick Dimiduk <nd...@gmail.com>.
Nice work!

On Friday, August 8, 2014, Stack <st...@duboce.net> wrote:

> Here is a follow up to Nick's blockcache 101 that compares a number of
> deploys x loadings and makes recommendation:
> https://blogs.apache.org/hbase/
> St.Ack
>
>
> On Fri, Apr 4, 2014 at 9:22 PM, Stack <stack@duboce.net <javascript:;>>
> wrote:
>
> > Nick:
> >
> > + You measure 99th percentile.  Did you take measure of average/mean
> > response times doing your blockcache comparison?  (Our LarsHofhansl had
> it
> > that that on average reads out of bucket cache were a good bit slower).
>  Or
> > is this a TODO?
> > + We should just remove slabcache because bucket cache is consistently
> > better and why have two means of doing same thing?  Or, do you need more
> > proof bucketcache subsumes slabcache?
> >
> > Thanks boss,
> > St.Ack
> >
> >
>

Re: blockcache 101

Posted by Stack <st...@duboce.net>.
Here is a follow up to Nick's blockcache 101 that compares a number of
deploys x loadings and makes recommendation: https://blogs.apache.org/hbase/
St.Ack


On Fri, Apr 4, 2014 at 9:22 PM, Stack <st...@duboce.net> wrote:

> Nick:
>
> + You measure 99th percentile.  Did you take measure of average/mean
> response times doing your blockcache comparison?  (Our LarsHofhansl had it
> that that on average reads out of bucket cache were a good bit slower).  Or
> is this a TODO?
> + We should just remove slabcache because bucket cache is consistently
> better and why have two means of doing same thing?  Or, do you need more
> proof bucketcache subsumes slabcache?
>
> Thanks boss,
> St.Ack
>
>

Re: blockcache 101

Posted by Nick Dimiduk <nd...@gmail.com>.
On Mon, Apr 14, 2014 at 10:12 PM, Todd Lipcon <to...@cloudera.com> wrote:

>
> Hmm... in "v2.pdf" here you're looking at different ratios of DB size
> to cache size, but there's also the secondary cache on the system (the
> OS block cache), right?


Yes, this is true.

So when you say only 20GB "memory under management", in fact you're still
> probably getting 100% hit rate on the case where the DB is bigger than RAM,
> right?
>

I can speculate, likely that's true, but I don't know this for certain. At
the moment, the only points of instrumentation in the harness are in the
HBase client. The next steps include pushing down into the RS, DN and
further to then is to the OS itself.

Maybe would be better to have each graph show the different cache
> implementations overlaid, rather than the different ratios overlaid? That
> would better differentiate the scaling behavior of the implementations vs
> each other.


I did experiment with that initially. I found the graphs became dense and
unreadable. I need to spend more time studying Tufti to present all these
data points in a single figure. The data is all included, so please, by all
means have a crack at it. Maybe you'll see something I didn't.

 As you've got it, the results seem somewhat obvious ("as the hit ratio
> gets worse, it gets slower").
>

Yes, that's true. Of interest in this particular experiment was the
relative performance of different caches under identical workloads.

Re: blockcache 101

Posted by Todd Lipcon <to...@cloudera.com>.
On Wed, Apr 9, 2014 at 10:24 AM, Nick Dimiduk <nd...@gmail.com> wrote:

> the trend lines drawn on the graphs seem to be based on some assumption
> > that there is an exponential scaling pattern.
>
>
> Which charts are you specifically referring to? Indeed, the trend lines
> were generated rather casually with Excel and may be misleading. Perhaps a
> more responsible representation would be to simply connect each data point
> with a line to aid visibility.
>

Was referring to these graphs:
http://www.n10k.com/assets/perfeval_blockcache_v2.pdf

And yep, I think straight lines between the points (or just the points
themselves) might be more accurate.


>
> In practice I would think it would be sigmoid [...] As soon as it starts to
> > be larger than the cache capacity [...] as the dataset gets larger, the
> > latency will level out as a flat line, not continue to grow as your trend
> > lines are showing.
>
>
> When decoupling cache size from database size, you're presumably correct. I
> believe that's what's shown in the figures in perfeval_blockcache_v1.pdf,
> especially as total memory increases. The plateau effect is suggested in
> the 20G and 50G charts in that book. This is why I included the second set
> of charts in perfeval_blockcache_v2.pdf. The intention is to couple the
> cache size to dataset size and demonstrate how an implementation performs
> as the absolute values increase. That is, assuming hit,eviction rate remain
> roughly constant, how well does an implementation "scale up" to a larger
> memory footprint.
>

Hmm... in "v2.pdf" here you're looking at different ratios of DB size to
cache size, but there's also the secondary cache on the system (the OS
block cache), right? So when you say only 20GB "memory under management",
in fact you're still probably getting 100% hit rate on the case where the
DB is bigger than RAM, right?

I guess I just find the graphs a little hard to understand what they're
trying to demonstrate. Maybe would be better to have each graph show the
different cache implementations overlaid, rather than the different ratios
overlaid? That would better differentiate the scaling behavior of the
implementations vs each other. As you've got it, the results seem somewhat
obvious ("as the hit ratio gets worse, it gets slower").



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: blockcache 101

Posted by Nick Dimiduk <nd...@gmail.com>.
Stack:

Did you take measure of average/mean response times doing your blockcache
> comparison?


Yes, in total I also collected mean 50%, 95%, 99%, and 99.9% latency
values. I only performed the analysis over the 99% in the post. I looked
briefly also at the 99.9% but that wasn't immediately relevant to the
context of the experiment. All of these data are included in the "raw
results" csv I uploaded and linked from the "Showdown" post.

do you need more proof bucketcache subsumes slabcache?


I'd like more vetting, yes. As you alluded to in the previous question, a
more holistic view of response times would be good, and also I'd like to
see how they perform with a mixed workload. Next step is probably to
exercise them with some YSCB workloads of varying RAM:DB ratios.

Todd:

the trend lines drawn on the graphs seem to be based on some assumption
> that there is an exponential scaling pattern.


Which charts are you specifically referring to? Indeed, the trend lines
were generated rather casually with Excel and may be misleading. Perhaps a
more responsible representation would be to simply connect each data point
with a line to aid visibility.

In practice I would think it would be sigmoid [...] As soon as it starts to
> be larger than the cache capacity [...] as the dataset gets larger, the
> latency will level out as a flat line, not continue to grow as your trend
> lines are showing.


When decoupling cache size from database size, you're presumably correct. I
believe that's what's shown in the figures in perfeval_blockcache_v1.pdf,
especially as total memory increases. The plateau effect is suggested in
the 20G and 50G charts in that book. This is why I included the second set
of charts in perfeval_blockcache_v2.pdf. The intention is to couple the
cache size to dataset size and demonstrate how an implementation performs
as the absolute values increase. That is, assuming hit,eviction rate remain
roughly constant, how well does an implementation "scale up" to a larger
memory footprint.

-n

Re: blockcache 101

Posted by Todd Lipcon <to...@cloudera.com>.
Another quick question: the trend lines drawn on the graphs seem to be
based on some assumption that there is an exponential scaling pattern. In
practice I would think it would be sigmoid -- while the dataset size is
smaller than cache capacity, changing the dataset size should have little
to no effect on the latency (since you'd get 100% hit rate). As soon as it
starts to be larger than the cache capacity, you'd expect the hit rate to
be on average equal to (size of cache / size of data). The average latency,
then, should be just about equal to the cache miss latency multiplied by
the cache miss ratio. That is to say, as the dataset gets larger, the
latency will level out as a flat line, not continue to grow as your trend
lines are showing.

-Todd


On Fri, Apr 4, 2014 at 9:40 PM, Stack <st...@duboce.net> wrote:

> Pardon, my questions are around Nick's blog on blockcache in case folks are
> confused: http://www.n10k.com/blog/blockcache-101/
> St.Ack
>
>
> On Fri, Apr 4, 2014 at 9:22 PM, Stack <st...@duboce.net> wrote:
>
> > Nick:
> >
> > + You measure 99th percentile.  Did you take measure of average/mean
> > response times doing your blockcache comparison?  (Our LarsHofhansl had
> it
> > that that on average reads out of bucket cache were a good bit slower).
>  Or
> > is this a TODO?
> > + We should just remove slabcache because bucket cache is consistently
> > better and why have two means of doing same thing?  Or, do you need more
> > proof bucketcache subsumes slabcache?
> >
> > Thanks boss,
> > St.Ack
> >
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: blockcache 101

Posted by Stack <st...@duboce.net>.
Pardon, my questions are around Nick's blog on blockcache in case folks are
confused: http://www.n10k.com/blog/blockcache-101/
St.Ack


On Fri, Apr 4, 2014 at 9:22 PM, Stack <st...@duboce.net> wrote:

> Nick:
>
> + You measure 99th percentile.  Did you take measure of average/mean
> response times doing your blockcache comparison?  (Our LarsHofhansl had it
> that that on average reads out of bucket cache were a good bit slower).  Or
> is this a TODO?
> + We should just remove slabcache because bucket cache is consistently
> better and why have two means of doing same thing?  Or, do you need more
> proof bucketcache subsumes slabcache?
>
> Thanks boss,
> St.Ack
>
>