You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by "Liu, Raymond" <ra...@intel.com> on 2013/06/03 10:30:49 UTC

what's the typical scan latency?

Hi

	If all the data is already in RS blockcache.
	Then what's the typical scan latency for scan a few rows from a say several GB table ( with dozens of regions ) on a small cluster with say 4 RS ?

	A few ms? Tens of ms? Or more?

Best Regards,
Raymond Liu

Re: what's the typical scan latency?

Posted by Amit Mor <am...@gmail.com>.

What's your blockCacheHitCachingRatio ? It would tell you about the ratio
of scans requested from cache (default) to the scans actually served from
the block cache. You can get that from the RS web ui. What you are seeing
can almost map to anything, for example: is scanner caching (client side)
enabled ? if so, how many rows are cached (how many rows returned by the
scanner.next RPC call) ? what's your HFile block size, block cache % of
total RS heap, max number of RPCs per RS for client connections,
tcpnodelay, your network topology and jitter, number of NICs. Are you using
HTableInterface connection pool ? HBase client is synchronous, so how do
achieve concurrency ?  What about your percentiles ? is 5ms the mean ?
median ? is 20ms only in the 99% percentile, etc. etc. etc ... I am far
from considering my self an expert on the general topic of HBase, so take
my tips with a pinch of salt - these are just factors I've considered when
trying to optimize my read latency. Hope that helps.

On Tue, Jun 4, 2013 at 4:02 AM, Liu, Raymond <ra...@intel.com> wrote:

> Thanks Amit
>
> In my envionment, I run a dozens of client to read about 5-20K data per
> scan concurrently, And the average read latency for cached data is around
> 5-20ms.
> So it seems there must be something wrong with my cluster env or
> application. Or did you run that with multiple client?
>
>
> >Depends on so much environment related variables and on data as well.
> >But to give you a number after all:
> >One of our clusters is on EC2, 6 RS, on m1.xlarge machines (network
> performance 'high' according to aws), with 90% of the time we do reads; our
> avg data size is 2K, block cache at 20K, 100 rows per scan avg, bloom
> filters 'on' at the 'ROW' level, 40% of heap dedicated to block cache (note
> that it contains several other bits and pieces) and I would say our average
> latency for cached data (~97% blockCacheHitCachingRatio) is 3-4ms. File
> system access is much much painful, especially on ec2 m1.xlarge where you
> really can't tell what's going on, as far as I can tell. To tell you the
> truth as I see it, this is an abuse (for our use case) of the HBase store
> and for cache like behavior I would recommend going to something like Redis.
>
>
> On Mon, Jun 3, 2013 at 12:13 PM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
> > What is that you are observing now?
> >
> > Regards
> > Ram
> >
> >
> > On Mon, Jun 3, 2013 at 2:00 PM, Liu, Raymond <ra...@intel.com>
> > wrote:
> >
> > > Hi
> > >
> > >         If all the data is already in RS blockcache.
> > >         Then what's the typical scan latency for scan a few rows
> > > from a say several GB table ( with dozens of regions ) on a small
> > > cluster with
> > say
> > > 4 RS ?
> > >
> > >         A few ms? Tens of ms? Or more?
> > >
> > > Best Regards,
> > > Raymond Liu
> > >
> >
>

RE: what's the typical scan latency?

Posted by "Liu, Raymond" <ra...@intel.com>.

Thanks Amit

In my envionment, I run a dozens of client to read about 5-20K data per scan concurrently, And the average read latency for cached data is around 5-20ms.
So it seems there must be something wrong with my cluster env or application. Or did you run that with multiple client?


>Depends on so much environment related variables and on data as well.
>But to give you a number after all:
>One of our clusters is on EC2, 6 RS, on m1.xlarge machines (network performance 'high' according to aws), with 90% of the time we do reads; our avg data size is 2K, block cache at 20K, 100 rows per scan avg, bloom filters 'on' at the 'ROW' level, 40% of heap dedicated to block cache (note that it contains several other bits and pieces) and I would say our average latency for cached data (~97% blockCacheHitCachingRatio) is 3-4ms. File system access is much much painful, especially on ec2 m1.xlarge where you really can't tell what's going on, as far as I can tell. To tell you the truth as I see it, this is an abuse (for our use case) of the HBase store and for cache like behavior I would recommend going to something like Redis.


On Mon, Jun 3, 2013 at 12:13 PM, ramkrishna vasudevan < ramkrishna.s.vasudevan@gmail.com> wrote:

> What is that you are observing now?
>
> Regards
> Ram
>
>
> On Mon, Jun 3, 2013 at 2:00 PM, Liu, Raymond <ra...@intel.com>
> wrote:
>
> > Hi
> >
> >         If all the data is already in RS blockcache.
> >         Then what's the typical scan latency for scan a few rows 
> > from a say several GB table ( with dozens of regions ) on a small 
> > cluster with
> say
> > 4 RS ?
> >
> >         A few ms? Tens of ms? Or more?
> >
> > Best Regards,
> > Raymond Liu
> >
>

Re: what's the typical scan latency?

Posted by Amit Mor <am...@gmail.com>.

Depends on so much environment related variables and on data as well.
But to give you a number after all:
One of our clusters is on EC2, 6 RS, on m1.xlarge machines (network
performance 'high' according to aws), with 90% of the time we do reads; our
avg data size is 2K, block cache at 20K, 100 rows per scan avg, bloom
filters 'on' at the 'ROW' level, 40% of heap dedicated to block cache (note
that it contains several other bits and pieces) and I would say our average
latency for cached data (~97% blockCacheHitCachingRatio) is 3-4ms. File
system access is much much painful, especially on ec2 m1.xlarge where you
really can't tell what's going on, as far as I can tell. To tell you the
truth as I see it, this is an abuse (for our use case) of the HBase store
and for cache like behavior I would recommend going to something like Redis.

On Mon, Jun 3, 2013 at 12:13 PM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> What is that you are observing now?
>
> Regards
> Ram
>
>
> On Mon, Jun 3, 2013 at 2:00 PM, Liu, Raymond <ra...@intel.com>
> wrote:
>
> > Hi
> >
> >         If all the data is already in RS blockcache.
> >         Then what's the typical scan latency for scan a few rows from a
> > say several GB table ( with dozens of regions ) on a small cluster with
> say
> > 4 RS ?
> >
> >         A few ms? Tens of ms? Or more?
> >
> > Best Regards,
> > Raymond Liu
> >
>

Re: what's the typical scan latency?

Posted by Azuryy Yu <az...@gmail.com>.

HBase doesn't know all data are in the block cache. it had to look at
HTable firstly to get "block_id"(tablename + offset), then find it in the
block cache.

so if all data in the block cache, you just avoid to read data from hfile
directly, save some I/O time. but it depends on your data size.
if you always return a large data size, the read performance cannot be
improved.

On Mon, Jun 3, 2013 at 5:13 PM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> What is that you are observing now?
>
> Regards
> Ram
>
>
> On Mon, Jun 3, 2013 at 2:00 PM, Liu, Raymond <ra...@intel.com>
> wrote:
>
> > Hi
> >
> >         If all the data is already in RS blockcache.
> >         Then what's the typical scan latency for scan a few rows from a
> > say several GB table ( with dozens of regions ) on a small cluster with
> say
> > 4 RS ?
> >
> >         A few ms? Tens of ms? Or more?
> >
> > Best Regards,
> > Raymond Liu
> >
>

Re: what's the typical scan latency?

Posted by ramkrishna vasudevan <ra...@gmail.com>.

What is that you are observing now?

Regards
Ram


On Mon, Jun 3, 2013 at 2:00 PM, Liu, Raymond <ra...@intel.com> wrote:

> Hi
>
>         If all the data is already in RS blockcache.
>         Then what's the typical scan latency for scan a few rows from a
> say several GB table ( with dozens of regions ) on a small cluster with say
> 4 RS ?
>
>         A few ms? Tens of ms? Or more?
>
> Best Regards,
> Raymond Liu
>