You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Vladimir Rodionov <vr...@carrieriq.com> on 2013/07/30 20:23:00 UTC

HBase read perfomnance and HBase client

I have been doing quite extensive testing of different read scenarios:

1. blockcache disabled/enabled
2. data is local/remote (no good hdfs locality)

and it turned out that that I can not saturate 1 RS using one (comparable in CPU power and RAM) client host:

 I am running client app with 60 read threads active (with multi-get) that is going to one particular RS and
this RS's load is 100 -150% (out of 3200% available) - it means that load is ~5%

All threads in RS are either in BLOCKED (wait) or in IN_NATIVE states (epoll)

I attribute this  to the HBase client implementation which seems to be not scalable (I am going dig into client later on today).

Some numbers: The maximum what I could get from Single get (60 threads): 30K per sec. Multiget gives ~ 75K (60 threads)

What are my options? I want to measure the limits and I do not want to run Cluster of clients against just ONE Region Server?

RS config: 96GB RAM, 16(32) CPU
Client     : 48GB RAM   8 (16) CPU

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com


Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

Re: HBase read perfomnance and HBase client

Posted by Varun Sharma <va...@pinterest.com>.
Last time around, we found that block checksums was #1 (but then if we are
serving from block cache, it should not matter) and binarysearch into
indices was also high in terms of CPU.

We use a multi level index for HFile v2 right ? Is that just a multi level
binary search on all the start keys of data blocks or do we ever sequential
scan index blocks like we do for data blocks...


On Wed, Jul 31, 2013 at 11:15 PM, lars hofhansl <la...@apache.org> wrote:

> Yeah, that would seem to indicate that seeking into the block is not a
> bottleneck (and you said earlier that everything fits into the blockcache).
> Need to profile to know more. If you have time, would be cool if you can
> start jvisualvm and attach it to the RS start the profiling and let the
> workload run for a bit.
>
> -- Lars
>
>
>
> ----- Original Message -----
> From: Vladimir Rodionov <vl...@gmail.com>
> To: dev@hbase.apache.org; lars hofhansl <la...@apache.org>
> Cc:
> Sent: Wednesday, July 31, 2013 9:57 PM
> Subject: Re: HBase read perfomnance and HBase client
>
> Smaller block size (32K) does not give any performance gain and this is
> strange, to say the least.
>
>
> On Wed, Jul 31, 2013 at 9:33 PM, lars hofhansl <la...@apache.org> wrote:
>
> > Would be interesting to profile MultiGet. With RTT of 0.1ms, the internal
> > RS friction is probably the main contributor.
> > In fact MultiGet just loops over the set at the RS and calls single gets
> > on the various regions.
> >
> > Each Get needs to reseek into the block (even when it is cached, since
> KVs
> > have variable size).
> >
> > There are HBASE-6136 and HBASE-8362.
> >
> >
> > -- Lars
> >
> > ________________________________
> > From: Vladimir Rodionov <vl...@gmail.com>
> > To: dev@hbase.apache.org; lars hofhansl <la...@apache.org>
> > Sent: Wednesday, July 31, 2013 7:27 PM
> > Subject: Re: HBase read perfomnance and HBase client
> >
> >
> > Some final numbers :
> >
> > Test config:
> >
> > HBase 0.94.6
> > blockcache=true, block size = 64K, KV size = 62 bytes (raw).
> >
> > 5 Clients: 96GB, 16(32) CPUs (2.2Ghz), CentOS 5.7
> > 1 RS Server: the same config.
> >
> > Local network with ping between hosts: 0.1 ms
> >
> >
> > 1. HBase client hits the wall at ~ 50K per sec regardless of # of CPU,
> > threads, IO pool size and other settings.
> > 2. HBase server was able to sustain 170K per sec (with 64K block size).
> All
> > from block cache. KV size = 62 bytes (very small). This is for single Get
> > op, 60 threads per client, 5 clients (on different hosts)
> > 3. Multi - get hits the wall at the same 170K-200K per sec. Batch size
> > tested: 30, 100. The same performance absolutely as with batch size = 1.
> > Multi get has some internal issues on RegionServer side. May be excessive
> > locking or some thing else.
> >
> >
> >
> >
> >
> > On Tue, Jul 30, 2013 at 2:01 PM, Vladimir Rodionov
> > <vl...@gmail.com>wrote:
> >
> > > 1. SCR are enabled
> > > 2. Single Configuration for all table did not work well, but I will try
> > it
> > > again
> > > 3. With Nagel I had 0.8ms avg, w/o - 0.4ms - I see the difference
> > >
> > >
> > > On Tue, Jul 30, 2013 at 1:50 PM, lars hofhansl <la...@apache.org>
> wrote:
> > >
> > >> With Nagle's you'd see something around 40ms. You are not saying 0.8ms
> > >> RTT is bad, right? Are you seeing ~40ms latencies?
> > >>
> > >> This thread has gotten confusing.
> > >>
> > >> I would try these:
> > >> * one Configuration for all tables. Or even use a single
> > >> HConnection/Threadpool and use the HTable(byte[], HConnection,
> > >> ExecutorService) constructor
> > >> * disable Nagle's: set both ipc.server.tcpnodelay and
> > >> hbase.ipc.client.tcpnodelay to true in hbase-site.xml (both client
> *and*
> > >> server)
> > >> * increase hbase.client.ipc.pool.size in client's hbase-site.xml
> > >> * enable short circuit reads (details depend on exact version of
> > Hadoop).
> > >> Google will help :)
> > >>
> > >> -- Lars
> > >>
> > >>
> > >> ----- Original Message -----
> > >> From: Vladimir Rodionov <vl...@gmail.com>
> > >> To: dev@hbase.apache.org
> > >> Cc:
> > >> Sent: Tuesday, July 30, 2013 1:30 PM
> > >> Subject: Re: HBase read perfomnance and HBase client
> > >>
> > >> This hbase.ipc.client.tcpnodelay (default - false) explains poor
> single
> > >> thread performance and high latency ( 0.8ms in local network)?
> > >>
> > >>
> > >> On Tue, Jul 30, 2013 at 1:22 PM, Vladimir Rodionov
> > >> <vl...@gmail.com>wrote:
> > >>
> > >> > One more observation: One Configuration instance per HTable gives
> 50%
> > >> > boost as compared to single Configuration object for all HTable's -
> > from
> > >> > 20K to 30K
> > >> >
> > >> >
> > >> > On Tue, Jul 30, 2013 at 1:17 PM, Vladimir Rodionov <
> > >> vladrodionov@gmail.com
> > >> > > wrote:
> > >> >
> > >> >> This thread dump has been taken when client was sending 60 requests
> > in
> > >> >> parallel (at least, in theory). There are 50 server handler
> threads.
> > >> >>
> > >> >>
> > >> >> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov <
> > >> >> vladrodionov@gmail.com> wrote:
> > >> >>
> > >> >>> Sure, here it is:
> > >> >>>
> > >> >>> http://pastebin.com/8TjyrKRT
> > >> >>>
> > >> >>> epoll is not only to read/write HDFS but to connect/listen to
> > clients
> > >> as
> > >> >>> well?
> > >> >>>
> > >> >>>
> > >> >>> On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <
> > >> >>> jdcryans@apache.org> wrote:
> > >> >>>
> > >> >>>> Can you show us what the thread dump looks like when the threads
> > are
> > >> >>>> BLOCKED? There aren't that many locks on the read path when
> reading
> > >> >>>> out of the block cache, and epoll would only happen if you need
> to
> > >> hit
> > >> >>>> HDFS, which you're saying is not happening.
> > >> >>>>
> > >> >>>> J-D
> > >> >>>>
> > >> >>>> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
> > >> >>>> <vl...@gmail.com> wrote:
> > >> >>>> > I am hitting data in a block cache, of course. The data set is
> > very
> > >> >>>> small
> > >> >>>> > to fit comfortably into block cache and all request are
> directed
> > to
> > >> >>>> the
> > >> >>>> > same Region to guarantee single RS testing.
> > >> >>>> >
> > >> >>>> > To Ted:
> > >> >>>> >
> > >> >>>> > Yes, its CDH 4.3 . What the difference between 94.10 and 94.6
> > with
> > >> >>>> respect
> > >> >>>> > to read performance?
> > >> >>>> >
> > >> >>>> >
> > >> >>>> > On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <
> > >> >>>> jdcryans@apache.org>wrote:
> > >> >>>> >
> > >> >>>> >> That's a tough one.
> > >> >>>> >>
> > >> >>>> >> One thing that comes to mind is socket reuse. It used to come
> up
> > >> more
> > >> >>>> >> more often but this is an issue that people hit when doing
> loads
> > >> of
> > >> >>>> >> random reads. Try enabling tcp_tw_recycle but I'm not
> > guaranteeing
> > >> >>>> >> anything :)
> > >> >>>> >>
> > >> >>>> >> Also if you _just_ want to saturate something, be it CPU or
> > >> network,
> > >> >>>> >> wouldn't it be better to hit data only in the block cache?
> This
> > >> way
> > >> >>>> it
> > >> >>>> >> has the lowest overhead?
> > >> >>>> >>
> > >> >>>> >> Last thing I wanted to mention is that yes, the client doesn't
> > >> scale
> > >> >>>> >> very well. I would suggest you give the asynchbase client a
> run.
> > >> >>>> >>
> > >> >>>> >> J-D
> > >> >>>> >>
> > >> >>>> >> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
> > >> >>>> >> <vr...@carrieriq.com> wrote:
> > >> >>>> >> > I have been doing quite extensive testing of different read
> > >> >>>> scenarios:
> > >> >>>> >> >
> > >> >>>> >> > 1. blockcache disabled/enabled
> > >> >>>> >> > 2. data is local/remote (no good hdfs locality)
> > >> >>>> >> >
> > >> >>>> >> > and it turned out that that I can not saturate 1 RS using
> one
> > >> >>>> >> (comparable in CPU power and RAM) client host:
> > >> >>>> >> >
> > >> >>>> >> >  I am running client app with 60 read threads active (with
> > >> >>>> multi-get)
> > >> >>>> >> that is going to one particular RS and
> > >> >>>> >> > this RS's load is 100 -150% (out of 3200% available) - it
> > means
> > >> >>>> that
> > >> >>>> >> load is ~5%
> > >> >>>> >> >
> > >> >>>> >> > All threads in RS are either in BLOCKED (wait) or in
> IN_NATIVE
> > >> >>>> states
> > >> >>>> >> (epoll)
> > >> >>>> >> >
> > >> >>>> >> > I attribute this  to the HBase client implementation which
> > seems
> > >> >>>> to be
> > >> >>>> >> not scalable (I am going dig into client later on today).
> > >> >>>> >> >
> > >> >>>> >> > Some numbers: The maximum what I could get from Single get
> (60
> > >> >>>> threads):
> > >> >>>> >> 30K per sec. Multiget gives ~ 75K (60 threads)
> > >> >>>> >> >
> > >> >>>> >> > What are my options? I want to measure the limits and I do
> not
> > >> >>>> want to
> > >> >>>> >> run Cluster of clients against just ONE Region Server?
> > >> >>>> >> >
> > >> >>>> >> > RS config: 96GB RAM, 16(32) CPU
> > >> >>>> >> > Client     : 48GB RAM   8 (16) CPU
> > >> >>>> >> >
> > >> >>>> >> > Best regards,
> > >> >>>> >> > Vladimir Rodionov
> > >> >>>> >> > Principal Platform Engineer
> > >> >>>> >> > Carrier IQ, www.carrieriq.com
> > >> >>>> >> > e-mail: vrodionov@carrieriq.com
> > >> >>>> >> >
> > >> >>>> >> >
> > >> >>>> >> > Confidentiality Notice:  The information contained in this
> > >> message,
> > >> >>>> >> including any attachments hereto, may be confidential and is
> > >> >>>> intended to be
> > >> >>>> >> read only by the individual or entity to whom this message is
> > >> >>>> addressed. If
> > >> >>>> >> the reader of this message is not the intended recipient or an
> > >> agent
> > >> >>>> or
> > >> >>>> >> designee of the intended recipient, please note that any
> review,
> > >> use,
> > >> >>>> >> disclosure or distribution of this message or its attachments,
> > in
> > >> >>>> any form,
> > >> >>>> >> is strictly prohibited.  If you have received this message in
> > >> error,
> > >> >>>> please
> > >> >>>> >> immediately notify the sender and/or
> > Notifications@carrieriq.comand
> > >> >>>> >> delete or destroy any copy of this message and its
> attachments.
> > >> >>>> >>
> > >> >>>>
> > >> >>>
> > >> >>>
> > >> >>
> > >> >
> > >>
> > >>
> > >
> >
>
>

Re: HBase read perfomnance and HBase client

Posted by Vladimir Rodionov <vl...@gmail.com>.
Yes, I think HBASE-9087 is what I have been observing in my load tests. It
seems that Store access (Scanner creation) is not only over-synchronized
but CPU intensive as well.


On Thu, Aug 1, 2013 at 9:27 AM, Ted Yu <yu...@gmail.com> wrote:

> Vlad:
> You might want to look at HBASE-9087 Handlers being blocked during reads
>
> On Thu, Aug 1, 2013 at 9:24 AM, Vladimir Rodionov <vladrodionov@gmail.com
> >wrote:
>
> > All tests I have run were hitting single region on a region server. I
> > suspect this is not a right scenario. There are some points in the Store
> > class which are heavily synchronized:
> >
> > For example this one:
> >   // All access must be synchronized.
> >   private final CopyOnWriteArraySet<ChangedReadersObserver>
> > changedReaderObservers =
> >     new CopyOnWriteArraySet<ChangedReadersObserver>();
> >
> > I will re-run tests against all available regions on a RS and will post
> > results later on today.
> >
> >
> >
> >
> > On Wed, Jul 31, 2013 at 11:15 PM, lars hofhansl <la...@apache.org>
> wrote:
> >
> > > Yeah, that would seem to indicate that seeking into the block is not a
> > > bottleneck (and you said earlier that everything fits into the
> > blockcache).
> > > Need to profile to know more. If you have time, would be cool if you
> can
> > > start jvisualvm and attach it to the RS start the profiling and let the
> > > workload run for a bit.
> > >
> > > -- Lars
> > >
> > >
> > >
> > > ----- Original Message -----
> > > From: Vladimir Rodionov <vl...@gmail.com>
> > > To: dev@hbase.apache.org; lars hofhansl <la...@apache.org>
> > > Cc:
> > > Sent: Wednesday, July 31, 2013 9:57 PM
> > > Subject: Re: HBase read perfomnance and HBase client
> > >
> > > Smaller block size (32K) does not give any performance gain and this is
> > > strange, to say the least.
> > >
> > >
> > > On Wed, Jul 31, 2013 at 9:33 PM, lars hofhansl <la...@apache.org>
> wrote:
> > >
> > > > Would be interesting to profile MultiGet. With RTT of 0.1ms, the
> > internal
> > > > RS friction is probably the main contributor.
> > > > In fact MultiGet just loops over the set at the RS and calls single
> > gets
> > > > on the various regions.
> > > >
> > > > Each Get needs to reseek into the block (even when it is cached,
> since
> > > KVs
> > > > have variable size).
> > > >
> > > > There are HBASE-6136 and HBASE-8362.
> > > >
> > > >
> > > > -- Lars
> > > >
> > > > ________________________________
> > > > From: Vladimir Rodionov <vl...@gmail.com>
> > > > To: dev@hbase.apache.org; lars hofhansl <la...@apache.org>
> > > > Sent: Wednesday, July 31, 2013 7:27 PM
> > > > Subject: Re: HBase read perfomnance and HBase client
> > > >
> > > >
> > > > Some final numbers :
> > > >
> > > > Test config:
> > > >
> > > > HBase 0.94.6
> > > > blockcache=true, block size = 64K, KV size = 62 bytes (raw).
> > > >
> > > > 5 Clients: 96GB, 16(32) CPUs (2.2Ghz), CentOS 5.7
> > > > 1 RS Server: the same config.
> > > >
> > > > Local network with ping between hosts: 0.1 ms
> > > >
> > > >
> > > > 1. HBase client hits the wall at ~ 50K per sec regardless of # of
> CPU,
> > > > threads, IO pool size and other settings.
> > > > 2. HBase server was able to sustain 170K per sec (with 64K block
> size).
> > > All
> > > > from block cache. KV size = 62 bytes (very small). This is for single
> > Get
> > > > op, 60 threads per client, 5 clients (on different hosts)
> > > > 3. Multi - get hits the wall at the same 170K-200K per sec. Batch
> size
> > > > tested: 30, 100. The same performance absolutely as with batch size =
> > 1.
> > > > Multi get has some internal issues on RegionServer side. May be
> > excessive
> > > > locking or some thing else.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Jul 30, 2013 at 2:01 PM, Vladimir Rodionov
> > > > <vl...@gmail.com>wrote:
> > > >
> > > > > 1. SCR are enabled
> > > > > 2. Single Configuration for all table did not work well, but I will
> > try
> > > > it
> > > > > again
> > > > > 3. With Nagel I had 0.8ms avg, w/o - 0.4ms - I see the difference
> > > > >
> > > > >
> > > > > On Tue, Jul 30, 2013 at 1:50 PM, lars hofhansl <la...@apache.org>
> > > wrote:
> > > > >
> > > > >> With Nagle's you'd see something around 40ms. You are not saying
> > 0.8ms
> > > > >> RTT is bad, right? Are you seeing ~40ms latencies?
> > > > >>
> > > > >> This thread has gotten confusing.
> > > > >>
> > > > >> I would try these:
> > > > >> * one Configuration for all tables. Or even use a single
> > > > >> HConnection/Threadpool and use the HTable(byte[], HConnection,
> > > > >> ExecutorService) constructor
> > > > >> * disable Nagle's: set both ipc.server.tcpnodelay and
> > > > >> hbase.ipc.client.tcpnodelay to true in hbase-site.xml (both client
> > > *and*
> > > > >> server)
> > > > >> * increase hbase.client.ipc.pool.size in client's hbase-site.xml
> > > > >> * enable short circuit reads (details depend on exact version of
> > > > Hadoop).
> > > > >> Google will help :)
> > > > >>
> > > > >> -- Lars
> > > > >>
> > > > >>
> > > > >> ----- Original Message -----
> > > > >> From: Vladimir Rodionov <vl...@gmail.com>
> > > > >> To: dev@hbase.apache.org
> > > > >> Cc:
> > > > >> Sent: Tuesday, July 30, 2013 1:30 PM
> > > > >> Subject: Re: HBase read perfomnance and HBase client
> > > > >>
> > > > >> This hbase.ipc.client.tcpnodelay (default - false) explains poor
> > > single
> > > > >> thread performance and high latency ( 0.8ms in local network)?
> > > > >>
> > > > >>
> > > > >> On Tue, Jul 30, 2013 at 1:22 PM, Vladimir Rodionov
> > > > >> <vl...@gmail.com>wrote:
> > > > >>
> > > > >> > One more observation: One Configuration instance per HTable
> gives
> > > 50%
> > > > >> > boost as compared to single Configuration object for all
> HTable's
> > -
> > > > from
> > > > >> > 20K to 30K
> > > > >> >
> > > > >> >
> > > > >> > On Tue, Jul 30, 2013 at 1:17 PM, Vladimir Rodionov <
> > > > >> vladrodionov@gmail.com
> > > > >> > > wrote:
> > > > >> >
> > > > >> >> This thread dump has been taken when client was sending 60
> > requests
> > > > in
> > > > >> >> parallel (at least, in theory). There are 50 server handler
> > > threads.
> > > > >> >>
> > > > >> >>
> > > > >> >> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov <
> > > > >> >> vladrodionov@gmail.com> wrote:
> > > > >> >>
> > > > >> >>> Sure, here it is:
> > > > >> >>>
> > > > >> >>> http://pastebin.com/8TjyrKRT
> > > > >> >>>
> > > > >> >>> epoll is not only to read/write HDFS but to connect/listen to
> > > > clients
> > > > >> as
> > > > >> >>> well?
> > > > >> >>>
> > > > >> >>>
> > > > >> >>> On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <
> > > > >> >>> jdcryans@apache.org> wrote:
> > > > >> >>>
> > > > >> >>>> Can you show us what the thread dump looks like when the
> > threads
> > > > are
> > > > >> >>>> BLOCKED? There aren't that many locks on the read path when
> > > reading
> > > > >> >>>> out of the block cache, and epoll would only happen if you
> need
> > > to
> > > > >> hit
> > > > >> >>>> HDFS, which you're saying is not happening.
> > > > >> >>>>
> > > > >> >>>> J-D
> > > > >> >>>>
> > > > >> >>>> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
> > > > >> >>>> <vl...@gmail.com> wrote:
> > > > >> >>>> > I am hitting data in a block cache, of course. The data set
> > is
> > > > very
> > > > >> >>>> small
> > > > >> >>>> > to fit comfortably into block cache and all request are
> > > directed
> > > > to
> > > > >> >>>> the
> > > > >> >>>> > same Region to guarantee single RS testing.
> > > > >> >>>> >
> > > > >> >>>> > To Ted:
> > > > >> >>>> >
> > > > >> >>>> > Yes, its CDH 4.3 . What the difference between 94.10 and
> 94.6
> > > > with
> > > > >> >>>> respect
> > > > >> >>>> > to read performance?
> > > > >> >>>> >
> > > > >> >>>> >
> > > > >> >>>> > On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <
> > > > >> >>>> jdcryans@apache.org>wrote:
> > > > >> >>>> >
> > > > >> >>>> >> That's a tough one.
> > > > >> >>>> >>
> > > > >> >>>> >> One thing that comes to mind is socket reuse. It used to
> > come
> > > up
> > > > >> more
> > > > >> >>>> >> more often but this is an issue that people hit when doing
> > > loads
> > > > >> of
> > > > >> >>>> >> random reads. Try enabling tcp_tw_recycle but I'm not
> > > > guaranteeing
> > > > >> >>>> >> anything :)
> > > > >> >>>> >>
> > > > >> >>>> >> Also if you _just_ want to saturate something, be it CPU
> or
> > > > >> network,
> > > > >> >>>> >> wouldn't it be better to hit data only in the block cache?
> > > This
> > > > >> way
> > > > >> >>>> it
> > > > >> >>>> >> has the lowest overhead?
> > > > >> >>>> >>
> > > > >> >>>> >> Last thing I wanted to mention is that yes, the client
> > doesn't
> > > > >> scale
> > > > >> >>>> >> very well. I would suggest you give the asynchbase client
> a
> > > run.
> > > > >> >>>> >>
> > > > >> >>>> >> J-D
> > > > >> >>>> >>
> > > > >> >>>> >> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
> > > > >> >>>> >> <vr...@carrieriq.com> wrote:
> > > > >> >>>> >> > I have been doing quite extensive testing of different
> > read
> > > > >> >>>> scenarios:
> > > > >> >>>> >> >
> > > > >> >>>> >> > 1. blockcache disabled/enabled
> > > > >> >>>> >> > 2. data is local/remote (no good hdfs locality)
> > > > >> >>>> >> >
> > > > >> >>>> >> > and it turned out that that I can not saturate 1 RS
> using
> > > one
> > > > >> >>>> >> (comparable in CPU power and RAM) client host:
> > > > >> >>>> >> >
> > > > >> >>>> >> >  I am running client app with 60 read threads active
> (with
> > > > >> >>>> multi-get)
> > > > >> >>>> >> that is going to one particular RS and
> > > > >> >>>> >> > this RS's load is 100 -150% (out of 3200% available) -
> it
> > > > means
> > > > >> >>>> that
> > > > >> >>>> >> load is ~5%
> > > > >> >>>> >> >
> > > > >> >>>> >> > All threads in RS are either in BLOCKED (wait) or in
> > > IN_NATIVE
> > > > >> >>>> states
> > > > >> >>>> >> (epoll)
> > > > >> >>>> >> >
> > > > >> >>>> >> > I attribute this  to the HBase client implementation
> which
> > > > seems
> > > > >> >>>> to be
> > > > >> >>>> >> not scalable (I am going dig into client later on today).
> > > > >> >>>> >> >
> > > > >> >>>> >> > Some numbers: The maximum what I could get from Single
> get
> > > (60
> > > > >> >>>> threads):
> > > > >> >>>> >> 30K per sec. Multiget gives ~ 75K (60 threads)
> > > > >> >>>> >> >
> > > > >> >>>> >> > What are my options? I want to measure the limits and I
> do
> > > not
> > > > >> >>>> want to
> > > > >> >>>> >> run Cluster of clients against just ONE Region Server?
> > > > >> >>>> >> >
> > > > >> >>>> >> > RS config: 96GB RAM, 16(32) CPU
> > > > >> >>>> >> > Client     : 48GB RAM   8 (16) CPU
> > > > >> >>>> >> >
> > > > >> >>>> >> > Best regards,
> > > > >> >>>> >> > Vladimir Rodionov
> > > > >> >>>> >> > Principal Platform Engineer
> > > > >> >>>> >> > Carrier IQ, www.carrieriq.com
> > > > >> >>>> >> > e-mail: vrodionov@carrieriq.com
> > > > >> >>>> >> >
> > > > >> >>>> >> >
> > > > >> >>>> >> > Confidentiality Notice:  The information contained in
> this
> > > > >> message,
> > > > >> >>>> >> including any attachments hereto, may be confidential and
> is
> > > > >> >>>> intended to be
> > > > >> >>>> >> read only by the individual or entity to whom this message
> > is
> > > > >> >>>> addressed. If
> > > > >> >>>> >> the reader of this message is not the intended recipient
> or
> > an
> > > > >> agent
> > > > >> >>>> or
> > > > >> >>>> >> designee of the intended recipient, please note that any
> > > review,
> > > > >> use,
> > > > >> >>>> >> disclosure or distribution of this message or its
> > attachments,
> > > > in
> > > > >> >>>> any form,
> > > > >> >>>> >> is strictly prohibited.  If you have received this message
> > in
> > > > >> error,
> > > > >> >>>> please
> > > > >> >>>> >> immediately notify the sender and/or
> > > > Notifications@carrieriq.comand
> > > > >> >>>> >> delete or destroy any copy of this message and its
> > > attachments.
> > > > >> >>>> >>
> > > > >> >>>>
> > > > >> >>>
> > > > >> >>>
> > > > >> >>
> > > > >> >
> > > > >>
> > > > >>
> > > > >
> > > >
> > >
> > >
> >
>

Re: HBase read perfomnance and HBase client

Posted by Ted Yu <yu...@gmail.com>.
Vlad:
You might want to look at HBASE-9087 Handlers being blocked during reads

On Thu, Aug 1, 2013 at 9:24 AM, Vladimir Rodionov <vl...@gmail.com>wrote:

> All tests I have run were hitting single region on a region server. I
> suspect this is not a right scenario. There are some points in the Store
> class which are heavily synchronized:
>
> For example this one:
>   // All access must be synchronized.
>   private final CopyOnWriteArraySet<ChangedReadersObserver>
> changedReaderObservers =
>     new CopyOnWriteArraySet<ChangedReadersObserver>();
>
> I will re-run tests against all available regions on a RS and will post
> results later on today.
>
>
>
>
> On Wed, Jul 31, 2013 at 11:15 PM, lars hofhansl <la...@apache.org> wrote:
>
> > Yeah, that would seem to indicate that seeking into the block is not a
> > bottleneck (and you said earlier that everything fits into the
> blockcache).
> > Need to profile to know more. If you have time, would be cool if you can
> > start jvisualvm and attach it to the RS start the profiling and let the
> > workload run for a bit.
> >
> > -- Lars
> >
> >
> >
> > ----- Original Message -----
> > From: Vladimir Rodionov <vl...@gmail.com>
> > To: dev@hbase.apache.org; lars hofhansl <la...@apache.org>
> > Cc:
> > Sent: Wednesday, July 31, 2013 9:57 PM
> > Subject: Re: HBase read perfomnance and HBase client
> >
> > Smaller block size (32K) does not give any performance gain and this is
> > strange, to say the least.
> >
> >
> > On Wed, Jul 31, 2013 at 9:33 PM, lars hofhansl <la...@apache.org> wrote:
> >
> > > Would be interesting to profile MultiGet. With RTT of 0.1ms, the
> internal
> > > RS friction is probably the main contributor.
> > > In fact MultiGet just loops over the set at the RS and calls single
> gets
> > > on the various regions.
> > >
> > > Each Get needs to reseek into the block (even when it is cached, since
> > KVs
> > > have variable size).
> > >
> > > There are HBASE-6136 and HBASE-8362.
> > >
> > >
> > > -- Lars
> > >
> > > ________________________________
> > > From: Vladimir Rodionov <vl...@gmail.com>
> > > To: dev@hbase.apache.org; lars hofhansl <la...@apache.org>
> > > Sent: Wednesday, July 31, 2013 7:27 PM
> > > Subject: Re: HBase read perfomnance and HBase client
> > >
> > >
> > > Some final numbers :
> > >
> > > Test config:
> > >
> > > HBase 0.94.6
> > > blockcache=true, block size = 64K, KV size = 62 bytes (raw).
> > >
> > > 5 Clients: 96GB, 16(32) CPUs (2.2Ghz), CentOS 5.7
> > > 1 RS Server: the same config.
> > >
> > > Local network with ping between hosts: 0.1 ms
> > >
> > >
> > > 1. HBase client hits the wall at ~ 50K per sec regardless of # of CPU,
> > > threads, IO pool size and other settings.
> > > 2. HBase server was able to sustain 170K per sec (with 64K block size).
> > All
> > > from block cache. KV size = 62 bytes (very small). This is for single
> Get
> > > op, 60 threads per client, 5 clients (on different hosts)
> > > 3. Multi - get hits the wall at the same 170K-200K per sec. Batch size
> > > tested: 30, 100. The same performance absolutely as with batch size =
> 1.
> > > Multi get has some internal issues on RegionServer side. May be
> excessive
> > > locking or some thing else.
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Jul 30, 2013 at 2:01 PM, Vladimir Rodionov
> > > <vl...@gmail.com>wrote:
> > >
> > > > 1. SCR are enabled
> > > > 2. Single Configuration for all table did not work well, but I will
> try
> > > it
> > > > again
> > > > 3. With Nagel I had 0.8ms avg, w/o - 0.4ms - I see the difference
> > > >
> > > >
> > > > On Tue, Jul 30, 2013 at 1:50 PM, lars hofhansl <la...@apache.org>
> > wrote:
> > > >
> > > >> With Nagle's you'd see something around 40ms. You are not saying
> 0.8ms
> > > >> RTT is bad, right? Are you seeing ~40ms latencies?
> > > >>
> > > >> This thread has gotten confusing.
> > > >>
> > > >> I would try these:
> > > >> * one Configuration for all tables. Or even use a single
> > > >> HConnection/Threadpool and use the HTable(byte[], HConnection,
> > > >> ExecutorService) constructor
> > > >> * disable Nagle's: set both ipc.server.tcpnodelay and
> > > >> hbase.ipc.client.tcpnodelay to true in hbase-site.xml (both client
> > *and*
> > > >> server)
> > > >> * increase hbase.client.ipc.pool.size in client's hbase-site.xml
> > > >> * enable short circuit reads (details depend on exact version of
> > > Hadoop).
> > > >> Google will help :)
> > > >>
> > > >> -- Lars
> > > >>
> > > >>
> > > >> ----- Original Message -----
> > > >> From: Vladimir Rodionov <vl...@gmail.com>
> > > >> To: dev@hbase.apache.org
> > > >> Cc:
> > > >> Sent: Tuesday, July 30, 2013 1:30 PM
> > > >> Subject: Re: HBase read perfomnance and HBase client
> > > >>
> > > >> This hbase.ipc.client.tcpnodelay (default - false) explains poor
> > single
> > > >> thread performance and high latency ( 0.8ms in local network)?
> > > >>
> > > >>
> > > >> On Tue, Jul 30, 2013 at 1:22 PM, Vladimir Rodionov
> > > >> <vl...@gmail.com>wrote:
> > > >>
> > > >> > One more observation: One Configuration instance per HTable gives
> > 50%
> > > >> > boost as compared to single Configuration object for all HTable's
> -
> > > from
> > > >> > 20K to 30K
> > > >> >
> > > >> >
> > > >> > On Tue, Jul 30, 2013 at 1:17 PM, Vladimir Rodionov <
> > > >> vladrodionov@gmail.com
> > > >> > > wrote:
> > > >> >
> > > >> >> This thread dump has been taken when client was sending 60
> requests
> > > in
> > > >> >> parallel (at least, in theory). There are 50 server handler
> > threads.
> > > >> >>
> > > >> >>
> > > >> >> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov <
> > > >> >> vladrodionov@gmail.com> wrote:
> > > >> >>
> > > >> >>> Sure, here it is:
> > > >> >>>
> > > >> >>> http://pastebin.com/8TjyrKRT
> > > >> >>>
> > > >> >>> epoll is not only to read/write HDFS but to connect/listen to
> > > clients
> > > >> as
> > > >> >>> well?
> > > >> >>>
> > > >> >>>
> > > >> >>> On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <
> > > >> >>> jdcryans@apache.org> wrote:
> > > >> >>>
> > > >> >>>> Can you show us what the thread dump looks like when the
> threads
> > > are
> > > >> >>>> BLOCKED? There aren't that many locks on the read path when
> > reading
> > > >> >>>> out of the block cache, and epoll would only happen if you need
> > to
> > > >> hit
> > > >> >>>> HDFS, which you're saying is not happening.
> > > >> >>>>
> > > >> >>>> J-D
> > > >> >>>>
> > > >> >>>> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
> > > >> >>>> <vl...@gmail.com> wrote:
> > > >> >>>> > I am hitting data in a block cache, of course. The data set
> is
> > > very
> > > >> >>>> small
> > > >> >>>> > to fit comfortably into block cache and all request are
> > directed
> > > to
> > > >> >>>> the
> > > >> >>>> > same Region to guarantee single RS testing.
> > > >> >>>> >
> > > >> >>>> > To Ted:
> > > >> >>>> >
> > > >> >>>> > Yes, its CDH 4.3 . What the difference between 94.10 and 94.6
> > > with
> > > >> >>>> respect
> > > >> >>>> > to read performance?
> > > >> >>>> >
> > > >> >>>> >
> > > >> >>>> > On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <
> > > >> >>>> jdcryans@apache.org>wrote:
> > > >> >>>> >
> > > >> >>>> >> That's a tough one.
> > > >> >>>> >>
> > > >> >>>> >> One thing that comes to mind is socket reuse. It used to
> come
> > up
> > > >> more
> > > >> >>>> >> more often but this is an issue that people hit when doing
> > loads
> > > >> of
> > > >> >>>> >> random reads. Try enabling tcp_tw_recycle but I'm not
> > > guaranteeing
> > > >> >>>> >> anything :)
> > > >> >>>> >>
> > > >> >>>> >> Also if you _just_ want to saturate something, be it CPU or
> > > >> network,
> > > >> >>>> >> wouldn't it be better to hit data only in the block cache?
> > This
> > > >> way
> > > >> >>>> it
> > > >> >>>> >> has the lowest overhead?
> > > >> >>>> >>
> > > >> >>>> >> Last thing I wanted to mention is that yes, the client
> doesn't
> > > >> scale
> > > >> >>>> >> very well. I would suggest you give the asynchbase client a
> > run.
> > > >> >>>> >>
> > > >> >>>> >> J-D
> > > >> >>>> >>
> > > >> >>>> >> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
> > > >> >>>> >> <vr...@carrieriq.com> wrote:
> > > >> >>>> >> > I have been doing quite extensive testing of different
> read
> > > >> >>>> scenarios:
> > > >> >>>> >> >
> > > >> >>>> >> > 1. blockcache disabled/enabled
> > > >> >>>> >> > 2. data is local/remote (no good hdfs locality)
> > > >> >>>> >> >
> > > >> >>>> >> > and it turned out that that I can not saturate 1 RS using
> > one
> > > >> >>>> >> (comparable in CPU power and RAM) client host:
> > > >> >>>> >> >
> > > >> >>>> >> >  I am running client app with 60 read threads active (with
> > > >> >>>> multi-get)
> > > >> >>>> >> that is going to one particular RS and
> > > >> >>>> >> > this RS's load is 100 -150% (out of 3200% available) - it
> > > means
> > > >> >>>> that
> > > >> >>>> >> load is ~5%
> > > >> >>>> >> >
> > > >> >>>> >> > All threads in RS are either in BLOCKED (wait) or in
> > IN_NATIVE
> > > >> >>>> states
> > > >> >>>> >> (epoll)
> > > >> >>>> >> >
> > > >> >>>> >> > I attribute this  to the HBase client implementation which
> > > seems
> > > >> >>>> to be
> > > >> >>>> >> not scalable (I am going dig into client later on today).
> > > >> >>>> >> >
> > > >> >>>> >> > Some numbers: The maximum what I could get from Single get
> > (60
> > > >> >>>> threads):
> > > >> >>>> >> 30K per sec. Multiget gives ~ 75K (60 threads)
> > > >> >>>> >> >
> > > >> >>>> >> > What are my options? I want to measure the limits and I do
> > not
> > > >> >>>> want to
> > > >> >>>> >> run Cluster of clients against just ONE Region Server?
> > > >> >>>> >> >
> > > >> >>>> >> > RS config: 96GB RAM, 16(32) CPU
> > > >> >>>> >> > Client     : 48GB RAM   8 (16) CPU
> > > >> >>>> >> >
> > > >> >>>> >> > Best regards,
> > > >> >>>> >> > Vladimir Rodionov
> > > >> >>>> >> > Principal Platform Engineer
> > > >> >>>> >> > Carrier IQ, www.carrieriq.com
> > > >> >>>> >> > e-mail: vrodionov@carrieriq.com
> > > >> >>>> >> >
> > > >> >>>> >> >
> > > >> >>>> >> > Confidentiality Notice:  The information contained in this
> > > >> message,
> > > >> >>>> >> including any attachments hereto, may be confidential and is
> > > >> >>>> intended to be
> > > >> >>>> >> read only by the individual or entity to whom this message
> is
> > > >> >>>> addressed. If
> > > >> >>>> >> the reader of this message is not the intended recipient or
> an
> > > >> agent
> > > >> >>>> or
> > > >> >>>> >> designee of the intended recipient, please note that any
> > review,
> > > >> use,
> > > >> >>>> >> disclosure or distribution of this message or its
> attachments,
> > > in
> > > >> >>>> any form,
> > > >> >>>> >> is strictly prohibited.  If you have received this message
> in
> > > >> error,
> > > >> >>>> please
> > > >> >>>> >> immediately notify the sender and/or
> > > Notifications@carrieriq.comand
> > > >> >>>> >> delete or destroy any copy of this message and its
> > attachments.
> > > >> >>>> >>
> > > >> >>>>
> > > >> >>>
> > > >> >>>
> > > >> >>
> > > >> >
> > > >>
> > > >>
> > > >
> > >
> >
> >
>

Re: HBase read perfomnance and HBase client

Posted by Vladimir Rodionov <vl...@gmail.com>.
All tests I have run were hitting single region on a region server. I
suspect this is not a right scenario. There are some points in the Store
class which are heavily synchronized:

For example this one:
  // All access must be synchronized.
  private final CopyOnWriteArraySet<ChangedReadersObserver>
changedReaderObservers =
    new CopyOnWriteArraySet<ChangedReadersObserver>();

I will re-run tests against all available regions on a RS and will post
results later on today.




On Wed, Jul 31, 2013 at 11:15 PM, lars hofhansl <la...@apache.org> wrote:

> Yeah, that would seem to indicate that seeking into the block is not a
> bottleneck (and you said earlier that everything fits into the blockcache).
> Need to profile to know more. If you have time, would be cool if you can
> start jvisualvm and attach it to the RS start the profiling and let the
> workload run for a bit.
>
> -- Lars
>
>
>
> ----- Original Message -----
> From: Vladimir Rodionov <vl...@gmail.com>
> To: dev@hbase.apache.org; lars hofhansl <la...@apache.org>
> Cc:
> Sent: Wednesday, July 31, 2013 9:57 PM
> Subject: Re: HBase read perfomnance and HBase client
>
> Smaller block size (32K) does not give any performance gain and this is
> strange, to say the least.
>
>
> On Wed, Jul 31, 2013 at 9:33 PM, lars hofhansl <la...@apache.org> wrote:
>
> > Would be interesting to profile MultiGet. With RTT of 0.1ms, the internal
> > RS friction is probably the main contributor.
> > In fact MultiGet just loops over the set at the RS and calls single gets
> > on the various regions.
> >
> > Each Get needs to reseek into the block (even when it is cached, since
> KVs
> > have variable size).
> >
> > There are HBASE-6136 and HBASE-8362.
> >
> >
> > -- Lars
> >
> > ________________________________
> > From: Vladimir Rodionov <vl...@gmail.com>
> > To: dev@hbase.apache.org; lars hofhansl <la...@apache.org>
> > Sent: Wednesday, July 31, 2013 7:27 PM
> > Subject: Re: HBase read perfomnance and HBase client
> >
> >
> > Some final numbers :
> >
> > Test config:
> >
> > HBase 0.94.6
> > blockcache=true, block size = 64K, KV size = 62 bytes (raw).
> >
> > 5 Clients: 96GB, 16(32) CPUs (2.2Ghz), CentOS 5.7
> > 1 RS Server: the same config.
> >
> > Local network with ping between hosts: 0.1 ms
> >
> >
> > 1. HBase client hits the wall at ~ 50K per sec regardless of # of CPU,
> > threads, IO pool size and other settings.
> > 2. HBase server was able to sustain 170K per sec (with 64K block size).
> All
> > from block cache. KV size = 62 bytes (very small). This is for single Get
> > op, 60 threads per client, 5 clients (on different hosts)
> > 3. Multi - get hits the wall at the same 170K-200K per sec. Batch size
> > tested: 30, 100. The same performance absolutely as with batch size = 1.
> > Multi get has some internal issues on RegionServer side. May be excessive
> > locking or some thing else.
> >
> >
> >
> >
> >
> > On Tue, Jul 30, 2013 at 2:01 PM, Vladimir Rodionov
> > <vl...@gmail.com>wrote:
> >
> > > 1. SCR are enabled
> > > 2. Single Configuration for all table did not work well, but I will try
> > it
> > > again
> > > 3. With Nagel I had 0.8ms avg, w/o - 0.4ms - I see the difference
> > >
> > >
> > > On Tue, Jul 30, 2013 at 1:50 PM, lars hofhansl <la...@apache.org>
> wrote:
> > >
> > >> With Nagle's you'd see something around 40ms. You are not saying 0.8ms
> > >> RTT is bad, right? Are you seeing ~40ms latencies?
> > >>
> > >> This thread has gotten confusing.
> > >>
> > >> I would try these:
> > >> * one Configuration for all tables. Or even use a single
> > >> HConnection/Threadpool and use the HTable(byte[], HConnection,
> > >> ExecutorService) constructor
> > >> * disable Nagle's: set both ipc.server.tcpnodelay and
> > >> hbase.ipc.client.tcpnodelay to true in hbase-site.xml (both client
> *and*
> > >> server)
> > >> * increase hbase.client.ipc.pool.size in client's hbase-site.xml
> > >> * enable short circuit reads (details depend on exact version of
> > Hadoop).
> > >> Google will help :)
> > >>
> > >> -- Lars
> > >>
> > >>
> > >> ----- Original Message -----
> > >> From: Vladimir Rodionov <vl...@gmail.com>
> > >> To: dev@hbase.apache.org
> > >> Cc:
> > >> Sent: Tuesday, July 30, 2013 1:30 PM
> > >> Subject: Re: HBase read perfomnance and HBase client
> > >>
> > >> This hbase.ipc.client.tcpnodelay (default - false) explains poor
> single
> > >> thread performance and high latency ( 0.8ms in local network)?
> > >>
> > >>
> > >> On Tue, Jul 30, 2013 at 1:22 PM, Vladimir Rodionov
> > >> <vl...@gmail.com>wrote:
> > >>
> > >> > One more observation: One Configuration instance per HTable gives
> 50%
> > >> > boost as compared to single Configuration object for all HTable's -
> > from
> > >> > 20K to 30K
> > >> >
> > >> >
> > >> > On Tue, Jul 30, 2013 at 1:17 PM, Vladimir Rodionov <
> > >> vladrodionov@gmail.com
> > >> > > wrote:
> > >> >
> > >> >> This thread dump has been taken when client was sending 60 requests
> > in
> > >> >> parallel (at least, in theory). There are 50 server handler
> threads.
> > >> >>
> > >> >>
> > >> >> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov <
> > >> >> vladrodionov@gmail.com> wrote:
> > >> >>
> > >> >>> Sure, here it is:
> > >> >>>
> > >> >>> http://pastebin.com/8TjyrKRT
> > >> >>>
> > >> >>> epoll is not only to read/write HDFS but to connect/listen to
> > clients
> > >> as
> > >> >>> well?
> > >> >>>
> > >> >>>
> > >> >>> On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <
> > >> >>> jdcryans@apache.org> wrote:
> > >> >>>
> > >> >>>> Can you show us what the thread dump looks like when the threads
> > are
> > >> >>>> BLOCKED? There aren't that many locks on the read path when
> reading
> > >> >>>> out of the block cache, and epoll would only happen if you need
> to
> > >> hit
> > >> >>>> HDFS, which you're saying is not happening.
> > >> >>>>
> > >> >>>> J-D
> > >> >>>>
> > >> >>>> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
> > >> >>>> <vl...@gmail.com> wrote:
> > >> >>>> > I am hitting data in a block cache, of course. The data set is
> > very
> > >> >>>> small
> > >> >>>> > to fit comfortably into block cache and all request are
> directed
> > to
> > >> >>>> the
> > >> >>>> > same Region to guarantee single RS testing.
> > >> >>>> >
> > >> >>>> > To Ted:
> > >> >>>> >
> > >> >>>> > Yes, its CDH 4.3 . What the difference between 94.10 and 94.6
> > with
> > >> >>>> respect
> > >> >>>> > to read performance?
> > >> >>>> >
> > >> >>>> >
> > >> >>>> > On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <
> > >> >>>> jdcryans@apache.org>wrote:
> > >> >>>> >
> > >> >>>> >> That's a tough one.
> > >> >>>> >>
> > >> >>>> >> One thing that comes to mind is socket reuse. It used to come
> up
> > >> more
> > >> >>>> >> more often but this is an issue that people hit when doing
> loads
> > >> of
> > >> >>>> >> random reads. Try enabling tcp_tw_recycle but I'm not
> > guaranteeing
> > >> >>>> >> anything :)
> > >> >>>> >>
> > >> >>>> >> Also if you _just_ want to saturate something, be it CPU or
> > >> network,
> > >> >>>> >> wouldn't it be better to hit data only in the block cache?
> This
> > >> way
> > >> >>>> it
> > >> >>>> >> has the lowest overhead?
> > >> >>>> >>
> > >> >>>> >> Last thing I wanted to mention is that yes, the client doesn't
> > >> scale
> > >> >>>> >> very well. I would suggest you give the asynchbase client a
> run.
> > >> >>>> >>
> > >> >>>> >> J-D
> > >> >>>> >>
> > >> >>>> >> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
> > >> >>>> >> <vr...@carrieriq.com> wrote:
> > >> >>>> >> > I have been doing quite extensive testing of different read
> > >> >>>> scenarios:
> > >> >>>> >> >
> > >> >>>> >> > 1. blockcache disabled/enabled
> > >> >>>> >> > 2. data is local/remote (no good hdfs locality)
> > >> >>>> >> >
> > >> >>>> >> > and it turned out that that I can not saturate 1 RS using
> one
> > >> >>>> >> (comparable in CPU power and RAM) client host:
> > >> >>>> >> >
> > >> >>>> >> >  I am running client app with 60 read threads active (with
> > >> >>>> multi-get)
> > >> >>>> >> that is going to one particular RS and
> > >> >>>> >> > this RS's load is 100 -150% (out of 3200% available) - it
> > means
> > >> >>>> that
> > >> >>>> >> load is ~5%
> > >> >>>> >> >
> > >> >>>> >> > All threads in RS are either in BLOCKED (wait) or in
> IN_NATIVE
> > >> >>>> states
> > >> >>>> >> (epoll)
> > >> >>>> >> >
> > >> >>>> >> > I attribute this  to the HBase client implementation which
> > seems
> > >> >>>> to be
> > >> >>>> >> not scalable (I am going dig into client later on today).
> > >> >>>> >> >
> > >> >>>> >> > Some numbers: The maximum what I could get from Single get
> (60
> > >> >>>> threads):
> > >> >>>> >> 30K per sec. Multiget gives ~ 75K (60 threads)
> > >> >>>> >> >
> > >> >>>> >> > What are my options? I want to measure the limits and I do
> not
> > >> >>>> want to
> > >> >>>> >> run Cluster of clients against just ONE Region Server?
> > >> >>>> >> >
> > >> >>>> >> > RS config: 96GB RAM, 16(32) CPU
> > >> >>>> >> > Client     : 48GB RAM   8 (16) CPU
> > >> >>>> >> >
> > >> >>>> >> > Best regards,
> > >> >>>> >> > Vladimir Rodionov
> > >> >>>> >> > Principal Platform Engineer
> > >> >>>> >> > Carrier IQ, www.carrieriq.com
> > >> >>>> >> > e-mail: vrodionov@carrieriq.com
> > >> >>>> >> >
> > >> >>>> >> >
> > >> >>>> >> > Confidentiality Notice:  The information contained in this
> > >> message,
> > >> >>>> >> including any attachments hereto, may be confidential and is
> > >> >>>> intended to be
> > >> >>>> >> read only by the individual or entity to whom this message is
> > >> >>>> addressed. If
> > >> >>>> >> the reader of this message is not the intended recipient or an
> > >> agent
> > >> >>>> or
> > >> >>>> >> designee of the intended recipient, please note that any
> review,
> > >> use,
> > >> >>>> >> disclosure or distribution of this message or its attachments,
> > in
> > >> >>>> any form,
> > >> >>>> >> is strictly prohibited.  If you have received this message in
> > >> error,
> > >> >>>> please
> > >> >>>> >> immediately notify the sender and/or
> > Notifications@carrieriq.comand
> > >> >>>> >> delete or destroy any copy of this message and its
> attachments.
> > >> >>>> >>
> > >> >>>>
> > >> >>>
> > >> >>>
> > >> >>
> > >> >
> > >>
> > >>
> > >
> >
>
>

Re: HBase read perfomnance and HBase client

Posted by lars hofhansl <la...@apache.org>.
Yeah, that would seem to indicate that seeking into the block is not a bottleneck (and you said earlier that everything fits into the blockcache).
Need to profile to know more. If you have time, would be cool if you can start jvisualvm and attach it to the RS start the profiling and let the workload run for a bit.

-- Lars



----- Original Message -----
From: Vladimir Rodionov <vl...@gmail.com>
To: dev@hbase.apache.org; lars hofhansl <la...@apache.org>
Cc: 
Sent: Wednesday, July 31, 2013 9:57 PM
Subject: Re: HBase read perfomnance and HBase client

Smaller block size (32K) does not give any performance gain and this is
strange, to say the least.


On Wed, Jul 31, 2013 at 9:33 PM, lars hofhansl <la...@apache.org> wrote:

> Would be interesting to profile MultiGet. With RTT of 0.1ms, the internal
> RS friction is probably the main contributor.
> In fact MultiGet just loops over the set at the RS and calls single gets
> on the various regions.
>
> Each Get needs to reseek into the block (even when it is cached, since KVs
> have variable size).
>
> There are HBASE-6136 and HBASE-8362.
>
>
> -- Lars
>
> ________________________________
> From: Vladimir Rodionov <vl...@gmail.com>
> To: dev@hbase.apache.org; lars hofhansl <la...@apache.org>
> Sent: Wednesday, July 31, 2013 7:27 PM
> Subject: Re: HBase read perfomnance and HBase client
>
>
> Some final numbers :
>
> Test config:
>
> HBase 0.94.6
> blockcache=true, block size = 64K, KV size = 62 bytes (raw).
>
> 5 Clients: 96GB, 16(32) CPUs (2.2Ghz), CentOS 5.7
> 1 RS Server: the same config.
>
> Local network with ping between hosts: 0.1 ms
>
>
> 1. HBase client hits the wall at ~ 50K per sec regardless of # of CPU,
> threads, IO pool size and other settings.
> 2. HBase server was able to sustain 170K per sec (with 64K block size). All
> from block cache. KV size = 62 bytes (very small). This is for single Get
> op, 60 threads per client, 5 clients (on different hosts)
> 3. Multi - get hits the wall at the same 170K-200K per sec. Batch size
> tested: 30, 100. The same performance absolutely as with batch size = 1.
> Multi get has some internal issues on RegionServer side. May be excessive
> locking or some thing else.
>
>
>
>
>
> On Tue, Jul 30, 2013 at 2:01 PM, Vladimir Rodionov
> <vl...@gmail.com>wrote:
>
> > 1. SCR are enabled
> > 2. Single Configuration for all table did not work well, but I will try
> it
> > again
> > 3. With Nagel I had 0.8ms avg, w/o - 0.4ms - I see the difference
> >
> >
> > On Tue, Jul 30, 2013 at 1:50 PM, lars hofhansl <la...@apache.org> wrote:
> >
> >> With Nagle's you'd see something around 40ms. You are not saying 0.8ms
> >> RTT is bad, right? Are you seeing ~40ms latencies?
> >>
> >> This thread has gotten confusing.
> >>
> >> I would try these:
> >> * one Configuration for all tables. Or even use a single
> >> HConnection/Threadpool and use the HTable(byte[], HConnection,
> >> ExecutorService) constructor
> >> * disable Nagle's: set both ipc.server.tcpnodelay and
> >> hbase.ipc.client.tcpnodelay to true in hbase-site.xml (both client *and*
> >> server)
> >> * increase hbase.client.ipc.pool.size in client's hbase-site.xml
> >> * enable short circuit reads (details depend on exact version of
> Hadoop).
> >> Google will help :)
> >>
> >> -- Lars
> >>
> >>
> >> ----- Original Message -----
> >> From: Vladimir Rodionov <vl...@gmail.com>
> >> To: dev@hbase.apache.org
> >> Cc:
> >> Sent: Tuesday, July 30, 2013 1:30 PM
> >> Subject: Re: HBase read perfomnance and HBase client
> >>
> >> This hbase.ipc.client.tcpnodelay (default - false) explains poor single
> >> thread performance and high latency ( 0.8ms in local network)?
> >>
> >>
> >> On Tue, Jul 30, 2013 at 1:22 PM, Vladimir Rodionov
> >> <vl...@gmail.com>wrote:
> >>
> >> > One more observation: One Configuration instance per HTable gives 50%
> >> > boost as compared to single Configuration object for all HTable's -
> from
> >> > 20K to 30K
> >> >
> >> >
> >> > On Tue, Jul 30, 2013 at 1:17 PM, Vladimir Rodionov <
> >> vladrodionov@gmail.com
> >> > > wrote:
> >> >
> >> >> This thread dump has been taken when client was sending 60 requests
> in
> >> >> parallel (at least, in theory). There are 50 server handler threads.
> >> >>
> >> >>
> >> >> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov <
> >> >> vladrodionov@gmail.com> wrote:
> >> >>
> >> >>> Sure, here it is:
> >> >>>
> >> >>> http://pastebin.com/8TjyrKRT
> >> >>>
> >> >>> epoll is not only to read/write HDFS but to connect/listen to
> clients
> >> as
> >> >>> well?
> >> >>>
> >> >>>
> >> >>> On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <
> >> >>> jdcryans@apache.org> wrote:
> >> >>>
> >> >>>> Can you show us what the thread dump looks like when the threads
> are
> >> >>>> BLOCKED? There aren't that many locks on the read path when reading
> >> >>>> out of the block cache, and epoll would only happen if you need to
> >> hit
> >> >>>> HDFS, which you're saying is not happening.
> >> >>>>
> >> >>>> J-D
> >> >>>>
> >> >>>> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
> >> >>>> <vl...@gmail.com> wrote:
> >> >>>> > I am hitting data in a block cache, of course. The data set is
> very
> >> >>>> small
> >> >>>> > to fit comfortably into block cache and all request are directed
> to
> >> >>>> the
> >> >>>> > same Region to guarantee single RS testing.
> >> >>>> >
> >> >>>> > To Ted:
> >> >>>> >
> >> >>>> > Yes, its CDH 4.3 . What the difference between 94.10 and 94.6
> with
> >> >>>> respect
> >> >>>> > to read performance?
> >> >>>> >
> >> >>>> >
> >> >>>> > On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <
> >> >>>> jdcryans@apache.org>wrote:
> >> >>>> >
> >> >>>> >> That's a tough one.
> >> >>>> >>
> >> >>>> >> One thing that comes to mind is socket reuse. It used to come up
> >> more
> >> >>>> >> more often but this is an issue that people hit when doing loads
> >> of
> >> >>>> >> random reads. Try enabling tcp_tw_recycle but I'm not
> guaranteeing
> >> >>>> >> anything :)
> >> >>>> >>
> >> >>>> >> Also if you _just_ want to saturate something, be it CPU or
> >> network,
> >> >>>> >> wouldn't it be better to hit data only in the block cache? This
> >> way
> >> >>>> it
> >> >>>> >> has the lowest overhead?
> >> >>>> >>
> >> >>>> >> Last thing I wanted to mention is that yes, the client doesn't
> >> scale
> >> >>>> >> very well. I would suggest you give the asynchbase client a run.
> >> >>>> >>
> >> >>>> >> J-D
> >> >>>> >>
> >> >>>> >> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
> >> >>>> >> <vr...@carrieriq.com> wrote:
> >> >>>> >> > I have been doing quite extensive testing of different read
> >> >>>> scenarios:
> >> >>>> >> >
> >> >>>> >> > 1. blockcache disabled/enabled
> >> >>>> >> > 2. data is local/remote (no good hdfs locality)
> >> >>>> >> >
> >> >>>> >> > and it turned out that that I can not saturate 1 RS using one
> >> >>>> >> (comparable in CPU power and RAM) client host:
> >> >>>> >> >
> >> >>>> >> >  I am running client app with 60 read threads active (with
> >> >>>> multi-get)
> >> >>>> >> that is going to one particular RS and
> >> >>>> >> > this RS's load is 100 -150% (out of 3200% available) - it
> means
> >> >>>> that
> >> >>>> >> load is ~5%
> >> >>>> >> >
> >> >>>> >> > All threads in RS are either in BLOCKED (wait) or in IN_NATIVE
> >> >>>> states
> >> >>>> >> (epoll)
> >> >>>> >> >
> >> >>>> >> > I attribute this  to the HBase client implementation which
> seems
> >> >>>> to be
> >> >>>> >> not scalable (I am going dig into client later on today).
> >> >>>> >> >
> >> >>>> >> > Some numbers: The maximum what I could get from Single get (60
> >> >>>> threads):
> >> >>>> >> 30K per sec. Multiget gives ~ 75K (60 threads)
> >> >>>> >> >
> >> >>>> >> > What are my options? I want to measure the limits and I do not
> >> >>>> want to
> >> >>>> >> run Cluster of clients against just ONE Region Server?
> >> >>>> >> >
> >> >>>> >> > RS config: 96GB RAM, 16(32) CPU
> >> >>>> >> > Client     : 48GB RAM   8 (16) CPU
> >> >>>> >> >
> >> >>>> >> > Best regards,
> >> >>>> >> > Vladimir Rodionov
> >> >>>> >> > Principal Platform Engineer
> >> >>>> >> > Carrier IQ, www.carrieriq.com
> >> >>>> >> > e-mail: vrodionov@carrieriq.com
> >> >>>> >> >
> >> >>>> >> >
> >> >>>> >> > Confidentiality Notice:  The information contained in this
> >> message,
> >> >>>> >> including any attachments hereto, may be confidential and is
> >> >>>> intended to be
> >> >>>> >> read only by the individual or entity to whom this message is
> >> >>>> addressed. If
> >> >>>> >> the reader of this message is not the intended recipient or an
> >> agent
> >> >>>> or
> >> >>>> >> designee of the intended recipient, please note that any review,
> >> use,
> >> >>>> >> disclosure or distribution of this message or its attachments,
> in
> >> >>>> any form,
> >> >>>> >> is strictly prohibited.  If you have received this message in
> >> error,
> >> >>>> please
> >> >>>> >> immediately notify the sender and/or
> Notifications@carrieriq.comand
> >> >>>> >> delete or destroy any copy of this message and its attachments.
> >> >>>> >>
> >> >>>>
> >> >>>
> >> >>>
> >> >>
> >> >
> >>
> >>
> >
>


Re: HBase read perfomnance and HBase client

Posted by Vladimir Rodionov <vl...@gmail.com>.
Smaller block size (32K) does not give any performance gain and this is
strange, to say the least.


On Wed, Jul 31, 2013 at 9:33 PM, lars hofhansl <la...@apache.org> wrote:

> Would be interesting to profile MultiGet. With RTT of 0.1ms, the internal
> RS friction is probably the main contributor.
> In fact MultiGet just loops over the set at the RS and calls single gets
> on the various regions.
>
> Each Get needs to reseek into the block (even when it is cached, since KVs
> have variable size).
>
> There are HBASE-6136 and HBASE-8362.
>
>
> -- Lars
>
> ________________________________
> From: Vladimir Rodionov <vl...@gmail.com>
> To: dev@hbase.apache.org; lars hofhansl <la...@apache.org>
> Sent: Wednesday, July 31, 2013 7:27 PM
> Subject: Re: HBase read perfomnance and HBase client
>
>
> Some final numbers :
>
> Test config:
>
> HBase 0.94.6
> blockcache=true, block size = 64K, KV size = 62 bytes (raw).
>
> 5 Clients: 96GB, 16(32) CPUs (2.2Ghz), CentOS 5.7
> 1 RS Server: the same config.
>
> Local network with ping between hosts: 0.1 ms
>
>
> 1. HBase client hits the wall at ~ 50K per sec regardless of # of CPU,
> threads, IO pool size and other settings.
> 2. HBase server was able to sustain 170K per sec (with 64K block size). All
> from block cache. KV size = 62 bytes (very small). This is for single Get
> op, 60 threads per client, 5 clients (on different hosts)
> 3. Multi - get hits the wall at the same 170K-200K per sec. Batch size
> tested: 30, 100. The same performance absolutely as with batch size = 1.
> Multi get has some internal issues on RegionServer side. May be excessive
> locking or some thing else.
>
>
>
>
>
> On Tue, Jul 30, 2013 at 2:01 PM, Vladimir Rodionov
> <vl...@gmail.com>wrote:
>
> > 1. SCR are enabled
> > 2. Single Configuration for all table did not work well, but I will try
> it
> > again
> > 3. With Nagel I had 0.8ms avg, w/o - 0.4ms - I see the difference
> >
> >
> > On Tue, Jul 30, 2013 at 1:50 PM, lars hofhansl <la...@apache.org> wrote:
> >
> >> With Nagle's you'd see something around 40ms. You are not saying 0.8ms
> >> RTT is bad, right? Are you seeing ~40ms latencies?
> >>
> >> This thread has gotten confusing.
> >>
> >> I would try these:
> >> * one Configuration for all tables. Or even use a single
> >> HConnection/Threadpool and use the HTable(byte[], HConnection,
> >> ExecutorService) constructor
> >> * disable Nagle's: set both ipc.server.tcpnodelay and
> >> hbase.ipc.client.tcpnodelay to true in hbase-site.xml (both client *and*
> >> server)
> >> * increase hbase.client.ipc.pool.size in client's hbase-site.xml
> >> * enable short circuit reads (details depend on exact version of
> Hadoop).
> >> Google will help :)
> >>
> >> -- Lars
> >>
> >>
> >> ----- Original Message -----
> >> From: Vladimir Rodionov <vl...@gmail.com>
> >> To: dev@hbase.apache.org
> >> Cc:
> >> Sent: Tuesday, July 30, 2013 1:30 PM
> >> Subject: Re: HBase read perfomnance and HBase client
> >>
> >> This hbase.ipc.client.tcpnodelay (default - false) explains poor single
> >> thread performance and high latency ( 0.8ms in local network)?
> >>
> >>
> >> On Tue, Jul 30, 2013 at 1:22 PM, Vladimir Rodionov
> >> <vl...@gmail.com>wrote:
> >>
> >> > One more observation: One Configuration instance per HTable gives 50%
> >> > boost as compared to single Configuration object for all HTable's -
> from
> >> > 20K to 30K
> >> >
> >> >
> >> > On Tue, Jul 30, 2013 at 1:17 PM, Vladimir Rodionov <
> >> vladrodionov@gmail.com
> >> > > wrote:
> >> >
> >> >> This thread dump has been taken when client was sending 60 requests
> in
> >> >> parallel (at least, in theory). There are 50 server handler threads.
> >> >>
> >> >>
> >> >> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov <
> >> >> vladrodionov@gmail.com> wrote:
> >> >>
> >> >>> Sure, here it is:
> >> >>>
> >> >>> http://pastebin.com/8TjyrKRT
> >> >>>
> >> >>> epoll is not only to read/write HDFS but to connect/listen to
> clients
> >> as
> >> >>> well?
> >> >>>
> >> >>>
> >> >>> On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <
> >> >>> jdcryans@apache.org> wrote:
> >> >>>
> >> >>>> Can you show us what the thread dump looks like when the threads
> are
> >> >>>> BLOCKED? There aren't that many locks on the read path when reading
> >> >>>> out of the block cache, and epoll would only happen if you need to
> >> hit
> >> >>>> HDFS, which you're saying is not happening.
> >> >>>>
> >> >>>> J-D
> >> >>>>
> >> >>>> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
> >> >>>> <vl...@gmail.com> wrote:
> >> >>>> > I am hitting data in a block cache, of course. The data set is
> very
> >> >>>> small
> >> >>>> > to fit comfortably into block cache and all request are directed
> to
> >> >>>> the
> >> >>>> > same Region to guarantee single RS testing.
> >> >>>> >
> >> >>>> > To Ted:
> >> >>>> >
> >> >>>> > Yes, its CDH 4.3 . What the difference between 94.10 and 94.6
> with
> >> >>>> respect
> >> >>>> > to read performance?
> >> >>>> >
> >> >>>> >
> >> >>>> > On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <
> >> >>>> jdcryans@apache.org>wrote:
> >> >>>> >
> >> >>>> >> That's a tough one.
> >> >>>> >>
> >> >>>> >> One thing that comes to mind is socket reuse. It used to come up
> >> more
> >> >>>> >> more often but this is an issue that people hit when doing loads
> >> of
> >> >>>> >> random reads. Try enabling tcp_tw_recycle but I'm not
> guaranteeing
> >> >>>> >> anything :)
> >> >>>> >>
> >> >>>> >> Also if you _just_ want to saturate something, be it CPU or
> >> network,
> >> >>>> >> wouldn't it be better to hit data only in the block cache? This
> >> way
> >> >>>> it
> >> >>>> >> has the lowest overhead?
> >> >>>> >>
> >> >>>> >> Last thing I wanted to mention is that yes, the client doesn't
> >> scale
> >> >>>> >> very well. I would suggest you give the asynchbase client a run.
> >> >>>> >>
> >> >>>> >> J-D
> >> >>>> >>
> >> >>>> >> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
> >> >>>> >> <vr...@carrieriq.com> wrote:
> >> >>>> >> > I have been doing quite extensive testing of different read
> >> >>>> scenarios:
> >> >>>> >> >
> >> >>>> >> > 1. blockcache disabled/enabled
> >> >>>> >> > 2. data is local/remote (no good hdfs locality)
> >> >>>> >> >
> >> >>>> >> > and it turned out that that I can not saturate 1 RS using one
> >> >>>> >> (comparable in CPU power and RAM) client host:
> >> >>>> >> >
> >> >>>> >> >  I am running client app with 60 read threads active (with
> >> >>>> multi-get)
> >> >>>> >> that is going to one particular RS and
> >> >>>> >> > this RS's load is 100 -150% (out of 3200% available) - it
> means
> >> >>>> that
> >> >>>> >> load is ~5%
> >> >>>> >> >
> >> >>>> >> > All threads in RS are either in BLOCKED (wait) or in IN_NATIVE
> >> >>>> states
> >> >>>> >> (epoll)
> >> >>>> >> >
> >> >>>> >> > I attribute this  to the HBase client implementation which
> seems
> >> >>>> to be
> >> >>>> >> not scalable (I am going dig into client later on today).
> >> >>>> >> >
> >> >>>> >> > Some numbers: The maximum what I could get from Single get (60
> >> >>>> threads):
> >> >>>> >> 30K per sec. Multiget gives ~ 75K (60 threads)
> >> >>>> >> >
> >> >>>> >> > What are my options? I want to measure the limits and I do not
> >> >>>> want to
> >> >>>> >> run Cluster of clients against just ONE Region Server?
> >> >>>> >> >
> >> >>>> >> > RS config: 96GB RAM, 16(32) CPU
> >> >>>> >> > Client     : 48GB RAM   8 (16) CPU
> >> >>>> >> >
> >> >>>> >> > Best regards,
> >> >>>> >> > Vladimir Rodionov
> >> >>>> >> > Principal Platform Engineer
> >> >>>> >> > Carrier IQ, www.carrieriq.com
> >> >>>> >> > e-mail: vrodionov@carrieriq.com
> >> >>>> >> >
> >> >>>> >> >
> >> >>>> >> > Confidentiality Notice:  The information contained in this
> >> message,
> >> >>>> >> including any attachments hereto, may be confidential and is
> >> >>>> intended to be
> >> >>>> >> read only by the individual or entity to whom this message is
> >> >>>> addressed. If
> >> >>>> >> the reader of this message is not the intended recipient or an
> >> agent
> >> >>>> or
> >> >>>> >> designee of the intended recipient, please note that any review,
> >> use,
> >> >>>> >> disclosure or distribution of this message or its attachments,
> in
> >> >>>> any form,
> >> >>>> >> is strictly prohibited.  If you have received this message in
> >> error,
> >> >>>> please
> >> >>>> >> immediately notify the sender and/or
> Notifications@carrieriq.comand
> >> >>>> >> delete or destroy any copy of this message and its attachments.
> >> >>>> >>
> >> >>>>
> >> >>>
> >> >>>
> >> >>
> >> >
> >>
> >>
> >
>

Re: HBase read perfomnance and HBase client

Posted by lars hofhansl <la...@apache.org>.
Would be interesting to profile MultiGet. With RTT of 0.1ms, the internal RS friction is probably the main contributor.
In fact MultiGet just loops over the set at the RS and calls single gets on the various regions.

Each Get needs to reseek into the block (even when it is cached, since KVs have variable size).

There are HBASE-6136 and HBASE-8362.


-- Lars

________________________________
From: Vladimir Rodionov <vl...@gmail.com>
To: dev@hbase.apache.org; lars hofhansl <la...@apache.org> 
Sent: Wednesday, July 31, 2013 7:27 PM
Subject: Re: HBase read perfomnance and HBase client


Some final numbers :

Test config:

HBase 0.94.6
blockcache=true, block size = 64K, KV size = 62 bytes (raw).

5 Clients: 96GB, 16(32) CPUs (2.2Ghz), CentOS 5.7
1 RS Server: the same config.

Local network with ping between hosts: 0.1 ms


1. HBase client hits the wall at ~ 50K per sec regardless of # of CPU,
threads, IO pool size and other settings.
2. HBase server was able to sustain 170K per sec (with 64K block size). All
from block cache. KV size = 62 bytes (very small). This is for single Get
op, 60 threads per client, 5 clients (on different hosts)
3. Multi - get hits the wall at the same 170K-200K per sec. Batch size
tested: 30, 100. The same performance absolutely as with batch size = 1.
Multi get has some internal issues on RegionServer side. May be excessive
locking or some thing else.





On Tue, Jul 30, 2013 at 2:01 PM, Vladimir Rodionov
<vl...@gmail.com>wrote:

> 1. SCR are enabled
> 2. Single Configuration for all table did not work well, but I will try it
> again
> 3. With Nagel I had 0.8ms avg, w/o - 0.4ms - I see the difference
>
>
> On Tue, Jul 30, 2013 at 1:50 PM, lars hofhansl <la...@apache.org> wrote:
>
>> With Nagle's you'd see something around 40ms. You are not saying 0.8ms
>> RTT is bad, right? Are you seeing ~40ms latencies?
>>
>> This thread has gotten confusing.
>>
>> I would try these:
>> * one Configuration for all tables. Or even use a single
>> HConnection/Threadpool and use the HTable(byte[], HConnection,
>> ExecutorService) constructor
>> * disable Nagle's: set both ipc.server.tcpnodelay and
>> hbase.ipc.client.tcpnodelay to true in hbase-site.xml (both client *and*
>> server)
>> * increase hbase.client.ipc.pool.size in client's hbase-site.xml
>> * enable short circuit reads (details depend on exact version of Hadoop).
>> Google will help :)
>>
>> -- Lars
>>
>>
>> ----- Original Message -----
>> From: Vladimir Rodionov <vl...@gmail.com>
>> To: dev@hbase.apache.org
>> Cc:
>> Sent: Tuesday, July 30, 2013 1:30 PM
>> Subject: Re: HBase read perfomnance and HBase client
>>
>> This hbase.ipc.client.tcpnodelay (default - false) explains poor single
>> thread performance and high latency ( 0.8ms in local network)?
>>
>>
>> On Tue, Jul 30, 2013 at 1:22 PM, Vladimir Rodionov
>> <vl...@gmail.com>wrote:
>>
>> > One more observation: One Configuration instance per HTable gives 50%
>> > boost as compared to single Configuration object for all HTable's - from
>> > 20K to 30K
>> >
>> >
>> > On Tue, Jul 30, 2013 at 1:17 PM, Vladimir Rodionov <
>> vladrodionov@gmail.com
>> > > wrote:
>> >
>> >> This thread dump has been taken when client was sending 60 requests in
>> >> parallel (at least, in theory). There are 50 server handler threads.
>> >>
>> >>
>> >> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov <
>> >> vladrodionov@gmail.com> wrote:
>> >>
>> >>> Sure, here it is:
>> >>>
>> >>> http://pastebin.com/8TjyrKRT
>> >>>
>> >>> epoll is not only to read/write HDFS but to connect/listen to clients
>> as
>> >>> well?
>> >>>
>> >>>
>> >>> On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <
>> >>> jdcryans@apache.org> wrote:
>> >>>
>> >>>> Can you show us what the thread dump looks like when the threads are
>> >>>> BLOCKED? There aren't that many locks on the read path when reading
>> >>>> out of the block cache, and epoll would only happen if you need to
>> hit
>> >>>> HDFS, which you're saying is not happening.
>> >>>>
>> >>>> J-D
>> >>>>
>> >>>> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
>> >>>> <vl...@gmail.com> wrote:
>> >>>> > I am hitting data in a block cache, of course. The data set is very
>> >>>> small
>> >>>> > to fit comfortably into block cache and all request are directed to
>> >>>> the
>> >>>> > same Region to guarantee single RS testing.
>> >>>> >
>> >>>> > To Ted:
>> >>>> >
>> >>>> > Yes, its CDH 4.3 . What the difference between 94.10 and 94.6 with
>> >>>> respect
>> >>>> > to read performance?
>> >>>> >
>> >>>> >
>> >>>> > On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <
>> >>>> jdcryans@apache.org>wrote:
>> >>>> >
>> >>>> >> That's a tough one.
>> >>>> >>
>> >>>> >> One thing that comes to mind is socket reuse. It used to come up
>> more
>> >>>> >> more often but this is an issue that people hit when doing loads
>> of
>> >>>> >> random reads. Try enabling tcp_tw_recycle but I'm not guaranteeing
>> >>>> >> anything :)
>> >>>> >>
>> >>>> >> Also if you _just_ want to saturate something, be it CPU or
>> network,
>> >>>> >> wouldn't it be better to hit data only in the block cache? This
>> way
>> >>>> it
>> >>>> >> has the lowest overhead?
>> >>>> >>
>> >>>> >> Last thing I wanted to mention is that yes, the client doesn't
>> scale
>> >>>> >> very well. I would suggest you give the asynchbase client a run.
>> >>>> >>
>> >>>> >> J-D
>> >>>> >>
>> >>>> >> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
>> >>>> >> <vr...@carrieriq.com> wrote:
>> >>>> >> > I have been doing quite extensive testing of different read
>> >>>> scenarios:
>> >>>> >> >
>> >>>> >> > 1. blockcache disabled/enabled
>> >>>> >> > 2. data is local/remote (no good hdfs locality)
>> >>>> >> >
>> >>>> >> > and it turned out that that I can not saturate 1 RS using one
>> >>>> >> (comparable in CPU power and RAM) client host:
>> >>>> >> >
>> >>>> >> >  I am running client app with 60 read threads active (with
>> >>>> multi-get)
>> >>>> >> that is going to one particular RS and
>> >>>> >> > this RS's load is 100 -150% (out of 3200% available) - it means
>> >>>> that
>> >>>> >> load is ~5%
>> >>>> >> >
>> >>>> >> > All threads in RS are either in BLOCKED (wait) or in IN_NATIVE
>> >>>> states
>> >>>> >> (epoll)
>> >>>> >> >
>> >>>> >> > I attribute this  to the HBase client implementation which seems
>> >>>> to be
>> >>>> >> not scalable (I am going dig into client later on today).
>> >>>> >> >
>> >>>> >> > Some numbers: The maximum what I could get from Single get (60
>> >>>> threads):
>> >>>> >> 30K per sec. Multiget gives ~ 75K (60 threads)
>> >>>> >> >
>> >>>> >> > What are my options? I want to measure the limits and I do not
>> >>>> want to
>> >>>> >> run Cluster of clients against just ONE Region Server?
>> >>>> >> >
>> >>>> >> > RS config: 96GB RAM, 16(32) CPU
>> >>>> >> > Client     : 48GB RAM   8 (16) CPU
>> >>>> >> >
>> >>>> >> > Best regards,
>> >>>> >> > Vladimir Rodionov
>> >>>> >> > Principal Platform Engineer
>> >>>> >> > Carrier IQ, www.carrieriq.com
>> >>>> >> > e-mail: vrodionov@carrieriq.com
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > Confidentiality Notice:  The information contained in this
>> message,
>> >>>> >> including any attachments hereto, may be confidential and is
>> >>>> intended to be
>> >>>> >> read only by the individual or entity to whom this message is
>> >>>> addressed. If
>> >>>> >> the reader of this message is not the intended recipient or an
>> agent
>> >>>> or
>> >>>> >> designee of the intended recipient, please note that any review,
>> use,
>> >>>> >> disclosure or distribution of this message or its attachments, in
>> >>>> any form,
>> >>>> >> is strictly prohibited.  If you have received this message in
>> error,
>> >>>> please
>> >>>> >> immediately notify the sender and/or Notifications@carrieriq.comand
>> >>>> >> delete or destroy any copy of this message and its attachments.
>> >>>> >>
>> >>>>
>> >>>
>> >>>
>> >>
>> >
>>
>>
>

Re: HBase read perfomnance and HBase client

Posted by Vladimir Rodionov <vl...@gmail.com>.
Micheal, network is not a bottleneck as since raw KV size is 62 bytes. 1GbE
can pump > 1 M per sec of these objects.

block cache is enabled, size ~ 2GB, query data set is less than 1MB, block
cache hit rate 99% (I think its 99.99% in reality)


On Thu, Aug 1, 2013 at 12:10 PM, Michael Segel <ms...@hotmail.com>wrote:

> Ok... Bonded 1GbE is less than 2GbE, not sure of actual max throughput.
>
> Are you hitting data in cache or are you fetching data from disk?
> I mean can we rule out disk I/O because the data would most likely be in
> cache?
>
> Are you monitoring your cluster w Ganglia? What do you see in terms of
> network traffic?
> Are all of the nodes in the test cluster on the same switch? Including the
> client?
>
>
> (Sorry, I'm currently looking at a network problem so now everything I see
> may be a networking problem. And a guy from Arista found me after our
> meetup last night so I am thinking about the impact on networking in the
> ecosystem. :-).  )
>
>
> -Just some guy out in left field...
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Aug 1, 2013, at 1:11 PM, "Vladimir Rodionov" <vl...@gmail.com>
> wrote:
>
> > 2x1Gb bonded, I think. This is our standard config.
> >
> >
> > On Thu, Aug 1, 2013 at 10:27 AM, Michael Segel <
> msegel_hadoop@hotmail.com>wrote:
> >
> >> Network? 1GbE or 10GbE?
> >>
> >> Sent from a remote device. Please excuse any typos...
> >>
> >> Mike Segel
> >>
> >> On Jul 31, 2013, at 9:27 PM, "Vladimir Rodionov" <
> vladrodionov@gmail.com>
> >> wrote:
> >>
> >>> Some final numbers :
> >>>
> >>> Test config:
> >>>
> >>> HBase 0.94.6
> >>> blockcache=true, block size = 64K, KV size = 62 bytes (raw).
> >>>
> >>> 5 Clients: 96GB, 16(32) CPUs (2.2Ghz), CentOS 5.7
> >>> 1 RS Server: the same config.
> >>>
> >>> Local network with ping between hosts: 0.1 ms
> >>>
> >>>
> >>> 1. HBase client hits the wall at ~ 50K per sec regardless of # of CPU,
> >>> threads, IO pool size and other settings.
> >>> 2. HBase server was able to sustain 170K per sec (with 64K block size).
> >> All
> >>> from block cache. KV size = 62 bytes (very small). This is for single
> Get
> >>> op, 60 threads per client, 5 clients (on different hosts)
> >>> 3. Multi - get hits the wall at the same 170K-200K per sec. Batch size
> >>> tested: 30, 100. The same performance absolutely as with batch size =
> 1.
> >>> Multi get has some internal issues on RegionServer side. May be
> excessive
> >>> locking or some thing else.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Jul 30, 2013 at 2:01 PM, Vladimir Rodionov
> >>> <vl...@gmail.com>wrote:
> >>>
> >>>> 1. SCR are enabled
> >>>> 2. Single Configuration for all table did not work well, but I will
> try
> >> it
> >>>> again
> >>>> 3. With Nagel I had 0.8ms avg, w/o - 0.4ms - I see the difference
> >>>>
> >>>>
> >>>> On Tue, Jul 30, 2013 at 1:50 PM, lars hofhansl <la...@apache.org>
> >> wrote:
> >>>>
> >>>>> With Nagle's you'd see something around 40ms. You are not saying
> 0.8ms
> >>>>> RTT is bad, right? Are you seeing ~40ms latencies?
> >>>>>
> >>>>> This thread has gotten confusing.
> >>>>>
> >>>>> I would try these:
> >>>>> * one Configuration for all tables. Or even use a single
> >>>>> HConnection/Threadpool and use the HTable(byte[], HConnection,
> >>>>> ExecutorService) constructor
> >>>>> * disable Nagle's: set both ipc.server.tcpnodelay and
> >>>>> hbase.ipc.client.tcpnodelay to true in hbase-site.xml (both client
> >> *and*
> >>>>> server)
> >>>>> * increase hbase.client.ipc.pool.size in client's hbase-site.xml
> >>>>> * enable short circuit reads (details depend on exact version of
> >> Hadoop).
> >>>>> Google will help :)
> >>>>>
> >>>>> -- Lars
> >>>>>
> >>>>>
> >>>>> ----- Original Message -----
> >>>>> From: Vladimir Rodionov <vl...@gmail.com>
> >>>>> To: dev@hbase.apache.org
> >>>>> Cc:
> >>>>> Sent: Tuesday, July 30, 2013 1:30 PM
> >>>>> Subject: Re: HBase read perfomnance and HBase client
> >>>>>
> >>>>> This hbase.ipc.client.tcpnodelay (default - false) explains poor
> single
> >>>>> thread performance and high latency ( 0.8ms in local network)?
> >>>>>
> >>>>>
> >>>>> On Tue, Jul 30, 2013 at 1:22 PM, Vladimir Rodionov
> >>>>> <vl...@gmail.com>wrote:
> >>>>>
> >>>>>> One more observation: One Configuration instance per HTable gives
> 50%
> >>>>>> boost as compared to single Configuration object for all HTable's -
> >> from
> >>>>>> 20K to 30K
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Jul 30, 2013 at 1:17 PM, Vladimir Rodionov <
> >>>>> vladrodionov@gmail.com
> >>>>>>> wrote:
> >>>>>>
> >>>>>>> This thread dump has been taken when client was sending 60 requests
> >> in
> >>>>>>> parallel (at least, in theory). There are 50 server handler
> threads.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov <
> >>>>>>> vladrodionov@gmail.com> wrote:
> >>>>>>>
> >>>>>>>> Sure, here it is:
> >>>>>>>>
> >>>>>>>> http://pastebin.com/8TjyrKRT
> >>>>>>>>
> >>>>>>>> epoll is not only to read/write HDFS but to connect/listen to
> >> clients
> >>>>> as
> >>>>>>>> well?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <
> >>>>>>>> jdcryans@apache.org> wrote:
> >>>>>>>>
> >>>>>>>>> Can you show us what the thread dump looks like when the threads
> >> are
> >>>>>>>>> BLOCKED? There aren't that many locks on the read path when
> reading
> >>>>>>>>> out of the block cache, and epoll would only happen if you need
> to
> >>>>> hit
> >>>>>>>>> HDFS, which you're saying is not happening.
> >>>>>>>>>
> >>>>>>>>> J-D
> >>>>>>>>>
> >>>>>>>>> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
> >>>>>>>>> <vl...@gmail.com> wrote:
> >>>>>>>>>> I am hitting data in a block cache, of course. The data set is
> >> very
> >>>>>>>>> small
> >>>>>>>>>> to fit comfortably into block cache and all request are directed
> >> to
> >>>>>>>>> the
> >>>>>>>>>> same Region to guarantee single RS testing.
> >>>>>>>>>>
> >>>>>>>>>> To Ted:
> >>>>>>>>>>
> >>>>>>>>>> Yes, its CDH 4.3 . What the difference between 94.10 and 94.6
> with
> >>>>>>>>> respect
> >>>>>>>>>> to read performance?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <
> >>>>>>>>> jdcryans@apache.org>wrote:
> >>>>>>>>>>
> >>>>>>>>>>> That's a tough one.
> >>>>>>>>>>>
> >>>>>>>>>>> One thing that comes to mind is socket reuse. It used to come
> up
> >>>>> more
> >>>>>>>>>>> more often but this is an issue that people hit when doing
> loads
> >>>>> of
> >>>>>>>>>>> random reads. Try enabling tcp_tw_recycle but I'm not
> >> guaranteeing
> >>>>>>>>>>> anything :)
> >>>>>>>>>>>
> >>>>>>>>>>> Also if you _just_ want to saturate something, be it CPU or
> >>>>> network,
> >>>>>>>>>>> wouldn't it be better to hit data only in the block cache? This
> >>>>> way
> >>>>>>>>> it
> >>>>>>>>>>> has the lowest overhead?
> >>>>>>>>>>>
> >>>>>>>>>>> Last thing I wanted to mention is that yes, the client doesn't
> >>>>> scale
> >>>>>>>>>>> very well. I would suggest you give the asynchbase client a
> run.
> >>>>>>>>>>>
> >>>>>>>>>>> J-D
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
> >>>>>>>>>>> <vr...@carrieriq.com> wrote:
> >>>>>>>>>>>> I have been doing quite extensive testing of different read
> >>>>>>>>> scenarios:
> >>>>>>>>>>>>
> >>>>>>>>>>>> 1. blockcache disabled/enabled
> >>>>>>>>>>>> 2. data is local/remote (no good hdfs locality)
> >>>>>>>>>>>>
> >>>>>>>>>>>> and it turned out that that I can not saturate 1 RS using one
> >>>>>>>>>>> (comparable in CPU power and RAM) client host:
> >>>>>>>>>>>>
> >>>>>>>>>>>> I am running client app with 60 read threads active (with
> >>>>>>>>> multi-get)
> >>>>>>>>>>> that is going to one particular RS and
> >>>>>>>>>>>> this RS's load is 100 -150% (out of 3200% available) - it
> means
> >>>>>>>>> that
> >>>>>>>>>>> load is ~5%
> >>>>>>>>>>>>
> >>>>>>>>>>>> All threads in RS are either in BLOCKED (wait) or in IN_NATIVE
> >>>>>>>>> states
> >>>>>>>>>>> (epoll)
> >>>>>>>>>>>>
> >>>>>>>>>>>> I attribute this  to the HBase client implementation which
> seems
> >>>>>>>>> to be
> >>>>>>>>>>> not scalable (I am going dig into client later on today).
> >>>>>>>>>>>>
> >>>>>>>>>>>> Some numbers: The maximum what I could get from Single get (60
> >>>>>>>>> threads):
> >>>>>>>>>>> 30K per sec. Multiget gives ~ 75K (60 threads)
> >>>>>>>>>>>>
> >>>>>>>>>>>> What are my options? I want to measure the limits and I do not
> >>>>>>>>> want to
> >>>>>>>>>>> run Cluster of clients against just ONE Region Server?
> >>>>>>>>>>>>
> >>>>>>>>>>>> RS config: 96GB RAM, 16(32) CPU
> >>>>>>>>>>>> Client     : 48GB RAM   8 (16) CPU
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best regards,
> >>>>>>>>>>>> Vladimir Rodionov
> >>>>>>>>>>>> Principal Platform Engineer
> >>>>>>>>>>>> Carrier IQ, www.carrieriq.com
> >>>>>>>>>>>> e-mail: vrodionov@carrieriq.com
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Confidentiality Notice:  The information contained in this
> >>>>> message,
> >>>>>>>>>>> including any attachments hereto, may be confidential and is
> >>>>>>>>> intended to be
> >>>>>>>>>>> read only by the individual or entity to whom this message is
> >>>>>>>>> addressed. If
> >>>>>>>>>>> the reader of this message is not the intended recipient or an
> >>>>> agent
> >>>>>>>>> or
> >>>>>>>>>>> designee of the intended recipient, please note that any
> review,
> >>>>> use,
> >>>>>>>>>>> disclosure or distribution of this message or its attachments,
> in
> >>>>>>>>> any form,
> >>>>>>>>>>> is strictly prohibited.  If you have received this message in
> >>>>> error,
> >>>>>>>>> please
> >>>>>>>>>>> immediately notify the sender and/or
> >> Notifications@carrieriq.comand
> >>>>>>>>>>> delete or destroy any copy of this message and its attachments.
> >>
>

Re: HBase read perfomnance and HBase client

Posted by Michael Segel <ms...@hotmail.com>.
Ok... Bonded 1GbE is less than 2GbE, not sure of actual max throughput. 

Are you hitting data in cache or are you fetching data from disk?
I mean can we rule out disk I/O because the data would most likely be in cache?

Are you monitoring your cluster w Ganglia? What do you see in terms of network traffic?
Are all of the nodes in the test cluster on the same switch? Including the client?


(Sorry, I'm currently looking at a network problem so now everything I see may be a networking problem. And a guy from Arista found me after our meetup last night so I am thinking about the impact on networking in the ecosystem. :-).  )


-Just some guy out in left field... 

Sent from a remote device. Please excuse any typos...

Mike Segel

On Aug 1, 2013, at 1:11 PM, "Vladimir Rodionov" <vl...@gmail.com> wrote:

> 2x1Gb bonded, I think. This is our standard config.
> 
> 
> On Thu, Aug 1, 2013 at 10:27 AM, Michael Segel <ms...@hotmail.com>wrote:
> 
>> Network? 1GbE or 10GbE?
>> 
>> Sent from a remote device. Please excuse any typos...
>> 
>> Mike Segel
>> 
>> On Jul 31, 2013, at 9:27 PM, "Vladimir Rodionov" <vl...@gmail.com>
>> wrote:
>> 
>>> Some final numbers :
>>> 
>>> Test config:
>>> 
>>> HBase 0.94.6
>>> blockcache=true, block size = 64K, KV size = 62 bytes (raw).
>>> 
>>> 5 Clients: 96GB, 16(32) CPUs (2.2Ghz), CentOS 5.7
>>> 1 RS Server: the same config.
>>> 
>>> Local network with ping between hosts: 0.1 ms
>>> 
>>> 
>>> 1. HBase client hits the wall at ~ 50K per sec regardless of # of CPU,
>>> threads, IO pool size and other settings.
>>> 2. HBase server was able to sustain 170K per sec (with 64K block size).
>> All
>>> from block cache. KV size = 62 bytes (very small). This is for single Get
>>> op, 60 threads per client, 5 clients (on different hosts)
>>> 3. Multi - get hits the wall at the same 170K-200K per sec. Batch size
>>> tested: 30, 100. The same performance absolutely as with batch size = 1.
>>> Multi get has some internal issues on RegionServer side. May be excessive
>>> locking or some thing else.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Tue, Jul 30, 2013 at 2:01 PM, Vladimir Rodionov
>>> <vl...@gmail.com>wrote:
>>> 
>>>> 1. SCR are enabled
>>>> 2. Single Configuration for all table did not work well, but I will try
>> it
>>>> again
>>>> 3. With Nagel I had 0.8ms avg, w/o - 0.4ms - I see the difference
>>>> 
>>>> 
>>>> On Tue, Jul 30, 2013 at 1:50 PM, lars hofhansl <la...@apache.org>
>> wrote:
>>>> 
>>>>> With Nagle's you'd see something around 40ms. You are not saying 0.8ms
>>>>> RTT is bad, right? Are you seeing ~40ms latencies?
>>>>> 
>>>>> This thread has gotten confusing.
>>>>> 
>>>>> I would try these:
>>>>> * one Configuration for all tables. Or even use a single
>>>>> HConnection/Threadpool and use the HTable(byte[], HConnection,
>>>>> ExecutorService) constructor
>>>>> * disable Nagle's: set both ipc.server.tcpnodelay and
>>>>> hbase.ipc.client.tcpnodelay to true in hbase-site.xml (both client
>> *and*
>>>>> server)
>>>>> * increase hbase.client.ipc.pool.size in client's hbase-site.xml
>>>>> * enable short circuit reads (details depend on exact version of
>> Hadoop).
>>>>> Google will help :)
>>>>> 
>>>>> -- Lars
>>>>> 
>>>>> 
>>>>> ----- Original Message -----
>>>>> From: Vladimir Rodionov <vl...@gmail.com>
>>>>> To: dev@hbase.apache.org
>>>>> Cc:
>>>>> Sent: Tuesday, July 30, 2013 1:30 PM
>>>>> Subject: Re: HBase read perfomnance and HBase client
>>>>> 
>>>>> This hbase.ipc.client.tcpnodelay (default - false) explains poor single
>>>>> thread performance and high latency ( 0.8ms in local network)?
>>>>> 
>>>>> 
>>>>> On Tue, Jul 30, 2013 at 1:22 PM, Vladimir Rodionov
>>>>> <vl...@gmail.com>wrote:
>>>>> 
>>>>>> One more observation: One Configuration instance per HTable gives 50%
>>>>>> boost as compared to single Configuration object for all HTable's -
>> from
>>>>>> 20K to 30K
>>>>>> 
>>>>>> 
>>>>>> On Tue, Jul 30, 2013 at 1:17 PM, Vladimir Rodionov <
>>>>> vladrodionov@gmail.com
>>>>>>> wrote:
>>>>>> 
>>>>>>> This thread dump has been taken when client was sending 60 requests
>> in
>>>>>>> parallel (at least, in theory). There are 50 server handler threads.
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov <
>>>>>>> vladrodionov@gmail.com> wrote:
>>>>>>> 
>>>>>>>> Sure, here it is:
>>>>>>>> 
>>>>>>>> http://pastebin.com/8TjyrKRT
>>>>>>>> 
>>>>>>>> epoll is not only to read/write HDFS but to connect/listen to
>> clients
>>>>> as
>>>>>>>> well?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <
>>>>>>>> jdcryans@apache.org> wrote:
>>>>>>>> 
>>>>>>>>> Can you show us what the thread dump looks like when the threads
>> are
>>>>>>>>> BLOCKED? There aren't that many locks on the read path when reading
>>>>>>>>> out of the block cache, and epoll would only happen if you need to
>>>>> hit
>>>>>>>>> HDFS, which you're saying is not happening.
>>>>>>>>> 
>>>>>>>>> J-D
>>>>>>>>> 
>>>>>>>>> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
>>>>>>>>> <vl...@gmail.com> wrote:
>>>>>>>>>> I am hitting data in a block cache, of course. The data set is
>> very
>>>>>>>>> small
>>>>>>>>>> to fit comfortably into block cache and all request are directed
>> to
>>>>>>>>> the
>>>>>>>>>> same Region to guarantee single RS testing.
>>>>>>>>>> 
>>>>>>>>>> To Ted:
>>>>>>>>>> 
>>>>>>>>>> Yes, its CDH 4.3 . What the difference between 94.10 and 94.6 with
>>>>>>>>> respect
>>>>>>>>>> to read performance?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <
>>>>>>>>> jdcryans@apache.org>wrote:
>>>>>>>>>> 
>>>>>>>>>>> That's a tough one.
>>>>>>>>>>> 
>>>>>>>>>>> One thing that comes to mind is socket reuse. It used to come up
>>>>> more
>>>>>>>>>>> more often but this is an issue that people hit when doing loads
>>>>> of
>>>>>>>>>>> random reads. Try enabling tcp_tw_recycle but I'm not
>> guaranteeing
>>>>>>>>>>> anything :)
>>>>>>>>>>> 
>>>>>>>>>>> Also if you _just_ want to saturate something, be it CPU or
>>>>> network,
>>>>>>>>>>> wouldn't it be better to hit data only in the block cache? This
>>>>> way
>>>>>>>>> it
>>>>>>>>>>> has the lowest overhead?
>>>>>>>>>>> 
>>>>>>>>>>> Last thing I wanted to mention is that yes, the client doesn't
>>>>> scale
>>>>>>>>>>> very well. I would suggest you give the asynchbase client a run.
>>>>>>>>>>> 
>>>>>>>>>>> J-D
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
>>>>>>>>>>> <vr...@carrieriq.com> wrote:
>>>>>>>>>>>> I have been doing quite extensive testing of different read
>>>>>>>>> scenarios:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. blockcache disabled/enabled
>>>>>>>>>>>> 2. data is local/remote (no good hdfs locality)
>>>>>>>>>>>> 
>>>>>>>>>>>> and it turned out that that I can not saturate 1 RS using one
>>>>>>>>>>> (comparable in CPU power and RAM) client host:
>>>>>>>>>>>> 
>>>>>>>>>>>> I am running client app with 60 read threads active (with
>>>>>>>>> multi-get)
>>>>>>>>>>> that is going to one particular RS and
>>>>>>>>>>>> this RS's load is 100 -150% (out of 3200% available) - it means
>>>>>>>>> that
>>>>>>>>>>> load is ~5%
>>>>>>>>>>>> 
>>>>>>>>>>>> All threads in RS are either in BLOCKED (wait) or in IN_NATIVE
>>>>>>>>> states
>>>>>>>>>>> (epoll)
>>>>>>>>>>>> 
>>>>>>>>>>>> I attribute this  to the HBase client implementation which seems
>>>>>>>>> to be
>>>>>>>>>>> not scalable (I am going dig into client later on today).
>>>>>>>>>>>> 
>>>>>>>>>>>> Some numbers: The maximum what I could get from Single get (60
>>>>>>>>> threads):
>>>>>>>>>>> 30K per sec. Multiget gives ~ 75K (60 threads)
>>>>>>>>>>>> 
>>>>>>>>>>>> What are my options? I want to measure the limits and I do not
>>>>>>>>> want to
>>>>>>>>>>> run Cluster of clients against just ONE Region Server?
>>>>>>>>>>>> 
>>>>>>>>>>>> RS config: 96GB RAM, 16(32) CPU
>>>>>>>>>>>> Client     : 48GB RAM   8 (16) CPU
>>>>>>>>>>>> 
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Vladimir Rodionov
>>>>>>>>>>>> Principal Platform Engineer
>>>>>>>>>>>> Carrier IQ, www.carrieriq.com
>>>>>>>>>>>> e-mail: vrodionov@carrieriq.com
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Confidentiality Notice:  The information contained in this
>>>>> message,
>>>>>>>>>>> including any attachments hereto, may be confidential and is
>>>>>>>>> intended to be
>>>>>>>>>>> read only by the individual or entity to whom this message is
>>>>>>>>> addressed. If
>>>>>>>>>>> the reader of this message is not the intended recipient or an
>>>>> agent
>>>>>>>>> or
>>>>>>>>>>> designee of the intended recipient, please note that any review,
>>>>> use,
>>>>>>>>>>> disclosure or distribution of this message or its attachments, in
>>>>>>>>> any form,
>>>>>>>>>>> is strictly prohibited.  If you have received this message in
>>>>> error,
>>>>>>>>> please
>>>>>>>>>>> immediately notify the sender and/or
>> Notifications@carrieriq.comand
>>>>>>>>>>> delete or destroy any copy of this message and its attachments.
>> 

Re: HBase read perfomnance and HBase client

Posted by Vladimir Rodionov <vl...@gmail.com>.
2x1Gb bonded, I think. This is our standard config.


On Thu, Aug 1, 2013 at 10:27 AM, Michael Segel <ms...@hotmail.com>wrote:

> Network? 1GbE or 10GbE?
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Jul 31, 2013, at 9:27 PM, "Vladimir Rodionov" <vl...@gmail.com>
> wrote:
>
> > Some final numbers :
> >
> > Test config:
> >
> > HBase 0.94.6
> > blockcache=true, block size = 64K, KV size = 62 bytes (raw).
> >
> > 5 Clients: 96GB, 16(32) CPUs (2.2Ghz), CentOS 5.7
> > 1 RS Server: the same config.
> >
> > Local network with ping between hosts: 0.1 ms
> >
> >
> > 1. HBase client hits the wall at ~ 50K per sec regardless of # of CPU,
> > threads, IO pool size and other settings.
> > 2. HBase server was able to sustain 170K per sec (with 64K block size).
> All
> > from block cache. KV size = 62 bytes (very small). This is for single Get
> > op, 60 threads per client, 5 clients (on different hosts)
> > 3. Multi - get hits the wall at the same 170K-200K per sec. Batch size
> > tested: 30, 100. The same performance absolutely as with batch size = 1.
> > Multi get has some internal issues on RegionServer side. May be excessive
> > locking or some thing else.
> >
> >
> >
> >
> >
> > On Tue, Jul 30, 2013 at 2:01 PM, Vladimir Rodionov
> > <vl...@gmail.com>wrote:
> >
> >> 1. SCR are enabled
> >> 2. Single Configuration for all table did not work well, but I will try
> it
> >> again
> >> 3. With Nagel I had 0.8ms avg, w/o - 0.4ms - I see the difference
> >>
> >>
> >> On Tue, Jul 30, 2013 at 1:50 PM, lars hofhansl <la...@apache.org>
> wrote:
> >>
> >>> With Nagle's you'd see something around 40ms. You are not saying 0.8ms
> >>> RTT is bad, right? Are you seeing ~40ms latencies?
> >>>
> >>> This thread has gotten confusing.
> >>>
> >>> I would try these:
> >>> * one Configuration for all tables. Or even use a single
> >>> HConnection/Threadpool and use the HTable(byte[], HConnection,
> >>> ExecutorService) constructor
> >>> * disable Nagle's: set both ipc.server.tcpnodelay and
> >>> hbase.ipc.client.tcpnodelay to true in hbase-site.xml (both client
> *and*
> >>> server)
> >>> * increase hbase.client.ipc.pool.size in client's hbase-site.xml
> >>> * enable short circuit reads (details depend on exact version of
> Hadoop).
> >>> Google will help :)
> >>>
> >>> -- Lars
> >>>
> >>>
> >>> ----- Original Message -----
> >>> From: Vladimir Rodionov <vl...@gmail.com>
> >>> To: dev@hbase.apache.org
> >>> Cc:
> >>> Sent: Tuesday, July 30, 2013 1:30 PM
> >>> Subject: Re: HBase read perfomnance and HBase client
> >>>
> >>> This hbase.ipc.client.tcpnodelay (default - false) explains poor single
> >>> thread performance and high latency ( 0.8ms in local network)?
> >>>
> >>>
> >>> On Tue, Jul 30, 2013 at 1:22 PM, Vladimir Rodionov
> >>> <vl...@gmail.com>wrote:
> >>>
> >>>> One more observation: One Configuration instance per HTable gives 50%
> >>>> boost as compared to single Configuration object for all HTable's -
> from
> >>>> 20K to 30K
> >>>>
> >>>>
> >>>> On Tue, Jul 30, 2013 at 1:17 PM, Vladimir Rodionov <
> >>> vladrodionov@gmail.com
> >>>>> wrote:
> >>>>
> >>>>> This thread dump has been taken when client was sending 60 requests
> in
> >>>>> parallel (at least, in theory). There are 50 server handler threads.
> >>>>>
> >>>>>
> >>>>> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov <
> >>>>> vladrodionov@gmail.com> wrote:
> >>>>>
> >>>>>> Sure, here it is:
> >>>>>>
> >>>>>> http://pastebin.com/8TjyrKRT
> >>>>>>
> >>>>>> epoll is not only to read/write HDFS but to connect/listen to
> clients
> >>> as
> >>>>>> well?
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <
> >>>>>> jdcryans@apache.org> wrote:
> >>>>>>
> >>>>>>> Can you show us what the thread dump looks like when the threads
> are
> >>>>>>> BLOCKED? There aren't that many locks on the read path when reading
> >>>>>>> out of the block cache, and epoll would only happen if you need to
> >>> hit
> >>>>>>> HDFS, which you're saying is not happening.
> >>>>>>>
> >>>>>>> J-D
> >>>>>>>
> >>>>>>> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
> >>>>>>> <vl...@gmail.com> wrote:
> >>>>>>>> I am hitting data in a block cache, of course. The data set is
> very
> >>>>>>> small
> >>>>>>>> to fit comfortably into block cache and all request are directed
> to
> >>>>>>> the
> >>>>>>>> same Region to guarantee single RS testing.
> >>>>>>>>
> >>>>>>>> To Ted:
> >>>>>>>>
> >>>>>>>> Yes, its CDH 4.3 . What the difference between 94.10 and 94.6 with
> >>>>>>> respect
> >>>>>>>> to read performance?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <
> >>>>>>> jdcryans@apache.org>wrote:
> >>>>>>>>
> >>>>>>>>> That's a tough one.
> >>>>>>>>>
> >>>>>>>>> One thing that comes to mind is socket reuse. It used to come up
> >>> more
> >>>>>>>>> more often but this is an issue that people hit when doing loads
> >>> of
> >>>>>>>>> random reads. Try enabling tcp_tw_recycle but I'm not
> guaranteeing
> >>>>>>>>> anything :)
> >>>>>>>>>
> >>>>>>>>> Also if you _just_ want to saturate something, be it CPU or
> >>> network,
> >>>>>>>>> wouldn't it be better to hit data only in the block cache? This
> >>> way
> >>>>>>> it
> >>>>>>>>> has the lowest overhead?
> >>>>>>>>>
> >>>>>>>>> Last thing I wanted to mention is that yes, the client doesn't
> >>> scale
> >>>>>>>>> very well. I would suggest you give the asynchbase client a run.
> >>>>>>>>>
> >>>>>>>>> J-D
> >>>>>>>>>
> >>>>>>>>> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
> >>>>>>>>> <vr...@carrieriq.com> wrote:
> >>>>>>>>>> I have been doing quite extensive testing of different read
> >>>>>>> scenarios:
> >>>>>>>>>>
> >>>>>>>>>> 1. blockcache disabled/enabled
> >>>>>>>>>> 2. data is local/remote (no good hdfs locality)
> >>>>>>>>>>
> >>>>>>>>>> and it turned out that that I can not saturate 1 RS using one
> >>>>>>>>> (comparable in CPU power and RAM) client host:
> >>>>>>>>>>
> >>>>>>>>>> I am running client app with 60 read threads active (with
> >>>>>>> multi-get)
> >>>>>>>>> that is going to one particular RS and
> >>>>>>>>>> this RS's load is 100 -150% (out of 3200% available) - it means
> >>>>>>> that
> >>>>>>>>> load is ~5%
> >>>>>>>>>>
> >>>>>>>>>> All threads in RS are either in BLOCKED (wait) or in IN_NATIVE
> >>>>>>> states
> >>>>>>>>> (epoll)
> >>>>>>>>>>
> >>>>>>>>>> I attribute this  to the HBase client implementation which seems
> >>>>>>> to be
> >>>>>>>>> not scalable (I am going dig into client later on today).
> >>>>>>>>>>
> >>>>>>>>>> Some numbers: The maximum what I could get from Single get (60
> >>>>>>> threads):
> >>>>>>>>> 30K per sec. Multiget gives ~ 75K (60 threads)
> >>>>>>>>>>
> >>>>>>>>>> What are my options? I want to measure the limits and I do not
> >>>>>>> want to
> >>>>>>>>> run Cluster of clients against just ONE Region Server?
> >>>>>>>>>>
> >>>>>>>>>> RS config: 96GB RAM, 16(32) CPU
> >>>>>>>>>> Client     : 48GB RAM   8 (16) CPU
> >>>>>>>>>>
> >>>>>>>>>> Best regards,
> >>>>>>>>>> Vladimir Rodionov
> >>>>>>>>>> Principal Platform Engineer
> >>>>>>>>>> Carrier IQ, www.carrieriq.com
> >>>>>>>>>> e-mail: vrodionov@carrieriq.com
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Confidentiality Notice:  The information contained in this
> >>> message,
> >>>>>>>>> including any attachments hereto, may be confidential and is
> >>>>>>> intended to be
> >>>>>>>>> read only by the individual or entity to whom this message is
> >>>>>>> addressed. If
> >>>>>>>>> the reader of this message is not the intended recipient or an
> >>> agent
> >>>>>>> or
> >>>>>>>>> designee of the intended recipient, please note that any review,
> >>> use,
> >>>>>>>>> disclosure or distribution of this message or its attachments, in
> >>>>>>> any form,
> >>>>>>>>> is strictly prohibited.  If you have received this message in
> >>> error,
> >>>>>>> please
> >>>>>>>>> immediately notify the sender and/or
> Notifications@carrieriq.comand
> >>>>>>>>> delete or destroy any copy of this message and its attachments.
> >>
>

Re: HBase read perfomnance and HBase client

Posted by Michael Segel <ms...@hotmail.com>.
Network? 1GbE or 10GbE?

Sent from a remote device. Please excuse any typos...

Mike Segel

On Jul 31, 2013, at 9:27 PM, "Vladimir Rodionov" <vl...@gmail.com> wrote:

> Some final numbers :
> 
> Test config:
> 
> HBase 0.94.6
> blockcache=true, block size = 64K, KV size = 62 bytes (raw).
> 
> 5 Clients: 96GB, 16(32) CPUs (2.2Ghz), CentOS 5.7
> 1 RS Server: the same config.
> 
> Local network with ping between hosts: 0.1 ms
> 
> 
> 1. HBase client hits the wall at ~ 50K per sec regardless of # of CPU,
> threads, IO pool size and other settings.
> 2. HBase server was able to sustain 170K per sec (with 64K block size). All
> from block cache. KV size = 62 bytes (very small). This is for single Get
> op, 60 threads per client, 5 clients (on different hosts)
> 3. Multi - get hits the wall at the same 170K-200K per sec. Batch size
> tested: 30, 100. The same performance absolutely as with batch size = 1.
> Multi get has some internal issues on RegionServer side. May be excessive
> locking or some thing else.
> 
> 
> 
> 
> 
> On Tue, Jul 30, 2013 at 2:01 PM, Vladimir Rodionov
> <vl...@gmail.com>wrote:
> 
>> 1. SCR are enabled
>> 2. Single Configuration for all table did not work well, but I will try it
>> again
>> 3. With Nagel I had 0.8ms avg, w/o - 0.4ms - I see the difference
>> 
>> 
>> On Tue, Jul 30, 2013 at 1:50 PM, lars hofhansl <la...@apache.org> wrote:
>> 
>>> With Nagle's you'd see something around 40ms. You are not saying 0.8ms
>>> RTT is bad, right? Are you seeing ~40ms latencies?
>>> 
>>> This thread has gotten confusing.
>>> 
>>> I would try these:
>>> * one Configuration for all tables. Or even use a single
>>> HConnection/Threadpool and use the HTable(byte[], HConnection,
>>> ExecutorService) constructor
>>> * disable Nagle's: set both ipc.server.tcpnodelay and
>>> hbase.ipc.client.tcpnodelay to true in hbase-site.xml (both client *and*
>>> server)
>>> * increase hbase.client.ipc.pool.size in client's hbase-site.xml
>>> * enable short circuit reads (details depend on exact version of Hadoop).
>>> Google will help :)
>>> 
>>> -- Lars
>>> 
>>> 
>>> ----- Original Message -----
>>> From: Vladimir Rodionov <vl...@gmail.com>
>>> To: dev@hbase.apache.org
>>> Cc:
>>> Sent: Tuesday, July 30, 2013 1:30 PM
>>> Subject: Re: HBase read perfomnance and HBase client
>>> 
>>> This hbase.ipc.client.tcpnodelay (default - false) explains poor single
>>> thread performance and high latency ( 0.8ms in local network)?
>>> 
>>> 
>>> On Tue, Jul 30, 2013 at 1:22 PM, Vladimir Rodionov
>>> <vl...@gmail.com>wrote:
>>> 
>>>> One more observation: One Configuration instance per HTable gives 50%
>>>> boost as compared to single Configuration object for all HTable's - from
>>>> 20K to 30K
>>>> 
>>>> 
>>>> On Tue, Jul 30, 2013 at 1:17 PM, Vladimir Rodionov <
>>> vladrodionov@gmail.com
>>>>> wrote:
>>>> 
>>>>> This thread dump has been taken when client was sending 60 requests in
>>>>> parallel (at least, in theory). There are 50 server handler threads.
>>>>> 
>>>>> 
>>>>> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov <
>>>>> vladrodionov@gmail.com> wrote:
>>>>> 
>>>>>> Sure, here it is:
>>>>>> 
>>>>>> http://pastebin.com/8TjyrKRT
>>>>>> 
>>>>>> epoll is not only to read/write HDFS but to connect/listen to clients
>>> as
>>>>>> well?
>>>>>> 
>>>>>> 
>>>>>> On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <
>>>>>> jdcryans@apache.org> wrote:
>>>>>> 
>>>>>>> Can you show us what the thread dump looks like when the threads are
>>>>>>> BLOCKED? There aren't that many locks on the read path when reading
>>>>>>> out of the block cache, and epoll would only happen if you need to
>>> hit
>>>>>>> HDFS, which you're saying is not happening.
>>>>>>> 
>>>>>>> J-D
>>>>>>> 
>>>>>>> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
>>>>>>> <vl...@gmail.com> wrote:
>>>>>>>> I am hitting data in a block cache, of course. The data set is very
>>>>>>> small
>>>>>>>> to fit comfortably into block cache and all request are directed to
>>>>>>> the
>>>>>>>> same Region to guarantee single RS testing.
>>>>>>>> 
>>>>>>>> To Ted:
>>>>>>>> 
>>>>>>>> Yes, its CDH 4.3 . What the difference between 94.10 and 94.6 with
>>>>>>> respect
>>>>>>>> to read performance?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <
>>>>>>> jdcryans@apache.org>wrote:
>>>>>>>> 
>>>>>>>>> That's a tough one.
>>>>>>>>> 
>>>>>>>>> One thing that comes to mind is socket reuse. It used to come up
>>> more
>>>>>>>>> more often but this is an issue that people hit when doing loads
>>> of
>>>>>>>>> random reads. Try enabling tcp_tw_recycle but I'm not guaranteeing
>>>>>>>>> anything :)
>>>>>>>>> 
>>>>>>>>> Also if you _just_ want to saturate something, be it CPU or
>>> network,
>>>>>>>>> wouldn't it be better to hit data only in the block cache? This
>>> way
>>>>>>> it
>>>>>>>>> has the lowest overhead?
>>>>>>>>> 
>>>>>>>>> Last thing I wanted to mention is that yes, the client doesn't
>>> scale
>>>>>>>>> very well. I would suggest you give the asynchbase client a run.
>>>>>>>>> 
>>>>>>>>> J-D
>>>>>>>>> 
>>>>>>>>> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
>>>>>>>>> <vr...@carrieriq.com> wrote:
>>>>>>>>>> I have been doing quite extensive testing of different read
>>>>>>> scenarios:
>>>>>>>>>> 
>>>>>>>>>> 1. blockcache disabled/enabled
>>>>>>>>>> 2. data is local/remote (no good hdfs locality)
>>>>>>>>>> 
>>>>>>>>>> and it turned out that that I can not saturate 1 RS using one
>>>>>>>>> (comparable in CPU power and RAM) client host:
>>>>>>>>>> 
>>>>>>>>>> I am running client app with 60 read threads active (with
>>>>>>> multi-get)
>>>>>>>>> that is going to one particular RS and
>>>>>>>>>> this RS's load is 100 -150% (out of 3200% available) - it means
>>>>>>> that
>>>>>>>>> load is ~5%
>>>>>>>>>> 
>>>>>>>>>> All threads in RS are either in BLOCKED (wait) or in IN_NATIVE
>>>>>>> states
>>>>>>>>> (epoll)
>>>>>>>>>> 
>>>>>>>>>> I attribute this  to the HBase client implementation which seems
>>>>>>> to be
>>>>>>>>> not scalable (I am going dig into client later on today).
>>>>>>>>>> 
>>>>>>>>>> Some numbers: The maximum what I could get from Single get (60
>>>>>>> threads):
>>>>>>>>> 30K per sec. Multiget gives ~ 75K (60 threads)
>>>>>>>>>> 
>>>>>>>>>> What are my options? I want to measure the limits and I do not
>>>>>>> want to
>>>>>>>>> run Cluster of clients against just ONE Region Server?
>>>>>>>>>> 
>>>>>>>>>> RS config: 96GB RAM, 16(32) CPU
>>>>>>>>>> Client     : 48GB RAM   8 (16) CPU
>>>>>>>>>> 
>>>>>>>>>> Best regards,
>>>>>>>>>> Vladimir Rodionov
>>>>>>>>>> Principal Platform Engineer
>>>>>>>>>> Carrier IQ, www.carrieriq.com
>>>>>>>>>> e-mail: vrodionov@carrieriq.com
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Confidentiality Notice:  The information contained in this
>>> message,
>>>>>>>>> including any attachments hereto, may be confidential and is
>>>>>>> intended to be
>>>>>>>>> read only by the individual or entity to whom this message is
>>>>>>> addressed. If
>>>>>>>>> the reader of this message is not the intended recipient or an
>>> agent
>>>>>>> or
>>>>>>>>> designee of the intended recipient, please note that any review,
>>> use,
>>>>>>>>> disclosure or distribution of this message or its attachments, in
>>>>>>> any form,
>>>>>>>>> is strictly prohibited.  If you have received this message in
>>> error,
>>>>>>> please
>>>>>>>>> immediately notify the sender and/or Notifications@carrieriq.comand
>>>>>>>>> delete or destroy any copy of this message and its attachments.
>> 

Re: HBase read perfomnance and HBase client

Posted by Vladimir Rodionov <vl...@gmail.com>.
Some final numbers :

Test config:

HBase 0.94.6
blockcache=true, block size = 64K, KV size = 62 bytes (raw).

5 Clients: 96GB, 16(32) CPUs (2.2Ghz), CentOS 5.7
1 RS Server: the same config.

Local network with ping between hosts: 0.1 ms


1. HBase client hits the wall at ~ 50K per sec regardless of # of CPU,
threads, IO pool size and other settings.
2. HBase server was able to sustain 170K per sec (with 64K block size). All
from block cache. KV size = 62 bytes (very small). This is for single Get
op, 60 threads per client, 5 clients (on different hosts)
3. Multi - get hits the wall at the same 170K-200K per sec. Batch size
tested: 30, 100. The same performance absolutely as with batch size = 1.
Multi get has some internal issues on RegionServer side. May be excessive
locking or some thing else.





On Tue, Jul 30, 2013 at 2:01 PM, Vladimir Rodionov
<vl...@gmail.com>wrote:

> 1. SCR are enabled
> 2. Single Configuration for all table did not work well, but I will try it
> again
> 3. With Nagel I had 0.8ms avg, w/o - 0.4ms - I see the difference
>
>
> On Tue, Jul 30, 2013 at 1:50 PM, lars hofhansl <la...@apache.org> wrote:
>
>> With Nagle's you'd see something around 40ms. You are not saying 0.8ms
>> RTT is bad, right? Are you seeing ~40ms latencies?
>>
>> This thread has gotten confusing.
>>
>> I would try these:
>> * one Configuration for all tables. Or even use a single
>> HConnection/Threadpool and use the HTable(byte[], HConnection,
>> ExecutorService) constructor
>> * disable Nagle's: set both ipc.server.tcpnodelay and
>> hbase.ipc.client.tcpnodelay to true in hbase-site.xml (both client *and*
>> server)
>> * increase hbase.client.ipc.pool.size in client's hbase-site.xml
>> * enable short circuit reads (details depend on exact version of Hadoop).
>> Google will help :)
>>
>> -- Lars
>>
>>
>> ----- Original Message -----
>> From: Vladimir Rodionov <vl...@gmail.com>
>> To: dev@hbase.apache.org
>> Cc:
>> Sent: Tuesday, July 30, 2013 1:30 PM
>> Subject: Re: HBase read perfomnance and HBase client
>>
>> This hbase.ipc.client.tcpnodelay (default - false) explains poor single
>> thread performance and high latency ( 0.8ms in local network)?
>>
>>
>> On Tue, Jul 30, 2013 at 1:22 PM, Vladimir Rodionov
>> <vl...@gmail.com>wrote:
>>
>> > One more observation: One Configuration instance per HTable gives 50%
>> > boost as compared to single Configuration object for all HTable's - from
>> > 20K to 30K
>> >
>> >
>> > On Tue, Jul 30, 2013 at 1:17 PM, Vladimir Rodionov <
>> vladrodionov@gmail.com
>> > > wrote:
>> >
>> >> This thread dump has been taken when client was sending 60 requests in
>> >> parallel (at least, in theory). There are 50 server handler threads.
>> >>
>> >>
>> >> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov <
>> >> vladrodionov@gmail.com> wrote:
>> >>
>> >>> Sure, here it is:
>> >>>
>> >>> http://pastebin.com/8TjyrKRT
>> >>>
>> >>> epoll is not only to read/write HDFS but to connect/listen to clients
>> as
>> >>> well?
>> >>>
>> >>>
>> >>> On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <
>> >>> jdcryans@apache.org> wrote:
>> >>>
>> >>>> Can you show us what the thread dump looks like when the threads are
>> >>>> BLOCKED? There aren't that many locks on the read path when reading
>> >>>> out of the block cache, and epoll would only happen if you need to
>> hit
>> >>>> HDFS, which you're saying is not happening.
>> >>>>
>> >>>> J-D
>> >>>>
>> >>>> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
>> >>>> <vl...@gmail.com> wrote:
>> >>>> > I am hitting data in a block cache, of course. The data set is very
>> >>>> small
>> >>>> > to fit comfortably into block cache and all request are directed to
>> >>>> the
>> >>>> > same Region to guarantee single RS testing.
>> >>>> >
>> >>>> > To Ted:
>> >>>> >
>> >>>> > Yes, its CDH 4.3 . What the difference between 94.10 and 94.6 with
>> >>>> respect
>> >>>> > to read performance?
>> >>>> >
>> >>>> >
>> >>>> > On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <
>> >>>> jdcryans@apache.org>wrote:
>> >>>> >
>> >>>> >> That's a tough one.
>> >>>> >>
>> >>>> >> One thing that comes to mind is socket reuse. It used to come up
>> more
>> >>>> >> more often but this is an issue that people hit when doing loads
>> of
>> >>>> >> random reads. Try enabling tcp_tw_recycle but I'm not guaranteeing
>> >>>> >> anything :)
>> >>>> >>
>> >>>> >> Also if you _just_ want to saturate something, be it CPU or
>> network,
>> >>>> >> wouldn't it be better to hit data only in the block cache? This
>> way
>> >>>> it
>> >>>> >> has the lowest overhead?
>> >>>> >>
>> >>>> >> Last thing I wanted to mention is that yes, the client doesn't
>> scale
>> >>>> >> very well. I would suggest you give the asynchbase client a run.
>> >>>> >>
>> >>>> >> J-D
>> >>>> >>
>> >>>> >> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
>> >>>> >> <vr...@carrieriq.com> wrote:
>> >>>> >> > I have been doing quite extensive testing of different read
>> >>>> scenarios:
>> >>>> >> >
>> >>>> >> > 1. blockcache disabled/enabled
>> >>>> >> > 2. data is local/remote (no good hdfs locality)
>> >>>> >> >
>> >>>> >> > and it turned out that that I can not saturate 1 RS using one
>> >>>> >> (comparable in CPU power and RAM) client host:
>> >>>> >> >
>> >>>> >> >  I am running client app with 60 read threads active (with
>> >>>> multi-get)
>> >>>> >> that is going to one particular RS and
>> >>>> >> > this RS's load is 100 -150% (out of 3200% available) - it means
>> >>>> that
>> >>>> >> load is ~5%
>> >>>> >> >
>> >>>> >> > All threads in RS are either in BLOCKED (wait) or in IN_NATIVE
>> >>>> states
>> >>>> >> (epoll)
>> >>>> >> >
>> >>>> >> > I attribute this  to the HBase client implementation which seems
>> >>>> to be
>> >>>> >> not scalable (I am going dig into client later on today).
>> >>>> >> >
>> >>>> >> > Some numbers: The maximum what I could get from Single get (60
>> >>>> threads):
>> >>>> >> 30K per sec. Multiget gives ~ 75K (60 threads)
>> >>>> >> >
>> >>>> >> > What are my options? I want to measure the limits and I do not
>> >>>> want to
>> >>>> >> run Cluster of clients against just ONE Region Server?
>> >>>> >> >
>> >>>> >> > RS config: 96GB RAM, 16(32) CPU
>> >>>> >> > Client     : 48GB RAM   8 (16) CPU
>> >>>> >> >
>> >>>> >> > Best regards,
>> >>>> >> > Vladimir Rodionov
>> >>>> >> > Principal Platform Engineer
>> >>>> >> > Carrier IQ, www.carrieriq.com
>> >>>> >> > e-mail: vrodionov@carrieriq.com
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > Confidentiality Notice:  The information contained in this
>> message,
>> >>>> >> including any attachments hereto, may be confidential and is
>> >>>> intended to be
>> >>>> >> read only by the individual or entity to whom this message is
>> >>>> addressed. If
>> >>>> >> the reader of this message is not the intended recipient or an
>> agent
>> >>>> or
>> >>>> >> designee of the intended recipient, please note that any review,
>> use,
>> >>>> >> disclosure or distribution of this message or its attachments, in
>> >>>> any form,
>> >>>> >> is strictly prohibited.  If you have received this message in
>> error,
>> >>>> please
>> >>>> >> immediately notify the sender and/or Notifications@carrieriq.comand
>> >>>> >> delete or destroy any copy of this message and its attachments.
>> >>>> >>
>> >>>>
>> >>>
>> >>>
>> >>
>> >
>>
>>
>

Re: HBase read perfomnance and HBase client

Posted by Vladimir Rodionov <vl...@gmail.com>.
1. SCR are enabled
2. Single Configuration for all table did not work well, but I will try it
again
3. With Nagel I had 0.8ms avg, w/o - 0.4ms - I see the difference


On Tue, Jul 30, 2013 at 1:50 PM, lars hofhansl <la...@apache.org> wrote:

> With Nagle's you'd see something around 40ms. You are not saying 0.8ms RTT
> is bad, right? Are you seeing ~40ms latencies?
>
> This thread has gotten confusing.
>
> I would try these:
> * one Configuration for all tables. Or even use a single
> HConnection/Threadpool and use the HTable(byte[], HConnection,
> ExecutorService) constructor
> * disable Nagle's: set both ipc.server.tcpnodelay and
> hbase.ipc.client.tcpnodelay to true in hbase-site.xml (both client *and*
> server)
> * increase hbase.client.ipc.pool.size in client's hbase-site.xml
> * enable short circuit reads (details depend on exact version of Hadoop).
> Google will help :)
>
> -- Lars
>
>
> ----- Original Message -----
> From: Vladimir Rodionov <vl...@gmail.com>
> To: dev@hbase.apache.org
> Cc:
> Sent: Tuesday, July 30, 2013 1:30 PM
> Subject: Re: HBase read perfomnance and HBase client
>
> This hbase.ipc.client.tcpnodelay (default - false) explains poor single
> thread performance and high latency ( 0.8ms in local network)?
>
>
> On Tue, Jul 30, 2013 at 1:22 PM, Vladimir Rodionov
> <vl...@gmail.com>wrote:
>
> > One more observation: One Configuration instance per HTable gives 50%
> > boost as compared to single Configuration object for all HTable's - from
> > 20K to 30K
> >
> >
> > On Tue, Jul 30, 2013 at 1:17 PM, Vladimir Rodionov <
> vladrodionov@gmail.com
> > > wrote:
> >
> >> This thread dump has been taken when client was sending 60 requests in
> >> parallel (at least, in theory). There are 50 server handler threads.
> >>
> >>
> >> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov <
> >> vladrodionov@gmail.com> wrote:
> >>
> >>> Sure, here it is:
> >>>
> >>> http://pastebin.com/8TjyrKRT
> >>>
> >>> epoll is not only to read/write HDFS but to connect/listen to clients
> as
> >>> well?
> >>>
> >>>
> >>> On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <
> >>> jdcryans@apache.org> wrote:
> >>>
> >>>> Can you show us what the thread dump looks like when the threads are
> >>>> BLOCKED? There aren't that many locks on the read path when reading
> >>>> out of the block cache, and epoll would only happen if you need to hit
> >>>> HDFS, which you're saying is not happening.
> >>>>
> >>>> J-D
> >>>>
> >>>> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
> >>>> <vl...@gmail.com> wrote:
> >>>> > I am hitting data in a block cache, of course. The data set is very
> >>>> small
> >>>> > to fit comfortably into block cache and all request are directed to
> >>>> the
> >>>> > same Region to guarantee single RS testing.
> >>>> >
> >>>> > To Ted:
> >>>> >
> >>>> > Yes, its CDH 4.3 . What the difference between 94.10 and 94.6 with
> >>>> respect
> >>>> > to read performance?
> >>>> >
> >>>> >
> >>>> > On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <
> >>>> jdcryans@apache.org>wrote:
> >>>> >
> >>>> >> That's a tough one.
> >>>> >>
> >>>> >> One thing that comes to mind is socket reuse. It used to come up
> more
> >>>> >> more often but this is an issue that people hit when doing loads of
> >>>> >> random reads. Try enabling tcp_tw_recycle but I'm not guaranteeing
> >>>> >> anything :)
> >>>> >>
> >>>> >> Also if you _just_ want to saturate something, be it CPU or
> network,
> >>>> >> wouldn't it be better to hit data only in the block cache? This way
> >>>> it
> >>>> >> has the lowest overhead?
> >>>> >>
> >>>> >> Last thing I wanted to mention is that yes, the client doesn't
> scale
> >>>> >> very well. I would suggest you give the asynchbase client a run.
> >>>> >>
> >>>> >> J-D
> >>>> >>
> >>>> >> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
> >>>> >> <vr...@carrieriq.com> wrote:
> >>>> >> > I have been doing quite extensive testing of different read
> >>>> scenarios:
> >>>> >> >
> >>>> >> > 1. blockcache disabled/enabled
> >>>> >> > 2. data is local/remote (no good hdfs locality)
> >>>> >> >
> >>>> >> > and it turned out that that I can not saturate 1 RS using one
> >>>> >> (comparable in CPU power and RAM) client host:
> >>>> >> >
> >>>> >> >  I am running client app with 60 read threads active (with
> >>>> multi-get)
> >>>> >> that is going to one particular RS and
> >>>> >> > this RS's load is 100 -150% (out of 3200% available) - it means
> >>>> that
> >>>> >> load is ~5%
> >>>> >> >
> >>>> >> > All threads in RS are either in BLOCKED (wait) or in IN_NATIVE
> >>>> states
> >>>> >> (epoll)
> >>>> >> >
> >>>> >> > I attribute this  to the HBase client implementation which seems
> >>>> to be
> >>>> >> not scalable (I am going dig into client later on today).
> >>>> >> >
> >>>> >> > Some numbers: The maximum what I could get from Single get (60
> >>>> threads):
> >>>> >> 30K per sec. Multiget gives ~ 75K (60 threads)
> >>>> >> >
> >>>> >> > What are my options? I want to measure the limits and I do not
> >>>> want to
> >>>> >> run Cluster of clients against just ONE Region Server?
> >>>> >> >
> >>>> >> > RS config: 96GB RAM, 16(32) CPU
> >>>> >> > Client     : 48GB RAM   8 (16) CPU
> >>>> >> >
> >>>> >> > Best regards,
> >>>> >> > Vladimir Rodionov
> >>>> >> > Principal Platform Engineer
> >>>> >> > Carrier IQ, www.carrieriq.com
> >>>> >> > e-mail: vrodionov@carrieriq.com
> >>>> >> >
> >>>> >> >
> >>>> >> > Confidentiality Notice:  The information contained in this
> message,
> >>>> >> including any attachments hereto, may be confidential and is
> >>>> intended to be
> >>>> >> read only by the individual or entity to whom this message is
> >>>> addressed. If
> >>>> >> the reader of this message is not the intended recipient or an
> agent
> >>>> or
> >>>> >> designee of the intended recipient, please note that any review,
> use,
> >>>> >> disclosure or distribution of this message or its attachments, in
> >>>> any form,
> >>>> >> is strictly prohibited.  If you have received this message in
> error,
> >>>> please
> >>>> >> immediately notify the sender and/or Notifications@carrieriq.comand
> >>>> >> delete or destroy any copy of this message and its attachments.
> >>>> >>
> >>>>
> >>>
> >>>
> >>
> >
>
>

Re: HBase read perfomnance and HBase client

Posted by lars hofhansl <la...@apache.org>.
With Nagle's you'd see something around 40ms. You are not saying 0.8ms RTT is bad, right? Are you seeing ~40ms latencies?

This thread has gotten confusing.

I would try these:
* one Configuration for all tables. Or even use a single HConnection/Threadpool and use the HTable(byte[], HConnection, ExecutorService) constructor
* disable Nagle's: set both ipc.server.tcpnodelay and hbase.ipc.client.tcpnodelay to true in hbase-site.xml (both client *and* server)
* increase hbase.client.ipc.pool.size in client's hbase-site.xml
* enable short circuit reads (details depend on exact version of Hadoop). Google will help :)

-- Lars


----- Original Message -----
From: Vladimir Rodionov <vl...@gmail.com>
To: dev@hbase.apache.org
Cc: 
Sent: Tuesday, July 30, 2013 1:30 PM
Subject: Re: HBase read perfomnance and HBase client

This hbase.ipc.client.tcpnodelay (default - false) explains poor single
thread performance and high latency ( 0.8ms in local network)?


On Tue, Jul 30, 2013 at 1:22 PM, Vladimir Rodionov
<vl...@gmail.com>wrote:

> One more observation: One Configuration instance per HTable gives 50%
> boost as compared to single Configuration object for all HTable's - from
> 20K to 30K
>
>
> On Tue, Jul 30, 2013 at 1:17 PM, Vladimir Rodionov <vladrodionov@gmail.com
> > wrote:
>
>> This thread dump has been taken when client was sending 60 requests in
>> parallel (at least, in theory). There are 50 server handler threads.
>>
>>
>> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov <
>> vladrodionov@gmail.com> wrote:
>>
>>> Sure, here it is:
>>>
>>> http://pastebin.com/8TjyrKRT
>>>
>>> epoll is not only to read/write HDFS but to connect/listen to clients as
>>> well?
>>>
>>>
>>> On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <
>>> jdcryans@apache.org> wrote:
>>>
>>>> Can you show us what the thread dump looks like when the threads are
>>>> BLOCKED? There aren't that many locks on the read path when reading
>>>> out of the block cache, and epoll would only happen if you need to hit
>>>> HDFS, which you're saying is not happening.
>>>>
>>>> J-D
>>>>
>>>> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
>>>> <vl...@gmail.com> wrote:
>>>> > I am hitting data in a block cache, of course. The data set is very
>>>> small
>>>> > to fit comfortably into block cache and all request are directed to
>>>> the
>>>> > same Region to guarantee single RS testing.
>>>> >
>>>> > To Ted:
>>>> >
>>>> > Yes, its CDH 4.3 . What the difference between 94.10 and 94.6 with
>>>> respect
>>>> > to read performance?
>>>> >
>>>> >
>>>> > On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <
>>>> jdcryans@apache.org>wrote:
>>>> >
>>>> >> That's a tough one.
>>>> >>
>>>> >> One thing that comes to mind is socket reuse. It used to come up more
>>>> >> more often but this is an issue that people hit when doing loads of
>>>> >> random reads. Try enabling tcp_tw_recycle but I'm not guaranteeing
>>>> >> anything :)
>>>> >>
>>>> >> Also if you _just_ want to saturate something, be it CPU or network,
>>>> >> wouldn't it be better to hit data only in the block cache? This way
>>>> it
>>>> >> has the lowest overhead?
>>>> >>
>>>> >> Last thing I wanted to mention is that yes, the client doesn't scale
>>>> >> very well. I would suggest you give the asynchbase client a run.
>>>> >>
>>>> >> J-D
>>>> >>
>>>> >> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
>>>> >> <vr...@carrieriq.com> wrote:
>>>> >> > I have been doing quite extensive testing of different read
>>>> scenarios:
>>>> >> >
>>>> >> > 1. blockcache disabled/enabled
>>>> >> > 2. data is local/remote (no good hdfs locality)
>>>> >> >
>>>> >> > and it turned out that that I can not saturate 1 RS using one
>>>> >> (comparable in CPU power and RAM) client host:
>>>> >> >
>>>> >> >  I am running client app with 60 read threads active (with
>>>> multi-get)
>>>> >> that is going to one particular RS and
>>>> >> > this RS's load is 100 -150% (out of 3200% available) - it means
>>>> that
>>>> >> load is ~5%
>>>> >> >
>>>> >> > All threads in RS are either in BLOCKED (wait) or in IN_NATIVE
>>>> states
>>>> >> (epoll)
>>>> >> >
>>>> >> > I attribute this  to the HBase client implementation which seems
>>>> to be
>>>> >> not scalable (I am going dig into client later on today).
>>>> >> >
>>>> >> > Some numbers: The maximum what I could get from Single get (60
>>>> threads):
>>>> >> 30K per sec. Multiget gives ~ 75K (60 threads)
>>>> >> >
>>>> >> > What are my options? I want to measure the limits and I do not
>>>> want to
>>>> >> run Cluster of clients against just ONE Region Server?
>>>> >> >
>>>> >> > RS config: 96GB RAM, 16(32) CPU
>>>> >> > Client     : 48GB RAM   8 (16) CPU
>>>> >> >
>>>> >> > Best regards,
>>>> >> > Vladimir Rodionov
>>>> >> > Principal Platform Engineer
>>>> >> > Carrier IQ, www.carrieriq.com
>>>> >> > e-mail: vrodionov@carrieriq.com
>>>> >> >
>>>> >> >
>>>> >> > Confidentiality Notice:  The information contained in this message,
>>>> >> including any attachments hereto, may be confidential and is
>>>> intended to be
>>>> >> read only by the individual or entity to whom this message is
>>>> addressed. If
>>>> >> the reader of this message is not the intended recipient or an agent
>>>> or
>>>> >> designee of the intended recipient, please note that any review, use,
>>>> >> disclosure or distribution of this message or its attachments, in
>>>> any form,
>>>> >> is strictly prohibited.  If you have received this message in error,
>>>> please
>>>> >> immediately notify the sender and/or Notifications@carrieriq.com and
>>>> >> delete or destroy any copy of this message and its attachments.
>>>> >>
>>>>
>>>
>>>
>>
>


Re: HBase read perfomnance and HBase client

Posted by Vladimir Rodionov <vl...@gmail.com>.
This hbase.ipc.client.tcpnodelay (default - false) explains poor single
thread performance and high latency ( 0.8ms in local network)?


On Tue, Jul 30, 2013 at 1:22 PM, Vladimir Rodionov
<vl...@gmail.com>wrote:

> One more observation: One Configuration instance per HTable gives 50%
> boost as compared to single Configuration object for all HTable's - from
> 20K to 30K
>
>
> On Tue, Jul 30, 2013 at 1:17 PM, Vladimir Rodionov <vladrodionov@gmail.com
> > wrote:
>
>> This thread dump has been taken when client was sending 60 requests in
>> parallel (at least, in theory). There are 50 server handler threads.
>>
>>
>> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov <
>> vladrodionov@gmail.com> wrote:
>>
>>> Sure, here it is:
>>>
>>> http://pastebin.com/8TjyrKRT
>>>
>>> epoll is not only to read/write HDFS but to connect/listen to clients as
>>> well?
>>>
>>>
>>> On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <
>>> jdcryans@apache.org> wrote:
>>>
>>>> Can you show us what the thread dump looks like when the threads are
>>>> BLOCKED? There aren't that many locks on the read path when reading
>>>> out of the block cache, and epoll would only happen if you need to hit
>>>> HDFS, which you're saying is not happening.
>>>>
>>>> J-D
>>>>
>>>> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
>>>> <vl...@gmail.com> wrote:
>>>> > I am hitting data in a block cache, of course. The data set is very
>>>> small
>>>> > to fit comfortably into block cache and all request are directed to
>>>> the
>>>> > same Region to guarantee single RS testing.
>>>> >
>>>> > To Ted:
>>>> >
>>>> > Yes, its CDH 4.3 . What the difference between 94.10 and 94.6 with
>>>> respect
>>>> > to read performance?
>>>> >
>>>> >
>>>> > On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <
>>>> jdcryans@apache.org>wrote:
>>>> >
>>>> >> That's a tough one.
>>>> >>
>>>> >> One thing that comes to mind is socket reuse. It used to come up more
>>>> >> more often but this is an issue that people hit when doing loads of
>>>> >> random reads. Try enabling tcp_tw_recycle but I'm not guaranteeing
>>>> >> anything :)
>>>> >>
>>>> >> Also if you _just_ want to saturate something, be it CPU or network,
>>>> >> wouldn't it be better to hit data only in the block cache? This way
>>>> it
>>>> >> has the lowest overhead?
>>>> >>
>>>> >> Last thing I wanted to mention is that yes, the client doesn't scale
>>>> >> very well. I would suggest you give the asynchbase client a run.
>>>> >>
>>>> >> J-D
>>>> >>
>>>> >> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
>>>> >> <vr...@carrieriq.com> wrote:
>>>> >> > I have been doing quite extensive testing of different read
>>>> scenarios:
>>>> >> >
>>>> >> > 1. blockcache disabled/enabled
>>>> >> > 2. data is local/remote (no good hdfs locality)
>>>> >> >
>>>> >> > and it turned out that that I can not saturate 1 RS using one
>>>> >> (comparable in CPU power and RAM) client host:
>>>> >> >
>>>> >> >  I am running client app with 60 read threads active (with
>>>> multi-get)
>>>> >> that is going to one particular RS and
>>>> >> > this RS's load is 100 -150% (out of 3200% available) - it means
>>>> that
>>>> >> load is ~5%
>>>> >> >
>>>> >> > All threads in RS are either in BLOCKED (wait) or in IN_NATIVE
>>>> states
>>>> >> (epoll)
>>>> >> >
>>>> >> > I attribute this  to the HBase client implementation which seems
>>>> to be
>>>> >> not scalable (I am going dig into client later on today).
>>>> >> >
>>>> >> > Some numbers: The maximum what I could get from Single get (60
>>>> threads):
>>>> >> 30K per sec. Multiget gives ~ 75K (60 threads)
>>>> >> >
>>>> >> > What are my options? I want to measure the limits and I do not
>>>> want to
>>>> >> run Cluster of clients against just ONE Region Server?
>>>> >> >
>>>> >> > RS config: 96GB RAM, 16(32) CPU
>>>> >> > Client     : 48GB RAM   8 (16) CPU
>>>> >> >
>>>> >> > Best regards,
>>>> >> > Vladimir Rodionov
>>>> >> > Principal Platform Engineer
>>>> >> > Carrier IQ, www.carrieriq.com
>>>> >> > e-mail: vrodionov@carrieriq.com
>>>> >> >
>>>> >> >
>>>> >> > Confidentiality Notice:  The information contained in this message,
>>>> >> including any attachments hereto, may be confidential and is
>>>> intended to be
>>>> >> read only by the individual or entity to whom this message is
>>>> addressed. If
>>>> >> the reader of this message is not the intended recipient or an agent
>>>> or
>>>> >> designee of the intended recipient, please note that any review, use,
>>>> >> disclosure or distribution of this message or its attachments, in
>>>> any form,
>>>> >> is strictly prohibited.  If you have received this message in error,
>>>> please
>>>> >> immediately notify the sender and/or Notifications@carrieriq.com and
>>>> >> delete or destroy any copy of this message and its attachments.
>>>> >>
>>>>
>>>
>>>
>>
>

Re: HBase read perfomnance and HBase client

Posted by Vladimir Rodionov <vl...@gmail.com>.
One more observation: One Configuration instance per HTable gives 50% boost
as compared to single Configuration object for all HTable's - from 20K to
30K


On Tue, Jul 30, 2013 at 1:17 PM, Vladimir Rodionov
<vl...@gmail.com>wrote:

> This thread dump has been taken when client was sending 60 requests in
> parallel (at least, in theory). There are 50 server handler threads.
>
>
> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov <vladrodionov@gmail.com
> > wrote:
>
>> Sure, here it is:
>>
>> http://pastebin.com/8TjyrKRT
>>
>> epoll is not only to read/write HDFS but to connect/listen to clients as
>> well?
>>
>>
>> On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <jdcryans@apache.org
>> > wrote:
>>
>>> Can you show us what the thread dump looks like when the threads are
>>> BLOCKED? There aren't that many locks on the read path when reading
>>> out of the block cache, and epoll would only happen if you need to hit
>>> HDFS, which you're saying is not happening.
>>>
>>> J-D
>>>
>>> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
>>> <vl...@gmail.com> wrote:
>>> > I am hitting data in a block cache, of course. The data set is very
>>> small
>>> > to fit comfortably into block cache and all request are directed to the
>>> > same Region to guarantee single RS testing.
>>> >
>>> > To Ted:
>>> >
>>> > Yes, its CDH 4.3 . What the difference between 94.10 and 94.6 with
>>> respect
>>> > to read performance?
>>> >
>>> >
>>> > On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <
>>> jdcryans@apache.org>wrote:
>>> >
>>> >> That's a tough one.
>>> >>
>>> >> One thing that comes to mind is socket reuse. It used to come up more
>>> >> more often but this is an issue that people hit when doing loads of
>>> >> random reads. Try enabling tcp_tw_recycle but I'm not guaranteeing
>>> >> anything :)
>>> >>
>>> >> Also if you _just_ want to saturate something, be it CPU or network,
>>> >> wouldn't it be better to hit data only in the block cache? This way it
>>> >> has the lowest overhead?
>>> >>
>>> >> Last thing I wanted to mention is that yes, the client doesn't scale
>>> >> very well. I would suggest you give the asynchbase client a run.
>>> >>
>>> >> J-D
>>> >>
>>> >> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
>>> >> <vr...@carrieriq.com> wrote:
>>> >> > I have been doing quite extensive testing of different read
>>> scenarios:
>>> >> >
>>> >> > 1. blockcache disabled/enabled
>>> >> > 2. data is local/remote (no good hdfs locality)
>>> >> >
>>> >> > and it turned out that that I can not saturate 1 RS using one
>>> >> (comparable in CPU power and RAM) client host:
>>> >> >
>>> >> >  I am running client app with 60 read threads active (with
>>> multi-get)
>>> >> that is going to one particular RS and
>>> >> > this RS's load is 100 -150% (out of 3200% available) - it means that
>>> >> load is ~5%
>>> >> >
>>> >> > All threads in RS are either in BLOCKED (wait) or in IN_NATIVE
>>> states
>>> >> (epoll)
>>> >> >
>>> >> > I attribute this  to the HBase client implementation which seems to
>>> be
>>> >> not scalable (I am going dig into client later on today).
>>> >> >
>>> >> > Some numbers: The maximum what I could get from Single get (60
>>> threads):
>>> >> 30K per sec. Multiget gives ~ 75K (60 threads)
>>> >> >
>>> >> > What are my options? I want to measure the limits and I do not want
>>> to
>>> >> run Cluster of clients against just ONE Region Server?
>>> >> >
>>> >> > RS config: 96GB RAM, 16(32) CPU
>>> >> > Client     : 48GB RAM   8 (16) CPU
>>> >> >
>>> >> > Best regards,
>>> >> > Vladimir Rodionov
>>> >> > Principal Platform Engineer
>>> >> > Carrier IQ, www.carrieriq.com
>>> >> > e-mail: vrodionov@carrieriq.com
>>> >> >
>>> >> >
>>> >> > Confidentiality Notice:  The information contained in this message,
>>> >> including any attachments hereto, may be confidential and is intended
>>> to be
>>> >> read only by the individual or entity to whom this message is
>>> addressed. If
>>> >> the reader of this message is not the intended recipient or an agent
>>> or
>>> >> designee of the intended recipient, please note that any review, use,
>>> >> disclosure or distribution of this message or its attachments, in any
>>> form,
>>> >> is strictly prohibited.  If you have received this message in error,
>>> please
>>> >> immediately notify the sender and/or Notifications@carrieriq.com and
>>> >> delete or destroy any copy of this message and its attachments.
>>> >>
>>>
>>
>>
>

Re: HBase read perfomnance and HBase client

Posted by Vladimir Rodionov <vl...@gmail.com>.
This thread dump has been taken when client was sending 60 requests in
parallel (at least, in theory). There are 50 server handler threads.


On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov
<vl...@gmail.com>wrote:

> Sure, here it is:
>
> http://pastebin.com/8TjyrKRT
>
> epoll is not only to read/write HDFS but to connect/listen to clients as
> well?
>
>
> On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> Can you show us what the thread dump looks like when the threads are
>> BLOCKED? There aren't that many locks on the read path when reading
>> out of the block cache, and epoll would only happen if you need to hit
>> HDFS, which you're saying is not happening.
>>
>> J-D
>>
>> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
>> <vl...@gmail.com> wrote:
>> > I am hitting data in a block cache, of course. The data set is very
>> small
>> > to fit comfortably into block cache and all request are directed to the
>> > same Region to guarantee single RS testing.
>> >
>> > To Ted:
>> >
>> > Yes, its CDH 4.3 . What the difference between 94.10 and 94.6 with
>> respect
>> > to read performance?
>> >
>> >
>> > On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <
>> jdcryans@apache.org>wrote:
>> >
>> >> That's a tough one.
>> >>
>> >> One thing that comes to mind is socket reuse. It used to come up more
>> >> more often but this is an issue that people hit when doing loads of
>> >> random reads. Try enabling tcp_tw_recycle but I'm not guaranteeing
>> >> anything :)
>> >>
>> >> Also if you _just_ want to saturate something, be it CPU or network,
>> >> wouldn't it be better to hit data only in the block cache? This way it
>> >> has the lowest overhead?
>> >>
>> >> Last thing I wanted to mention is that yes, the client doesn't scale
>> >> very well. I would suggest you give the asynchbase client a run.
>> >>
>> >> J-D
>> >>
>> >> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
>> >> <vr...@carrieriq.com> wrote:
>> >> > I have been doing quite extensive testing of different read
>> scenarios:
>> >> >
>> >> > 1. blockcache disabled/enabled
>> >> > 2. data is local/remote (no good hdfs locality)
>> >> >
>> >> > and it turned out that that I can not saturate 1 RS using one
>> >> (comparable in CPU power and RAM) client host:
>> >> >
>> >> >  I am running client app with 60 read threads active (with multi-get)
>> >> that is going to one particular RS and
>> >> > this RS's load is 100 -150% (out of 3200% available) - it means that
>> >> load is ~5%
>> >> >
>> >> > All threads in RS are either in BLOCKED (wait) or in IN_NATIVE states
>> >> (epoll)
>> >> >
>> >> > I attribute this  to the HBase client implementation which seems to
>> be
>> >> not scalable (I am going dig into client later on today).
>> >> >
>> >> > Some numbers: The maximum what I could get from Single get (60
>> threads):
>> >> 30K per sec. Multiget gives ~ 75K (60 threads)
>> >> >
>> >> > What are my options? I want to measure the limits and I do not want
>> to
>> >> run Cluster of clients against just ONE Region Server?
>> >> >
>> >> > RS config: 96GB RAM, 16(32) CPU
>> >> > Client     : 48GB RAM   8 (16) CPU
>> >> >
>> >> > Best regards,
>> >> > Vladimir Rodionov
>> >> > Principal Platform Engineer
>> >> > Carrier IQ, www.carrieriq.com
>> >> > e-mail: vrodionov@carrieriq.com
>> >> >
>> >> >
>> >> > Confidentiality Notice:  The information contained in this message,
>> >> including any attachments hereto, may be confidential and is intended
>> to be
>> >> read only by the individual or entity to whom this message is
>> addressed. If
>> >> the reader of this message is not the intended recipient or an agent or
>> >> designee of the intended recipient, please note that any review, use,
>> >> disclosure or distribution of this message or its attachments, in any
>> form,
>> >> is strictly prohibited.  If you have received this message in error,
>> please
>> >> immediately notify the sender and/or Notifications@carrieriq.com and
>> >> delete or destroy any copy of this message and its attachments.
>> >>
>>
>
>

Re: HBase read perfomnance and HBase client

Posted by Ted Yu <yu...@gmail.com>.
In hdfs-site.xml, can you check the values for the following config ?

<property>
  <name>ipc.server.tcpnodelay</name>
  <value>true</value>
</property>
<property>
  <name>ipc.client.tcpnodelay</name>
  <value>true</value>
</property>


On Tue, Jul 30, 2013 at 1:58 PM, Vladimir Rodionov
<vl...@gmail.com>wrote:

> With :
>
> hbase.ipc.client.tcpnodelay= true
> hbase.client.ipc.pool.size =5
>
> I was able to achieve 50K per sec for single get operations. No progress
> for multi-gets.
>
>
> On Tue, Jul 30, 2013 at 1:52 PM, Vladimir Rodionov
> <vl...@gmail.com>wrote:
>
> > Exactly, but this thread dump is from RS under load nevertheless (you can
> > see that one thread is in JAVA and reading data from socket)
> >
> >
> > On Tue, Jul 30, 2013 at 1:35 PM, Jean-Daniel Cryans <jdcryans@apache.org
> >wrote:
> >
> >> FWIW nothing is happening in that thread dump.
> >>
> >> J-D
> >>
> >> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov
> >> <vl...@gmail.com> wrote:
> >> > Sure, here it is:
> >> >
> >> > http://pastebin.com/8TjyrKRT
> >> >
> >> > epoll is not only to read/write HDFS but to connect/listen to clients
> as
> >> > well?
> >> >
> >> >
> >> > On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <
> >> jdcryans@apache.org>wrote:
> >> >
> >> >> Can you show us what the thread dump looks like when the threads are
> >> >> BLOCKED? There aren't that many locks on the read path when reading
> >> >> out of the block cache, and epoll would only happen if you need to
> hit
> >> >> HDFS, which you're saying is not happening.
> >> >>
> >> >> J-D
> >> >>
> >> >> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
> >> >> <vl...@gmail.com> wrote:
> >> >> > I am hitting data in a block cache, of course. The data set is very
> >> small
> >> >> > to fit comfortably into block cache and all request are directed to
> >> the
> >> >> > same Region to guarantee single RS testing.
> >> >> >
> >> >> > To Ted:
> >> >> >
> >> >> > Yes, its CDH 4.3 . What the difference between 94.10 and 94.6 with
> >> >> respect
> >> >> > to read performance?
> >> >> >
> >> >> >
> >> >> > On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <
> >> >> jdcryans@apache.org>wrote:
> >> >> >
> >> >> >> That's a tough one.
> >> >> >>
> >> >> >> One thing that comes to mind is socket reuse. It used to come up
> >> more
> >> >> >> more often but this is an issue that people hit when doing loads
> of
> >> >> >> random reads. Try enabling tcp_tw_recycle but I'm not guaranteeing
> >> >> >> anything :)
> >> >> >>
> >> >> >> Also if you _just_ want to saturate something, be it CPU or
> network,
> >> >> >> wouldn't it be better to hit data only in the block cache? This
> way
> >> it
> >> >> >> has the lowest overhead?
> >> >> >>
> >> >> >> Last thing I wanted to mention is that yes, the client doesn't
> scale
> >> >> >> very well. I would suggest you give the asynchbase client a run.
> >> >> >>
> >> >> >> J-D
> >> >> >>
> >> >> >> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
> >> >> >> <vr...@carrieriq.com> wrote:
> >> >> >> > I have been doing quite extensive testing of different read
> >> scenarios:
> >> >> >> >
> >> >> >> > 1. blockcache disabled/enabled
> >> >> >> > 2. data is local/remote (no good hdfs locality)
> >> >> >> >
> >> >> >> > and it turned out that that I can not saturate 1 RS using one
> >> >> >> (comparable in CPU power and RAM) client host:
> >> >> >> >
> >> >> >> >  I am running client app with 60 read threads active (with
> >> multi-get)
> >> >> >> that is going to one particular RS and
> >> >> >> > this RS's load is 100 -150% (out of 3200% available) - it means
> >> that
> >> >> >> load is ~5%
> >> >> >> >
> >> >> >> > All threads in RS are either in BLOCKED (wait) or in IN_NATIVE
> >> states
> >> >> >> (epoll)
> >> >> >> >
> >> >> >> > I attribute this  to the HBase client implementation which seems
> >> to be
> >> >> >> not scalable (I am going dig into client later on today).
> >> >> >> >
> >> >> >> > Some numbers: The maximum what I could get from Single get (60
> >> >> threads):
> >> >> >> 30K per sec. Multiget gives ~ 75K (60 threads)
> >> >> >> >
> >> >> >> > What are my options? I want to measure the limits and I do not
> >> want to
> >> >> >> run Cluster of clients against just ONE Region Server?
> >> >> >> >
> >> >> >> > RS config: 96GB RAM, 16(32) CPU
> >> >> >> > Client     : 48GB RAM   8 (16) CPU
> >> >> >> >
> >> >> >> > Best regards,
> >> >> >> > Vladimir Rodionov
> >> >> >> > Principal Platform Engineer
> >> >> >> > Carrier IQ, www.carrieriq.com
> >> >> >> > e-mail: vrodionov@carrieriq.com
> >> >> >> >
> >> >> >> >
> >> >> >> > Confidentiality Notice:  The information contained in this
> >> message,
> >> >> >> including any attachments hereto, may be confidential and is
> >> intended
> >> >> to be
> >> >> >> read only by the individual or entity to whom this message is
> >> >> addressed. If
> >> >> >> the reader of this message is not the intended recipient or an
> >> agent or
> >> >> >> designee of the intended recipient, please note that any review,
> >> use,
> >> >> >> disclosure or distribution of this message or its attachments, in
> >> any
> >> >> form,
> >> >> >> is strictly prohibited.  If you have received this message in
> error,
> >> >> please
> >> >> >> immediately notify the sender and/or
> Notifications@carrieriq.comand
> >> >> >> delete or destroy any copy of this message and its attachments.
> >> >> >>
> >> >>
> >>
> >
> >
>

Re: HBase read perfomnance and HBase client

Posted by Vladimir Rodionov <vl...@gmail.com>.
With :

hbase.ipc.client.tcpnodelay= true
hbase.client.ipc.pool.size =5

I was able to achieve 50K per sec for single get operations. No progress
for multi-gets.


On Tue, Jul 30, 2013 at 1:52 PM, Vladimir Rodionov
<vl...@gmail.com>wrote:

> Exactly, but this thread dump is from RS under load nevertheless (you can
> see that one thread is in JAVA and reading data from socket)
>
>
> On Tue, Jul 30, 2013 at 1:35 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> FWIW nothing is happening in that thread dump.
>>
>> J-D
>>
>> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov
>> <vl...@gmail.com> wrote:
>> > Sure, here it is:
>> >
>> > http://pastebin.com/8TjyrKRT
>> >
>> > epoll is not only to read/write HDFS but to connect/listen to clients as
>> > well?
>> >
>> >
>> > On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <
>> jdcryans@apache.org>wrote:
>> >
>> >> Can you show us what the thread dump looks like when the threads are
>> >> BLOCKED? There aren't that many locks on the read path when reading
>> >> out of the block cache, and epoll would only happen if you need to hit
>> >> HDFS, which you're saying is not happening.
>> >>
>> >> J-D
>> >>
>> >> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
>> >> <vl...@gmail.com> wrote:
>> >> > I am hitting data in a block cache, of course. The data set is very
>> small
>> >> > to fit comfortably into block cache and all request are directed to
>> the
>> >> > same Region to guarantee single RS testing.
>> >> >
>> >> > To Ted:
>> >> >
>> >> > Yes, its CDH 4.3 . What the difference between 94.10 and 94.6 with
>> >> respect
>> >> > to read performance?
>> >> >
>> >> >
>> >> > On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <
>> >> jdcryans@apache.org>wrote:
>> >> >
>> >> >> That's a tough one.
>> >> >>
>> >> >> One thing that comes to mind is socket reuse. It used to come up
>> more
>> >> >> more often but this is an issue that people hit when doing loads of
>> >> >> random reads. Try enabling tcp_tw_recycle but I'm not guaranteeing
>> >> >> anything :)
>> >> >>
>> >> >> Also if you _just_ want to saturate something, be it CPU or network,
>> >> >> wouldn't it be better to hit data only in the block cache? This way
>> it
>> >> >> has the lowest overhead?
>> >> >>
>> >> >> Last thing I wanted to mention is that yes, the client doesn't scale
>> >> >> very well. I would suggest you give the asynchbase client a run.
>> >> >>
>> >> >> J-D
>> >> >>
>> >> >> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
>> >> >> <vr...@carrieriq.com> wrote:
>> >> >> > I have been doing quite extensive testing of different read
>> scenarios:
>> >> >> >
>> >> >> > 1. blockcache disabled/enabled
>> >> >> > 2. data is local/remote (no good hdfs locality)
>> >> >> >
>> >> >> > and it turned out that that I can not saturate 1 RS using one
>> >> >> (comparable in CPU power and RAM) client host:
>> >> >> >
>> >> >> >  I am running client app with 60 read threads active (with
>> multi-get)
>> >> >> that is going to one particular RS and
>> >> >> > this RS's load is 100 -150% (out of 3200% available) - it means
>> that
>> >> >> load is ~5%
>> >> >> >
>> >> >> > All threads in RS are either in BLOCKED (wait) or in IN_NATIVE
>> states
>> >> >> (epoll)
>> >> >> >
>> >> >> > I attribute this  to the HBase client implementation which seems
>> to be
>> >> >> not scalable (I am going dig into client later on today).
>> >> >> >
>> >> >> > Some numbers: The maximum what I could get from Single get (60
>> >> threads):
>> >> >> 30K per sec. Multiget gives ~ 75K (60 threads)
>> >> >> >
>> >> >> > What are my options? I want to measure the limits and I do not
>> want to
>> >> >> run Cluster of clients against just ONE Region Server?
>> >> >> >
>> >> >> > RS config: 96GB RAM, 16(32) CPU
>> >> >> > Client     : 48GB RAM   8 (16) CPU
>> >> >> >
>> >> >> > Best regards,
>> >> >> > Vladimir Rodionov
>> >> >> > Principal Platform Engineer
>> >> >> > Carrier IQ, www.carrieriq.com
>> >> >> > e-mail: vrodionov@carrieriq.com
>> >> >> >
>> >> >> >
>> >> >> > Confidentiality Notice:  The information contained in this
>> message,
>> >> >> including any attachments hereto, may be confidential and is
>> intended
>> >> to be
>> >> >> read only by the individual or entity to whom this message is
>> >> addressed. If
>> >> >> the reader of this message is not the intended recipient or an
>> agent or
>> >> >> designee of the intended recipient, please note that any review,
>> use,
>> >> >> disclosure or distribution of this message or its attachments, in
>> any
>> >> form,
>> >> >> is strictly prohibited.  If you have received this message in error,
>> >> please
>> >> >> immediately notify the sender and/or Notifications@carrieriq.comand
>> >> >> delete or destroy any copy of this message and its attachments.
>> >> >>
>> >>
>>
>
>

Re: HBase read perfomnance and HBase client

Posted by Vladimir Rodionov <vl...@gmail.com>.
Exactly, but this thread dump is from RS under load nevertheless (you can
see that one thread is in JAVA and reading data from socket)


On Tue, Jul 30, 2013 at 1:35 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> FWIW nothing is happening in that thread dump.
>
> J-D
>
> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov
> <vl...@gmail.com> wrote:
> > Sure, here it is:
> >
> > http://pastebin.com/8TjyrKRT
> >
> > epoll is not only to read/write HDFS but to connect/listen to clients as
> > well?
> >
> >
> > On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <
> jdcryans@apache.org>wrote:
> >
> >> Can you show us what the thread dump looks like when the threads are
> >> BLOCKED? There aren't that many locks on the read path when reading
> >> out of the block cache, and epoll would only happen if you need to hit
> >> HDFS, which you're saying is not happening.
> >>
> >> J-D
> >>
> >> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
> >> <vl...@gmail.com> wrote:
> >> > I am hitting data in a block cache, of course. The data set is very
> small
> >> > to fit comfortably into block cache and all request are directed to
> the
> >> > same Region to guarantee single RS testing.
> >> >
> >> > To Ted:
> >> >
> >> > Yes, its CDH 4.3 . What the difference between 94.10 and 94.6 with
> >> respect
> >> > to read performance?
> >> >
> >> >
> >> > On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <
> >> jdcryans@apache.org>wrote:
> >> >
> >> >> That's a tough one.
> >> >>
> >> >> One thing that comes to mind is socket reuse. It used to come up more
> >> >> more often but this is an issue that people hit when doing loads of
> >> >> random reads. Try enabling tcp_tw_recycle but I'm not guaranteeing
> >> >> anything :)
> >> >>
> >> >> Also if you _just_ want to saturate something, be it CPU or network,
> >> >> wouldn't it be better to hit data only in the block cache? This way
> it
> >> >> has the lowest overhead?
> >> >>
> >> >> Last thing I wanted to mention is that yes, the client doesn't scale
> >> >> very well. I would suggest you give the asynchbase client a run.
> >> >>
> >> >> J-D
> >> >>
> >> >> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
> >> >> <vr...@carrieriq.com> wrote:
> >> >> > I have been doing quite extensive testing of different read
> scenarios:
> >> >> >
> >> >> > 1. blockcache disabled/enabled
> >> >> > 2. data is local/remote (no good hdfs locality)
> >> >> >
> >> >> > and it turned out that that I can not saturate 1 RS using one
> >> >> (comparable in CPU power and RAM) client host:
> >> >> >
> >> >> >  I am running client app with 60 read threads active (with
> multi-get)
> >> >> that is going to one particular RS and
> >> >> > this RS's load is 100 -150% (out of 3200% available) - it means
> that
> >> >> load is ~5%
> >> >> >
> >> >> > All threads in RS are either in BLOCKED (wait) or in IN_NATIVE
> states
> >> >> (epoll)
> >> >> >
> >> >> > I attribute this  to the HBase client implementation which seems
> to be
> >> >> not scalable (I am going dig into client later on today).
> >> >> >
> >> >> > Some numbers: The maximum what I could get from Single get (60
> >> threads):
> >> >> 30K per sec. Multiget gives ~ 75K (60 threads)
> >> >> >
> >> >> > What are my options? I want to measure the limits and I do not
> want to
> >> >> run Cluster of clients against just ONE Region Server?
> >> >> >
> >> >> > RS config: 96GB RAM, 16(32) CPU
> >> >> > Client     : 48GB RAM   8 (16) CPU
> >> >> >
> >> >> > Best regards,
> >> >> > Vladimir Rodionov
> >> >> > Principal Platform Engineer
> >> >> > Carrier IQ, www.carrieriq.com
> >> >> > e-mail: vrodionov@carrieriq.com
> >> >> >
> >> >> >
> >> >> > Confidentiality Notice:  The information contained in this message,
> >> >> including any attachments hereto, may be confidential and is intended
> >> to be
> >> >> read only by the individual or entity to whom this message is
> >> addressed. If
> >> >> the reader of this message is not the intended recipient or an agent
> or
> >> >> designee of the intended recipient, please note that any review, use,
> >> >> disclosure or distribution of this message or its attachments, in any
> >> form,
> >> >> is strictly prohibited.  If you have received this message in error,
> >> please
> >> >> immediately notify the sender and/or Notifications@carrieriq.com and
> >> >> delete or destroy any copy of this message and its attachments.
> >> >>
> >>
>

Re: HBase read perfomnance and HBase client

Posted by Jean-Daniel Cryans <jd...@apache.org>.
FWIW nothing is happening in that thread dump.

J-D

On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov
<vl...@gmail.com> wrote:
> Sure, here it is:
>
> http://pastebin.com/8TjyrKRT
>
> epoll is not only to read/write HDFS but to connect/listen to clients as
> well?
>
>
> On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> Can you show us what the thread dump looks like when the threads are
>> BLOCKED? There aren't that many locks on the read path when reading
>> out of the block cache, and epoll would only happen if you need to hit
>> HDFS, which you're saying is not happening.
>>
>> J-D
>>
>> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
>> <vl...@gmail.com> wrote:
>> > I am hitting data in a block cache, of course. The data set is very small
>> > to fit comfortably into block cache and all request are directed to the
>> > same Region to guarantee single RS testing.
>> >
>> > To Ted:
>> >
>> > Yes, its CDH 4.3 . What the difference between 94.10 and 94.6 with
>> respect
>> > to read performance?
>> >
>> >
>> > On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <
>> jdcryans@apache.org>wrote:
>> >
>> >> That's a tough one.
>> >>
>> >> One thing that comes to mind is socket reuse. It used to come up more
>> >> more often but this is an issue that people hit when doing loads of
>> >> random reads. Try enabling tcp_tw_recycle but I'm not guaranteeing
>> >> anything :)
>> >>
>> >> Also if you _just_ want to saturate something, be it CPU or network,
>> >> wouldn't it be better to hit data only in the block cache? This way it
>> >> has the lowest overhead?
>> >>
>> >> Last thing I wanted to mention is that yes, the client doesn't scale
>> >> very well. I would suggest you give the asynchbase client a run.
>> >>
>> >> J-D
>> >>
>> >> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
>> >> <vr...@carrieriq.com> wrote:
>> >> > I have been doing quite extensive testing of different read scenarios:
>> >> >
>> >> > 1. blockcache disabled/enabled
>> >> > 2. data is local/remote (no good hdfs locality)
>> >> >
>> >> > and it turned out that that I can not saturate 1 RS using one
>> >> (comparable in CPU power and RAM) client host:
>> >> >
>> >> >  I am running client app with 60 read threads active (with multi-get)
>> >> that is going to one particular RS and
>> >> > this RS's load is 100 -150% (out of 3200% available) - it means that
>> >> load is ~5%
>> >> >
>> >> > All threads in RS are either in BLOCKED (wait) or in IN_NATIVE states
>> >> (epoll)
>> >> >
>> >> > I attribute this  to the HBase client implementation which seems to be
>> >> not scalable (I am going dig into client later on today).
>> >> >
>> >> > Some numbers: The maximum what I could get from Single get (60
>> threads):
>> >> 30K per sec. Multiget gives ~ 75K (60 threads)
>> >> >
>> >> > What are my options? I want to measure the limits and I do not want to
>> >> run Cluster of clients against just ONE Region Server?
>> >> >
>> >> > RS config: 96GB RAM, 16(32) CPU
>> >> > Client     : 48GB RAM   8 (16) CPU
>> >> >
>> >> > Best regards,
>> >> > Vladimir Rodionov
>> >> > Principal Platform Engineer
>> >> > Carrier IQ, www.carrieriq.com
>> >> > e-mail: vrodionov@carrieriq.com
>> >> >
>> >> >
>> >> > Confidentiality Notice:  The information contained in this message,
>> >> including any attachments hereto, may be confidential and is intended
>> to be
>> >> read only by the individual or entity to whom this message is
>> addressed. If
>> >> the reader of this message is not the intended recipient or an agent or
>> >> designee of the intended recipient, please note that any review, use,
>> >> disclosure or distribution of this message or its attachments, in any
>> form,
>> >> is strictly prohibited.  If you have received this message in error,
>> please
>> >> immediately notify the sender and/or Notifications@carrieriq.com and
>> >> delete or destroy any copy of this message and its attachments.
>> >>
>>

Re: HBase read perfomnance and HBase client

Posted by Vladimir Rodionov <vl...@gmail.com>.
Sure, here it is:

http://pastebin.com/8TjyrKRT

epoll is not only to read/write HDFS but to connect/listen to clients as
well?


On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> Can you show us what the thread dump looks like when the threads are
> BLOCKED? There aren't that many locks on the read path when reading
> out of the block cache, and epoll would only happen if you need to hit
> HDFS, which you're saying is not happening.
>
> J-D
>
> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
> <vl...@gmail.com> wrote:
> > I am hitting data in a block cache, of course. The data set is very small
> > to fit comfortably into block cache and all request are directed to the
> > same Region to guarantee single RS testing.
> >
> > To Ted:
> >
> > Yes, its CDH 4.3 . What the difference between 94.10 and 94.6 with
> respect
> > to read performance?
> >
> >
> > On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <
> jdcryans@apache.org>wrote:
> >
> >> That's a tough one.
> >>
> >> One thing that comes to mind is socket reuse. It used to come up more
> >> more often but this is an issue that people hit when doing loads of
> >> random reads. Try enabling tcp_tw_recycle but I'm not guaranteeing
> >> anything :)
> >>
> >> Also if you _just_ want to saturate something, be it CPU or network,
> >> wouldn't it be better to hit data only in the block cache? This way it
> >> has the lowest overhead?
> >>
> >> Last thing I wanted to mention is that yes, the client doesn't scale
> >> very well. I would suggest you give the asynchbase client a run.
> >>
> >> J-D
> >>
> >> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
> >> <vr...@carrieriq.com> wrote:
> >> > I have been doing quite extensive testing of different read scenarios:
> >> >
> >> > 1. blockcache disabled/enabled
> >> > 2. data is local/remote (no good hdfs locality)
> >> >
> >> > and it turned out that that I can not saturate 1 RS using one
> >> (comparable in CPU power and RAM) client host:
> >> >
> >> >  I am running client app with 60 read threads active (with multi-get)
> >> that is going to one particular RS and
> >> > this RS's load is 100 -150% (out of 3200% available) - it means that
> >> load is ~5%
> >> >
> >> > All threads in RS are either in BLOCKED (wait) or in IN_NATIVE states
> >> (epoll)
> >> >
> >> > I attribute this  to the HBase client implementation which seems to be
> >> not scalable (I am going dig into client later on today).
> >> >
> >> > Some numbers: The maximum what I could get from Single get (60
> threads):
> >> 30K per sec. Multiget gives ~ 75K (60 threads)
> >> >
> >> > What are my options? I want to measure the limits and I do not want to
> >> run Cluster of clients against just ONE Region Server?
> >> >
> >> > RS config: 96GB RAM, 16(32) CPU
> >> > Client     : 48GB RAM   8 (16) CPU
> >> >
> >> > Best regards,
> >> > Vladimir Rodionov
> >> > Principal Platform Engineer
> >> > Carrier IQ, www.carrieriq.com
> >> > e-mail: vrodionov@carrieriq.com
> >> >
> >> >
> >> > Confidentiality Notice:  The information contained in this message,
> >> including any attachments hereto, may be confidential and is intended
> to be
> >> read only by the individual or entity to whom this message is
> addressed. If
> >> the reader of this message is not the intended recipient or an agent or
> >> designee of the intended recipient, please note that any review, use,
> >> disclosure or distribution of this message or its attachments, in any
> form,
> >> is strictly prohibited.  If you have received this message in error,
> please
> >> immediately notify the sender and/or Notifications@carrieriq.com and
> >> delete or destroy any copy of this message and its attachments.
> >>
>

Re: HBase read perfomnance and HBase client

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Can you show us what the thread dump looks like when the threads are
BLOCKED? There aren't that many locks on the read path when reading
out of the block cache, and epoll would only happen if you need to hit
HDFS, which you're saying is not happening.

J-D

On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
<vl...@gmail.com> wrote:
> I am hitting data in a block cache, of course. The data set is very small
> to fit comfortably into block cache and all request are directed to the
> same Region to guarantee single RS testing.
>
> To Ted:
>
> Yes, its CDH 4.3 . What the difference between 94.10 and 94.6 with respect
> to read performance?
>
>
> On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> That's a tough one.
>>
>> One thing that comes to mind is socket reuse. It used to come up more
>> more often but this is an issue that people hit when doing loads of
>> random reads. Try enabling tcp_tw_recycle but I'm not guaranteeing
>> anything :)
>>
>> Also if you _just_ want to saturate something, be it CPU or network,
>> wouldn't it be better to hit data only in the block cache? This way it
>> has the lowest overhead?
>>
>> Last thing I wanted to mention is that yes, the client doesn't scale
>> very well. I would suggest you give the asynchbase client a run.
>>
>> J-D
>>
>> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
>> <vr...@carrieriq.com> wrote:
>> > I have been doing quite extensive testing of different read scenarios:
>> >
>> > 1. blockcache disabled/enabled
>> > 2. data is local/remote (no good hdfs locality)
>> >
>> > and it turned out that that I can not saturate 1 RS using one
>> (comparable in CPU power and RAM) client host:
>> >
>> >  I am running client app with 60 read threads active (with multi-get)
>> that is going to one particular RS and
>> > this RS's load is 100 -150% (out of 3200% available) - it means that
>> load is ~5%
>> >
>> > All threads in RS are either in BLOCKED (wait) or in IN_NATIVE states
>> (epoll)
>> >
>> > I attribute this  to the HBase client implementation which seems to be
>> not scalable (I am going dig into client later on today).
>> >
>> > Some numbers: The maximum what I could get from Single get (60 threads):
>> 30K per sec. Multiget gives ~ 75K (60 threads)
>> >
>> > What are my options? I want to measure the limits and I do not want to
>> run Cluster of clients against just ONE Region Server?
>> >
>> > RS config: 96GB RAM, 16(32) CPU
>> > Client     : 48GB RAM   8 (16) CPU
>> >
>> > Best regards,
>> > Vladimir Rodionov
>> > Principal Platform Engineer
>> > Carrier IQ, www.carrieriq.com
>> > e-mail: vrodionov@carrieriq.com
>> >
>> >
>> > Confidentiality Notice:  The information contained in this message,
>> including any attachments hereto, may be confidential and is intended to be
>> read only by the individual or entity to whom this message is addressed. If
>> the reader of this message is not the intended recipient or an agent or
>> designee of the intended recipient, please note that any review, use,
>> disclosure or distribution of this message or its attachments, in any form,
>> is strictly prohibited.  If you have received this message in error, please
>> immediately notify the sender and/or Notifications@carrieriq.com and
>> delete or destroy any copy of this message and its attachments.
>>

Re: HBase read perfomnance and HBase client

Posted by Vladimir Rodionov <vl...@gmail.com>.
I am hitting data in a block cache, of course. The data set is very small
to fit comfortably into block cache and all request are directed to the
same Region to guarantee single RS testing.

To Ted:

Yes, its CDH 4.3 . What the difference between 94.10 and 94.6 with respect
to read performance?


On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> That's a tough one.
>
> One thing that comes to mind is socket reuse. It used to come up more
> more often but this is an issue that people hit when doing loads of
> random reads. Try enabling tcp_tw_recycle but I'm not guaranteeing
> anything :)
>
> Also if you _just_ want to saturate something, be it CPU or network,
> wouldn't it be better to hit data only in the block cache? This way it
> has the lowest overhead?
>
> Last thing I wanted to mention is that yes, the client doesn't scale
> very well. I would suggest you give the asynchbase client a run.
>
> J-D
>
> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
> <vr...@carrieriq.com> wrote:
> > I have been doing quite extensive testing of different read scenarios:
> >
> > 1. blockcache disabled/enabled
> > 2. data is local/remote (no good hdfs locality)
> >
> > and it turned out that that I can not saturate 1 RS using one
> (comparable in CPU power and RAM) client host:
> >
> >  I am running client app with 60 read threads active (with multi-get)
> that is going to one particular RS and
> > this RS's load is 100 -150% (out of 3200% available) - it means that
> load is ~5%
> >
> > All threads in RS are either in BLOCKED (wait) or in IN_NATIVE states
> (epoll)
> >
> > I attribute this  to the HBase client implementation which seems to be
> not scalable (I am going dig into client later on today).
> >
> > Some numbers: The maximum what I could get from Single get (60 threads):
> 30K per sec. Multiget gives ~ 75K (60 threads)
> >
> > What are my options? I want to measure the limits and I do not want to
> run Cluster of clients against just ONE Region Server?
> >
> > RS config: 96GB RAM, 16(32) CPU
> > Client     : 48GB RAM   8 (16) CPU
> >
> > Best regards,
> > Vladimir Rodionov
> > Principal Platform Engineer
> > Carrier IQ, www.carrieriq.com
> > e-mail: vrodionov@carrieriq.com
> >
> >
> > Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>

Re: HBase read perfomnance and HBase client

Posted by Jean-Daniel Cryans <jd...@apache.org>.
That's a tough one.

One thing that comes to mind is socket reuse. It used to come up more
more often but this is an issue that people hit when doing loads of
random reads. Try enabling tcp_tw_recycle but I'm not guaranteeing
anything :)

Also if you _just_ want to saturate something, be it CPU or network,
wouldn't it be better to hit data only in the block cache? This way it
has the lowest overhead?

Last thing I wanted to mention is that yes, the client doesn't scale
very well. I would suggest you give the asynchbase client a run.

J-D

On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
<vr...@carrieriq.com> wrote:
> I have been doing quite extensive testing of different read scenarios:
>
> 1. blockcache disabled/enabled
> 2. data is local/remote (no good hdfs locality)
>
> and it turned out that that I can not saturate 1 RS using one (comparable in CPU power and RAM) client host:
>
>  I am running client app with 60 read threads active (with multi-get) that is going to one particular RS and
> this RS's load is 100 -150% (out of 3200% available) - it means that load is ~5%
>
> All threads in RS are either in BLOCKED (wait) or in IN_NATIVE states (epoll)
>
> I attribute this  to the HBase client implementation which seems to be not scalable (I am going dig into client later on today).
>
> Some numbers: The maximum what I could get from Single get (60 threads): 30K per sec. Multiget gives ~ 75K (60 threads)
>
> What are my options? I want to measure the limits and I do not want to run Cluster of clients against just ONE Region Server?
>
> RS config: 96GB RAM, 16(32) CPU
> Client     : 48GB RAM   8 (16) CPU
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
>
> Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

Re: HBase read perfomnance and HBase client

Posted by Stack <st...@duboce.net>.
On Tue, Jul 30, 2013 at 11:25 AM, Ted Yu <yu...@gmail.com> wrote:
...

>
> I wonder if 0.94.10 would make a difference.
>
>
Folks come to these lists looking for expertise, not "wonderings".  I
suggest that if you have nothing to offer, do not make a reply.  The world
is full of distractions and noise.  There is no need to add to it.

St.Ack

Re: HBase read perfomnance and HBase client

Posted by Ted Yu <yu...@gmail.com>.
Was this obtained on top of 0.94.6 ?

I wonder if 0.94.10 would make a difference.

On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov <vrodionov@carrieriq.com
> wrote:

> I have been doing quite extensive testing of different read scenarios:
>
> 1. blockcache disabled/enabled
> 2. data is local/remote (no good hdfs locality)
>
> and it turned out that that I can not saturate 1 RS using one (comparable
> in CPU power and RAM) client host:
>
>  I am running client app with 60 read threads active (with multi-get) that
> is going to one particular RS and
> this RS's load is 100 -150% (out of 3200% available) - it means that load
> is ~5%
>
> All threads in RS are either in BLOCKED (wait) or in IN_NATIVE states
> (epoll)
>
> I attribute this  to the HBase client implementation which seems to be not
> scalable (I am going dig into client later on today).
>
> Some numbers: The maximum what I could get from Single get (60 threads):
> 30K per sec. Multiget gives ~ 75K (60 threads)
>
> What are my options? I want to measure the limits and I do not want to run
> Cluster of clients against just ONE Region Server?
>
> RS config: 96GB RAM, 16(32) CPU
> Client     : 48GB RAM   8 (16) CPU
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>

Re: HBase read perfomnance and HBase client

Posted by lars hofhansl <la...@apache.org>.
Try to set "hbase.client.ipc.pool.size" to a larger number on your client. This is the number of TCP connections the client will maintain for each region server it talks to. Default is 1.
Could also play with the pool type, but I would leave that at "round robin".


-- Lars



----- Original Message -----
From: Vladimir Rodionov <vr...@carrieriq.com>
To: "dev@hbase.apache.org" <de...@hbase.apache.org>
Cc: 
Sent: Tuesday, July 30, 2013 11:23 AM
Subject: HBase read perfomnance and HBase client

I have been doing quite extensive testing of different read scenarios:

1. blockcache disabled/enabled
2. data is local/remote (no good hdfs locality)

and it turned out that that I can not saturate 1 RS using one (comparable in CPU power and RAM) client host:

I am running client app with 60 read threads active (with multi-get) that is going to one particular RS and
this RS's load is 100 -150% (out of 3200% available) - it means that load is ~5%

All threads in RS are either in BLOCKED (wait) or in IN_NATIVE states (epoll)

I attribute this  to the HBase client implementation which seems to be not scalable (I am going dig into client later on today).

Some numbers: The maximum what I could get from Single get (60 threads): 30K per sec. Multiget gives ~ 75K (60 threads)

What are my options? I want to measure the limits and I do not want to run Cluster of clients against just ONE Region Server?

RS config: 96GB RAM, 16(32) CPU
Client     : 48GB RAM   8 (16) CPU

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com


Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.