You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by "Hiller, Dean (Contractor)" <de...@broadridge.com> on 2011/02/27 19:29:23 UTC

asynch api mailing list?

I had a question on the hbase asynch api.

 

I find it odd that there is a PleaseThrottleException as on most asynch
systems I have worked on, each node has a queue and when the queue fills
up, the nic buffer in it fills up which means the remote nic buffer then
fills up and then the client should be blocking all writes which means
his incoming buffer fills up, etc. etc. (or he can spin needlessly but
that is usually a very very bad choice...in fact, I have never seen that
work out well and always was reverted).

 

The nice thing I always found with asynch systems is you completely
control your memory footprint of the server with the incoming
queue...direct relationship between that queue size and the memory of
the node used.  The other things that was always done was asynch reads
but always do synch writes(or block on a lock if write can't go through
until it can go through to slow down the upstream system and throttle
it).....ie. it is a self throttling system when done this way so there
is no need for a PleaseThrottleException which I find odd.

 

Maybe I am missing something though????  As a client, I definitely want
async reads but my writes should only return if there was room in the
nic buffer to write it out, otherwise it should block and hold up my
client so my client doesn't have to do any extra coding for a
PleaseThrottleException.

 

In this solution, most of my writes will be asynch right up until hbase
starts becoming a bottleneck.

 

Thanks,

Dean

 

 


This message and any attachments are intended only for the use of the addressee and
may contain information that is privileged and confidential. If the reader of the 
message is not the intended recipient or an authorized representative of the
intended recipient, you are hereby notified that any dissemination of this
communication is strictly prohibited. If you have received this communication in
error, please notify us immediately by e-mail and delete the message and any
attachments from your system.

Re: asynch api mailing list?

Posted by tsuna <ts...@gmail.com>.

On Sun, Feb 27, 2011 at 10:29 AM, Hiller, Dean  (Contractor)
<de...@broadridge.com> wrote:
> I had a question on the hbase asynch api.

Unless other people do not want to discuss asynchbase stuff on this
mailing list, I'm happy to discuss any issue or question here.  The
good thing about using a single list is that it makes it easier to
exchange ideas.

> Maybe I am missing something though????  As a client, I definitely want
> async reads but my writes should only return if there was room in the
> nic buffer to write it out, otherwise it should block and hold up my
> client so my client doesn't have to do any extra coding for a
> PleaseThrottleException.

asynchbase is non-blocking.  It tries hard to never ever block you.
This is very important for high-throughput, low-latency serving
systems.  If you produce writes faster than they can go through the
system, asynchbase will buffer them only up to a certain point
(hardcoded in the code: 10000 edits per region).  Beyond this point,
asynchbase will ask you to throttle yourself since it doesn't wanna
block you to throttle you.

If you have a user-facing server, and this server is accepting a lot
of edits, and because of a problem with this server it's unable to
process edits as fast as it's receiving them, then blocking the server
will cause unpredictable consequences, depending on how you use the
HBase client.  Most likely all your threads will end up blocked and
your server will look like it locked up.

By instead generating the PleaseThrottleException, the asynchbase
client is giving you a chance of becoming aware of the problem and
letting you take action, based on your application's specific needs.
For instance you could take this chance to tell some load-balancing
system sitting in front of your server that you're overwhelmed and
that you would like to receive less traffic.  Or you could close some
persistent connections with some of your clients to force them to
re-open a connection, which will hopefully direct them to another
backend.  Or you could stop accepting new connections until you're no
longer overloaded.

The HBase client has no idea what you do with HBase or how your
application is written, so the most reasonable thing to do IMO is to
throw a PleaseThrottleException and let the application Do The Right
Thing.  Blocking is not an option, especially for a client that
promises to be non-blocking.

-- 
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com

Re: asynch api mailing list?

Posted by tsuna <ts...@gmail.com>.

On Sun, Feb 27, 2011 at 1:15 PM, Edward Capriolo <ed...@gmail.com> wrote:
> Sorry to hijack the thread but I noticed thrift 5.0 generates both
> synchronous and asyn stubs. Why would someone pick async-hbase vs
> thrift-async?

If you don't wanna use Thrift.  Or if you want a real HBase client
that supports the HBase API, and not only whatever API is exposed
through Thrift (IIRC, you can't do everything through the Thrift API,
although I'm sure this could easily be fixed).

If you use the Thrift gateway, you still end up using the normal HBase
client on the Thrift gateway anyway.  The reason I wrote asynchase is
so I could have a thread-safe and highly scalable client.  asynchbase
uses far fewer locks and produces less garbage for the GC to collect,
which makes it more suitable for low-latency high-throughput servers
that want to use HBase as backend.

Here's the result of a micro-benchmark I ran in late November at
StumbleUpon.  It shows that asynchbase is 40% faster on those super
straightforward microbenchmarks:

Simple edit
===========
HBase:       real=1005.1; user=539.5; sys=36.0; csw=230.8; icsw=55.7;
syscalls=7915; futex=754; rss=38488;
HBase Async: real=606.7; user=428.0; sys=40.5; csw=165.6; icsw=51.8;
syscalls=8074; futex=546; rss=36825;

Simple get
==========
HBase:       real=1046.0; user=532.0; sys=38.0; csw=232.7; icsw=47.2;
syscalls=7920; futex=700; rss=38511;
HBase Async: real=614.0; user=429.0; sys=35.5; csw=166.2; icsw=51.5;
syscalls=8067; futex=570; rss=36730;

Multiple edits
==============
HBase:       real=1193.5; user=587.5; sys=42.5; csw=731.5; icsw=60.8;
syscalls=10190; futex=2229; rss=39107;
HBase Async: real=626.3; user=439.5; sys=40.5; csw=181.7; icsw=49.9;
syscalls=8187; futex=580; rss=36811;

The tests are done on a clean, truncated table.  The first test simply
creates a Put and stores it.  The second reads it back.  The third
creates 100 Put, each for a different row key, but they all fall in
the same region (since the table was truncated, it has only a single
region).  The timings are end-to-end, including JVM startup time.
Nothing fancy going on, just a trivial main function of a few lines,
no other threads, no JVM tuning parameters.  Not a single GC cycle
happens during those short tests.  Values are averages over 40 runs.

As you can see, asynchbase spends only slightly less user+system time
than HBase's traditional client, but the wall clock time (real) is
significantly higher when using HTable.  I think this is entirely
attributed to context switches (csw = number of context switches, icsw
= number of involuntary context switches).  The number of calls to
futex is consistently higher with HTable, which indicates higher lock
contention (and causes more context switches).  Other fields are:
syscalls = number of system calls executed by the process, rss =
actual memory used ("resident set size") in KB before exiting main.

-- 
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com

Re: asynch api mailing list?

Posted by Edward Capriolo <ed...@gmail.com>.

On Sun, Feb 27, 2011 at 1:29 PM, Hiller, Dean  (Contractor)
<de...@broadridge.com> wrote:
> I had a question on the hbase asynch api.
>
>
>
> I find it odd that there is a PleaseThrottleException as on most asynch
> systems I have worked on, each node has a queue and when the queue fills
> up, the nic buffer in it fills up which means the remote nic buffer then
> fills up and then the client should be blocking all writes which means
> his incoming buffer fills up, etc. etc. (or he can spin needlessly but
> that is usually a very very bad choice...in fact, I have never seen that
> work out well and always was reverted).
>
>
>
> The nice thing I always found with asynch systems is you completely
> control your memory footprint of the server with the incoming
> queue...direct relationship between that queue size and the memory of
> the node used.  The other things that was always done was asynch reads
> but always do synch writes(or block on a lock if write can't go through
> until it can go through to slow down the upstream system and throttle
> it).....ie. it is a self throttling system when done this way so there
> is no need for a PleaseThrottleException which I find odd.
>
>
>
> Maybe I am missing something though????  As a client, I definitely want
> async reads but my writes should only return if there was room in the
> nic buffer to write it out, otherwise it should block and hold up my
> client so my client doesn't have to do any extra coding for a
> PleaseThrottleException.
>
>
>
> In this solution, most of my writes will be asynch right up until hbase
> starts becoming a bottleneck.
>
>
>
> Thanks,
>
> Dean
>
>
>
>
>
>
> This message and any attachments are intended only for the use of the addressee and
> may contain information that is privileged and confidential. If the reader of the
> message is not the intended recipient or an authorized representative of the
> intended recipient, you are hereby notified that any dissemination of this
> communication is strictly prohibited. If you have received this communication in
> error, please notify us immediately by e-mail and delete the message and any
> attachments from your system.
>
>

Sorry to hijack the thread but I noticed thrift 5.0 generates both
synchronous and asyn stubs. Why would someone pick async-hbase vs
thrift-async?