You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Mi...@high5games.com on 2013/11/04 04:46:49 UTC

HBase Client Performance Bottleneck in a Single Virtual Machine

Hi all; I posted this as a question on StackOverflow as well but realized I should have gone straight ot the horses-mouth with my question. Sorry for the double post!

We are running a series of HBase tests to see if we can migrate one of our existing datasets from a RDBMS to HBase. We are running 15 nodes with 5 zookeepers and HBase 0.94.12 for this test.

We have a single table with three column families and a key that is distributing very well across the cluster. All of our queries are running a direct look-up; no searching or scanning. Since the HTablePool is now frowned upon, we are using the Apache commons pool and a simple connection factory to create a pool of connections and use them in our threads. Each thread creates an HTableInstance as needed and closes it when done. There are no leaks we can identify.

If we run a single thread and just do lots of random calls sequentially, the performance is quite good. Everything works great until we start trying to scale the performance. As we add more threads and try and get more work done in a single VM, we start seeing performance degrade quickly. The client code is simply attempting to run either one of several gets or a single put at a given frequency. It then waits until the next time to run, we use this to simulate the workload from external clients. With a single thread, we will see call times in the 2-3 milliseconds which is acceptable.

As we add more threads, this call time starts increasing quickly. What gets strange is if we add more VMs, the times hold steady across them all so clearly it's a bottleneck in the running instance and not the cluster. We can get a huge amount of processing happening across the cluster very easily - it just has to use a lot of VMs on the client side to do it. We know the contention isn't in the connection pool as we see the problem even when we have more connections than threads. Unfortunately, the times are spiraling out of control very quickly. We need it to support at least 128 threads in practice, but most important I want to support 500 updates/sec and 250 gets/sec. In theory, this should be a piece of cake for the cluster as we can do FAR more work than that with a few VMs, but we don't even get close to this with a single VM.

So my question: how do people building high-performance apps with HBase get around this? What approach are others using for connection pooling in a multi-threaded environment? There seems to be a surprisingly little amount of info about this on the web considering the popularity. Is there some client setting we need to use that makes it perform better in a threaded environment? We are going to try to cache HTable instances next but that's a total guess. There are solutions to offloading work to other VMs but we really want to avoid this as clearly the cluster can handle the load and it will dramatically decrease the application performance in critical areas.

Any help is greatly appreciated! Thanks!
-Mike

RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Posted by Mi...@high5games.com.

I've gone through the code in detail - we are using unmanaged connections and they are not being closed when the table is closed. Thanks!

From: lars hofhansl [mailto:larsh@apache.org]
Sent: Monday, November 04, 2013 12:03 PM
To: Michael Grundvig; user@hbase.apache.org
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine

HConnectionManager.createConnection is a different API creating an "unmanaged" connection. If you're not using that each HTable.close() might close the underlying connection.

-- Lars

________________________________
From: "Michael.Grundvig@high5games.com<ma...@high5games.com>" <Mi...@high5games.com>>
To: user@hbase.apache.org<ma...@hbase.apache.org>; larsh@apache.org<ma...@apache.org>
Sent: Sunday, November 3, 2013 9:36 PM
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Hi Lars, at application startup the pool is created with X number of connections using the first method you indicated: HConnectionManager.createConnection(conf). We store each connection in the pool automatically and serve it up to threads as they request it. When a thread is done using the connection, they return it back to the pool. The connections are not be created and closed per thread, but only once for the entire application. We are using the GenericObjectPool from Apache Commons Pooling as the foundation of our connection pooling approach. Our entire pool implementation really consists of just a couple overridden methods to specify how to create a new connection and close it. The GenericObjectPool class does all the rest. See here for details:  http://commons.apache.org/proper/commons-pool/

Each thread is getting a HTableInstance as needed and then closing it when done. The only thing we are not doing is using the createConnection method that takes in an ExecutorService as that wouldn't work in our model. Our app is like a web application - the thread pool is managed outside the scope of our application code so we can't assume the service is available at connection creation time. Thanks!

-Mike

-----Original Message-----
From: lars hofhansl [mailto:larsh@apache.org<ma...@apache.org>]
Sent: Sunday, November 03, 2013 11:27 PM
To: user@hbase.apache.org<ma...@hbase.apache.org>
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Hi Micheal,

can you try to create a single HConnection in your client:
HConnectionManager.createConnection(Configuration conf) or HConnectionManager.createConnection(Configuration conf, ExecutorService pool)

Then use HConnection.getTable(...) each time you need to do an operation.

I.e.
Configuration conf = ...;
ExecutorService pool = ...;
// create a single HConnection for you vm.
HConnection con = HConnectionManager.createConnection(Configuration conf, ExecutorService pool); // reuse the connection for many tables, even in different threads HTableInterface table = con.getTable(...); // use table even for only a few operation.
table.close();
...
HTableInterface table = con.getTable(...); // use table even for only a few operation.
table.close();
...
// at the end close the connection
con.close();

-- Lars

________________________________
From: "Michael.Grundvig@high5games.com<ma...@high5games.com>" <Mi...@high5games.com>>
To: user@hbase.apache.org<ma...@hbase.apache.org>
Sent: Sunday, November 3, 2013 7:46 PM
Subject: HBase Client Performance Bottleneck in a Single Virtual Machine

Hi all; I posted this as a question on StackOverflow as well but realized I should have gone straight ot the horses-mouth with my question. Sorry for the double post!

We are running a series of HBase tests to see if we can migrate one of our existing datasets from a RDBMS to HBase. We are running 15 nodes with 5 zookeepers and HBase 0.94.12 for this test.

We have a single table with three column families and a key that is distributing very well across the cluster. All of our queries are running a direct look-up; no searching or scanning. Since the HTablePool is now frowned upon, we are using the Apache commons pool and a simple connection factory to create a pool of connections and use them in our threads. Each thread creates an HTableInstance as needed and closes it when done. There are no leaks we can identify.

If we run a single thread and just do lots of random calls sequentially, the performance is quite good. Everything works great until we start trying to scale the performance. As we add more threads and try and get more work done in a single VM, we start seeing performance degrade quickly. The client code is simply attempting to run either one of several gets or a single put at a given frequency. It then waits until the next time to run, we use this to simulate the workload from external clients. With a single thread, we will see call times in the 2-3 milliseconds which is acceptable.

As we add more threads, this call time starts increasing quickly. What gets strange is if we add more VMs, the times hold steady across them all so clearly it's a bottleneck in the running instance and not the cluster. We can get a huge amount of processing happening across the cluster very easily - it just has to use a lot of VMs on the client side to do it. We know the contention isn't in the connection pool as we see the problem even when we have more connections than threads. Unfortunately, the times are spiraling out of control very quickly. We need it to support at least 128 threads in practice, but most important I want to support 500 updates/sec and 250 gets/sec. In theory, this should be a piece of cake for the cluster as we can do FAR more work than that with a few VMs, but we don't even get close to this with a single VM.

So my question: how do people building high-performance apps with HBase get around this? What approach are others using for connection pooling in a multi-threaded environment? There seems to be a surprisingly little amount of info about this on the web considering the popularity. Is there some client setting we need to use that makes it perform better in a threaded environment? We are going to try to cache HTable instances next but that's a total guess. There are solutions to offloading work to other VMs but we really want to avoid this as clearly the cluster can handle the load and it will dramatically decrease the application performance in critical areas.

Any help is greatly appreciated! Thanks!
-Mike

Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Posted by lars hofhansl <la...@apache.org>.

HConnectionManager.createConnection is a different API creating an "unmanaged" connection. If you're not using that each HTable.close() might close the underlying connection.

-- Lars

________________________________
 From: "Michael.Grundvig@high5games.com" <Mi...@high5games.com>
To: user@hbase.apache.org; larsh@apache.org 
Sent: Sunday, November 3, 2013 9:36 PM
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Hi Lars, at application startup the pool is created with X number of connections using the first method you indicated: HConnectionManager.createConnection(conf). We store each connection in the pool automatically and serve it up to threads as they request it. When a thread is done using the connection, they return it back to the pool. The connections are not be created and closed per thread, but only once for the entire application. We are using the GenericObjectPool from Apache Commons Pooling as the foundation of our connection pooling approach. Our entire pool implementation really consists of just a couple overridden methods to specify how to create a new connection and close it. The GenericObjectPool class does all the rest. See here for details:  http://commons.apache.org/proper/commons-pool/

Each thread is getting a HTableInstance as needed and then closing it when done. The only thing we are not doing is using the createConnection method that takes in an ExecutorService as that wouldn't work in our model. Our app is like a web application - the thread pool is managed outside the scope of our application code so we can't assume the service is available at connection creation time. Thanks!

-Mike

-----Original Message-----
From: lars hofhansl [mailto:larsh@apache.org] 
Sent: Sunday, November 03, 2013 11:27 PM
To: user@hbase.apache.org
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Hi Micheal,

can you try to create a single HConnection in your client:
HConnectionManager.createConnection(Configuration conf) or HConnectionManager.createConnection(Configuration conf, ExecutorService pool)

Then use HConnection.getTable(...) each time you need to do an operation.

I.e.
Configuration conf = ...;
ExecutorService pool = ...;
// create a single HConnection for you vm.
HConnection con = HConnectionManager.createConnection(Configuration conf, ExecutorService pool); // reuse the connection for many tables, even in different threads HTableInterface table = con.getTable(...); // use table even for only a few operation.
table.close();
...
HTableInterface table = con.getTable(...); // use table even for only a few operation.
table.close();
...
// at the end close the connection
con.close();

-- Lars

________________________________
From: "Michael.Grundvig@high5games.com" <Mi...@high5games.com>
To: user@hbase.apache.org
Sent: Sunday, November 3, 2013 7:46 PM
Subject: HBase Client Performance Bottleneck in a Single Virtual Machine

Hi all; I posted this as a question on StackOverflow as well but realized I should have gone straight ot the horses-mouth with my question. Sorry for the double post!

We are running a series of HBase tests to see if we can migrate one of our existing datasets from a RDBMS to HBase. We are running 15 nodes with 5 zookeepers and HBase 0.94.12 for this test.

We have a single table with three column families and a key that is distributing very well across the cluster. All of our queries are running a direct look-up; no searching or scanning. Since the HTablePool is now frowned upon, we are using the Apache commons pool and a simple connection factory to create a pool of connections and use them in our threads. Each thread creates an HTableInstance as needed and closes it when done. There are no leaks we can identify.

If we run a single thread and just do lots of random calls sequentially, the performance is quite good. Everything works great until we start trying to scale the performance. As we add more threads and try and get more work done in a single VM, we start seeing performance degrade quickly. The client code is simply attempting to run either one of several gets or a single put at a given frequency. It then waits until the next time to run, we use this to simulate the workload from external clients. With a single thread, we will see call times in the 2-3 milliseconds which is acceptable.

As we add more threads, this call time starts increasing quickly. What gets strange is if we add more VMs, the times hold steady across them all so clearly it's a bottleneck in the running instance and not the cluster. We can get a huge amount of processing happening across the cluster very easily - it just has to use a lot of VMs on the client side to do it. We know the contention isn't in the connection pool as we see the problem even when we have more connections than threads. Unfortunately, the times are spiraling out of control very quickly. We need it to support at least 128 threads in practice, but most important I want to support 500 updates/sec and 250 gets/sec. In theory, this should be a piece of cake for the cluster as we can do FAR more work than that with a few VMs, but we don't even get close to this with a single VM.

So my question: how do people building high-performance apps with HBase get around this? What approach are others using for connection pooling in a multi-threaded environment? There seems to be a surprisingly little amount of info about this on the web considering the popularity. Is there some client setting we need to use that makes it perform better in a threaded environment? We are going to try to cache HTable instances next but that's a total guess. There are solutions to offloading work to other VMs but we really want to avoid this as clearly the cluster can handle the load and it will dramatically decrease the application performance in critical areas.

Any help is greatly appreciated! Thanks!
-Mike

Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Posted by Anoop John <an...@gmail.com>.

He uses HConnection.getTable()   which in turn uses the Htable constructor


HTable(*final* *byte*[] tableName, *final* HConnection connection,
*final*ExecutorService pool)

So no worry. on HTable#close() the connection wont get closed   :)



-Anoop-


On Mon, Nov 4, 2013 at 11:29 AM, Sriram Ramachandrasekaran <
sri.rams85@gmail.com> wrote:

> HTable is the implementation of HTableInterface. I was looking at the code
> and it *indeed* closes the underlying resources on close() unless you
> create it with the ExecutorService and HConnection Option that lars
> suggested. Please do take a look at HTable constructors - that might help.
>
> P.S: I verified this on 0.94.6 code base. Hope things haven't changed b/w
> this and your version (0.94.12)
>
>
>
> On Mon, Nov 4, 2013 at 11:21 AM, <Mi...@high5games.com> wrote:
>
> > Our current usage is how I would do this in a typical database app with
> > table acting like a statement. It looks like this:
> >
> > Connection connection = null;
> > HTableInterface table = null;
> > try {
> >         connection = pool.acquire();
> >         table = connection.getTable(tableName);
> >         // Do work
> > } finally {
> >         table.close();
> >         pool.release(connection);
> > }
> >
> > Is this incorrect? The API docs says close " Releases any resources held
> > or pending changes in internal buffers." I didn't interpret that as
> having
> > it close the underlying connection. Thanks!
> >
> > -Mike
> >
> > -----Original Message-----
> > From: Sriram Ramachandrasekaran [mailto:sri.rams85@gmail.com]
> > Sent: Sunday, November 03, 2013 11:43 PM
> > To: user@hbase.apache.org
> > Cc: larsh@apache.org
> > Subject: Re: HBase Client Performance Bottleneck in a Single Virtual
> > Machine
> >
> > Hey Michael - Per API documentation, closing the HTable Instance would
> > close the underlying resources too. Hope you are aware of it.
> >
> >
> > On Mon, Nov 4, 2013 at 11:06 AM, <Mi...@high5games.com>
> wrote:
> >
> > > Hi Lars, at application startup the pool is created with X number of
> > > connections using the first method you indicated:
> > > HConnectionManager.createConnection(conf). We store each connection in
> > > the pool automatically and serve it up to threads as they request it.
> > > When a thread is done using the connection, they return it back to the
> > > pool. The connections are not be created and closed per thread, but
> > > only once for the entire application. We are using the
> > > GenericObjectPool from Apache Commons Pooling as the foundation of our
> > > connection pooling approach. Our entire pool implementation really
> > > consists of just a couple overridden methods to specify how to create
> > > a new connection and close it. The GenericObjectPool class does all the
> > rest. See here for details:
> > > http://commons.apache.org/proper/commons-pool/
> > >
> > > Each thread is getting a HTableInstance as needed and then closing it
> > > when done. The only thing we are not doing is using the
> > > createConnection method that takes in an ExecutorService as that
> > > wouldn't work in our model. Our app is like a web application - the
> > > thread pool is managed outside the scope of our application code so we
> > > can't assume the service is available at connection creation time.
> > Thanks!
> > >
> > > -Mike
> > >
> > >
> > > -----Original Message-----
> > > From: lars hofhansl [mailto:larsh@apache.org]
> > > Sent: Sunday, November 03, 2013 11:27 PM
> > > To: user@hbase.apache.org
> > > Subject: Re: HBase Client Performance Bottleneck in a Single Virtual
> > > Machine
> > >
> > > Hi Micheal,
> > >
> > > can you try to create a single HConnection in your client:
> > > HConnectionManager.createConnection(Configuration conf) or
> > > HConnectionManager.createConnection(Configuration conf,
> > > ExecutorService
> > > pool)
> > >
> > > Then use HConnection.getTable(...) each time you need to do an
> operation.
> > >
> > > I.e.
> > > Configuration conf = ...;
> > > ExecutorService pool = ...;
> > > // create a single HConnection for you vm.
> > > HConnection con = HConnectionManager.createConnection(Configuration
> > > conf, ExecutorService pool); // reuse the connection for many tables,
> > > even in different threads HTableInterface table = con.getTable(...);
> > > // use table even for only a few operation.
> > > table.close();
> > > ...
> > > HTableInterface table = con.getTable(...); // use table even for only
> > > a few operation.
> > > table.close();
> > > ...
> > > // at the end close the connection
> > > con.close();
> > >
> > > -- Lars
> > >
> > >
> > >
> > > ________________________________
> > >  From: "Michael.Grundvig@high5games.com"
> > > <Mi...@high5games.com>
> > > To: user@hbase.apache.org
> > > Sent: Sunday, November 3, 2013 7:46 PM
> > > Subject: HBase Client Performance Bottleneck in a Single Virtual
> > > Machine
> > >
> > >
> > > Hi all; I posted this as a question on StackOverflow as well but
> > > realized I should have gone straight ot the horses-mouth with my
> > > question. Sorry for the double post!
> > >
> > > We are running a series of HBase tests to see if we can migrate one of
> > > our existing datasets from a RDBMS to HBase. We are running 15 nodes
> > > with 5 zookeepers and HBase 0.94.12 for this test.
> > >
> > > We have a single table with three column families and a key that is
> > > distributing very well across the cluster. All of our queries are
> > > running a direct look-up; no searching or scanning. Since the
> > > HTablePool is now frowned upon, we are using the Apache commons pool
> > > and a simple connection factory to create a pool of connections and
> > > use them in our threads. Each thread creates an HTableInstance as
> > > needed and closes it when done. There are no leaks we can identify.
> > >
> > > If we run a single thread and just do lots of random calls
> > > sequentially, the performance is quite good. Everything works great
> > > until we start trying to scale the performance. As we add more threads
> > > and try and get more work done in a single VM, we start seeing
> > > performance degrade quickly. The client code is simply attempting to
> > > run either one of several gets or a single put at a given frequency.
> > > It then waits until the next time to run, we use this to simulate the
> > > workload from external clients. With a single thread, we will see call
> > times in the 2-3 milliseconds which is acceptable.
> > >
> > > As we add more threads, this call time starts increasing quickly. What
> > > gets strange is if we add more VMs, the times hold steady across them
> > > all so clearly it's a bottleneck in the running instance and not the
> > cluster.
> > > We can get a huge amount of processing happening across the cluster
> > > very easily - it just has to use a lot of VMs on the client side to do
> > > it. We know the contention isn't in the connection pool as we see the
> > > problem even when we have more connections than threads.
> > > Unfortunately, the times are spiraling out of control very quickly. We
> > > need it to support at least 128 threads in practice, but most
> > > important I want to support 500 updates/sec and 250 gets/sec. In
> > > theory, this should be a piece of cake for the cluster as we can do
> > > FAR more work than that with a few VMs, but we don't even get close to
> > this with a single VM.
> > >
> > > So my question: how do people building high-performance apps with
> > > HBase get around this? What approach are others using for connection
> > > pooling in a multi-threaded environment? There seems to be a
> > > surprisingly little amount of info about this on the web considering
> > > the popularity. Is there some client setting we need to use that makes
> > > it perform better in a threaded environment? We are going to try to
> > > cache HTable instances next but that's a total guess. There are
> > > solutions to offloading work to other VMs but we really want to avoid
> > > this as clearly the cluster can handle the load and it will
> dramatically
> > decrease the application performance in critical areas.
> > >
> > > Any help is greatly appreciated! Thanks!
> > > -Mike
> > >
> >
> >
> >
> > --
> > It's just about how deep your longing is!
> >
>
>
>
> --
> It's just about how deep your longing is!
>

Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Posted by Sriram Ramachandrasekaran <sr...@gmail.com>.

HTable is the implementation of HTableInterface. I was looking at the code
and it *indeed* closes the underlying resources on close() unless you
create it with the ExecutorService and HConnection Option that lars
suggested. Please do take a look at HTable constructors - that might help.

P.S: I verified this on 0.94.6 code base. Hope things haven't changed b/w
this and your version (0.94.12)



On Mon, Nov 4, 2013 at 11:21 AM, <Mi...@high5games.com> wrote:

> Our current usage is how I would do this in a typical database app with
> table acting like a statement. It looks like this:
>
> Connection connection = null;
> HTableInterface table = null;
> try {
>         connection = pool.acquire();
>         table = connection.getTable(tableName);
>         // Do work
> } finally {
>         table.close();
>         pool.release(connection);
> }
>
> Is this incorrect? The API docs says close " Releases any resources held
> or pending changes in internal buffers." I didn't interpret that as having
> it close the underlying connection. Thanks!
>
> -Mike
>
> -----Original Message-----
> From: Sriram Ramachandrasekaran [mailto:sri.rams85@gmail.com]
> Sent: Sunday, November 03, 2013 11:43 PM
> To: user@hbase.apache.org
> Cc: larsh@apache.org
> Subject: Re: HBase Client Performance Bottleneck in a Single Virtual
> Machine
>
> Hey Michael - Per API documentation, closing the HTable Instance would
> close the underlying resources too. Hope you are aware of it.
>
>
> On Mon, Nov 4, 2013 at 11:06 AM, <Mi...@high5games.com> wrote:
>
> > Hi Lars, at application startup the pool is created with X number of
> > connections using the first method you indicated:
> > HConnectionManager.createConnection(conf). We store each connection in
> > the pool automatically and serve it up to threads as they request it.
> > When a thread is done using the connection, they return it back to the
> > pool. The connections are not be created and closed per thread, but
> > only once for the entire application. We are using the
> > GenericObjectPool from Apache Commons Pooling as the foundation of our
> > connection pooling approach. Our entire pool implementation really
> > consists of just a couple overridden methods to specify how to create
> > a new connection and close it. The GenericObjectPool class does all the
> rest. See here for details:
> > http://commons.apache.org/proper/commons-pool/
> >
> > Each thread is getting a HTableInstance as needed and then closing it
> > when done. The only thing we are not doing is using the
> > createConnection method that takes in an ExecutorService as that
> > wouldn't work in our model. Our app is like a web application - the
> > thread pool is managed outside the scope of our application code so we
> > can't assume the service is available at connection creation time.
> Thanks!
> >
> > -Mike
> >
> >
> > -----Original Message-----
> > From: lars hofhansl [mailto:larsh@apache.org]
> > Sent: Sunday, November 03, 2013 11:27 PM
> > To: user@hbase.apache.org
> > Subject: Re: HBase Client Performance Bottleneck in a Single Virtual
> > Machine
> >
> > Hi Micheal,
> >
> > can you try to create a single HConnection in your client:
> > HConnectionManager.createConnection(Configuration conf) or
> > HConnectionManager.createConnection(Configuration conf,
> > ExecutorService
> > pool)
> >
> > Then use HConnection.getTable(...) each time you need to do an operation.
> >
> > I.e.
> > Configuration conf = ...;
> > ExecutorService pool = ...;
> > // create a single HConnection for you vm.
> > HConnection con = HConnectionManager.createConnection(Configuration
> > conf, ExecutorService pool); // reuse the connection for many tables,
> > even in different threads HTableInterface table = con.getTable(...);
> > // use table even for only a few operation.
> > table.close();
> > ...
> > HTableInterface table = con.getTable(...); // use table even for only
> > a few operation.
> > table.close();
> > ...
> > // at the end close the connection
> > con.close();
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> >  From: "Michael.Grundvig@high5games.com"
> > <Mi...@high5games.com>
> > To: user@hbase.apache.org
> > Sent: Sunday, November 3, 2013 7:46 PM
> > Subject: HBase Client Performance Bottleneck in a Single Virtual
> > Machine
> >
> >
> > Hi all; I posted this as a question on StackOverflow as well but
> > realized I should have gone straight ot the horses-mouth with my
> > question. Sorry for the double post!
> >
> > We are running a series of HBase tests to see if we can migrate one of
> > our existing datasets from a RDBMS to HBase. We are running 15 nodes
> > with 5 zookeepers and HBase 0.94.12 for this test.
> >
> > We have a single table with three column families and a key that is
> > distributing very well across the cluster. All of our queries are
> > running a direct look-up; no searching or scanning. Since the
> > HTablePool is now frowned upon, we are using the Apache commons pool
> > and a simple connection factory to create a pool of connections and
> > use them in our threads. Each thread creates an HTableInstance as
> > needed and closes it when done. There are no leaks we can identify.
> >
> > If we run a single thread and just do lots of random calls
> > sequentially, the performance is quite good. Everything works great
> > until we start trying to scale the performance. As we add more threads
> > and try and get more work done in a single VM, we start seeing
> > performance degrade quickly. The client code is simply attempting to
> > run either one of several gets or a single put at a given frequency.
> > It then waits until the next time to run, we use this to simulate the
> > workload from external clients. With a single thread, we will see call
> times in the 2-3 milliseconds which is acceptable.
> >
> > As we add more threads, this call time starts increasing quickly. What
> > gets strange is if we add more VMs, the times hold steady across them
> > all so clearly it's a bottleneck in the running instance and not the
> cluster.
> > We can get a huge amount of processing happening across the cluster
> > very easily - it just has to use a lot of VMs on the client side to do
> > it. We know the contention isn't in the connection pool as we see the
> > problem even when we have more connections than threads.
> > Unfortunately, the times are spiraling out of control very quickly. We
> > need it to support at least 128 threads in practice, but most
> > important I want to support 500 updates/sec and 250 gets/sec. In
> > theory, this should be a piece of cake for the cluster as we can do
> > FAR more work than that with a few VMs, but we don't even get close to
> this with a single VM.
> >
> > So my question: how do people building high-performance apps with
> > HBase get around this? What approach are others using for connection
> > pooling in a multi-threaded environment? There seems to be a
> > surprisingly little amount of info about this on the web considering
> > the popularity. Is there some client setting we need to use that makes
> > it perform better in a threaded environment? We are going to try to
> > cache HTable instances next but that's a total guess. There are
> > solutions to offloading work to other VMs but we really want to avoid
> > this as clearly the cluster can handle the load and it will dramatically
> decrease the application performance in critical areas.
> >
> > Any help is greatly appreciated! Thanks!
> > -Mike
> >
>
>
>
> --
> It's just about how deep your longing is!
>



-- 
It's just about how deep your longing is!

Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Posted by Anoop John <an...@gmail.com>.

If you have used con.getTable() the close on HTable wont close the
underlying connection

-Anoop-

On Mon, Nov 4, 2013 at 11:21 AM, <Mi...@high5games.com> wrote:

> Our current usage is how I would do this in a typical database app with
> table acting like a statement. It looks like this:
>
> Connection connection = null;
> HTableInterface table = null;
> try {
>         connection = pool.acquire();
>         table = connection.getTable(tableName);
>         // Do work
> } finally {
>         table.close();
>         pool.release(connection);
> }
>
> Is this incorrect? The API docs says close " Releases any resources held
> or pending changes in internal buffers." I didn't interpret that as having
> it close the underlying connection. Thanks!
>
> -Mike
>
> -----Original Message-----
> From: Sriram Ramachandrasekaran [mailto:sri.rams85@gmail.com]
> Sent: Sunday, November 03, 2013 11:43 PM
> To: user@hbase.apache.org
> Cc: larsh@apache.org
> Subject: Re: HBase Client Performance Bottleneck in a Single Virtual
> Machine
>
> Hey Michael - Per API documentation, closing the HTable Instance would
> close the underlying resources too. Hope you are aware of it.
>
>
> On Mon, Nov 4, 2013 at 11:06 AM, <Mi...@high5games.com> wrote:
>
> > Hi Lars, at application startup the pool is created with X number of
> > connections using the first method you indicated:
> > HConnectionManager.createConnection(conf). We store each connection in
> > the pool automatically and serve it up to threads as they request it.
> > When a thread is done using the connection, they return it back to the
> > pool. The connections are not be created and closed per thread, but
> > only once for the entire application. We are using the
> > GenericObjectPool from Apache Commons Pooling as the foundation of our
> > connection pooling approach. Our entire pool implementation really
> > consists of just a couple overridden methods to specify how to create
> > a new connection and close it. The GenericObjectPool class does all the
> rest. See here for details:
> > http://commons.apache.org/proper/commons-pool/
> >
> > Each thread is getting a HTableInstance as needed and then closing it
> > when done. The only thing we are not doing is using the
> > createConnection method that takes in an ExecutorService as that
> > wouldn't work in our model. Our app is like a web application - the
> > thread pool is managed outside the scope of our application code so we
> > can't assume the service is available at connection creation time.
> Thanks!
> >
> > -Mike
> >
> >
> > -----Original Message-----
> > From: lars hofhansl [mailto:larsh@apache.org]
> > Sent: Sunday, November 03, 2013 11:27 PM
> > To: user@hbase.apache.org
> > Subject: Re: HBase Client Performance Bottleneck in a Single Virtual
> > Machine
> >
> > Hi Micheal,
> >
> > can you try to create a single HConnection in your client:
> > HConnectionManager.createConnection(Configuration conf) or
> > HConnectionManager.createConnection(Configuration conf,
> > ExecutorService
> > pool)
> >
> > Then use HConnection.getTable(...) each time you need to do an operation.
> >
> > I.e.
> > Configuration conf = ...;
> > ExecutorService pool = ...;
> > // create a single HConnection for you vm.
> > HConnection con = HConnectionManager.createConnection(Configuration
> > conf, ExecutorService pool); // reuse the connection for many tables,
> > even in different threads HTableInterface table = con.getTable(...);
> > // use table even for only a few operation.
> > table.close();
> > ...
> > HTableInterface table = con.getTable(...); // use table even for only
> > a few operation.
> > table.close();
> > ...
> > // at the end close the connection
> > con.close();
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> >  From: "Michael.Grundvig@high5games.com"
> > <Mi...@high5games.com>
> > To: user@hbase.apache.org
> > Sent: Sunday, November 3, 2013 7:46 PM
> > Subject: HBase Client Performance Bottleneck in a Single Virtual
> > Machine
> >
> >
> > Hi all; I posted this as a question on StackOverflow as well but
> > realized I should have gone straight ot the horses-mouth with my
> > question. Sorry for the double post!
> >
> > We are running a series of HBase tests to see if we can migrate one of
> > our existing datasets from a RDBMS to HBase. We are running 15 nodes
> > with 5 zookeepers and HBase 0.94.12 for this test.
> >
> > We have a single table with three column families and a key that is
> > distributing very well across the cluster. All of our queries are
> > running a direct look-up; no searching or scanning. Since the
> > HTablePool is now frowned upon, we are using the Apache commons pool
> > and a simple connection factory to create a pool of connections and
> > use them in our threads. Each thread creates an HTableInstance as
> > needed and closes it when done. There are no leaks we can identify.
> >
> > If we run a single thread and just do lots of random calls
> > sequentially, the performance is quite good. Everything works great
> > until we start trying to scale the performance. As we add more threads
> > and try and get more work done in a single VM, we start seeing
> > performance degrade quickly. The client code is simply attempting to
> > run either one of several gets or a single put at a given frequency.
> > It then waits until the next time to run, we use this to simulate the
> > workload from external clients. With a single thread, we will see call
> times in the 2-3 milliseconds which is acceptable.
> >
> > As we add more threads, this call time starts increasing quickly. What
> > gets strange is if we add more VMs, the times hold steady across them
> > all so clearly it's a bottleneck in the running instance and not the
> cluster.
> > We can get a huge amount of processing happening across the cluster
> > very easily - it just has to use a lot of VMs on the client side to do
> > it. We know the contention isn't in the connection pool as we see the
> > problem even when we have more connections than threads.
> > Unfortunately, the times are spiraling out of control very quickly. We
> > need it to support at least 128 threads in practice, but most
> > important I want to support 500 updates/sec and 250 gets/sec. In
> > theory, this should be a piece of cake for the cluster as we can do
> > FAR more work than that with a few VMs, but we don't even get close to
> this with a single VM.
> >
> > So my question: how do people building high-performance apps with
> > HBase get around this? What approach are others using for connection
> > pooling in a multi-threaded environment? There seems to be a
> > surprisingly little amount of info about this on the web considering
> > the popularity. Is there some client setting we need to use that makes
> > it perform better in a threaded environment? We are going to try to
> > cache HTable instances next but that's a total guess. There are
> > solutions to offloading work to other VMs but we really want to avoid
> > this as clearly the cluster can handle the load and it will dramatically
> decrease the application performance in critical areas.
> >
> > Any help is greatly appreciated! Thanks!
> > -Mike
> >
>
>
>
> --
> It's just about how deep your longing is!
>

RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Posted by Mi...@high5games.com.

Our current usage is how I would do this in a typical database app with table acting like a statement. It looks like this:

Connection connection = null;
HTableInterface table = null;
try {
	connection = pool.acquire();
	table = connection.getTable(tableName);
	// Do work
} finally {
	table.close();
	pool.release(connection);
}

Is this incorrect? The API docs says close " Releases any resources held or pending changes in internal buffers." I didn't interpret that as having it close the underlying connection. Thanks!

-Mike

-----Original Message-----
From: Sriram Ramachandrasekaran [mailto:sri.rams85@gmail.com] 
Sent: Sunday, November 03, 2013 11:43 PM
To: user@hbase.apache.org
Cc: larsh@apache.org
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Hey Michael - Per API documentation, closing the HTable Instance would close the underlying resources too. Hope you are aware of it.


On Mon, Nov 4, 2013 at 11:06 AM, <Mi...@high5games.com> wrote:

> Hi Lars, at application startup the pool is created with X number of 
> connections using the first method you indicated:
> HConnectionManager.createConnection(conf). We store each connection in 
> the pool automatically and serve it up to threads as they request it. 
> When a thread is done using the connection, they return it back to the 
> pool. The connections are not be created and closed per thread, but 
> only once for the entire application. We are using the 
> GenericObjectPool from Apache Commons Pooling as the foundation of our 
> connection pooling approach. Our entire pool implementation really 
> consists of just a couple overridden methods to specify how to create 
> a new connection and close it. The GenericObjectPool class does all the rest. See here for details:
> http://commons.apache.org/proper/commons-pool/
>
> Each thread is getting a HTableInstance as needed and then closing it 
> when done. The only thing we are not doing is using the 
> createConnection method that takes in an ExecutorService as that 
> wouldn't work in our model. Our app is like a web application - the 
> thread pool is managed outside the scope of our application code so we 
> can't assume the service is available at connection creation time. Thanks!
>
> -Mike
>
>
> -----Original Message-----
> From: lars hofhansl [mailto:larsh@apache.org]
> Sent: Sunday, November 03, 2013 11:27 PM
> To: user@hbase.apache.org
> Subject: Re: HBase Client Performance Bottleneck in a Single Virtual 
> Machine
>
> Hi Micheal,
>
> can you try to create a single HConnection in your client:
> HConnectionManager.createConnection(Configuration conf) or 
> HConnectionManager.createConnection(Configuration conf, 
> ExecutorService
> pool)
>
> Then use HConnection.getTable(...) each time you need to do an operation.
>
> I.e.
> Configuration conf = ...;
> ExecutorService pool = ...;
> // create a single HConnection for you vm.
> HConnection con = HConnectionManager.createConnection(Configuration 
> conf, ExecutorService pool); // reuse the connection for many tables, 
> even in different threads HTableInterface table = con.getTable(...); 
> // use table even for only a few operation.
> table.close();
> ...
> HTableInterface table = con.getTable(...); // use table even for only 
> a few operation.
> table.close();
> ...
> // at the end close the connection
> con.close();
>
> -- Lars
>
>
>
> ________________________________
>  From: "Michael.Grundvig@high5games.com" 
> <Mi...@high5games.com>
> To: user@hbase.apache.org
> Sent: Sunday, November 3, 2013 7:46 PM
> Subject: HBase Client Performance Bottleneck in a Single Virtual 
> Machine
>
>
> Hi all; I posted this as a question on StackOverflow as well but 
> realized I should have gone straight ot the horses-mouth with my 
> question. Sorry for the double post!
>
> We are running a series of HBase tests to see if we can migrate one of 
> our existing datasets from a RDBMS to HBase. We are running 15 nodes 
> with 5 zookeepers and HBase 0.94.12 for this test.
>
> We have a single table with three column families and a key that is 
> distributing very well across the cluster. All of our queries are 
> running a direct look-up; no searching or scanning. Since the 
> HTablePool is now frowned upon, we are using the Apache commons pool 
> and a simple connection factory to create a pool of connections and 
> use them in our threads. Each thread creates an HTableInstance as 
> needed and closes it when done. There are no leaks we can identify.
>
> If we run a single thread and just do lots of random calls 
> sequentially, the performance is quite good. Everything works great 
> until we start trying to scale the performance. As we add more threads 
> and try and get more work done in a single VM, we start seeing 
> performance degrade quickly. The client code is simply attempting to 
> run either one of several gets or a single put at a given frequency. 
> It then waits until the next time to run, we use this to simulate the 
> workload from external clients. With a single thread, we will see call times in the 2-3 milliseconds which is acceptable.
>
> As we add more threads, this call time starts increasing quickly. What 
> gets strange is if we add more VMs, the times hold steady across them 
> all so clearly it's a bottleneck in the running instance and not the cluster.
> We can get a huge amount of processing happening across the cluster 
> very easily - it just has to use a lot of VMs on the client side to do 
> it. We know the contention isn't in the connection pool as we see the 
> problem even when we have more connections than threads. 
> Unfortunately, the times are spiraling out of control very quickly. We 
> need it to support at least 128 threads in practice, but most 
> important I want to support 500 updates/sec and 250 gets/sec. In 
> theory, this should be a piece of cake for the cluster as we can do 
> FAR more work than that with a few VMs, but we don't even get close to this with a single VM.
>
> So my question: how do people building high-performance apps with 
> HBase get around this? What approach are others using for connection 
> pooling in a multi-threaded environment? There seems to be a 
> surprisingly little amount of info about this on the web considering 
> the popularity. Is there some client setting we need to use that makes 
> it perform better in a threaded environment? We are going to try to 
> cache HTable instances next but that's a total guess. There are 
> solutions to offloading work to other VMs but we really want to avoid 
> this as clearly the cluster can handle the load and it will dramatically decrease the application performance in critical areas.
>
> Any help is greatly appreciated! Thanks!
> -Mike
>



--
It's just about how deep your longing is!

Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Posted by Sriram Ramachandrasekaran <sr...@gmail.com>.

Hey Michael - Per API documentation, closing the HTable Instance would
close the underlying resources too. Hope you are aware of it.


On Mon, Nov 4, 2013 at 11:06 AM, <Mi...@high5games.com> wrote:

> Hi Lars, at application startup the pool is created with X number of
> connections using the first method you indicated:
> HConnectionManager.createConnection(conf). We store each connection in the
> pool automatically and serve it up to threads as they request it. When a
> thread is done using the connection, they return it back to the pool. The
> connections are not be created and closed per thread, but only once for the
> entire application. We are using the GenericObjectPool from Apache Commons
> Pooling as the foundation of our connection pooling approach. Our entire
> pool implementation really consists of just a couple overridden methods to
> specify how to create a new connection and close it. The GenericObjectPool
> class does all the rest. See here for details:
> http://commons.apache.org/proper/commons-pool/
>
> Each thread is getting a HTableInstance as needed and then closing it when
> done. The only thing we are not doing is using the createConnection method
> that takes in an ExecutorService as that wouldn't work in our model. Our
> app is like a web application - the thread pool is managed outside the
> scope of our application code so we can't assume the service is available
> at connection creation time. Thanks!
>
> -Mike
>
>
> -----Original Message-----
> From: lars hofhansl [mailto:larsh@apache.org]
> Sent: Sunday, November 03, 2013 11:27 PM
> To: user@hbase.apache.org
> Subject: Re: HBase Client Performance Bottleneck in a Single Virtual
> Machine
>
> Hi Micheal,
>
> can you try to create a single HConnection in your client:
> HConnectionManager.createConnection(Configuration conf) or
> HConnectionManager.createConnection(Configuration conf, ExecutorService
> pool)
>
> Then use HConnection.getTable(...) each time you need to do an operation.
>
> I.e.
> Configuration conf = ...;
> ExecutorService pool = ...;
> // create a single HConnection for you vm.
> HConnection con = HConnectionManager.createConnection(Configuration conf,
> ExecutorService pool); // reuse the connection for many tables, even in
> different threads HTableInterface table = con.getTable(...); // use table
> even for only a few operation.
> table.close();
> ...
> HTableInterface table = con.getTable(...); // use table even for only a
> few operation.
> table.close();
> ...
> // at the end close the connection
> con.close();
>
> -- Lars
>
>
>
> ________________________________
>  From: "Michael.Grundvig@high5games.com" <Mi...@high5games.com>
> To: user@hbase.apache.org
> Sent: Sunday, November 3, 2013 7:46 PM
> Subject: HBase Client Performance Bottleneck in a Single Virtual Machine
>
>
> Hi all; I posted this as a question on StackOverflow as well but realized
> I should have gone straight ot the horses-mouth with my question. Sorry for
> the double post!
>
> We are running a series of HBase tests to see if we can migrate one of our
> existing datasets from a RDBMS to HBase. We are running 15 nodes with 5
> zookeepers and HBase 0.94.12 for this test.
>
> We have a single table with three column families and a key that is
> distributing very well across the cluster. All of our queries are running a
> direct look-up; no searching or scanning. Since the HTablePool is now
> frowned upon, we are using the Apache commons pool and a simple connection
> factory to create a pool of connections and use them in our threads. Each
> thread creates an HTableInstance as needed and closes it when done. There
> are no leaks we can identify.
>
> If we run a single thread and just do lots of random calls sequentially,
> the performance is quite good. Everything works great until we start trying
> to scale the performance. As we add more threads and try and get more work
> done in a single VM, we start seeing performance degrade quickly. The
> client code is simply attempting to run either one of several gets or a
> single put at a given frequency. It then waits until the next time to run,
> we use this to simulate the workload from external clients. With a single
> thread, we will see call times in the 2-3 milliseconds which is acceptable.
>
> As we add more threads, this call time starts increasing quickly. What
> gets strange is if we add more VMs, the times hold steady across them all
> so clearly it's a bottleneck in the running instance and not the cluster.
> We can get a huge amount of processing happening across the cluster very
> easily - it just has to use a lot of VMs on the client side to do it. We
> know the contention isn't in the connection pool as we see the problem even
> when we have more connections than threads. Unfortunately, the times are
> spiraling out of control very quickly. We need it to support at least 128
> threads in practice, but most important I want to support 500 updates/sec
> and 250 gets/sec. In theory, this should be a piece of cake for the cluster
> as we can do FAR more work than that with a few VMs, but we don't even get
> close to this with a single VM.
>
> So my question: how do people building high-performance apps with HBase
> get around this? What approach are others using for connection pooling in a
> multi-threaded environment? There seems to be a surprisingly little amount
> of info about this on the web considering the popularity. Is there some
> client setting we need to use that makes it perform better in a threaded
> environment? We are going to try to cache HTable instances next but that's
> a total guess. There are solutions to offloading work to other VMs but we
> really want to avoid this as clearly the cluster can handle the load and it
> will dramatically decrease the application performance in critical areas.
>
> Any help is greatly appreciated! Thanks!
> -Mike
>



-- 
It's just about how deep your longing is!

RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Posted by Mi...@high5games.com.

Hi Lars, at application startup the pool is created with X number of connections using the first method you indicated: HConnectionManager.createConnection(conf). We store each connection in the pool automatically and serve it up to threads as they request it. When a thread is done using the connection, they return it back to the pool. The connections are not be created and closed per thread, but only once for the entire application. We are using the GenericObjectPool from Apache Commons Pooling as the foundation of our connection pooling approach. Our entire pool implementation really consists of just a couple overridden methods to specify how to create a new connection and close it. The GenericObjectPool class does all the rest. See here for details:  http://commons.apache.org/proper/commons-pool/

Each thread is getting a HTableInstance as needed and then closing it when done. The only thing we are not doing is using the createConnection method that takes in an ExecutorService as that wouldn't work in our model. Our app is like a web application - the thread pool is managed outside the scope of our application code so we can't assume the service is available at connection creation time. Thanks!

-Mike


-----Original Message-----
From: lars hofhansl [mailto:larsh@apache.org] 
Sent: Sunday, November 03, 2013 11:27 PM
To: user@hbase.apache.org
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Hi Micheal,

can you try to create a single HConnection in your client:
HConnectionManager.createConnection(Configuration conf) or HConnectionManager.createConnection(Configuration conf, ExecutorService pool)

Then use HConnection.getTable(...) each time you need to do an operation.

I.e.
Configuration conf = ...;
ExecutorService pool = ...;
// create a single HConnection for you vm.
HConnection con = HConnectionManager.createConnection(Configuration conf, ExecutorService pool); // reuse the connection for many tables, even in different threads HTableInterface table = con.getTable(...); // use table even for only a few operation.
table.close();
...
HTableInterface table = con.getTable(...); // use table even for only a few operation.
table.close();
...
// at the end close the connection
con.close();

-- Lars



________________________________
 From: "Michael.Grundvig@high5games.com" <Mi...@high5games.com>
To: user@hbase.apache.org
Sent: Sunday, November 3, 2013 7:46 PM
Subject: HBase Client Performance Bottleneck in a Single Virtual Machine
 

Hi all; I posted this as a question on StackOverflow as well but realized I should have gone straight ot the horses-mouth with my question. Sorry for the double post!

We are running a series of HBase tests to see if we can migrate one of our existing datasets from a RDBMS to HBase. We are running 15 nodes with 5 zookeepers and HBase 0.94.12 for this test.

We have a single table with three column families and a key that is distributing very well across the cluster. All of our queries are running a direct look-up; no searching or scanning. Since the HTablePool is now frowned upon, we are using the Apache commons pool and a simple connection factory to create a pool of connections and use them in our threads. Each thread creates an HTableInstance as needed and closes it when done. There are no leaks we can identify.

If we run a single thread and just do lots of random calls sequentially, the performance is quite good. Everything works great until we start trying to scale the performance. As we add more threads and try and get more work done in a single VM, we start seeing performance degrade quickly. The client code is simply attempting to run either one of several gets or a single put at a given frequency. It then waits until the next time to run, we use this to simulate the workload from external clients. With a single thread, we will see call times in the 2-3 milliseconds which is acceptable.

As we add more threads, this call time starts increasing quickly. What gets strange is if we add more VMs, the times hold steady across them all so clearly it's a bottleneck in the running instance and not the cluster. We can get a huge amount of processing happening across the cluster very easily - it just has to use a lot of VMs on the client side to do it. We know the contention isn't in the connection pool as we see the problem even when we have more connections than threads. Unfortunately, the times are spiraling out of control very quickly. We need it to support at least 128 threads in practice, but most important I want to support 500 updates/sec and 250 gets/sec. In theory, this should be a piece of cake for the cluster as we can do FAR more work than that with a few VMs, but we don't even get close to this with a single VM.

So my question: how do people building high-performance apps with HBase get around this? What approach are others using for connection pooling in a multi-threaded environment? There seems to be a surprisingly little amount of info about this on the web considering the popularity. Is there some client setting we need to use that makes it perform better in a threaded environment? We are going to try to cache HTable instances next but that's a total guess. There are solutions to offloading work to other VMs but we really want to avoid this as clearly the cluster can handle the load and it will dramatically decrease the application performance in critical areas.

Any help is greatly appreciated! Thanks!
-Mike

Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Posted by lars hofhansl <la...@apache.org>.

Hi Micheal,

can you try to create a single HConnection in your client:
HConnectionManager.createConnection(Configuration conf) or
HConnectionManager.createConnection(Configuration conf, ExecutorService pool)

Then use HConnection.getTable(...) each time you need to do an operation.

I.e.
Configuration conf = ...;
ExecutorService pool = ...;
// create a single HConnection for you vm.
HConnection con = HConnectionManager.createConnection(Configuration conf, ExecutorService pool);
// reuse the connection for many tables, even in different threads
HTableInterface table = con.getTable(...);
// use table even for only a few operation.
table.close();
...
HTableInterface table = con.getTable(...);
// use table even for only a few operation.
table.close();
...
// at the end close the connection
con.close();

-- Lars



________________________________
 From: "Michael.Grundvig@high5games.com" <Mi...@high5games.com>
To: user@hbase.apache.org 
Sent: Sunday, November 3, 2013 7:46 PM
Subject: HBase Client Performance Bottleneck in a Single Virtual Machine
 

Hi all; I posted this as a question on StackOverflow as well but realized I should have gone straight ot the horses-mouth with my question. Sorry for the double post!

We are running a series of HBase tests to see if we can migrate one of our existing datasets from a RDBMS to HBase. We are running 15 nodes with 5 zookeepers and HBase 0.94.12 for this test.

We have a single table with three column families and a key that is distributing very well across the cluster. All of our queries are running a direct look-up; no searching or scanning. Since the HTablePool is now frowned upon, we are using the Apache commons pool and a simple connection factory to create a pool of connections and use them in our threads. Each thread creates an HTableInstance as needed and closes it when done. There are no leaks we can identify.

If we run a single thread and just do lots of random calls sequentially, the performance is quite good. Everything works great until we start trying to scale the performance. As we add more threads and try and get more work done in a single VM, we start seeing performance degrade quickly. The client code is simply attempting to run either one of several gets or a single put at a given frequency. It then waits until the next time to run, we use this to simulate the workload from external clients. With a single thread, we will see call times in the 2-3 milliseconds which is acceptable.

As we add more threads, this call time starts increasing quickly. What gets strange is if we add more VMs, the times hold steady across them all so clearly it's a bottleneck in the running instance and not the cluster. We can get a huge amount of processing happening across the cluster very easily - it just has to use a lot of VMs on the client side to do it. We know the contention isn't in the connection pool as we see the problem even when we have more connections than threads. Unfortunately, the times are spiraling out of control very quickly. We need it to support at least 128 threads in practice, but most important I want to support 500 updates/sec and 250 gets/sec. In theory, this should be a piece of cake for the cluster as we can do FAR more work than that with a few VMs, but we don't even get close to this with a single VM.

So my question: how do people building high-performance apps with HBase get around this? What approach are others using for connection pooling in a multi-threaded environment? There seems to be a surprisingly little amount of info about this on the web considering the popularity. Is there some client setting we need to use that makes it perform better in a threaded environment? We are going to try to cache HTable instances next but that's a total guess. There are solutions to offloading work to other VMs but we really want to avoid this as clearly the cluster can handle the load and it will dramatically decrease the application performance in critical areas.

Any help is greatly appreciated! Thanks!
-Mike

Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Posted by Sriram Ramachandrasekaran <sr...@gmail.com>.

Responses inline.


On Mon, Nov 4, 2013 at 11:13 AM, <Mi...@high5games.com> wrote:

> Thanks for the input and things to look at. To respond:
>
> 1) I don't quite understand the question. When we increase threads we are
> not changing the work each thread is doing, just expecting more to be done
> concurrently. The machine we are testing on has effectively 16 cores
> available and is just idling away. The test harness we are using runs the
> same process whether it's using 1 thread or 100.
>
   [S] I understand the work doesn't change. What I meant was, are these
threads waiting on I/O primarily? I suspect they are since you mentioned
the CPU is idling away. I just wanted to ensure that client doesn't have
any obvious problems.

> 2) We have done 1 to 1 for all of our tests thus far. We started there and
> were planning to back it off once we saw how it worked.
> 3) No, the requests spread very nicely across all the servers. We spent a
> good bit of time designing a key that would distribute almost perfectly
> across the entire cluster and it appears to be working great.
> 4) hbase.regionserver.handler.count is currently set to 600 when I look at
> the master configuration. Is that what you are referencing or should I look
> at something else?
>
   [S] Yes, that's correct. Can you check what happens to the RPC handler
threads on HMaster when you run these tests - are all the handlers doing
request processing or not is the first thing to check. Again, as mentioned
in the earlier email, if you are closing HTable instances, chances are, you
are creating new connection every time. Please check it out. By the way,
can you tell us how much requests your clients are pumping in, how many
threads and what is the payload size of get and put, so that, it gives more
info for others to help you out.

>
> As for memory, we've increased that to the point that any single region
> server could cache the entire dataset 100% in memory and didn't see any
> performance improvement at all.
>
> Thanks!
>
> -Mike
>
> -----Original Message-----
> From: Sriram Ramachandrasekaran [mailto:sri.rams85@gmail.com]
> Sent: Sunday, November 03, 2013 10:12 PM
> To: user@hbase.apache.org
> Subject: Re: HBase Client Performance Bottleneck in a Single Virtual
> Machine
>
> Hey Michael,
> I am relatively new to HBase, so do take my response with a grain of salt.
> I think, definitely your requirements are something that HBase should be
> able to handle easily(assuming you are not pulling inordinate amounts of
> data(payload) from HBase).
> Few things that you should look for to understand this better is, 1. What
> are your clients doing when you increase the number of threads?
> 2. How is the thread-connection mapping - 1 to 1? Are you creating a
> connection every time in your threads?
> 3. Do you see any one region server unduly getting more requests than the
> rest of them (region hotspot)?
> 4. What is your number of request handler
> count(hbase.regionserver.handler.count) on HBase? If it's too low, then,
> your connections on the client would wait before actually getting into the
> application layer(here, RS).
>
> This is assuming you've given enough memory to your Region servers and
> your HDFS layer is stable.
> Hope this helps.
>
>
>
>
>
>
>
> On Mon, Nov 4, 2013 at 9:16 AM, <Mi...@high5games.com> wrote:
>
> > Hi all; I posted this as a question on StackOverflow as well but
> > realized I should have gone straight ot the horses-mouth with my
> > question. Sorry for the double post!
> >
> > We are running a series of HBase tests to see if we can migrate one of
> > our existing datasets from a RDBMS to HBase. We are running 15 nodes
> > with 5 zookeepers and HBase 0.94.12 for this test.
> >
> > We have a single table with three column families and a key that is
> > distributing very well across the cluster. All of our queries are
> > running a direct look-up; no searching or scanning. Since the
> > HTablePool is now frowned upon, we are using the Apache commons pool
> > and a simple connection factory to create a pool of connections and
> > use them in our threads. Each thread creates an HTableInstance as
> > needed and closes it when done. There are no leaks we can identify.
> >
> > If we run a single thread and just do lots of random calls
> > sequentially, the performance is quite good. Everything works great
> > until we start trying to scale the performance. As we add more threads
> > and try and get more work done in a single VM, we start seeing
> > performance degrade quickly. The client code is simply attempting to
> > run either one of several gets or a single put at a given frequency.
> > It then waits until the next time to run, we use this to simulate the
> > workload from external clients. With a single thread, we will see call
> times in the 2-3 milliseconds which is acceptable.
> >
> > As we add more threads, this call time starts increasing quickly. What
> > gets strange is if we add more VMs, the times hold steady across them
> > all so clearly it's a bottleneck in the running instance and not the
> cluster.
> > We can get a huge amount of processing happening across the cluster
> > very easily - it just has to use a lot of VMs on the client side to do
> > it. We know the contention isn't in the connection pool as we see the
> > problem even when we have more connections than threads.
> > Unfortunately, the times are spiraling out of control very quickly. We
> > need it to support at least 128 threads in practice, but most
> > important I want to support 500 updates/sec and 250 gets/sec. In
> > theory, this should be a piece of cake for the cluster as we can do
> > FAR more work than that with a few VMs, but we don't even get close to
> this with a single VM.
> >
> > So my question: how do people building high-performance apps with
> > HBase get around this? What approach are others using for connection
> > pooling in a multi-threaded environment? There seems to be a
> > surprisingly little amount of info about this on the web considering
> > the popularity. Is there some client setting we need to use that makes
> > it perform better in a threaded environment? We are going to try to
> > cache HTable instances next but that's a total guess. There are
> > solutions to offloading work to other VMs but we really want to avoid
> > this as clearly the cluster can handle the load and it will dramatically
> decrease the application performance in critical areas.
> >
> > Any help is greatly appreciated! Thanks!
> > -Mike
> >
>
>
>
> --
> It's just about how deep your longing is!
>



-- 
It's just about how deep your longing is!

RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Posted by Mi...@high5games.com.

Thanks for the input and things to look at. To respond:

1) I don't quite understand the question. When we increase threads we are not changing the work each thread is doing, just expecting more to be done concurrently. The machine we are testing on has effectively 16 cores available and is just idling away. The test harness we are using runs the same process whether it's using 1 thread or 100. 
2) We have done 1 to 1 for all of our tests thus far. We started there and were planning to back it off once we saw how it worked.
3) No, the requests spread very nicely across all the servers. We spent a good bit of time designing a key that would distribute almost perfectly across the entire cluster and it appears to be working great. 
4) hbase.regionserver.handler.count is currently set to 600 when I look at the master configuration. Is that what you are referencing or should I look at something else? 

As for memory, we've increased that to the point that any single region server could cache the entire dataset 100% in memory and didn't see any performance improvement at all.

Thanks!

-Mike

-----Original Message-----
From: Sriram Ramachandrasekaran [mailto:sri.rams85@gmail.com] 
Sent: Sunday, November 03, 2013 10:12 PM
To: user@hbase.apache.org
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Hey Michael,
I am relatively new to HBase, so do take my response with a grain of salt.
I think, definitely your requirements are something that HBase should be able to handle easily(assuming you are not pulling inordinate amounts of
data(payload) from HBase).
Few things that you should look for to understand this better is, 1. What are your clients doing when you increase the number of threads?
2. How is the thread-connection mapping - 1 to 1? Are you creating a connection every time in your threads?
3. Do you see any one region server unduly getting more requests than the rest of them (region hotspot)?
4. What is your number of request handler
count(hbase.regionserver.handler.count) on HBase? If it's too low, then, your connections on the client would wait before actually getting into the application layer(here, RS).

This is assuming you've given enough memory to your Region servers and your HDFS layer is stable.
Hope this helps.

On Mon, Nov 4, 2013 at 9:16 AM, <Mi...@high5games.com> wrote:

> Hi all; I posted this as a question on StackOverflow as well but 
> realized I should have gone straight ot the horses-mouth with my 
> question. Sorry for the double post!
>
> We are running a series of HBase tests to see if we can migrate one of 
> our existing datasets from a RDBMS to HBase. We are running 15 nodes 
> with 5 zookeepers and HBase 0.94.12 for this test.
>
> We have a single table with three column families and a key that is 
> distributing very well across the cluster. All of our queries are 
> running a direct look-up; no searching or scanning. Since the 
> HTablePool is now frowned upon, we are using the Apache commons pool 
> and a simple connection factory to create a pool of connections and 
> use them in our threads. Each thread creates an HTableInstance as 
> needed and closes it when done. There are no leaks we can identify.
>
> If we run a single thread and just do lots of random calls 
> sequentially, the performance is quite good. Everything works great 
> until we start trying to scale the performance. As we add more threads 
> and try and get more work done in a single VM, we start seeing 
> performance degrade quickly. The client code is simply attempting to 
> run either one of several gets or a single put at a given frequency. 
> It then waits until the next time to run, we use this to simulate the 
> workload from external clients. With a single thread, we will see call times in the 2-3 milliseconds which is acceptable.
>
> As we add more threads, this call time starts increasing quickly. What 
> gets strange is if we add more VMs, the times hold steady across them 
> all so clearly it's a bottleneck in the running instance and not the cluster.
> We can get a huge amount of processing happening across the cluster 
> very easily - it just has to use a lot of VMs on the client side to do 
> it. We know the contention isn't in the connection pool as we see the 
> problem even when we have more connections than threads. 
> Unfortunately, the times are spiraling out of control very quickly. We 
> need it to support at least 128 threads in practice, but most 
> important I want to support 500 updates/sec and 250 gets/sec. In 
> theory, this should be a piece of cake for the cluster as we can do 
> FAR more work than that with a few VMs, but we don't even get close to this with a single VM.
>
> So my question: how do people building high-performance apps with 
> HBase get around this? What approach are others using for connection 
> pooling in a multi-threaded environment? There seems to be a 
> surprisingly little amount of info about this on the web considering 
> the popularity. Is there some client setting we need to use that makes 
> it perform better in a threaded environment? We are going to try to 
> cache HTable instances next but that's a total guess. There are 
> solutions to offloading work to other VMs but we really want to avoid 
> this as clearly the cluster can handle the load and it will dramatically decrease the application performance in critical areas.
>
> Any help is greatly appreciated! Thanks!
> -Mike
>

--
It's just about how deep your longing is!

Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Posted by Sriram Ramachandrasekaran <sr...@gmail.com>.

Hey Michael,
I am relatively new to HBase, so do take my response with a grain of salt.
I think, definitely your requirements are something that HBase should be
able to handle easily(assuming you are not pulling inordinate amounts of
data(payload) from HBase).
Few things that you should look for to understand this better is,
1. What are your clients doing when you increase the number of threads?
2. How is the thread-connection mapping - 1 to 1? Are you creating a
connection every time in your threads?
3. Do you see any one region server unduly getting more requests than the
rest of them (region hotspot)?
4. What is your number of request handler
count(hbase.regionserver.handler.count) on HBase? If it's too low, then,
your connections on the client would wait before actually getting into the
application layer(here, RS).

This is assuming you've given enough memory to your Region servers and your
HDFS layer is stable.
Hope this helps.







On Mon, Nov 4, 2013 at 9:16 AM, <Mi...@high5games.com> wrote:

> Hi all; I posted this as a question on StackOverflow as well but realized
> I should have gone straight ot the horses-mouth with my question. Sorry for
> the double post!
>
> We are running a series of HBase tests to see if we can migrate one of our
> existing datasets from a RDBMS to HBase. We are running 15 nodes with 5
> zookeepers and HBase 0.94.12 for this test.
>
> We have a single table with three column families and a key that is
> distributing very well across the cluster. All of our queries are running a
> direct look-up; no searching or scanning. Since the HTablePool is now
> frowned upon, we are using the Apache commons pool and a simple connection
> factory to create a pool of connections and use them in our threads. Each
> thread creates an HTableInstance as needed and closes it when done. There
> are no leaks we can identify.
>
> If we run a single thread and just do lots of random calls sequentially,
> the performance is quite good. Everything works great until we start trying
> to scale the performance. As we add more threads and try and get more work
> done in a single VM, we start seeing performance degrade quickly. The
> client code is simply attempting to run either one of several gets or a
> single put at a given frequency. It then waits until the next time to run,
> we use this to simulate the workload from external clients. With a single
> thread, we will see call times in the 2-3 milliseconds which is acceptable.
>
> As we add more threads, this call time starts increasing quickly. What
> gets strange is if we add more VMs, the times hold steady across them all
> so clearly it's a bottleneck in the running instance and not the cluster.
> We can get a huge amount of processing happening across the cluster very
> easily - it just has to use a lot of VMs on the client side to do it. We
> know the contention isn't in the connection pool as we see the problem even
> when we have more connections than threads. Unfortunately, the times are
> spiraling out of control very quickly. We need it to support at least 128
> threads in practice, but most important I want to support 500 updates/sec
> and 250 gets/sec. In theory, this should be a piece of cake for the cluster
> as we can do FAR more work than that with a few VMs, but we don't even get
> close to this with a single VM.
>
> So my question: how do people building high-performance apps with HBase
> get around this? What approach are others using for connection pooling in a
> multi-threaded environment? There seems to be a surprisingly little amount
> of info about this on the web considering the popularity. Is there some
> client setting we need to use that makes it perform better in a threaded
> environment? We are going to try to cache HTable instances next but that's
> a total guess. There are solutions to offloading work to other VMs but we
> really want to avoid this as clearly the cluster can handle the load and it
> will dramatically decrease the application performance in critical areas.
>
> Any help is greatly appreciated! Thanks!
> -Mike
>



-- 
It's just about how deep your longing is!

Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Posted by Stack <st...@duboce.net>.

Usually in hbase-site.xml.  When the lads say 'on the client', they usually
mean the hbase-site.xml the client reads when it starts up (the
hbase-site.xml that is in the conf directory that it is pointing to on
startup).

St.Ack


On Tue, Nov 5, 2013 at 6:43 AM, <Mi...@high5games.com> wrote:

> Thanks for the input, we'll do some more testing on this today. Where do
> these settings get made? In the configuration used to create the pool or
> somewhere else? It appears hbase-site.xml "on the client" but we have
> nothing like that so I think I'm misunderstanding something. Thanks!
>
> -Mike
>
>
> -----Original Message-----
> From: Vladimir Rodionov [mailto:vrodionov@carrieriq.com]
> Sent: Monday, November 04, 2013 8:41 PM
> To: user@hbase.apache.org; lars hofhansl
> Subject: RE: HBase Client Performance Bottleneck in a Single Virtual
> Machine
>
> One more: "hbase.ipc.client.tcpnodelay" set to true. It is worth trying as
> well.
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: lars hofhansl [larsh@apache.org]
> Sent: Monday, November 04, 2013 5:55 PM
> To: user@hbase.apache.org; lars hofhansl
> Subject: Re: HBase Client Performance Bottleneck in a Single Virtual
> Machine
>
> Here're one more thing to try. By default each HConnection will use a
> single TCP connection to multiplex traffic to each region server.
>
> Try setting hbase.client.ipc.pool.size on the client to something > 1.
>
> -- Lars
>
>
>
> ________________________________
>  From: lars hofhansl <la...@apache.org>
> To: "user@hbase.apache.org" <us...@hbase.apache.org>
> Sent: Monday, November 4, 2013 5:16 PM
> Subject: Re: HBase Client Performance Bottleneck in a Single Virtual
> Machine
>
>
> No. This is terrible.
> If you can, please send a jstack and do some profiling. Is there an easy
> way to reproduce this with just a single RegionServer?
> If so, I'd offer to do some profiling.
>
> Thanks.
>
>
> -- Lars
>
>
>
> ________________________________
>
> From: "Michael.Grundvig@high5games.com" <Mi...@high5games.com>
> To: user@hbase.apache.org
> Sent: Monday, November 4, 2013 11:00 AM
> Subject: RE: HBase Client Performance Bottleneck in a Single Virtual
> Machine
>
>
> Not yet, this is just a load test client. It literally does nothing but
> create threads to talk to HBase and run 4 different calls. Nothing else is
> done in the app at all.
>
> To eliminate even more of our code from the loop, we just tried removing
> our connection pool entirely and just using a single connection per thread
> - no improvement. Then we tried creating the HTableInterface (all calls are
> against the same table) at the time of connection creation. The means
> thread to connection to table interface were all at 1 to 1 and not being
> passed around. No performance improvement.
>
> Long story short, running a single thread it's fast. Start multithreading,
> it starts slowing down. CPU usage, memory usage, etc. are all negligible.
> The performance isn't terrible - it's probably good enough for the vast
> majority of users, but it's not good enough for our app. With one thread,
> it might take 5 milliseconds. With 10 threads all spinning more quickly (40
> milliseconds delay), the call time increases to 15-30 milliseconds. The
> problem is that at our throughput rates, that's a serious concern.
>
> We are going to fire up a profiler next to see what we can find.
>
> -Mike
>
>
>
> -----Original Message-----
> From: Vladimir Rodionov [mailto:vrodionov@carrieriq.com]
> Sent: Monday, November 04, 2013 12:50 PM
> To: user@hbase.apache.org
> Subject: RE: HBase Client Performance Bottleneck in a Single Virtual
> Machine
>
> Michael, have you tried jstack on your client application?
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Michael.Grundvig@high5games.com [Michael.Grundvig@high5games.com]
> Sent: Sunday, November 03, 2013 7:46 PM
> To: user@hbase.apache.org
> Subject: HBase Client Performance Bottleneck in a Single Virtual Machine
>
> Hi all; I posted this as a question on StackOverflow as well but realized
> I should have gone straight ot the horses-mouth with my question. Sorry for
> the double post!
>
> We are running a series of HBase tests to see if we can migrate one of our
> existing datasets from a RDBMS to HBase. We are running 15 nodes with 5
> zookeepers and HBase 0.94.12 for this test.
>
> We have a single table with three column families and a key that is
> distributing very well across the cluster. All of our queries are running a
> direct look-up; no searching or scanning. Since the HTablePool is now
> frowned upon, we are using the Apache commons pool and a simple connection
> factory to create a pool of connections and use them in our threads. Each
> thread creates an HTableInstance as needed and closes it when done. There
> are no leaks we can identify.
>
> If we run a single thread and just do lots of random calls sequentially,
> the performance is quite good. Everything works great until we start trying
> to scale the performance. As we add more threads and try and get more work
> done in a single VM, we start seeing performance degrade quickly. The
> client code is simply attempting to run either one of several gets or a
> single put at a given frequency. It then waits until the next time to run,
> we use this to simulate the workload from external clients. With a single
> thread, we will see call times in the 2-3 milliseconds which is acceptable.
>
> As we add more threads, this call time starts increasing quickly. What
> gets strange is if we add more VMs, the times hold steady across them all
> so clearly it's a bottleneck in the running instance and not the cluster.
> We can get a huge amount of processing happening across the cluster very
> easily - it just has to use a lot of VMs on the client side to do it. We
> know the contention isn't in the connection pool as we see the problem even
> when we have more connections than threads. Unfortunately, the times are
> spiraling out of control very quickly. We need it to support at least 128
> threads in practice, but most important I want to support 500 updates/sec
> and 250 gets/sec. In theory, this should be a piece of cake for the cluster
> as we can do FAR more work than that with a few VMs, but we don't even get
> close to this with a single VM.
>
> So my question: how do people building high-performance apps with HBase
> get around this? What approach are others using for connection pooling in a
> multi-threaded environment? There seems to be a surprisingly little amount
> of info about this on the web considering the popularity. Is there some
> client setting we need to use that makes it perform better in a threaded
> environment? We are going to try to cache HTable instances next but that's
> a total guess. There are solutions to offloading work to other VMs but we
> really want to avoid this as clearly the cluster can handle the load and it
> will dramatically decrease the application performance in critical areas.
>
> Any help is greatly appreciated! Thanks!
> -Mike
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>

Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Posted by lars hofhansl <la...@apache.org>.

Looks like it's time for some profiling.

________________________________
 From: "Michael.Grundvig@high5games.com" <Mi...@high5games.com>
To: user@hbase.apache.org 
Sent: Tuesday, November 5, 2013 8:35 AM
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

I'm finding this configuration process very confusing. This is what we are doing in Java code:

        Configuration configuration = HBaseConfiguration.create();
        configuration.set("hbase.zookeeper.quorum", props.getProperty(HBASE_NODES));
        configuration.set("zookeeper.znode.parent", props.getProperty(HBASE_PARAMS));
        configuration.setBoolean(" hbase.ipc.client.tcpnodelay", true);
        configuration.setInt("hbase.client.ipc.pool.size", 10);

Adding those last two lines has made no improvement. Likewise, updating hbase-site.xml on the cluster to include these settings has made no improvement. The problem is that I have zero confidence it's actually taking the settings in the first place. 

-Mike

-----Original Message-----
From: lars hofhansl [mailto:larsh@apache.org] 
Sent: Tuesday, November 05, 2013 9:47 AM
To: user@hbase.apache.org
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Hi Mike,

hbase-site.xml has both client and server side settings.

In your client, where do you set the ZK quorum of the cluster? At that same spot you'd set these options.

Assuming you're running the clients with classes from a normal HBase install, you would add these setting to the hbase-site.xml there.

-- Lars

________________________________
From: "Michael.Grundvig@high5games.com" <Mi...@high5games.com>
To: user@hbase.apache.org 
Sent: Tuesday, November 5, 2013 6:43 AM
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Thanks for the input, we'll do some more testing on this today. Where do these settings get made? In the configuration used to create the pool or somewhere else? It appears hbase-site.xml "on the client" but we have nothing like that so I think I'm misunderstanding something. Thanks!

-Mike

-----Original Message-----
From: Vladimir Rodionov [mailto:vrodionov@carrieriq.com] 
Sent: Monday, November 04, 2013 8:41 PM
To: user@hbase.apache.org; lars hofhansl
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

One more: "hbase.ipc.client.tcpnodelay" set to true. It is worth trying as well.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: lars hofhansl [larsh@apache.org]
Sent: Monday, November 04, 2013 5:55 PM
To: user@hbase.apache.org; lars hofhansl
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Here're one more thing to try. By default each HConnection will use a single TCP connection to multiplex traffic to each region server.

Try setting hbase.client.ipc.pool.size on the client to something > 1.

-- Lars

________________________________
From: lars hofhansl <la...@apache.org>
To: "user@hbase.apache.org" <us...@hbase.apache.org>
Sent: Monday, November 4, 2013 5:16 PM
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine

No. This is terrible.
If you can, please send a jstack and do some profiling. Is there an easy way to reproduce this with just a single RegionServer?
If so, I'd offer to do some profiling.

Thanks.

-- Lars

________________________________

From: "Michael.Grundvig@high5games.com" <Mi...@high5games.com>
To: user@hbase.apache.org
Sent: Monday, November 4, 2013 11:00 AM
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Not yet, this is just a load test client. It literally does nothing but create threads to talk to HBase and run 4 different calls. Nothing else is done in the app at all.

To eliminate even more of our code from the loop, we just tried removing our connection pool entirely and just using a single connection per thread - no improvement. Then we tried creating the HTableInterface (all calls are against the same table) at the time of connection creation. The means thread to connection to table interface were all at 1 to 1 and not being passed around. No performance improvement.

Long story short, running a single thread it's fast. Start multithreading, it starts slowing down. CPU usage, memory usage, etc. are all negligible. The performance isn't terrible - it's probably good enough for the vast majority of users, but it's not good enough for our app. With one thread, it might take 5 milliseconds. With 10 threads all spinning more quickly (40 milliseconds delay), the call time increases to 15-30 milliseconds. The problem is that at our throughput rates, that's a serious concern.

We are going to fire up a profiler next to see what we can find.

-Mike

-----Original Message-----
From: Vladimir Rodionov [mailto:vrodionov@carrieriq.com]
Sent: Monday, November 04, 2013 12:50 PM
To: user@hbase.apache.org
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Michael, have you tried jstack on your client application?

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Michael.Grundvig@high5games.com [Michael.Grundvig@high5games.com]
Sent: Sunday, November 03, 2013 7:46 PM
To: user@hbase.apache.org
Subject: HBase Client Performance Bottleneck in a Single Virtual Machine

Hi all; I posted this as a question on StackOverflow as well but realized I should have gone straight ot the horses-mouth with my question. Sorry for the double post!

We are running a series of HBase tests to see if we can migrate one of our existing datasets from a RDBMS to HBase. We are running 15 nodes with 5 zookeepers and HBase 0.94.12 for this test.

We have a single table with three column families and a key that is distributing very well across the cluster. All of our queries are running a direct look-up; no searching or scanning. Since the HTablePool is now frowned upon, we are using the Apache commons pool and a simple connection factory to create a pool of connections and use them in our threads. Each thread creates an HTableInstance as needed and closes it when done. There are no leaks we can identify.

If we run a single thread and just do lots of random calls sequentially, the performance is quite good. Everything works great until we start trying to scale the performance. As we add more threads and try and get more work done in a single VM, we start seeing performance degrade quickly. The client code is simply attempting to run either one of several gets or a single put at a given frequency. It then waits until the next time to run, we use this to simulate the workload from external clients. With a single thread, we will see call times in the 2-3 milliseconds which is acceptable.

As we add more threads, this call time starts increasing quickly. What gets strange is if we add more VMs, the times hold steady across them all so clearly it's a bottleneck in the running instance and not the cluster. We can get a huge amount of processing happening across the cluster very easily - it just has to use a lot of VMs on the client side to do it. We know the contention isn't in the connection pool as we see the problem even when we have more connections than threads. Unfortunately, the times are spiraling out of control very quickly. We need it to support at least 128 threads in practice, but most important I want to support 500 updates/sec and 250 gets/sec. In theory, this should be a piece of cake for the cluster as we can do FAR more work than that with a few VMs, but we don't even get close to this with a single VM.

So my question: how do people building high-performance apps with HBase get around this? What approach are others using for connection pooling in a multi-threaded environment? There seems to be a surprisingly little amount of info about this on the web considering the popularity. Is there some client setting we need to use that makes it perform better in a threaded environment? We are going to try to cache HTable instances next but that's a total guess. There are solutions to offloading work to other VMs but we really want to avoid this as clearly the cluster can handle the load and it will dramatically decrease the application performance in critical areas.

Any help is greatly appreciated! Thanks!
-Mike

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Posted by Vladimir Rodionov <vr...@carrieriq.com>.

This thread describes my similar findings:

http://mail-archives.apache.org/mod_mbox/hbase-dev/201307.mbox/%3CDC5EBE7F3610EB4CA5C7E92D76873E1517ECF1DCCE%40exchange2007.carrieriq.com%3E

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Vladimir Rodionov
Sent: Wednesday, November 06, 2013 11:43 AM
To: user@hbase.apache.org; lars hofhansl
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Yes, this issue exists definitely. I do remember my own thread on dev-list a while back when I tried to saturate on RS with all data cached. It took me 10 client JVM and a couple hundreds threads.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: lars hofhansl [larsh@apache.org]
Sent: Wednesday, November 06, 2013 11:09 AM
To: user@hbase.apache.org
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine

If you get the chance, please send a jstack of the client when it is slow, or some profile data if you have, or just some code reproducing the problem.
We should track this down.

-- Lars



________________________________
 From: "Michael.Grundvig@high5games.com" <Mi...@high5games.com>
To: user@hbase.apache.org
Sent: Tuesday, November 5, 2013 8:35 AM
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine


I'm finding this configuration process very confusing. This is what we are doing in Java code:

        Configuration configuration = HBaseConfiguration.create();
        configuration.set("hbase.zookeeper.quorum", props.getProperty(HBASE_NODES));
        configuration.set("zookeeper.znode.parent", props.getProperty(HBASE_PARAMS));
        configuration.setBoolean(" hbase.ipc.client.tcpnodelay", true);
        configuration.setInt("hbase.client.ipc.pool.size", 10);

Adding those last two lines has made no improvement. Likewise, updating hbase-site.xml on the cluster to include these settings has made no improvement. The problem is that I have zero confidence it's actually taking the settings in the first place.

-Mike



-----Original Message-----
From: lars hofhansl [mailto:larsh@apache.org]
Sent: Tuesday, November 05, 2013 9:47 AM
To: user@hbase.apache.org
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Hi Mike,

hbase-site.xml has both client and server side settings.

In your client, where do you set the ZK quorum of the cluster? At that same spot you'd set these options.

Assuming you're running the clients with classes from a normal HBase install, you would add these setting to the hbase-site.xml there.

-- Lars



________________________________
From: "Michael.Grundvig@high5games.com" <Mi...@high5games.com>
To: user@hbase.apache.org
Sent: Tuesday, November 5, 2013 6:43 AM
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine


Thanks for the input, we'll do some more testing on this today. Where do these settings get made? In the configuration used to create the pool or somewhere else? It appears hbase-site.xml "on the client" but we have nothing like that so I think I'm misunderstanding something. Thanks!

-Mike



-----Original Message-----
From: Vladimir Rodionov [mailto:vrodionov@carrieriq.com]
Sent: Monday, November 04, 2013 8:41 PM
To: user@hbase.apache.org; lars hofhansl
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

One more: "hbase.ipc.client.tcpnodelay" set to true. It is worth trying as well.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: lars hofhansl [larsh@apache.org]
Sent: Monday, November 04, 2013 5:55 PM
To: user@hbase.apache.org; lars hofhansl
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Here're one more thing to try. By default each HConnection will use a single TCP connection to multiplex traffic to each region server.

Try setting hbase.client.ipc.pool.size on the client to something > 1.

-- Lars



________________________________
From: lars hofhansl <la...@apache.org>
To: "user@hbase.apache.org" <us...@hbase.apache.org>
Sent: Monday, November 4, 2013 5:16 PM
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine


No. This is terrible.
If you can, please send a jstack and do some profiling. Is there an easy way to reproduce this with just a single RegionServer?
If so, I'd offer to do some profiling.

Thanks.


-- Lars



________________________________

From: "Michael.Grundvig@high5games.com" <Mi...@high5games.com>
To: user@hbase.apache.org
Sent: Monday, November 4, 2013 11:00 AM
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine


Not yet, this is just a load test client. It literally does nothing but create threads to talk to HBase and run 4 different calls. Nothing else is done in the app at all.

To eliminate even more of our code from the loop, we just tried removing our connection pool entirely and just using a single connection per thread - no improvement. Then we tried creating the HTableInterface (all calls are against the same table) at the time of connection creation. The means thread to connection to table interface were all at 1 to 1 and not being passed around. No performance improvement.

Long story short, running a single thread it's fast. Start multithreading, it starts slowing down. CPU usage, memory usage, etc. are all negligible. The performance isn't terrible - it's probably good enough for the vast majority of users, but it's not good enough for our app. With one thread, it might take 5 milliseconds. With 10 threads all spinning more quickly (40 milliseconds delay), the call time increases to 15-30 milliseconds. The problem is that at our throughput rates, that's a serious concern.

We are going to fire up a profiler next to see what we can find.

-Mike



-----Original Message-----
From: Vladimir Rodionov [mailto:vrodionov@carrieriq.com]
Sent: Monday, November 04, 2013 12:50 PM
To: user@hbase.apache.org
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Michael, have you tried jstack on your client application?

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Michael.Grundvig@high5games.com [Michael.Grundvig@high5games.com]
Sent: Sunday, November 03, 2013 7:46 PM
To: user@hbase.apache.org
Subject: HBase Client Performance Bottleneck in a Single Virtual Machine

Hi all; I posted this as a question on StackOverflow as well but realized I should have gone straight ot the horses-mouth with my question. Sorry for the double post!

We are running a series of HBase tests to see if we can migrate one of our existing datasets from a RDBMS to HBase. We are running 15 nodes with 5 zookeepers and HBase 0.94.12 for this test.

We have a single table with three column families and a key that is distributing very well across the cluster. All of our queries are running a direct look-up; no searching or scanning. Since the HTablePool is now frowned upon, we are using the Apache commons pool and a simple connection factory to create a pool of connections and use them in our threads. Each thread creates an HTableInstance as needed and closes it when done. There are no leaks we can identify.

If we run a single thread and just do lots of random calls sequentially, the performance is quite good. Everything works great until we start trying to scale the performance. As we add more threads and try and get more work done in a single VM, we start seeing performance degrade quickly. The client code is simply attempting to run either one of several gets or a single put at a given frequency. It then waits until the next time to run, we use this to simulate the workload from external clients. With a single thread, we will see call times in the 2-3 milliseconds which is acceptable.

As we add more threads, this call time starts increasing quickly. What gets strange is if we add more VMs, the times hold steady across them all so clearly it's a bottleneck in the running instance and not the cluster. We can get a huge amount of processing happening across the cluster very easily - it just has to use a lot of VMs on the client side to do it. We know the contention isn't in the connection pool as we see the problem even when we have more connections than threads. Unfortunately, the times are spiraling out of control very quickly. We need it to support at least 128 threads in practice, but most important I want to support 500 updates/sec and 250 gets/sec. In theory, this should be a piece of cake for the cluster as we can do FAR more work than that with a few VMs, but we don't even get close to this with a single VM.

So my question: how do people building high-performance apps with HBase get around this? What approach are others using for connection pooling in a multi-threaded environment? There seems to be a surprisingly little amount of info about this on the web considering the popularity. Is there some client setting we need to use that makes it perform better in a threaded environment? We are going to try to cache HTable instances next but that's a total guess. There are solutions to offloading work to other VMs but we really want to avoid this as clearly the cluster can handle the load and it will dramatically decrease the application performance in critical areas.

Any help is greatly appreciated! Thanks!
-Mike

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Posted by Vladimir Rodionov <vr...@carrieriq.com>.

Yes, this issue exists definitely. I do remember my own thread on dev-list a while back when I tried to saturate on RS with all data cached. It took me 10 client JVM and a couple hundreds threads.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: lars hofhansl [larsh@apache.org]
Sent: Wednesday, November 06, 2013 11:09 AM
To: user@hbase.apache.org
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine

If you get the chance, please send a jstack of the client when it is slow, or some profile data if you have, or just some code reproducing the problem.
We should track this down.

-- Lars



________________________________
 From: "Michael.Grundvig@high5games.com" <Mi...@high5games.com>
To: user@hbase.apache.org
Sent: Tuesday, November 5, 2013 8:35 AM
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine


I'm finding this configuration process very confusing. This is what we are doing in Java code:

        Configuration configuration = HBaseConfiguration.create();
        configuration.set("hbase.zookeeper.quorum", props.getProperty(HBASE_NODES));
        configuration.set("zookeeper.znode.parent", props.getProperty(HBASE_PARAMS));
        configuration.setBoolean(" hbase.ipc.client.tcpnodelay", true);
        configuration.setInt("hbase.client.ipc.pool.size", 10);

Adding those last two lines has made no improvement. Likewise, updating hbase-site.xml on the cluster to include these settings has made no improvement. The problem is that I have zero confidence it's actually taking the settings in the first place.

-Mike



-----Original Message-----
From: lars hofhansl [mailto:larsh@apache.org]
Sent: Tuesday, November 05, 2013 9:47 AM
To: user@hbase.apache.org
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Hi Mike,

hbase-site.xml has both client and server side settings.

In your client, where do you set the ZK quorum of the cluster? At that same spot you'd set these options.

Assuming you're running the clients with classes from a normal HBase install, you would add these setting to the hbase-site.xml there.

-- Lars



________________________________
From: "Michael.Grundvig@high5games.com" <Mi...@high5games.com>
To: user@hbase.apache.org
Sent: Tuesday, November 5, 2013 6:43 AM
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine


Thanks for the input, we'll do some more testing on this today. Where do these settings get made? In the configuration used to create the pool or somewhere else? It appears hbase-site.xml "on the client" but we have nothing like that so I think I'm misunderstanding something. Thanks!

-Mike



-----Original Message-----
From: Vladimir Rodionov [mailto:vrodionov@carrieriq.com]
Sent: Monday, November 04, 2013 8:41 PM
To: user@hbase.apache.org; lars hofhansl
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

One more: "hbase.ipc.client.tcpnodelay" set to true. It is worth trying as well.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: lars hofhansl [larsh@apache.org]
Sent: Monday, November 04, 2013 5:55 PM
To: user@hbase.apache.org; lars hofhansl
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Here're one more thing to try. By default each HConnection will use a single TCP connection to multiplex traffic to each region server.

Try setting hbase.client.ipc.pool.size on the client to something > 1.

-- Lars



________________________________
From: lars hofhansl <la...@apache.org>
To: "user@hbase.apache.org" <us...@hbase.apache.org>
Sent: Monday, November 4, 2013 5:16 PM
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine


No. This is terrible.
If you can, please send a jstack and do some profiling. Is there an easy way to reproduce this with just a single RegionServer?
If so, I'd offer to do some profiling.

Thanks.


-- Lars



________________________________

From: "Michael.Grundvig@high5games.com" <Mi...@high5games.com>
To: user@hbase.apache.org
Sent: Monday, November 4, 2013 11:00 AM
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine


Not yet, this is just a load test client. It literally does nothing but create threads to talk to HBase and run 4 different calls. Nothing else is done in the app at all.

To eliminate even more of our code from the loop, we just tried removing our connection pool entirely and just using a single connection per thread - no improvement. Then we tried creating the HTableInterface (all calls are against the same table) at the time of connection creation. The means thread to connection to table interface were all at 1 to 1 and not being passed around. No performance improvement.

Long story short, running a single thread it's fast. Start multithreading, it starts slowing down. CPU usage, memory usage, etc. are all negligible. The performance isn't terrible - it's probably good enough for the vast majority of users, but it's not good enough for our app. With one thread, it might take 5 milliseconds. With 10 threads all spinning more quickly (40 milliseconds delay), the call time increases to 15-30 milliseconds. The problem is that at our throughput rates, that's a serious concern.

We are going to fire up a profiler next to see what we can find.

-Mike



-----Original Message-----
From: Vladimir Rodionov [mailto:vrodionov@carrieriq.com]
Sent: Monday, November 04, 2013 12:50 PM
To: user@hbase.apache.org
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Michael, have you tried jstack on your client application?

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Michael.Grundvig@high5games.com [Michael.Grundvig@high5games.com]
Sent: Sunday, November 03, 2013 7:46 PM
To: user@hbase.apache.org
Subject: HBase Client Performance Bottleneck in a Single Virtual Machine

Hi all; I posted this as a question on StackOverflow as well but realized I should have gone straight ot the horses-mouth with my question. Sorry for the double post!

We are running a series of HBase tests to see if we can migrate one of our existing datasets from a RDBMS to HBase. We are running 15 nodes with 5 zookeepers and HBase 0.94.12 for this test.

We have a single table with three column families and a key that is distributing very well across the cluster. All of our queries are running a direct look-up; no searching or scanning. Since the HTablePool is now frowned upon, we are using the Apache commons pool and a simple connection factory to create a pool of connections and use them in our threads. Each thread creates an HTableInstance as needed and closes it when done. There are no leaks we can identify.

If we run a single thread and just do lots of random calls sequentially, the performance is quite good. Everything works great until we start trying to scale the performance. As we add more threads and try and get more work done in a single VM, we start seeing performance degrade quickly. The client code is simply attempting to run either one of several gets or a single put at a given frequency. It then waits until the next time to run, we use this to simulate the workload from external clients. With a single thread, we will see call times in the 2-3 milliseconds which is acceptable.

As we add more threads, this call time starts increasing quickly. What gets strange is if we add more VMs, the times hold steady across them all so clearly it's a bottleneck in the running instance and not the cluster. We can get a huge amount of processing happening across the cluster very easily - it just has to use a lot of VMs on the client side to do it. We know the contention isn't in the connection pool as we see the problem even when we have more connections than threads. Unfortunately, the times are spiraling out of control very quickly. We need it to support at least 128 threads in practice, but most important I want to support 500 updates/sec and 250 gets/sec. In theory, this should be a piece of cake for the cluster as we can do FAR more work than that with a few VMs, but we don't even get close to this with a single VM.

So my question: how do people building high-performance apps with HBase get around this? What approach are others using for connection pooling in a multi-threaded environment? There seems to be a surprisingly little amount of info about this on the web considering the popularity. Is there some client setting we need to use that makes it perform better in a threaded environment? We are going to try to cache HTable instances next but that's a total guess. There are solutions to offloading work to other VMs but we really want to avoid this as clearly the cluster can handle the load and it will dramatically decrease the application performance in critical areas.

Any help is greatly appreciated! Thanks!
-Mike

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Posted by lars hofhansl <la...@apache.org>.

If you get the chance, please send a jstack of the client when it is slow, or some profile data if you have, or just some code reproducing the problem.
We should track this down.

-- Lars

________________________________
 From: "Michael.Grundvig@high5games.com" <Mi...@high5games.com>
To: user@hbase.apache.org 
Sent: Tuesday, November 5, 2013 8:35 AM
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

I'm finding this configuration process very confusing. This is what we are doing in Java code:

        Configuration configuration = HBaseConfiguration.create();
        configuration.set("hbase.zookeeper.quorum", props.getProperty(HBASE_NODES));
        configuration.set("zookeeper.znode.parent", props.getProperty(HBASE_PARAMS));
        configuration.setBoolean(" hbase.ipc.client.tcpnodelay", true);
        configuration.setInt("hbase.client.ipc.pool.size", 10);

Adding those last two lines has made no improvement. Likewise, updating hbase-site.xml on the cluster to include these settings has made no improvement. The problem is that I have zero confidence it's actually taking the settings in the first place. 

-Mike

-----Original Message-----
From: lars hofhansl [mailto:larsh@apache.org] 
Sent: Tuesday, November 05, 2013 9:47 AM
To: user@hbase.apache.org
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Hi Mike,

hbase-site.xml has both client and server side settings.

In your client, where do you set the ZK quorum of the cluster? At that same spot you'd set these options.

Assuming you're running the clients with classes from a normal HBase install, you would add these setting to the hbase-site.xml there.

-- Lars

________________________________
From: "Michael.Grundvig@high5games.com" <Mi...@high5games.com>
To: user@hbase.apache.org 
Sent: Tuesday, November 5, 2013 6:43 AM
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Thanks for the input, we'll do some more testing on this today. Where do these settings get made? In the configuration used to create the pool or somewhere else? It appears hbase-site.xml "on the client" but we have nothing like that so I think I'm misunderstanding something. Thanks!

-Mike

-----Original Message-----
From: Vladimir Rodionov [mailto:vrodionov@carrieriq.com] 
Sent: Monday, November 04, 2013 8:41 PM
To: user@hbase.apache.org; lars hofhansl
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

One more: "hbase.ipc.client.tcpnodelay" set to true. It is worth trying as well.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: lars hofhansl [larsh@apache.org]
Sent: Monday, November 04, 2013 5:55 PM
To: user@hbase.apache.org; lars hofhansl
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Here're one more thing to try. By default each HConnection will use a single TCP connection to multiplex traffic to each region server.

Try setting hbase.client.ipc.pool.size on the client to something > 1.

-- Lars

________________________________
From: lars hofhansl <la...@apache.org>
To: "user@hbase.apache.org" <us...@hbase.apache.org>
Sent: Monday, November 4, 2013 5:16 PM
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine

No. This is terrible.
If you can, please send a jstack and do some profiling. Is there an easy way to reproduce this with just a single RegionServer?
If so, I'd offer to do some profiling.

Thanks.

-- Lars

________________________________

From: "Michael.Grundvig@high5games.com" <Mi...@high5games.com>
To: user@hbase.apache.org
Sent: Monday, November 4, 2013 11:00 AM
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Not yet, this is just a load test client. It literally does nothing but create threads to talk to HBase and run 4 different calls. Nothing else is done in the app at all.

To eliminate even more of our code from the loop, we just tried removing our connection pool entirely and just using a single connection per thread - no improvement. Then we tried creating the HTableInterface (all calls are against the same table) at the time of connection creation. The means thread to connection to table interface were all at 1 to 1 and not being passed around. No performance improvement.

Long story short, running a single thread it's fast. Start multithreading, it starts slowing down. CPU usage, memory usage, etc. are all negligible. The performance isn't terrible - it's probably good enough for the vast majority of users, but it's not good enough for our app. With one thread, it might take 5 milliseconds. With 10 threads all spinning more quickly (40 milliseconds delay), the call time increases to 15-30 milliseconds. The problem is that at our throughput rates, that's a serious concern.

We are going to fire up a profiler next to see what we can find.

-Mike

-----Original Message-----
From: Vladimir Rodionov [mailto:vrodionov@carrieriq.com]
Sent: Monday, November 04, 2013 12:50 PM
To: user@hbase.apache.org
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Michael, have you tried jstack on your client application?

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Michael.Grundvig@high5games.com [Michael.Grundvig@high5games.com]
Sent: Sunday, November 03, 2013 7:46 PM
To: user@hbase.apache.org
Subject: HBase Client Performance Bottleneck in a Single Virtual Machine

Hi all; I posted this as a question on StackOverflow as well but realized I should have gone straight ot the horses-mouth with my question. Sorry for the double post!

We are running a series of HBase tests to see if we can migrate one of our existing datasets from a RDBMS to HBase. We are running 15 nodes with 5 zookeepers and HBase 0.94.12 for this test.

We have a single table with three column families and a key that is distributing very well across the cluster. All of our queries are running a direct look-up; no searching or scanning. Since the HTablePool is now frowned upon, we are using the Apache commons pool and a simple connection factory to create a pool of connections and use them in our threads. Each thread creates an HTableInstance as needed and closes it when done. There are no leaks we can identify.

If we run a single thread and just do lots of random calls sequentially, the performance is quite good. Everything works great until we start trying to scale the performance. As we add more threads and try and get more work done in a single VM, we start seeing performance degrade quickly. The client code is simply attempting to run either one of several gets or a single put at a given frequency. It then waits until the next time to run, we use this to simulate the workload from external clients. With a single thread, we will see call times in the 2-3 milliseconds which is acceptable.

As we add more threads, this call time starts increasing quickly. What gets strange is if we add more VMs, the times hold steady across them all so clearly it's a bottleneck in the running instance and not the cluster. We can get a huge amount of processing happening across the cluster very easily - it just has to use a lot of VMs on the client side to do it. We know the contention isn't in the connection pool as we see the problem even when we have more connections than threads. Unfortunately, the times are spiraling out of control very quickly. We need it to support at least 128 threads in practice, but most important I want to support 500 updates/sec and 250 gets/sec. In theory, this should be a piece of cake for the cluster as we can do FAR more work than that with a few VMs, but we don't even get close to this with a single VM.

So my question: how do people building high-performance apps with HBase get around this? What approach are others using for connection pooling in a multi-threaded environment? There seems to be a surprisingly little amount of info about this on the web considering the popularity. Is there some client setting we need to use that makes it perform better in a threaded environment? We are going to try to cache HTable instances next but that's a total guess. There are solutions to offloading work to other VMs but we really want to avoid this as clearly the cluster can handle the load and it will dramatically decrease the application performance in critical areas.

Any help is greatly appreciated! Thanks!
-Mike

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Posted by Mi...@high5games.com.

I'm finding this configuration process very confusing. This is what we are doing in Java code:

        Configuration configuration = HBaseConfiguration.create();
        configuration.set("hbase.zookeeper.quorum", props.getProperty(HBASE_NODES));
        configuration.set("zookeeper.znode.parent", props.getProperty(HBASE_PARAMS));
        configuration.setBoolean(" hbase.ipc.client.tcpnodelay", true);
        configuration.setInt("hbase.client.ipc.pool.size", 10);

Adding those last two lines has made no improvement. Likewise, updating hbase-site.xml on the cluster to include these settings has made no improvement. The problem is that I have zero confidence it's actually taking the settings in the first place. 

-Mike


-----Original Message-----
From: lars hofhansl [mailto:larsh@apache.org] 
Sent: Tuesday, November 05, 2013 9:47 AM
To: user@hbase.apache.org
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Hi Mike,

hbase-site.xml has both client and server side settings.

In your client, where do you set the ZK quorum of the cluster? At that same spot you'd set these options.

Assuming you're running the clients with classes from a normal HBase install, you would add these setting to the hbase-site.xml there.

-- Lars



________________________________
 From: "Michael.Grundvig@high5games.com" <Mi...@high5games.com>
To: user@hbase.apache.org 
Sent: Tuesday, November 5, 2013 6:43 AM
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine
 

Thanks for the input, we'll do some more testing on this today. Where do these settings get made? In the configuration used to create the pool or somewhere else? It appears hbase-site.xml "on the client" but we have nothing like that so I think I'm misunderstanding something. Thanks!

-Mike



-----Original Message-----
From: Vladimir Rodionov [mailto:vrodionov@carrieriq.com] 
Sent: Monday, November 04, 2013 8:41 PM
To: user@hbase.apache.org; lars hofhansl
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

One more: "hbase.ipc.client.tcpnodelay" set to true. It is worth trying as well.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: lars hofhansl [larsh@apache.org]
Sent: Monday, November 04, 2013 5:55 PM
To: user@hbase.apache.org; lars hofhansl
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Here're one more thing to try. By default each HConnection will use a single TCP connection to multiplex traffic to each region server.

Try setting hbase.client.ipc.pool.size on the client to something > 1.

-- Lars



________________________________
From: lars hofhansl <la...@apache.org>
To: "user@hbase.apache.org" <us...@hbase.apache.org>
Sent: Monday, November 4, 2013 5:16 PM
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine


No. This is terrible.
If you can, please send a jstack and do some profiling. Is there an easy way to reproduce this with just a single RegionServer?
If so, I'd offer to do some profiling.

Thanks.


-- Lars



________________________________

From: "Michael.Grundvig@high5games.com" <Mi...@high5games.com>
To: user@hbase.apache.org
Sent: Monday, November 4, 2013 11:00 AM
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine


Not yet, this is just a load test client. It literally does nothing but create threads to talk to HBase and run 4 different calls. Nothing else is done in the app at all.

To eliminate even more of our code from the loop, we just tried removing our connection pool entirely and just using a single connection per thread - no improvement. Then we tried creating the HTableInterface (all calls are against the same table) at the time of connection creation. The means thread to connection to table interface were all at 1 to 1 and not being passed around. No performance improvement.

Long story short, running a single thread it's fast. Start multithreading, it starts slowing down. CPU usage, memory usage, etc. are all negligible. The performance isn't terrible - it's probably good enough for the vast majority of users, but it's not good enough for our app. With one thread, it might take 5 milliseconds. With 10 threads all spinning more quickly (40 milliseconds delay), the call time increases to 15-30 milliseconds. The problem is that at our throughput rates, that's a serious concern.

We are going to fire up a profiler next to see what we can find.

-Mike



-----Original Message-----
From: Vladimir Rodionov [mailto:vrodionov@carrieriq.com]
Sent: Monday, November 04, 2013 12:50 PM
To: user@hbase.apache.org
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Michael, have you tried jstack on your client application?

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Michael.Grundvig@high5games.com [Michael.Grundvig@high5games.com]
Sent: Sunday, November 03, 2013 7:46 PM
To: user@hbase.apache.org
Subject: HBase Client Performance Bottleneck in a Single Virtual Machine

Hi all; I posted this as a question on StackOverflow as well but realized I should have gone straight ot the horses-mouth with my question. Sorry for the double post!

We are running a series of HBase tests to see if we can migrate one of our existing datasets from a RDBMS to HBase. We are running 15 nodes with 5 zookeepers and HBase 0.94.12 for this test.

We have a single table with three column families and a key that is distributing very well across the cluster. All of our queries are running a direct look-up; no searching or scanning. Since the HTablePool is now frowned upon, we are using the Apache commons pool and a simple connection factory to create a pool of connections and use them in our threads. Each thread creates an HTableInstance as needed and closes it when done. There are no leaks we can identify.

If we run a single thread and just do lots of random calls sequentially, the performance is quite good. Everything works great until we start trying to scale the performance. As we add more threads and try and get more work done in a single VM, we start seeing performance degrade quickly. The client code is simply attempting to run either one of several gets or a single put at a given frequency. It then waits until the next time to run, we use this to simulate the workload from external clients. With a single thread, we will see call times in the 2-3 milliseconds which is acceptable.

As we add more threads, this call time starts increasing quickly. What gets strange is if we add more VMs, the times hold steady across them all so clearly it's a bottleneck in the running instance and not the cluster. We can get a huge amount of processing happening across the cluster very easily - it just has to use a lot of VMs on the client side to do it. We know the contention isn't in the connection pool as we see the problem even when we have more connections than threads. Unfortunately, the times are spiraling out of control very quickly. We need it to support at least 128 threads in practice, but most important I want to support 500 updates/sec and 250 gets/sec. In theory, this should be a piece of cake for the cluster as we can do FAR more work than that with a few VMs, but we don't even get close to this with a single VM.

So my question: how do people building high-performance apps with HBase get around this? What approach are others using for connection pooling in a multi-threaded environment? There seems to be a surprisingly little amount of info about this on the web considering the popularity. Is there some client setting we need to use that makes it perform better in a threaded environment? We are going to try to cache HTable instances next but that's a total guess. There are solutions to offloading work to other VMs but we really want to avoid this as clearly the cluster can handle the load and it will dramatically decrease the application performance in critical areas.

Any help is greatly appreciated! Thanks!
-Mike

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Posted by lars hofhansl <la...@apache.org>.

Hi Mike,

hbase-site.xml has both client and server side settings.

In your client, where do you set the ZK quorum of the cluster? At that same spot you'd set these options.

Assuming you're running the clients with classes from a normal HBase install, you would add these setting to the hbase-site.xml there.

-- Lars



________________________________
 From: "Michael.Grundvig@high5games.com" <Mi...@high5games.com>
To: user@hbase.apache.org 
Sent: Tuesday, November 5, 2013 6:43 AM
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine
 

Thanks for the input, we'll do some more testing on this today. Where do these settings get made? In the configuration used to create the pool or somewhere else? It appears hbase-site.xml "on the client" but we have nothing like that so I think I'm misunderstanding something. Thanks!

-Mike



-----Original Message-----
From: Vladimir Rodionov [mailto:vrodionov@carrieriq.com] 
Sent: Monday, November 04, 2013 8:41 PM
To: user@hbase.apache.org; lars hofhansl
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

One more: "hbase.ipc.client.tcpnodelay" set to true. It is worth trying as well.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: lars hofhansl [larsh@apache.org]
Sent: Monday, November 04, 2013 5:55 PM
To: user@hbase.apache.org; lars hofhansl
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Here're one more thing to try. By default each HConnection will use a single TCP connection to multiplex traffic to each region server.

Try setting hbase.client.ipc.pool.size on the client to something > 1.

-- Lars



________________________________
From: lars hofhansl <la...@apache.org>
To: "user@hbase.apache.org" <us...@hbase.apache.org>
Sent: Monday, November 4, 2013 5:16 PM
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine


No. This is terrible.
If you can, please send a jstack and do some profiling. Is there an easy way to reproduce this with just a single RegionServer?
If so, I'd offer to do some profiling.

Thanks.


-- Lars



________________________________

From: "Michael.Grundvig@high5games.com" <Mi...@high5games.com>
To: user@hbase.apache.org
Sent: Monday, November 4, 2013 11:00 AM
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine


Not yet, this is just a load test client. It literally does nothing but create threads to talk to HBase and run 4 different calls. Nothing else is done in the app at all.

To eliminate even more of our code from the loop, we just tried removing our connection pool entirely and just using a single connection per thread - no improvement. Then we tried creating the HTableInterface (all calls are against the same table) at the time of connection creation. The means thread to connection to table interface were all at 1 to 1 and not being passed around. No performance improvement.

Long story short, running a single thread it's fast. Start multithreading, it starts slowing down. CPU usage, memory usage, etc. are all negligible. The performance isn't terrible - it's probably good enough for the vast majority of users, but it's not good enough for our app. With one thread, it might take 5 milliseconds. With 10 threads all spinning more quickly (40 milliseconds delay), the call time increases to 15-30 milliseconds. The problem is that at our throughput rates, that's a serious concern.

We are going to fire up a profiler next to see what we can find.

-Mike



-----Original Message-----
From: Vladimir Rodionov [mailto:vrodionov@carrieriq.com]
Sent: Monday, November 04, 2013 12:50 PM
To: user@hbase.apache.org
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Michael, have you tried jstack on your client application?

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Michael.Grundvig@high5games.com [Michael.Grundvig@high5games.com]
Sent: Sunday, November 03, 2013 7:46 PM
To: user@hbase.apache.org
Subject: HBase Client Performance Bottleneck in a Single Virtual Machine

Hi all; I posted this as a question on StackOverflow as well but realized I should have gone straight ot the horses-mouth with my question. Sorry for the double post!

We are running a series of HBase tests to see if we can migrate one of our existing datasets from a RDBMS to HBase. We are running 15 nodes with 5 zookeepers and HBase 0.94.12 for this test.

We have a single table with three column families and a key that is distributing very well across the cluster. All of our queries are running a direct look-up; no searching or scanning. Since the HTablePool is now frowned upon, we are using the Apache commons pool and a simple connection factory to create a pool of connections and use them in our threads. Each thread creates an HTableInstance as needed and closes it when done. There are no leaks we can identify.

If we run a single thread and just do lots of random calls sequentially, the performance is quite good. Everything works great until we start trying to scale the performance. As we add more threads and try and get more work done in a single VM, we start seeing performance degrade quickly. The client code is simply attempting to run either one of several gets or a single put at a given frequency. It then waits until the next time to run, we use this to simulate the workload from external clients. With a single thread, we will see call times in the 2-3 milliseconds which is acceptable.

As we add more threads, this call time starts increasing quickly. What gets strange is if we add more VMs, the times hold steady across them all so clearly it's a bottleneck in the running instance and not the cluster. We can get a huge amount of processing happening across the cluster very easily - it just has to use a lot of VMs on the client side to do it. We know the contention isn't in the connection pool as we see the problem even when we have more connections than threads. Unfortunately, the times are spiraling out of control very quickly. We need it to support at least 128 threads in practice, but most important I want to support 500 updates/sec and 250 gets/sec. In theory, this should be a piece of cake for the cluster as we can do FAR more work than that with a few VMs, but we don't even get close to this with a single VM.

So my question: how do people building high-performance apps with HBase get around this? What approach are others using for connection pooling in a multi-threaded environment? There seems to be a surprisingly little amount of info about this on the web considering the popularity. Is there some client setting we need to use that makes it perform better in a threaded environment? We are going to try to cache HTable instances next but that's a total guess. There are solutions to offloading work to other VMs but we really want to avoid this as clearly the cluster can handle the load and it will dramatically decrease the application performance in critical areas.

Any help is greatly appreciated! Thanks!
-Mike

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Posted by Mi...@high5games.com.

Thanks for the input, we'll do some more testing on this today. Where do these settings get made? In the configuration used to create the pool or somewhere else? It appears hbase-site.xml "on the client" but we have nothing like that so I think I'm misunderstanding something. Thanks!

-Mike

-----Original Message-----
From: Vladimir Rodionov [mailto:vrodionov@carrieriq.com] 
Sent: Monday, November 04, 2013 8:41 PM
To: user@hbase.apache.org; lars hofhansl
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

One more: "hbase.ipc.client.tcpnodelay" set to true. It is worth trying as well.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: lars hofhansl [larsh@apache.org]
Sent: Monday, November 04, 2013 5:55 PM
To: user@hbase.apache.org; lars hofhansl
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Here're one more thing to try. By default each HConnection will use a single TCP connection to multiplex traffic to each region server.

Try setting hbase.client.ipc.pool.size on the client to something > 1.

-- Lars

________________________________
 From: lars hofhansl <la...@apache.org>
To: "user@hbase.apache.org" <us...@hbase.apache.org>
Sent: Monday, November 4, 2013 5:16 PM
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine

No. This is terrible.
If you can, please send a jstack and do some profiling. Is there an easy way to reproduce this with just a single RegionServer?
If so, I'd offer to do some profiling.

Thanks.

-- Lars

________________________________

From: "Michael.Grundvig@high5games.com" <Mi...@high5games.com>
To: user@hbase.apache.org
Sent: Monday, November 4, 2013 11:00 AM
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Not yet, this is just a load test client. It literally does nothing but create threads to talk to HBase and run 4 different calls. Nothing else is done in the app at all.

To eliminate even more of our code from the loop, we just tried removing our connection pool entirely and just using a single connection per thread - no improvement. Then we tried creating the HTableInterface (all calls are against the same table) at the time of connection creation. The means thread to connection to table interface were all at 1 to 1 and not being passed around. No performance improvement.

Long story short, running a single thread it's fast. Start multithreading, it starts slowing down. CPU usage, memory usage, etc. are all negligible. The performance isn't terrible - it's probably good enough for the vast majority of users, but it's not good enough for our app. With one thread, it might take 5 milliseconds. With 10 threads all spinning more quickly (40 milliseconds delay), the call time increases to 15-30 milliseconds. The problem is that at our throughput rates, that's a serious concern.

We are going to fire up a profiler next to see what we can find.

-Mike

-----Original Message-----
From: Vladimir Rodionov [mailto:vrodionov@carrieriq.com]
Sent: Monday, November 04, 2013 12:50 PM
To: user@hbase.apache.org
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Michael, have you tried jstack on your client application?

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Michael.Grundvig@high5games.com [Michael.Grundvig@high5games.com]
Sent: Sunday, November 03, 2013 7:46 PM
To: user@hbase.apache.org
Subject: HBase Client Performance Bottleneck in a Single Virtual Machine

Hi all; I posted this as a question on StackOverflow as well but realized I should have gone straight ot the horses-mouth with my question. Sorry for the double post!

We are running a series of HBase tests to see if we can migrate one of our existing datasets from a RDBMS to HBase. We are running 15 nodes with 5 zookeepers and HBase 0.94.12 for this test.

We have a single table with three column families and a key that is distributing very well across the cluster. All of our queries are running a direct look-up; no searching or scanning. Since the HTablePool is now frowned upon, we are using the Apache commons pool and a simple connection factory to create a pool of connections and use them in our threads. Each thread creates an HTableInstance as needed and closes it when done. There are no leaks we can identify.

If we run a single thread and just do lots of random calls sequentially, the performance is quite good. Everything works great until we start trying to scale the performance. As we add more threads and try and get more work done in a single VM, we start seeing performance degrade quickly. The client code is simply attempting to run either one of several gets or a single put at a given frequency. It then waits until the next time to run, we use this to simulate the workload from external clients. With a single thread, we will see call times in the 2-3 milliseconds which is acceptable.

As we add more threads, this call time starts increasing quickly. What gets strange is if we add more VMs, the times hold steady across them all so clearly it's a bottleneck in the running instance and not the cluster. We can get a huge amount of processing happening across the cluster very easily - it just has to use a lot of VMs on the client side to do it. We know the contention isn't in the connection pool as we see the problem even when we have more connections than threads. Unfortunately, the times are spiraling out of control very quickly. We need it to support at least 128 threads in practice, but most important I want to support 500 updates/sec and 250 gets/sec. In theory, this should be a piece of cake for the cluster as we can do FAR more work than that with a few VMs, but we don't even get close to this with a single VM.

So my question: how do people building high-performance apps with HBase get around this? What approach are others using for connection pooling in a multi-threaded environment? There seems to be a surprisingly little amount of info about this on the web considering the popularity. Is there some client setting we need to use that makes it perform better in a threaded environment? We are going to try to cache HTable instances next but that's a total guess. There are solutions to offloading work to other VMs but we really want to avoid this as clearly the cluster can handle the load and it will dramatically decrease the application performance in critical areas.

Any help is greatly appreciated! Thanks!
-Mike

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Posted by Vladimir Rodionov <vr...@carrieriq.com>.

One more: "hbase.ipc.client.tcpnodelay" set to true. It is worth trying as well.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: lars hofhansl [larsh@apache.org]
Sent: Monday, November 04, 2013 5:55 PM
To: user@hbase.apache.org; lars hofhansl
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Here're one more thing to try. By default each HConnection will use a single TCP connection to multiplex traffic to each region server.

Try setting hbase.client.ipc.pool.size on the client to something > 1.

-- Lars



________________________________
 From: lars hofhansl <la...@apache.org>
To: "user@hbase.apache.org" <us...@hbase.apache.org>
Sent: Monday, November 4, 2013 5:16 PM
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine


No. This is terrible.
If you can, please send a jstack and do some profiling. Is there an easy way to reproduce this with just a single RegionServer?
If so, I'd offer to do some profiling.

Thanks.


-- Lars



________________________________

From: "Michael.Grundvig@high5games.com" <Mi...@high5games.com>
To: user@hbase.apache.org
Sent: Monday, November 4, 2013 11:00 AM
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine


Not yet, this is just a load test client. It literally does nothing but create threads to talk to HBase and run 4 different calls. Nothing else is done in the app at all.

To eliminate even more of our code from the loop, we just tried removing our connection pool entirely and just using a single connection per thread - no improvement. Then we tried creating the HTableInterface (all calls are against the same table) at the time of connection creation. The means thread to connection to table interface were all at 1 to 1 and not being passed around. No performance improvement.

Long story short, running a single thread it's fast. Start multithreading, it starts slowing down. CPU usage, memory usage, etc. are all negligible. The performance isn't terrible - it's probably good enough for the vast majority of users, but it's not good enough for our app. With one thread, it might take 5 milliseconds. With 10 threads all spinning more quickly (40 milliseconds delay), the call time increases to 15-30 milliseconds. The problem is that at our throughput rates, that's a serious concern.

We are going to fire up a profiler next to see what we can find.

-Mike



-----Original Message-----
From: Vladimir Rodionov [mailto:vrodionov@carrieriq.com]
Sent: Monday, November 04, 2013 12:50 PM
To: user@hbase.apache.org
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Michael, have you tried jstack on your client application?

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Michael.Grundvig@high5games.com [Michael.Grundvig@high5games.com]
Sent: Sunday, November 03, 2013 7:46 PM
To: user@hbase.apache.org
Subject: HBase Client Performance Bottleneck in a Single Virtual Machine

Hi all; I posted this as a question on StackOverflow as well but realized I should have gone straight ot the horses-mouth with my question. Sorry for the double post!

We are running a series of HBase tests to see if we can migrate one of our existing datasets from a RDBMS to HBase. We are running 15 nodes with 5 zookeepers and HBase 0.94.12 for this test.

We have a single table with three column families and a key that is distributing very well across the cluster. All of our queries are running a direct look-up; no searching or scanning. Since the HTablePool is now frowned upon, we are using the Apache commons pool and a simple connection factory to create a pool of connections and use them in our threads. Each thread creates an HTableInstance as needed and closes it when done. There are no leaks we can identify.

If we run a single thread and just do lots of random calls sequentially, the performance is quite good. Everything works great until we start trying to scale the performance. As we add more threads and try and get more work done in a single VM, we start seeing performance degrade quickly. The client code is simply attempting to run either one of several gets or a single put at a given frequency. It then waits until the next time to run, we use this to simulate the workload from external clients. With a single thread, we will see call times in the 2-3 milliseconds which is acceptable.

As we add more threads, this call time starts increasing quickly. What gets strange is if we add more VMs, the times hold steady across them all so clearly it's a bottleneck in the running instance and not the cluster. We can get a huge amount of processing happening across the cluster very easily - it just has to use a lot of VMs on the client side to do it. We know the contention isn't in the connection pool as we see the problem even when we have more connections than threads. Unfortunately, the times are spiraling out of control very quickly. We need it to support at least 128 threads in practice, but most important I want to support 500 updates/sec and 250 gets/sec. In theory, this should be a piece of cake for the cluster as we can do FAR more work than that with a few VMs, but we don't even get close to this with a single VM.

So my question: how do people building high-performance apps with HBase get around this? What approach are others using for connection pooling in a multi-threaded environment? There seems to be a surprisingly little amount of info about this on the web considering the popularity. Is there some client setting we need to use that makes it perform better in a threaded environment? We are going to try to cache HTable instances next but that's a total guess. There are solutions to offloading work to other VMs but we really want to avoid this as clearly the cluster can handle the load and it will dramatically decrease the application performance in critical areas.

Any help is greatly appreciated! Thanks!
-Mike

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Posted by lars hofhansl <la...@apache.org>.

Here're one more thing to try. By default each HConnection will use a single TCP connection to multiplex traffic to each region server. 

Try setting hbase.client.ipc.pool.size on the client to something > 1.

-- Lars



________________________________
 From: lars hofhansl <la...@apache.org>
To: "user@hbase.apache.org" <us...@hbase.apache.org> 
Sent: Monday, November 4, 2013 5:16 PM
Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine
 

No. This is terrible.
If you can, please send a jstack and do some profiling. Is there an easy way to reproduce this with just a single RegionServer?
If so, I'd offer to do some profiling.

Thanks.


-- Lars



________________________________

From: "Michael.Grundvig@high5games.com" <Mi...@high5games.com>
To: user@hbase.apache.org 
Sent: Monday, November 4, 2013 11:00 AM
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine


Not yet, this is just a load test client. It literally does nothing but create threads to talk to HBase and run 4 different calls. Nothing else is done in the app at all. 

To eliminate even more of our code from the loop, we just tried removing our connection pool entirely and just using a single connection per thread - no improvement. Then we tried creating the HTableInterface (all calls are against the same table) at the time of connection creation. The means thread to connection to table interface were all at 1 to 1 and not being passed around. No performance improvement.

Long story short, running a single thread it's fast. Start multithreading, it starts slowing down. CPU usage, memory usage, etc. are all negligible. The performance isn't terrible - it's probably good enough for the vast majority of users, but it's not good enough for our app. With one thread, it might take 5 milliseconds. With 10 threads all spinning more quickly (40 milliseconds delay), the call time increases to 15-30 milliseconds. The problem is that at our throughput rates, that's a serious concern.

We are going to fire up a profiler next to see what we can find. 

-Mike



-----Original Message-----
From: Vladimir Rodionov [mailto:vrodionov@carrieriq.com] 
Sent: Monday, November 04, 2013 12:50 PM
To: user@hbase.apache.org
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Michael, have you tried jstack on your client application?

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Michael.Grundvig@high5games.com [Michael.Grundvig@high5games.com]
Sent: Sunday, November 03, 2013 7:46 PM
To: user@hbase.apache.org
Subject: HBase Client Performance Bottleneck in a Single Virtual Machine

Hi all; I posted this as a question on StackOverflow as well but realized I should have gone straight ot the horses-mouth with my question. Sorry for the double post!

We are running a series of HBase tests to see if we can migrate one of our existing datasets from a RDBMS to HBase. We are running 15 nodes with 5 zookeepers and HBase 0.94.12 for this test.

We have a single table with three column families and a key that is distributing very well across the cluster. All of our queries are running a direct look-up; no searching or scanning. Since the HTablePool is now frowned upon, we are using the Apache commons pool and a simple connection factory to create a pool of connections and use them in our threads. Each thread creates an HTableInstance as needed and closes it when done. There are no leaks we can identify.

If we run a single thread and just do lots of random calls sequentially, the performance is quite good. Everything works great until we start trying to scale the performance. As we add more threads and try and get more work done in a single VM, we start seeing performance degrade quickly. The client code is simply attempting to run either one of several gets or a single put at a given frequency. It then waits until the next time to run, we use this to simulate the workload from external clients. With a single thread, we will see call times in the 2-3 milliseconds which is acceptable.

As we add more threads, this call time starts increasing quickly. What gets strange is if we add more VMs, the times hold steady across them all so clearly it's a bottleneck in the running instance and not the cluster. We can get a huge amount of processing happening across the cluster very easily - it just has to use a lot of VMs on the client side to do it. We know the contention isn't in the connection pool as we see the problem even when we have more connections than threads. Unfortunately, the times are spiraling out of control very quickly. We need it to support at least 128 threads in practice, but most important I want to support 500 updates/sec and 250 gets/sec. In theory, this should be a piece of cake for the cluster as we can do FAR more work than that with a few VMs, but we don't even get close to this with a single VM.

So my question: how do people building high-performance apps with HBase get around this? What approach are others using for connection pooling in a multi-threaded environment? There seems to be a surprisingly little amount of info about this on the web considering the popularity. Is there some client setting we need to use that makes it perform better in a threaded environment? We are going to try to cache HTable instances next but that's a total guess. There are solutions to offloading work to other VMs but we really want to avoid this as clearly the cluster can handle the load and it will dramatically decrease the application performance in critical areas.

Any help is greatly appreciated! Thanks!
-Mike

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Posted by lars hofhansl <la...@apache.org>.

No. This is terrible.
If you can, please send a jstack and do some profiling. Is there an easy way to reproduce this with just a single RegionServer?
If so, I'd offer to do some profiling.

Thanks.


-- Lars



________________________________
 From: "Michael.Grundvig@high5games.com" <Mi...@high5games.com>
To: user@hbase.apache.org 
Sent: Monday, November 4, 2013 11:00 AM
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine
 

Not yet, this is just a load test client. It literally does nothing but create threads to talk to HBase and run 4 different calls. Nothing else is done in the app at all. 

To eliminate even more of our code from the loop, we just tried removing our connection pool entirely and just using a single connection per thread - no improvement. Then we tried creating the HTableInterface (all calls are against the same table) at the time of connection creation. The means thread to connection to table interface were all at 1 to 1 and not being passed around. No performance improvement.

Long story short, running a single thread it's fast. Start multithreading, it starts slowing down. CPU usage, memory usage, etc. are all negligible. The performance isn't terrible - it's probably good enough for the vast majority of users, but it's not good enough for our app. With one thread, it might take 5 milliseconds. With 10 threads all spinning more quickly (40 milliseconds delay), the call time increases to 15-30 milliseconds. The problem is that at our throughput rates, that's a serious concern.

We are going to fire up a profiler next to see what we can find. 

-Mike



-----Original Message-----
From: Vladimir Rodionov [mailto:vrodionov@carrieriq.com] 
Sent: Monday, November 04, 2013 12:50 PM
To: user@hbase.apache.org
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Michael, have you tried jstack on your client application?

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Michael.Grundvig@high5games.com [Michael.Grundvig@high5games.com]
Sent: Sunday, November 03, 2013 7:46 PM
To: user@hbase.apache.org
Subject: HBase Client Performance Bottleneck in a Single Virtual Machine

Hi all; I posted this as a question on StackOverflow as well but realized I should have gone straight ot the horses-mouth with my question. Sorry for the double post!

We are running a series of HBase tests to see if we can migrate one of our existing datasets from a RDBMS to HBase. We are running 15 nodes with 5 zookeepers and HBase 0.94.12 for this test.

We have a single table with three column families and a key that is distributing very well across the cluster. All of our queries are running a direct look-up; no searching or scanning. Since the HTablePool is now frowned upon, we are using the Apache commons pool and a simple connection factory to create a pool of connections and use them in our threads. Each thread creates an HTableInstance as needed and closes it when done. There are no leaks we can identify.

If we run a single thread and just do lots of random calls sequentially, the performance is quite good. Everything works great until we start trying to scale the performance. As we add more threads and try and get more work done in a single VM, we start seeing performance degrade quickly. The client code is simply attempting to run either one of several gets or a single put at a given frequency. It then waits until the next time to run, we use this to simulate the workload from external clients. With a single thread, we will see call times in the 2-3 milliseconds which is acceptable.

As we add more threads, this call time starts increasing quickly. What gets strange is if we add more VMs, the times hold steady across them all so clearly it's a bottleneck in the running instance and not the cluster. We can get a huge amount of processing happening across the cluster very easily - it just has to use a lot of VMs on the client side to do it. We know the contention isn't in the connection pool as we see the problem even when we have more connections than threads. Unfortunately, the times are spiraling out of control very quickly. We need it to support at least 128 threads in practice, but most important I want to support 500 updates/sec and 250 gets/sec. In theory, this should be a piece of cake for the cluster as we can do FAR more work than that with a few VMs, but we don't even get close to this with a single VM.

So my question: how do people building high-performance apps with HBase get around this? What approach are others using for connection pooling in a multi-threaded environment? There seems to be a surprisingly little amount of info about this on the web considering the popularity. Is there some client setting we need to use that makes it perform better in a threaded environment? We are going to try to cache HTable instances next but that's a total guess. There are solutions to offloading work to other VMs but we really want to avoid this as clearly the cluster can handle the load and it will dramatically decrease the application performance in critical areas.

Any help is greatly appreciated! Thanks!
-Mike

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Posted by Stack <st...@duboce.net>.

You might try asynchbase Michael.
St.Ack


On Mon, Nov 4, 2013 at 11:00 AM, <Mi...@high5games.com> wrote:

> Not yet, this is just a load test client. It literally does nothing but
> create threads to talk to HBase and run 4 different calls. Nothing else is
> done in the app at all.
>
> To eliminate even more of our code from the loop, we just tried removing
> our connection pool entirely and just using a single connection per thread
> - no improvement. Then we tried creating the HTableInterface (all calls are
> against the same table) at the time of connection creation. The means
> thread to connection to table interface were all at 1 to 1 and not being
> passed around. No performance improvement.
>
> Long story short, running a single thread it's fast. Start multithreading,
> it starts slowing down. CPU usage, memory usage, etc. are all negligible.
> The performance isn't terrible - it's probably good enough for the vast
> majority of users, but it's not good enough for our app. With one thread,
> it might take 5 milliseconds. With 10 threads all spinning more quickly (40
> milliseconds delay), the call time increases to 15-30 milliseconds. The
> problem is that at our throughput rates, that's a serious concern.
>
> We are going to fire up a profiler next to see what we can find.
>
> -Mike
>
>
> -----Original Message-----
> From: Vladimir Rodionov [mailto:vrodionov@carrieriq.com]
> Sent: Monday, November 04, 2013 12:50 PM
> To: user@hbase.apache.org
> Subject: RE: HBase Client Performance Bottleneck in a Single Virtual
> Machine
>
> Michael, have you tried jstack on your client application?
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Michael.Grundvig@high5games.com [Michael.Grundvig@high5games.com]
> Sent: Sunday, November 03, 2013 7:46 PM
> To: user@hbase.apache.org
> Subject: HBase Client Performance Bottleneck in a Single Virtual Machine
>
> Hi all; I posted this as a question on StackOverflow as well but realized
> I should have gone straight ot the horses-mouth with my question. Sorry for
> the double post!
>
> We are running a series of HBase tests to see if we can migrate one of our
> existing datasets from a RDBMS to HBase. We are running 15 nodes with 5
> zookeepers and HBase 0.94.12 for this test.
>
> We have a single table with three column families and a key that is
> distributing very well across the cluster. All of our queries are running a
> direct look-up; no searching or scanning. Since the HTablePool is now
> frowned upon, we are using the Apache commons pool and a simple connection
> factory to create a pool of connections and use them in our threads. Each
> thread creates an HTableInstance as needed and closes it when done. There
> are no leaks we can identify.
>
> If we run a single thread and just do lots of random calls sequentially,
> the performance is quite good. Everything works great until we start trying
> to scale the performance. As we add more threads and try and get more work
> done in a single VM, we start seeing performance degrade quickly. The
> client code is simply attempting to run either one of several gets or a
> single put at a given frequency. It then waits until the next time to run,
> we use this to simulate the workload from external clients. With a single
> thread, we will see call times in the 2-3 milliseconds which is acceptable.
>
> As we add more threads, this call time starts increasing quickly. What
> gets strange is if we add more VMs, the times hold steady across them all
> so clearly it's a bottleneck in the running instance and not the cluster.
> We can get a huge amount of processing happening across the cluster very
> easily - it just has to use a lot of VMs on the client side to do it. We
> know the contention isn't in the connection pool as we see the problem even
> when we have more connections than threads. Unfortunately, the times are
> spiraling out of control very quickly. We need it to support at least 128
> threads in practice, but most important I want to support 500 updates/sec
> and 250 gets/sec. In theory, this should be a piece of cake for the cluster
> as we can do FAR more work than that with a few VMs, but we don't even get
> close to this with a single VM.
>
> So my question: how do people building high-performance apps with HBase
> get around this? What approach are others using for connection pooling in a
> multi-threaded environment? There seems to be a surprisingly little amount
> of info about this on the web considering the popularity. Is there some
> client setting we need to use that makes it perform better in a threaded
> environment? We are going to try to cache HTable instances next but that's
> a total guess. There are solutions to offloading work to other VMs but we
> really want to avoid this as clearly the cluster can handle the load and it
> will dramatically decrease the application performance in critical areas.
>
> Any help is greatly appreciated! Thanks!
> -Mike
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>

RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Posted by Vladimir Rodionov <vr...@carrieriq.com>.

It seems that there is a serious contention point inside standard HBase client API, that is why your application does not scale with multiple threads,
but scales with multiple JVMs. I would start with analyzing thread stack traces of your client application - you will easily spot excessive locking.  
 
Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Michael.Grundvig@high5games.com [Michael.Grundvig@high5games.com]
Sent: Monday, November 04, 2013 11:00 AM
To: user@hbase.apache.org
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Not yet, this is just a load test client. It literally does nothing but create threads to talk to HBase and run 4 different calls. Nothing else is done in the app at all.

To eliminate even more of our code from the loop, we just tried removing our connection pool entirely and just using a single connection per thread - no improvement. Then we tried creating the HTableInterface (all calls are against the same table) at the time of connection creation. The means thread to connection to table interface were all at 1 to 1 and not being passed around. No performance improvement.

Long story short, running a single thread it's fast. Start multithreading, it starts slowing down. CPU usage, memory usage, etc. are all negligible. The performance isn't terrible - it's probably good enough for the vast majority of users, but it's not good enough for our app. With one thread, it might take 5 milliseconds. With 10 threads all spinning more quickly (40 milliseconds delay), the call time increases to 15-30 milliseconds. The problem is that at our throughput rates, that's a serious concern.

We are going to fire up a profiler next to see what we can find.

-Mike


-----Original Message-----
From: Vladimir Rodionov [mailto:vrodionov@carrieriq.com]
Sent: Monday, November 04, 2013 12:50 PM
To: user@hbase.apache.org
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Michael, have you tried jstack on your client application?

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Michael.Grundvig@high5games.com [Michael.Grundvig@high5games.com]
Sent: Sunday, November 03, 2013 7:46 PM
To: user@hbase.apache.org
Subject: HBase Client Performance Bottleneck in a Single Virtual Machine

Hi all; I posted this as a question on StackOverflow as well but realized I should have gone straight ot the horses-mouth with my question. Sorry for the double post!

We are running a series of HBase tests to see if we can migrate one of our existing datasets from a RDBMS to HBase. We are running 15 nodes with 5 zookeepers and HBase 0.94.12 for this test.

We have a single table with three column families and a key that is distributing very well across the cluster. All of our queries are running a direct look-up; no searching or scanning. Since the HTablePool is now frowned upon, we are using the Apache commons pool and a simple connection factory to create a pool of connections and use them in our threads. Each thread creates an HTableInstance as needed and closes it when done. There are no leaks we can identify.

If we run a single thread and just do lots of random calls sequentially, the performance is quite good. Everything works great until we start trying to scale the performance. As we add more threads and try and get more work done in a single VM, we start seeing performance degrade quickly. The client code is simply attempting to run either one of several gets or a single put at a given frequency. It then waits until the next time to run, we use this to simulate the workload from external clients. With a single thread, we will see call times in the 2-3 milliseconds which is acceptable.

As we add more threads, this call time starts increasing quickly. What gets strange is if we add more VMs, the times hold steady across them all so clearly it's a bottleneck in the running instance and not the cluster. We can get a huge amount of processing happening across the cluster very easily - it just has to use a lot of VMs on the client side to do it. We know the contention isn't in the connection pool as we see the problem even when we have more connections than threads. Unfortunately, the times are spiraling out of control very quickly. We need it to support at least 128 threads in practice, but most important I want to support 500 updates/sec and 250 gets/sec. In theory, this should be a piece of cake for the cluster as we can do FAR more work than that with a few VMs, but we don't even get close to this with a single VM.

So my question: how do people building high-performance apps with HBase get around this? What approach are others using for connection pooling in a multi-threaded environment? There seems to be a surprisingly little amount of info about this on the web considering the popularity. Is there some client setting we need to use that makes it perform better in a threaded environment? We are going to try to cache HTable instances next but that's a total guess. There are solutions to offloading work to other VMs but we really want to avoid this as clearly the cluster can handle the load and it will dramatically decrease the application performance in critical areas.

Any help is greatly appreciated! Thanks!
-Mike

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Posted by Mi...@high5games.com.

Not yet, this is just a load test client. It literally does nothing but create threads to talk to HBase and run 4 different calls. Nothing else is done in the app at all. 

To eliminate even more of our code from the loop, we just tried removing our connection pool entirely and just using a single connection per thread - no improvement. Then we tried creating the HTableInterface (all calls are against the same table) at the time of connection creation. The means thread to connection to table interface were all at 1 to 1 and not being passed around. No performance improvement.

Long story short, running a single thread it's fast. Start multithreading, it starts slowing down. CPU usage, memory usage, etc. are all negligible. The performance isn't terrible - it's probably good enough for the vast majority of users, but it's not good enough for our app. With one thread, it might take 5 milliseconds. With 10 threads all spinning more quickly (40 milliseconds delay), the call time increases to 15-30 milliseconds. The problem is that at our throughput rates, that's a serious concern.

We are going to fire up a profiler next to see what we can find. 

-Mike


-----Original Message-----
From: Vladimir Rodionov [mailto:vrodionov@carrieriq.com] 
Sent: Monday, November 04, 2013 12:50 PM
To: user@hbase.apache.org
Subject: RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Michael, have you tried jstack on your client application?

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Michael.Grundvig@high5games.com [Michael.Grundvig@high5games.com]
Sent: Sunday, November 03, 2013 7:46 PM
To: user@hbase.apache.org
Subject: HBase Client Performance Bottleneck in a Single Virtual Machine

Hi all; I posted this as a question on StackOverflow as well but realized I should have gone straight ot the horses-mouth with my question. Sorry for the double post!

We are running a series of HBase tests to see if we can migrate one of our existing datasets from a RDBMS to HBase. We are running 15 nodes with 5 zookeepers and HBase 0.94.12 for this test.

We have a single table with three column families and a key that is distributing very well across the cluster. All of our queries are running a direct look-up; no searching or scanning. Since the HTablePool is now frowned upon, we are using the Apache commons pool and a simple connection factory to create a pool of connections and use them in our threads. Each thread creates an HTableInstance as needed and closes it when done. There are no leaks we can identify.

If we run a single thread and just do lots of random calls sequentially, the performance is quite good. Everything works great until we start trying to scale the performance. As we add more threads and try and get more work done in a single VM, we start seeing performance degrade quickly. The client code is simply attempting to run either one of several gets or a single put at a given frequency. It then waits until the next time to run, we use this to simulate the workload from external clients. With a single thread, we will see call times in the 2-3 milliseconds which is acceptable.

As we add more threads, this call time starts increasing quickly. What gets strange is if we add more VMs, the times hold steady across them all so clearly it's a bottleneck in the running instance and not the cluster. We can get a huge amount of processing happening across the cluster very easily - it just has to use a lot of VMs on the client side to do it. We know the contention isn't in the connection pool as we see the problem even when we have more connections than threads. Unfortunately, the times are spiraling out of control very quickly. We need it to support at least 128 threads in practice, but most important I want to support 500 updates/sec and 250 gets/sec. In theory, this should be a piece of cake for the cluster as we can do FAR more work than that with a few VMs, but we don't even get close to this with a single VM.

So my question: how do people building high-performance apps with HBase get around this? What approach are others using for connection pooling in a multi-threaded environment? There seems to be a surprisingly little amount of info about this on the web considering the popularity. Is there some client setting we need to use that makes it perform better in a threaded environment? We are going to try to cache HTable instances next but that's a total guess. There are solutions to offloading work to other VMs but we really want to avoid this as clearly the cluster can handle the load and it will dramatically decrease the application performance in critical areas.

Any help is greatly appreciated! Thanks!
-Mike

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

RE: HBase Client Performance Bottleneck in a Single Virtual Machine

Posted by Vladimir Rodionov <vr...@carrieriq.com>.

Michael, have you tried jstack on your client application?

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Michael.Grundvig@high5games.com [Michael.Grundvig@high5games.com]
Sent: Sunday, November 03, 2013 7:46 PM
To: user@hbase.apache.org
Subject: HBase Client Performance Bottleneck in a Single Virtual Machine

Hi all; I posted this as a question on StackOverflow as well but realized I should have gone straight ot the horses-mouth with my question. Sorry for the double post!

We are running a series of HBase tests to see if we can migrate one of our existing datasets from a RDBMS to HBase. We are running 15 nodes with 5 zookeepers and HBase 0.94.12 for this test.

We have a single table with three column families and a key that is distributing very well across the cluster. All of our queries are running a direct look-up; no searching or scanning. Since the HTablePool is now frowned upon, we are using the Apache commons pool and a simple connection factory to create a pool of connections and use them in our threads. Each thread creates an HTableInstance as needed and closes it when done. There are no leaks we can identify.

If we run a single thread and just do lots of random calls sequentially, the performance is quite good. Everything works great until we start trying to scale the performance. As we add more threads and try and get more work done in a single VM, we start seeing performance degrade quickly. The client code is simply attempting to run either one of several gets or a single put at a given frequency. It then waits until the next time to run, we use this to simulate the workload from external clients. With a single thread, we will see call times in the 2-3 milliseconds which is acceptable.

As we add more threads, this call time starts increasing quickly. What gets strange is if we add more VMs, the times hold steady across them all so clearly it's a bottleneck in the running instance and not the cluster. We can get a huge amount of processing happening across the cluster very easily - it just has to use a lot of VMs on the client side to do it. We know the contention isn't in the connection pool as we see the problem even when we have more connections than threads. Unfortunately, the times are spiraling out of control very quickly. We need it to support at least 128 threads in practice, but most important I want to support 500 updates/sec and 250 gets/sec. In theory, this should be a piece of cake for the cluster as we can do FAR more work than that with a few VMs, but we don't even get close to this with a single VM.

So my question: how do people building high-performance apps with HBase get around this? What approach are others using for connection pooling in a multi-threaded environment? There seems to be a surprisingly little amount of info about this on the web considering the popularity. Is there some client setting we need to use that makes it perform better in a threaded environment? We are going to try to cache HTable instances next but that's a total guess. There are solutions to offloading work to other VMs but we really want to avoid this as clearly the cluster can handle the load and it will dramatically decrease the application performance in critical areas.

Any help is greatly appreciated! Thanks!
-Mike

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.