You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ashish Shinde <as...@strandls.com> on 2011/04/04 09:45:01 UTC

Is HTable threadsafe and cachable?

Hi,

We are using hbase to power a web application. The current
implementation of the data access classes maintain a static HTable
instance to read and write. The reason being getting hold of HTable
instance looks costly. 

In this scenario the HTable instances could more or less be perpetually
cached. Is it reasonable to assume that HTables do not have some
inherent timeout and are threadsafe across gets and puts?

Thanks and regards,
- Ashish



Re: Is HTable threadsafe and cachable?

Posted by tsuna <ts...@gmail.com>.
On Mon, Apr 4, 2011 at 12:45 AM, Ashish Shinde <as...@strandls.com> wrote:
> We are using hbase to power a web application. The current
> implementation of the data access classes maintain a static HTable
> instance to read and write. The reason being getting hold of HTable
> instance looks costly.
>
> In this scenario the HTable instances could more or less be perpetually
> cached. Is it reasonable to assume that HTables do not have some
> inherent timeout and are threadsafe across gets and puts?

Hi Ashish,
if you're interested in a thread-safe, scalable HBase client, take a
look at asynchbase: https://github.com/stumbleupon/asynchbase
It was designed from the ground up to be thread-safe, you only need
one instance of HBaseClient per cluster, regardless of how many tables
you're going to interact with.  It also greatly outperforms HTable,
especially in write-heavy workloads, because it uses far fewer threads
and has less lock contention.

-- 
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com

Re: Is HTable threadsafe and cachable?

Posted by Jean-Daniel Cryans <jd...@apache.org>.
>From http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html

"Instances of HTable passed the same Configuration instance will share
connections to servers out on the cluster and to the zookeeper
ensemble as well as caches of region locations. This is usually a
*good* thing. This happens because they will all share the same
underlying HConnection instance. See HConnectionManager for more on
how this mechanism works."

I'll add that another good think is that you don't run out of connections.

J-D

On Mon, Apr 4, 2011 at 4:12 AM, Ashish Shinde <as...@strandls.com> wrote:
> Hi Ryan,
>
> Thanks HTablePool fits the bill. Will start using it.
>
> I kinda discovered the re-use of Configuration object after zookeeper
> "too many connections" errors. Although I could not find it documented
> anywhere. Had to dig into HTable code to figure it out.
>
> Thanks and regards,
> - Ashish
>
>
>
> On Mon, 4 Apr 2011 00:50:34 -0700
> Ryan Rawson <ry...@gmail.com> wrote:
>
>> Hey,
>>
>> HTable instances are not really thread safe at this time.  You can
>> cache them, check out HTablePool.  But the creation cost of a HTable
>> instance isnt that high, the actual TCP socket creation and management
>> is done at a lower level and all HTable interfaces share these common
>> caches and sockets. So you can create a number of HTable instances
>> without creating a large number of sockets.
>>
>> Oh and be sure to re-use the same Configuration object, or else you'll
>> end up with multiple sockets.  This is because we use the Config
>> object to know when two HTables are accessing the same cluster.
>>
>> -ryan
>>
>> On Mon, Apr 4, 2011 at 12:45 AM, Ashish Shinde <as...@strandls.com>
>> wrote:
>> > Hi,
>> >
>> > We are using hbase to power a web application. The current
>> > implementation of the data access classes maintain a static HTable
>> > instance to read and write. The reason being getting hold of HTable
>> > instance looks costly.
>> >
>> > In this scenario the HTable instances could more or less be
>> > perpetually cached. Is it reasonable to assume that HTables do not
>> > have some inherent timeout and are threadsafe across gets and puts?
>> >
>> > Thanks and regards,
>> > - Ashish
>> >
>> >
>> >
>
>

Re: Is HTable threadsafe and cachable?

Posted by Ashish Shinde <as...@strandls.com>.
Hi Ryan,

Thanks HTablePool fits the bill. Will start using it. 

I kinda discovered the re-use of Configuration object after zookeeper
"too many connections" errors. Although I could not find it documented
anywhere. Had to dig into HTable code to figure it out.

Thanks and regards,
- Ashish



On Mon, 4 Apr 2011 00:50:34 -0700
Ryan Rawson <ry...@gmail.com> wrote:

> Hey,
> 
> HTable instances are not really thread safe at this time.  You can
> cache them, check out HTablePool.  But the creation cost of a HTable
> instance isnt that high, the actual TCP socket creation and management
> is done at a lower level and all HTable interfaces share these common
> caches and sockets. So you can create a number of HTable instances
> without creating a large number of sockets.
> 
> Oh and be sure to re-use the same Configuration object, or else you'll
> end up with multiple sockets.  This is because we use the Config
> object to know when two HTables are accessing the same cluster.
> 
> -ryan
> 
> On Mon, Apr 4, 2011 at 12:45 AM, Ashish Shinde <as...@strandls.com>
> wrote:
> > Hi,
> >
> > We are using hbase to power a web application. The current
> > implementation of the data access classes maintain a static HTable
> > instance to read and write. The reason being getting hold of HTable
> > instance looks costly.
> >
> > In this scenario the HTable instances could more or less be
> > perpetually cached. Is it reasonable to assume that HTables do not
> > have some inherent timeout and are threadsafe across gets and puts?
> >
> > Thanks and regards,
> > - Ashish
> >
> >
> >


Re: Is HTable threadsafe and cachable?

Posted by Ryan Rawson <ry...@gmail.com>.
Hey,

HTable instances are not really thread safe at this time.  You can
cache them, check out HTablePool.  But the creation cost of a HTable
instance isnt that high, the actual TCP socket creation and management
is done at a lower level and all HTable interfaces share these common
caches and sockets. So you can create a number of HTable instances
without creating a large number of sockets.

Oh and be sure to re-use the same Configuration object, or else you'll
end up with multiple sockets.  This is because we use the Config
object to know when two HTables are accessing the same cluster.

-ryan

On Mon, Apr 4, 2011 at 12:45 AM, Ashish Shinde <as...@strandls.com> wrote:
> Hi,
>
> We are using hbase to power a web application. The current
> implementation of the data access classes maintain a static HTable
> instance to read and write. The reason being getting hold of HTable
> instance looks costly.
>
> In this scenario the HTable instances could more or less be perpetually
> cached. Is it reasonable to assume that HTables do not have some
> inherent timeout and are threadsafe across gets and puts?
>
> Thanks and regards,
> - Ashish
>
>
>