You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Vaibhav Puranik <vp...@gmail.com> on 2009/07/10 03:48:14 UTC

HTablePool question

Hi,

It looks like HTablePool is designed to have one instance of HTablePool per
table.

I am confused by the static map inside HTablePool class. If we can
instantiate one HTablePool per table, what's the use of the map?
Furthermore, the map is static and there is no way to add multiple tables to
it.

Regards,
Vaibhav

Re: HTablePool question

Posted by Jonathan Gray <jl...@streamy.com>.

Excellent, Ken.  Good stuff.

It's byte[] because everything is generally binary in HBase, not Strings.

Given that tables are practically almost always String, you could add 
additional override methods that took String and just called 
Bytes.toBytes() underneath if you'd like.

Tree -> Hash is fine with me, shouldn't really make much of a difference 
in this case, Trees are just more memory efficient.

Everything else sounds good.

Thanks Ken.

JG

Ken Weiner wrote:
> Hey Jon.  I already started writing a simple patch to make the constructors
> private and add the static getters/setters that don't require the client to
> deal with the pool instance.  I'll submit it later today once I make sure it
> conforms to HBase coding standards.
> 
> I'll leave the pool fixed size for now, but I'll add a comment saying that
> when the pool is exhausted, new HTable instances will be created for each
> request (which is what currently happens now).
> 
> I was also going to change the internal map of pools from TreeMap to HashMap
> since there was no apparent need to maintain the pools in sorted order.
> 
> Is there a reason that tableName is represented as byte[] rather than a
> simple String throughout the class?
> 
> On Tue, Jul 14, 2009 at 10:15 AM, Jonathan Gray <jl...@streamy.com> wrote:
> 
>> It's overkill IMO.
>>
>> The interface we would probably need is the KeyedPoolableObjectFactory (key
>> being the table name), defined as:
>>
>> public interface KeyedPoolableObjectFactory {
>>    Object makeObject(Object key);
>>    void activateObject(Object key, Object obj);
>>    void passivateObject(Object key, Object obj);
>>    boolean validateObject(Object key, Object obj);
>>    void destroyObject(Object key, Object obj);
>> }
>>
>>
>> We don't have activate/passivate/destroy semantics, questionable whether we
>> have a validate.  We really just have get/put.
>>
>> HTables are not very stateful (the internal read/write buffers for
>> scans/puts is the exception).  The purpose of the pooling is not for
>> long-running usage but rather a lot of short usages.  Instantiation of an
>> HTable is not free from a client perspective (can take 100ms or so, though
>> I've not tested this much on 0.20), though there is not really a server-side
>> overhead to it.
>>
>> I think the confusion could be settled by adding static getters/putters of
>> HTables that don't require you the intermediate step of getting the pool.
>>  Perhaps rather than using a fixed-size queue, we could just dynamically
>> grow it; could still have some limit to how big the queue itself could grow,
>> spitting out unpooled HTables eventually.
>>
>> If no one else wants to make the patch I will, let me know.
>>
>> JG
>>
>>
>>
>> stack wrote:
>>
>>> I'm an ex-user.  Commons pool is great.  IMO, it may be overkill in this
>>> case.  You make the call Ken.
>>>
>>> Thanks,
>>> St.Ack
>>>
>>> On Mon, Jul 13, 2009 at 6:02 PM, Ken Weiner <ke...@gumgum.com> wrote:
>>>
>>>  I guess the main advantage would be that the interface
>>>>
>>>> http://commons.apache.org/pool/apidocs/org/apache/commons/pool/ObjectPool.htmlis
>>>> simple and well-understood by a lot of people, and also
>>>>
>>>>
>>>> http://commons.apache.org/pool/apidocs/org/apache/commons/pool/impl/GenericObjectPool.htmlmakes
>>>> a lot of things configurable and allows a built-in way to optionally
>>>> verify that each HTable is working before handing it out.  If you think
>>>> it
>>>> is overkill to have more pooling functionality, than I'll stick to making
>>>> simple modifications to the current tiny class.  I just thought that
>>>> Commons
>>>> Pool was a good fit for this functionality in HBase.
>>>>
>>>> On Mon, Jul 13, 2009 at 5:34 PM, Erik Holstad <er...@gmail.com>
>>>> wrote:
>>>>
>>>>  Hi Ken!
>>>>> I don't think that the person that wrote the code for the pooler looked
>>>>>
>>>> at
>>>>
>>>>> the code you mentioned.
>>>>> I looked at it and it looked pretty good, but what would the advantage
>>>>> of
>>>>> using that interface with it's
>>>>> packages be instead of just using the tiny class we have now?
>>>>>
>>>>> Regards Erik
>>>>>
>>>>>
>

Re: HTablePool question

Posted by Ken Weiner <ke...@gumgum.com>.

Hey Jon.  I already started writing a simple patch to make the constructors
private and add the static getters/setters that don't require the client to
deal with the pool instance.  I'll submit it later today once I make sure it
conforms to HBase coding standards.

I'll leave the pool fixed size for now, but I'll add a comment saying that
when the pool is exhausted, new HTable instances will be created for each
request (which is what currently happens now).

I was also going to change the internal map of pools from TreeMap to HashMap
since there was no apparent need to maintain the pools in sorted order.

Is there a reason that tableName is represented as byte[] rather than a
simple String throughout the class?

On Tue, Jul 14, 2009 at 10:15 AM, Jonathan Gray <jl...@streamy.com> wrote:

> It's overkill IMO.
>
> The interface we would probably need is the KeyedPoolableObjectFactory (key
> being the table name), defined as:
>
> public interface KeyedPoolableObjectFactory {
>    Object makeObject(Object key);
>    void activateObject(Object key, Object obj);
>    void passivateObject(Object key, Object obj);
>    boolean validateObject(Object key, Object obj);
>    void destroyObject(Object key, Object obj);
> }
>
>
> We don't have activate/passivate/destroy semantics, questionable whether we
> have a validate.  We really just have get/put.
>
> HTables are not very stateful (the internal read/write buffers for
> scans/puts is the exception).  The purpose of the pooling is not for
> long-running usage but rather a lot of short usages.  Instantiation of an
> HTable is not free from a client perspective (can take 100ms or so, though
> I've not tested this much on 0.20), though there is not really a server-side
> overhead to it.
>
> I think the confusion could be settled by adding static getters/putters of
> HTables that don't require you the intermediate step of getting the pool.
>  Perhaps rather than using a fixed-size queue, we could just dynamically
> grow it; could still have some limit to how big the queue itself could grow,
> spitting out unpooled HTables eventually.
>
> If no one else wants to make the patch I will, let me know.
>
> JG
>
>
>
> stack wrote:
>
>> I'm an ex-user.  Commons pool is great.  IMO, it may be overkill in this
>> case.  You make the call Ken.
>>
>> Thanks,
>> St.Ack
>>
>> On Mon, Jul 13, 2009 at 6:02 PM, Ken Weiner <ke...@gumgum.com> wrote:
>>
>>  I guess the main advantage would be that the interface
>>>
>>>
>>> http://commons.apache.org/pool/apidocs/org/apache/commons/pool/ObjectPool.htmlis
>>> simple and well-understood by a lot of people, and also
>>>
>>>
>>> http://commons.apache.org/pool/apidocs/org/apache/commons/pool/impl/GenericObjectPool.htmlmakes
>>> a lot of things configurable and allows a built-in way to optionally
>>> verify that each HTable is working before handing it out.  If you think
>>> it
>>> is overkill to have more pooling functionality, than I'll stick to making
>>> simple modifications to the current tiny class.  I just thought that
>>> Commons
>>> Pool was a good fit for this functionality in HBase.
>>>
>>> On Mon, Jul 13, 2009 at 5:34 PM, Erik Holstad <er...@gmail.com>
>>> wrote:
>>>
>>>  Hi Ken!
>>>> I don't think that the person that wrote the code for the pooler looked
>>>>
>>> at
>>>
>>>> the code you mentioned.
>>>> I looked at it and it looked pretty good, but what would the advantage
>>>> of
>>>> using that interface with it's
>>>> packages be instead of just using the tiny class we have now?
>>>>
>>>> Regards Erik
>>>>
>>>>
>>

Re: HTablePool question

Posted by Jonathan Gray <jl...@streamy.com>.

It's overkill IMO.

The interface we would probably need is the KeyedPoolableObjectFactory 
(key being the table name), defined as:

public interface KeyedPoolableObjectFactory {
     Object makeObject(Object key);
     void activateObject(Object key, Object obj);
     void passivateObject(Object key, Object obj);
     boolean validateObject(Object key, Object obj);
     void destroyObject(Object key, Object obj);
}

We don't have activate/passivate/destroy semantics, questionable whether 
we have a validate.  We really just have get/put.

HTables are not very stateful (the internal read/write buffers for 
scans/puts is the exception).  The purpose of the pooling is not for 
long-running usage but rather a lot of short usages.  Instantiation of 
an HTable is not free from a client perspective (can take 100ms or so, 
though I've not tested this much on 0.20), though there is not really a 
server-side overhead to it.

I think the confusion could be settled by adding static getters/putters 
of HTables that don't require you the intermediate step of getting the 
pool.  Perhaps rather than using a fixed-size queue, we could just 
dynamically grow it; could still have some limit to how big the queue 
itself could grow, spitting out unpooled HTables eventually.

If no one else wants to make the patch I will, let me know.

JG

stack wrote:
> I'm an ex-user.  Commons pool is great.  IMO, it may be overkill in this
> case.  You make the call Ken.
> 
> Thanks,
> St.Ack
> 
> On Mon, Jul 13, 2009 at 6:02 PM, Ken Weiner <ke...@gumgum.com> wrote:
> 
>> I guess the main advantage would be that the interface
>>
>> http://commons.apache.org/pool/apidocs/org/apache/commons/pool/ObjectPool.htmlis
>> simple and well-understood by a lot of people, and also
>>
>> http://commons.apache.org/pool/apidocs/org/apache/commons/pool/impl/GenericObjectPool.htmlmakes
>> a lot of things configurable and allows a built-in way to optionally
>> verify that each HTable is working before handing it out.  If you think it
>> is overkill to have more pooling functionality, than I'll stick to making
>> simple modifications to the current tiny class.  I just thought that
>> Commons
>> Pool was a good fit for this functionality in HBase.
>>
>> On Mon, Jul 13, 2009 at 5:34 PM, Erik Holstad <er...@gmail.com>
>> wrote:
>>
>>> Hi Ken!
>>> I don't think that the person that wrote the code for the pooler looked
>> at
>>> the code you mentioned.
>>> I looked at it and it looked pretty good, but what would the advantage of
>>> using that interface with it's
>>> packages be instead of just using the tiny class we have now?
>>>
>>> Regards Erik
>>>
>

Re: HTablePool question

Posted by stack <st...@duboce.net>.

I'm an ex-user.  Commons pool is great.  IMO, it may be overkill in this
case.  You make the call Ken.

Thanks,
St.Ack

On Mon, Jul 13, 2009 at 6:02 PM, Ken Weiner <ke...@gumgum.com> wrote:

> I guess the main advantage would be that the interface
>
> http://commons.apache.org/pool/apidocs/org/apache/commons/pool/ObjectPool.htmlis
> simple and well-understood by a lot of people, and also
>
> http://commons.apache.org/pool/apidocs/org/apache/commons/pool/impl/GenericObjectPool.htmlmakes
> a lot of things configurable and allows a built-in way to optionally
> verify that each HTable is working before handing it out.  If you think it
> is overkill to have more pooling functionality, than I'll stick to making
> simple modifications to the current tiny class.  I just thought that
> Commons
> Pool was a good fit for this functionality in HBase.
>
> On Mon, Jul 13, 2009 at 5:34 PM, Erik Holstad <er...@gmail.com>
> wrote:
>
> > Hi Ken!
> > I don't think that the person that wrote the code for the pooler looked
> at
> > the code you mentioned.
> > I looked at it and it looked pretty good, but what would the advantage of
> > using that interface with it's
> > packages be instead of just using the tiny class we have now?
> >
> > Regards Erik
> >
>

Re: HTablePool question

Posted by Ken Weiner <ke...@gumgum.com>.

I guess the main advantage would be that the interface
http://commons.apache.org/pool/apidocs/org/apache/commons/pool/ObjectPool.htmlis
simple and well-understood by a lot of people, and also
http://commons.apache.org/pool/apidocs/org/apache/commons/pool/impl/GenericObjectPool.htmlmakes
a lot of things configurable and allows a built-in way to optionally
verify that each HTable is working before handing it out.  If you think it
is overkill to have more pooling functionality, than I'll stick to making
simple modifications to the current tiny class.  I just thought that Commons
Pool was a good fit for this functionality in HBase.

On Mon, Jul 13, 2009 at 5:34 PM, Erik Holstad <er...@gmail.com> wrote:

> Hi Ken!
> I don't think that the person that wrote the code for the pooler looked at
> the code you mentioned.
> I looked at it and it looked pretty good, but what would the advantage of
> using that interface with it's
> packages be instead of just using the tiny class we have now?
>
> Regards Erik
>

Re: HTablePool question

Posted by Erik Holstad <er...@gmail.com>.

Hi Ken!
I don't think that the person that wrote the code for the pooler looked at
the code you mentioned.
I looked at it and it looked pretty good, but what would the advantage of
using that interface with it's
packages be instead of just using the tiny class we have now?

Regards Erik

Re: HTablePool question

Posted by Ken Weiner <ke...@gumgum.com>.

Thanks for the responses Erik and Andrew.  I may submit a patch to make the
constructor protected and add methods to get/put the HTable based on the
table name, but before I do, I was wondering if you'd consider basing this
class on Apache Commons Pool http://commons.apache.org/pool/ instead.  Let
me know if this is a possibility, and if so, I'll look into writing a new
HTablePool based on it.

On Mon, Jul 13, 2009 at 3:49 PM, Andrew Purtell <ap...@apache.org> wrote:

> The current semantics are to use the factory method HTablePool.getPool()
> to create one pool per table. There are a couple of improvements which
> could be made here:
>   - Use a factory class to make things clearer
>   - Make the HTablePool constructor protected
> Also, this is a very simple class which may not match up with anyone's
> use case. As I recall this is a quickly constructed answer to a
> suggestion made on IRC one day. Patches are always welcome.
>
>   - Andy
>
>
>
>
>
>
> ________________________________
> From: Ken Weiner <ke...@gumgum.com>
> To: hbase-user@hadoop.apache.org
> Sent: Monday, July 13, 2009 12:37:38 PM
> Subject: Re: HTablePool question
>
> I was just looking at the same code and I am wondering what the intended
> usage pattern is.  HTablePool has both a static factory-like method
> HTablePool.getPool() and a public constructor.  Should a client be using
> the
> static method to get pools for each table of interest or should the client
> instantiate a pool per table on its own and manage those references itself?
>
> On Fri, Jul 10, 2009 at 5:39 PM, Erik Holstad <er...@gmail.com>
> wrote:
>
> > Hey Vaibhav!
> > I haven't used HTablePool myself, but we used to have something very
> > similar
> > that we used internally.
> > But just looked at the code and will try to answer your questions.
> >
> >
> > On Thu, Jul 9, 2009 at 6:48 PM, Vaibhav Puranik <vp...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > It looks like HTablePool is designed to have one instance of HTablePool
> > per
> > > table.
> >
> > Yes, it looks that way.
> >
> >
> > >
> > >
> > > I am confused by the static map inside HTablePool class. If we can
> > > instantiate one HTablePool per table, what's the use of the map?
> >
> > The reason I think you are a little bit confused is because the classes
> > mixes a static getter with a non static class, so what you do is that you
> > use the static getPool to get you HTablePool and then you use the non
> > static
> > methods to work on it.
> >
> >
> > >
> > > Furthermore, the map is static and there is no way to add multiple
> tables
> > > to
> > > it.
> >
> > Yes, it is static, but not final, so every time  you ask for a pool for a
> > table that is not in the static map, it creates that key value pair. What
> > you cannot do though is to change the number of HTable instances after
> you
> > called it ones, not sure if you need this, but should be to hard to add
> if
> > you do.
> >
> > >
> > >
> > > Regards,
> > > Vaibhav
> > >
> >
> > Regards Erik
> >
>
>
>
>
>

Re: HTablePool question

Posted by Andrew Purtell <ap...@apache.org>.

The current semantics are to use the factory method HTablePool.getPool()
to create one pool per table. There are a couple of improvements which
could be made here:
   - Use a factory class to make things clearer
   - Make the HTablePool constructor protected
Also, this is a very simple class which may not match up with anyone's
use case. As I recall this is a quickly constructed answer to a
suggestion made on IRC one day. Patches are always welcome.

   - Andy






________________________________
From: Ken Weiner <ke...@gumgum.com>
To: hbase-user@hadoop.apache.org
Sent: Monday, July 13, 2009 12:37:38 PM
Subject: Re: HTablePool question

I was just looking at the same code and I am wondering what the intended
usage pattern is.  HTablePool has both a static factory-like method
HTablePool.getPool() and a public constructor.  Should a client be using the
static method to get pools for each table of interest or should the client
instantiate a pool per table on its own and manage those references itself?

On Fri, Jul 10, 2009 at 5:39 PM, Erik Holstad <er...@gmail.com> wrote:

> Hey Vaibhav!
> I haven't used HTablePool myself, but we used to have something very
> similar
> that we used internally.
> But just looked at the code and will try to answer your questions.
>
>
> On Thu, Jul 9, 2009 at 6:48 PM, Vaibhav Puranik <vp...@gmail.com>
> wrote:
>
> > Hi,
> >
> > It looks like HTablePool is designed to have one instance of HTablePool
> per
> > table.
>
> Yes, it looks that way.
>
>
> >
> >
> > I am confused by the static map inside HTablePool class. If we can
> > instantiate one HTablePool per table, what's the use of the map?
>
> The reason I think you are a little bit confused is because the classes
> mixes a static getter with a non static class, so what you do is that you
> use the static getPool to get you HTablePool and then you use the non
> static
> methods to work on it.
>
>
> >
> > Furthermore, the map is static and there is no way to add multiple tables
> > to
> > it.
>
> Yes, it is static, but not final, so every time  you ask for a pool for a
> table that is not in the static map, it creates that key value pair. What
> you cannot do though is to change the number of HTable instances after you
> called it ones, not sure if you need this, but should be to hard to add if
> you do.
>
> >
> >
> > Regards,
> > Vaibhav
> >
>
> Regards Erik
>

Re: HTablePool question

Posted by Erik Holstad <er...@gmail.com>.

Like said earlier, I haven't used the HTablePool class as it is now written,
but looking at the
code , to me it seems like the intended way to use it is to use the static
method to instantiate and
get the pool for one table and then get single tables from the pool. This is
mostly due to the fact that
there is that static map up to, but I might be wrong, so someone with more
insight please correct me
here.

Regards Erik

Re: HTablePool question

Posted by Ken Weiner <ke...@gumgum.com>.

I was just looking at the same code and I am wondering what the intended
usage pattern is.  HTablePool has both a static factory-like method
HTablePool.getPool() and a public constructor.  Should a client be using the
static method to get pools for each table of interest or should the client
instantiate a pool per table on its own and manage those references itself?

On Fri, Jul 10, 2009 at 5:39 PM, Erik Holstad <er...@gmail.com> wrote:

> Hey Vaibhav!
> I haven't used HTablePool myself, but we used to have something very
> similar
> that we used internally.
> But just looked at the code and will try to answer your questions.
>
>
> On Thu, Jul 9, 2009 at 6:48 PM, Vaibhav Puranik <vp...@gmail.com>
> wrote:
>
> > Hi,
> >
> > It looks like HTablePool is designed to have one instance of HTablePool
> per
> > table.
>
> Yes, it looks that way.
>
>
> >
> >
> > I am confused by the static map inside HTablePool class. If we can
> > instantiate one HTablePool per table, what's the use of the map?
>
> The reason I think you are a little bit confused is because the classes
> mixes a static getter with a non static class, so what you do is that you
> use the static getPool to get you HTablePool and then you use the non
> static
> methods to work on it.
>
>
> >
> > Furthermore, the map is static and there is no way to add multiple tables
> > to
> > it.
>
> Yes, it is static, but not final, so every time  you ask for a pool for a
> table that is not in the static map, it creates that key value pair. What
> you cannot do though is to change the number of HTable instances after you
> called it ones, not sure if you need this, but should be to hard to add if
> you do.
>
> >
> >
> > Regards,
> > Vaibhav
> >
>
> Regards Erik
>

Re: HTablePool question

Posted by Erik Holstad <er...@gmail.com>.

Hey Vaibhav!
I haven't used HTablePool myself, but we used to have something very similar
that we used internally.
But just looked at the code and will try to answer your questions.

On Thu, Jul 9, 2009 at 6:48 PM, Vaibhav Puranik <vp...@gmail.com> wrote:

> Hi,
>
> It looks like HTablePool is designed to have one instance of HTablePool per
> table.

Yes, it looks that way.

>
>
> I am confused by the static map inside HTablePool class. If we can
> instantiate one HTablePool per table, what's the use of the map?

The reason I think you are a little bit confused is because the classes
mixes a static getter with a non static class, so what you do is that you
use the static getPool to get you HTablePool and then you use the non static
methods to work on it.

>
> Furthermore, the map is static and there is no way to add multiple tables
> to
> it.

Yes, it is static, but not final, so every time  you ask for a pool for a
table that is not in the static map, it creates that key value pair. What
you cannot do though is to change the number of HTable instances after you
called it ones, not sure if you need this, but should be to hard to add if
you do.

>
>
> Regards,
> Vaibhav
>

Regards Erik