You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by lars <lh...@yahoo.com> on 2011/07/29 06:12:41 UTC

Problem with hbase.client.ipc.pool.type=threadlocal in trunk

Looking at HBaseClient.getConnection(...) I see this:
...
       synchronized (connections) {
         connection = connections.get(remoteId);
         if (connection == null) {
           connection = new Connection(remoteId);
           connections.put(remoteId, connection);
         }
       }
...

At the same time PoolMap.ThreadLocalPool.put is defined like this:
     public R put(R resource) {
       R previousResource = get();
       if (previousResource == null) {
...
         if (poolSize.intValue() >= maxSize) {
           return null;
         }
...
     }

So... If the ThreadLocalPool reaches its capacity it always returns null 
and hence all new threads will create a
new connection every time getConnection is called!

I have also verified with a test program that works fine as long as the 
number of client threads (which include
the threads in HTable's threadpool of course) is < poolsize. Once that 
is no longer the case the number of
connections "explodes" and the program dies with OOMEs (mostly because 
each Connection is associated with
yet another thread).

It's not clear what should happen, though. Maybe (1) the ThreadLocalPool 
should not have a limit, or maybe
(2) allocations past the pool size should throw an exception (i.e. 
there's a hard limit), or maybe (3) in that case
a single connection is returned for all threads while the pool it over 
its limit or (4) we start round robin with the other
connection in the other thread locals.

For #1 means that the number of client threads needs to be more 
carefully managed by the client app.
In this case it would also be somewhat pointless that Connection have 
their own threads, we just pass stuff
between threads.
#2 would work, but puts more logic in the client.
#3 would lead to hard to debug performance issues.
And #4 is messy :)

Any other options?

Maybe for now just do #2 (or #1)?

-- Lars


Re: Problem with hbase.client.ipc.pool.type=threadlocal in trunk

Posted by Shrijeet Paliwal <sh...@rocketfuel.com>.
If you mean Karthick Sankarachary, he does not hang around mailing list much
. Let me go and get him.

On Tue, Aug 2, 2011 at 3:06 PM, Ted Yu <yu...@gmail.com> wrote:

> I created https://issues.apache.org/jira/browse/HBASE-4150
>
> On Tue, Aug 2, 2011 at 3:02 PM, lars hofhansl <lh...@yahoo.com> wrote:
>
> > Yep, the RR pool has a similar issue. Maybe the maximum number of
> > connection should be related to the number of cores on the client.
> > Something like 2 x #cores?
> >
> > For the threadLocal case that I mentioned below it would be hard to find
> a
> > useful hard limit, since it is hard to foresee the number of threads that
> > will be accessing a region (because of the threadpool in HTable).
> >
> > More generally: Should HTable have a threadpool at all? Or should the
> > pooling be happening on the Connection level?
> >
> > Lastly should I create an issue? :)
> >
> > ------------------------------
> > *From:* Ted Yu <yu...@gmail.com>
> > *To:* user@hbase.apache.org
> > *Cc:* lars hofhansl <lh...@yahoo.com>
> > *Sent:* Friday, July 29, 2011 9:52 AM
> >
> > *Subject:* Re: Problem with hbase.client.ipc.pool.type=threadlocal in
> > trunk
> >
> > I am waiting for Karthick to shed some light on this.
> >
> > On Fri, Jul 29, 2011 at 9:47 AM, Stack <st...@duboce.net> wrote:
> >
> > Sounds good.  Issue?
> > St.Ack
> >
> >
> > On Thu, Jul 28, 2011 at 10:46 PM, Ted Yu <yu...@gmail.com> wrote:
> > > For HBaseClient, at least the javadoc doesn't match:
> > >
> > >   * @param config configuration
> > >   * @return either a {@link PoolType#Reusable} or {@link
> > > PoolType#ThreadLocal}
> > >   */
> > >  private static PoolType getPoolType(Configuration config) {
> > >    return
> > > PoolType.valueOf(config.get(HConstants.HBASE_CLIENT_IPC_POOL_TYPE),
> > >        PoolType.RoundRobin, PoolType.ThreadLocal);
> > >
> > > I think for RoundRobinPool, we shouldn't allow maxSize to be
> > > Integer#MAX_VALUE. Otherwise connection explosion described by Lars may
> > > incur.
> > >
> > > Cheers
> > >
> > > On Thu, Jul 28, 2011 at 10:27 PM, lars hofhansl <lh...@yahoo.com>
> > wrote:
> > >
> > >> Yeah I looked at that one too, but I assumed that the rationale was to
> > keep
> > >> creating connectionsuntil the pool is full, which makes
> > >> some sense for a round robin pool. Otherwise it is not clear to me how
> > the
> > >> pool grow at all?
> > >>
> > >>
> > >> (And at least it can be usefully controlled with the pool.size
> > parameter.)
> > >>
> > >>
> > >> Cheers indeed
> > >>
> > >>
> > >> ----- Original Message -----
> > >> From: Ted Yu <yu...@gmail.com>
> > >> To: user@hbase.apache.org
> > >> Cc:
> > >> Sent: Thursday, July 28, 2011 10:07 PM
> > >> Subject: Re: Problem with hbase.client.ipc.pool.type=threadlocal in
> > trunk
> > >>
> > >> Looking at PoolMap, I have some question as well.
> > >> For RoundRobinPool, I don't understand the following method:
> > >>     public R get() {
> > >>       if (size() < maxSize) {
> > >>         return null;
> > >>       }
> > >> Should the above condition be?
> > >>       if (size() <= 0) {
> > >>
> > >> For the issue Lars raised, I prefer solution number 2.
> > >>
> > >> Cheers
> > >>
> > >> On Thu, Jul 28, 2011 at 9:12 PM, lars <lh...@yahoo.com> wrote:
> > >>
> > >> > Looking at HBaseClient.getConnection(...) I see this:
> > >> > ...
> > >> >      synchronized (connections) {
> > >> >        connection = connections.get(remoteId);
> > >> >        if (connection == null) {
> > >> >          connection = new Connection(remoteId);
> > >> >          connections.put(remoteId, connection);
> > >> >        }
> > >> >      }
> > >> > ...
> > >> >
> > >> > At the same time PoolMap.ThreadLocalPool.put is defined like this:
> > >> >    public R put(R resource) {
> > >> >      R previousResource = get();
> > >> >      if (previousResource == null) {
> > >> > ...
> > >> >        if (poolSize.intValue() >= maxSize) {
> > >> >          return null;
> > >> >        }
> > >> > ...
> > >> >    }
> > >> >
> > >> > So... If the ThreadLocalPool reaches its capacity it always returns
> > null
> > >> > and hence all new threads will create a
> > >> > new connection every time getConnection is called!
> > >> >
> > >> > I have also verified with a test program that works fine as long as
> > the
> > >> > number of client threads (which include
> > >> > the threads in HTable's threadpool of course) is < poolsize. Once
> that
> > is
> > >> > no longer the case the number of
> > >> > connections "explodes" and the program dies with OOMEs (mostly
> because
> > >> each
> > >> > Connection is associated with
> > >> > yet another thread).
> > >> >
> > >> > It's not clear what should happen, though. Maybe (1) the
> > ThreadLocalPool
> > >> > should not have a limit, or maybe
> > >> > (2) allocations past the pool size should throw an exception (i.e.
> > >> there's
> > >> > a hard limit), or maybe (3) in that case
> > >> > a single connection is returned for all threads while the pool it
> over
> > >> its
> > >> > limit or (4) we start round robin with the other
> > >> > connection in the other thread locals.
> > >> >
> > >> > For #1 means that the number of client threads needs to be more
> > carefully
> > >> > managed by the client app.
> > >> > In this case it would also be somewhat pointless that Connection
> have
> > >> their
> > >> > own threads, we just pass stuff
> > >> > between threads.
> > >> > #2 would work, but puts more logic in the client.
> > >> > #3 would lead to hard to debug performance issues.
> > >> > And #4 is messy :)
> > >> >
> > >> > Any other options?
> > >> >
> > >> > Maybe for now just do #2 (or #1)?
> > >> >
> > >> > -- Lars
> > >> >
> > >> >
> > >>
> > >>
> > >
> >
> >
> >
> >
> >
>

Re: Problem with hbase.client.ipc.pool.type=threadlocal in trunk

Posted by Ted Yu <yu...@gmail.com>.
I created https://issues.apache.org/jira/browse/HBASE-4150

On Tue, Aug 2, 2011 at 3:02 PM, lars hofhansl <lh...@yahoo.com> wrote:

> Yep, the RR pool has a similar issue. Maybe the maximum number of
> connection should be related to the number of cores on the client.
> Something like 2 x #cores?
>
> For the threadLocal case that I mentioned below it would be hard to find a
> useful hard limit, since it is hard to foresee the number of threads that
> will be accessing a region (because of the threadpool in HTable).
>
> More generally: Should HTable have a threadpool at all? Or should the
> pooling be happening on the Connection level?
>
> Lastly should I create an issue? :)
>
> ------------------------------
> *From:* Ted Yu <yu...@gmail.com>
> *To:* user@hbase.apache.org
> *Cc:* lars hofhansl <lh...@yahoo.com>
> *Sent:* Friday, July 29, 2011 9:52 AM
>
> *Subject:* Re: Problem with hbase.client.ipc.pool.type=threadlocal in
> trunk
>
> I am waiting for Karthick to shed some light on this.
>
> On Fri, Jul 29, 2011 at 9:47 AM, Stack <st...@duboce.net> wrote:
>
> Sounds good.  Issue?
> St.Ack
>
>
> On Thu, Jul 28, 2011 at 10:46 PM, Ted Yu <yu...@gmail.com> wrote:
> > For HBaseClient, at least the javadoc doesn't match:
> >
> >   * @param config configuration
> >   * @return either a {@link PoolType#Reusable} or {@link
> > PoolType#ThreadLocal}
> >   */
> >  private static PoolType getPoolType(Configuration config) {
> >    return
> > PoolType.valueOf(config.get(HConstants.HBASE_CLIENT_IPC_POOL_TYPE),
> >        PoolType.RoundRobin, PoolType.ThreadLocal);
> >
> > I think for RoundRobinPool, we shouldn't allow maxSize to be
> > Integer#MAX_VALUE. Otherwise connection explosion described by Lars may
> > incur.
> >
> > Cheers
> >
> > On Thu, Jul 28, 2011 at 10:27 PM, lars hofhansl <lh...@yahoo.com>
> wrote:
> >
> >> Yeah I looked at that one too, but I assumed that the rationale was to
> keep
> >> creating connectionsuntil the pool is full, which makes
> >> some sense for a round robin pool. Otherwise it is not clear to me how
> the
> >> pool grow at all?
> >>
> >>
> >> (And at least it can be usefully controlled with the pool.size
> parameter.)
> >>
> >>
> >> Cheers indeed
> >>
> >>
> >> ----- Original Message -----
> >> From: Ted Yu <yu...@gmail.com>
> >> To: user@hbase.apache.org
> >> Cc:
> >> Sent: Thursday, July 28, 2011 10:07 PM
> >> Subject: Re: Problem with hbase.client.ipc.pool.type=threadlocal in
> trunk
> >>
> >> Looking at PoolMap, I have some question as well.
> >> For RoundRobinPool, I don't understand the following method:
> >>     public R get() {
> >>       if (size() < maxSize) {
> >>         return null;
> >>       }
> >> Should the above condition be?
> >>       if (size() <= 0) {
> >>
> >> For the issue Lars raised, I prefer solution number 2.
> >>
> >> Cheers
> >>
> >> On Thu, Jul 28, 2011 at 9:12 PM, lars <lh...@yahoo.com> wrote:
> >>
> >> > Looking at HBaseClient.getConnection(...) I see this:
> >> > ...
> >> >      synchronized (connections) {
> >> >        connection = connections.get(remoteId);
> >> >        if (connection == null) {
> >> >          connection = new Connection(remoteId);
> >> >          connections.put(remoteId, connection);
> >> >        }
> >> >      }
> >> > ...
> >> >
> >> > At the same time PoolMap.ThreadLocalPool.put is defined like this:
> >> >    public R put(R resource) {
> >> >      R previousResource = get();
> >> >      if (previousResource == null) {
> >> > ...
> >> >        if (poolSize.intValue() >= maxSize) {
> >> >          return null;
> >> >        }
> >> > ...
> >> >    }
> >> >
> >> > So... If the ThreadLocalPool reaches its capacity it always returns
> null
> >> > and hence all new threads will create a
> >> > new connection every time getConnection is called!
> >> >
> >> > I have also verified with a test program that works fine as long as
> the
> >> > number of client threads (which include
> >> > the threads in HTable's threadpool of course) is < poolsize. Once that
> is
> >> > no longer the case the number of
> >> > connections "explodes" and the program dies with OOMEs (mostly because
> >> each
> >> > Connection is associated with
> >> > yet another thread).
> >> >
> >> > It's not clear what should happen, though. Maybe (1) the
> ThreadLocalPool
> >> > should not have a limit, or maybe
> >> > (2) allocations past the pool size should throw an exception (i.e.
> >> there's
> >> > a hard limit), or maybe (3) in that case
> >> > a single connection is returned for all threads while the pool it over
> >> its
> >> > limit or (4) we start round robin with the other
> >> > connection in the other thread locals.
> >> >
> >> > For #1 means that the number of client threads needs to be more
> carefully
> >> > managed by the client app.
> >> > In this case it would also be somewhat pointless that Connection have
> >> their
> >> > own threads, we just pass stuff
> >> > between threads.
> >> > #2 would work, but puts more logic in the client.
> >> > #3 would lead to hard to debug performance issues.
> >> > And #4 is messy :)
> >> >
> >> > Any other options?
> >> >
> >> > Maybe for now just do #2 (or #1)?
> >> >
> >> > -- Lars
> >> >
> >> >
> >>
> >>
> >
>
>
>
>
>

Re: Problem with hbase.client.ipc.pool.type=threadlocal in trunk

Posted by lars hofhansl <lh...@yahoo.com>.
Yep, the RR pool has a similar issue. Maybe the maximum number of connection should be related to the number of cores on the client.
Something like 2 x #cores?


For the threadLocal case that I mentioned below it would be hard to find a useful hard limit, since it is hard to foresee the number of threads that will be accessing a region (because of the threadpool in HTable).


More generally: Should HTable have a threadpool at all? Or should the pooling be happening on the Connection level?


Lastly should I create an issue? :)



________________________________
From: Ted Yu <yu...@gmail.com>
To: user@hbase.apache.org
Cc: lars hofhansl <lh...@yahoo.com>
Sent: Friday, July 29, 2011 9:52 AM
Subject: Re: Problem with hbase.client.ipc.pool.type=threadlocal in trunk


I am waiting for Karthick to shed some light on this.


On Fri, Jul 29, 2011 at 9:47 AM, Stack <st...@duboce.net> wrote:

Sounds good.  Issue?
>St.Ack
>
>
>On Thu, Jul 28, 2011 at 10:46 PM, Ted Yu <yu...@gmail.com> wrote:
>> For HBaseClient, at least the javadoc doesn't match:
>>
>>   * @param config configuration
>>   * @return either a {@link PoolType#Reusable} or {@link
>> PoolType#ThreadLocal}
>>   */
>>  private static PoolType getPoolType(Configuration config) {
>>    return
>> PoolType.valueOf(config.get(HConstants.HBASE_CLIENT_IPC_POOL_TYPE),
>>        PoolType.RoundRobin, PoolType.ThreadLocal);
>>
>> I think for RoundRobinPool, we shouldn't allow maxSize to be
>> Integer#MAX_VALUE. Otherwise connection explosion described by Lars may
>> incur.
>>
>> Cheers
>>
>> On Thu, Jul 28, 2011 at 10:27 PM, lars hofhansl <lh...@yahoo.com> wrote:
>>
>>> Yeah I looked at that one too, but I assumed that the rationale was to keep
>>> creating connectionsuntil the pool is full, which makes
>>> some sense for a round robin pool. Otherwise it is not clear to me how the
>>> pool grow at all?
>>>
>>>
>>> (And at least it can be usefully controlled with the pool.size parameter.)
>>>
>>>
>>> Cheers indeed
>>>
>>>
>>> ----- Original Message -----
>>> From: Ted Yu <yu...@gmail.com>
>>> To: user@hbase.apache.org
>>> Cc:
>>> Sent: Thursday, July 28, 2011 10:07 PM
>>> Subject: Re: Problem with hbase.client.ipc.pool.type=threadlocal in trunk
>>>
>>> Looking at PoolMap, I have some question as well.
>>> For RoundRobinPool, I don't understand the following method:
>>>     public R get() {
>>>       if (size() < maxSize) {
>>>         return null;
>>>       }
>>> Should the above condition be?
>>>       if (size() <= 0) {
>>>
>>> For the issue Lars raised, I prefer solution number 2.
>>>
>>> Cheers
>>>
>>> On Thu, Jul 28, 2011 at 9:12 PM, lars <lh...@yahoo.com> wrote:
>>>
>>> > Looking at HBaseClient.getConnection(...) I see this:
>>> > ...
>>> >      synchronized (connections) {
>>> >        connection = connections.get(remoteId);
>>> >        if (connection == null) {
>>> >          connection = new Connection(remoteId);
>>> >          connections.put(remoteId, connection);
>>> >        }
>>> >      }
>>> > ...
>>> >
>>> > At the same time PoolMap.ThreadLocalPool.put is defined like this:
>>> >    public R put(R resource) {
>>> >      R previousResource = get();
>>> >      if (previousResource == null) {
>>> > ...
>>> >        if (poolSize.intValue() >= maxSize) {
>>> >          return null;
>>> >        }
>>> > ...
>>> >    }
>>> >
>>> > So... If the ThreadLocalPool reaches its capacity it always returns null
>>> > and hence all new threads will create a
>>> > new connection every time getConnection is called!
>>> >
>>> > I have also verified with a test program that works fine as long as the
>>> > number of client threads (which include
>>> > the threads in HTable's threadpool of course) is < poolsize. Once that is
>>> > no longer the case the number of
>>> > connections "explodes" and the program dies with OOMEs (mostly because
>>> each
>>> > Connection is associated with
>>> > yet another thread).
>>> >
>>> > It's not clear what should happen, though. Maybe (1) the ThreadLocalPool
>>> > should not have a limit, or maybe
>>> > (2) allocations past the pool size should throw an exception (i.e.
>>> there's
>>> > a hard limit), or maybe (3) in that case
>>> > a single connection is returned for all threads while the pool it over
>>> its
>>> > limit or (4) we start round robin with the other
>>> > connection in the other thread locals.
>>> >
>>> > For #1 means that the number of client threads needs to be more carefully
>>> > managed by the client app.
>>> > In this case it would also be somewhat pointless that Connection have
>>> their
>>> > own threads, we just pass stuff
>>> > between threads.
>>> > #2 would work, but puts more logic in the client.
>>> > #3 would lead to hard to debug performance issues.
>>> > And #4 is messy :)
>>> >
>>> > Any other options?
>>> >
>>> > Maybe for now just do #2 (or #1)?
>>> >
>>> > -- Lars
>>> >
>>> >
>>>
>>>
>>
>

Re: Problem with hbase.client.ipc.pool.type=threadlocal in trunk

Posted by Ted Yu <yu...@gmail.com>.
I am waiting for Karthick to shed some light on this.

On Fri, Jul 29, 2011 at 9:47 AM, Stack <st...@duboce.net> wrote:

> Sounds good.  Issue?
> St.Ack
>
> On Thu, Jul 28, 2011 at 10:46 PM, Ted Yu <yu...@gmail.com> wrote:
> > For HBaseClient, at least the javadoc doesn't match:
> >
> >   * @param config configuration
> >   * @return either a {@link PoolType#Reusable} or {@link
> > PoolType#ThreadLocal}
> >   */
> >  private static PoolType getPoolType(Configuration config) {
> >    return
> > PoolType.valueOf(config.get(HConstants.HBASE_CLIENT_IPC_POOL_TYPE),
> >        PoolType.RoundRobin, PoolType.ThreadLocal);
> >
> > I think for RoundRobinPool, we shouldn't allow maxSize to be
> > Integer#MAX_VALUE. Otherwise connection explosion described by Lars may
> > incur.
> >
> > Cheers
> >
> > On Thu, Jul 28, 2011 at 10:27 PM, lars hofhansl <lh...@yahoo.com>
> wrote:
> >
> >> Yeah I looked at that one too, but I assumed that the rationale was to
> keep
> >> creating connectionsuntil the pool is full, which makes
> >> some sense for a round robin pool. Otherwise it is not clear to me how
> the
> >> pool grow at all?
> >>
> >>
> >> (And at least it can be usefully controlled with the pool.size
> parameter.)
> >>
> >>
> >> Cheers indeed
> >>
> >>
> >> ----- Original Message -----
> >> From: Ted Yu <yu...@gmail.com>
> >> To: user@hbase.apache.org
> >> Cc:
> >> Sent: Thursday, July 28, 2011 10:07 PM
> >> Subject: Re: Problem with hbase.client.ipc.pool.type=threadlocal in
> trunk
> >>
> >> Looking at PoolMap, I have some question as well.
> >> For RoundRobinPool, I don't understand the following method:
> >>     public R get() {
> >>       if (size() < maxSize) {
> >>         return null;
> >>       }
> >> Should the above condition be?
> >>       if (size() <= 0) {
> >>
> >> For the issue Lars raised, I prefer solution number 2.
> >>
> >> Cheers
> >>
> >> On Thu, Jul 28, 2011 at 9:12 PM, lars <lh...@yahoo.com> wrote:
> >>
> >> > Looking at HBaseClient.getConnection(...) I see this:
> >> > ...
> >> >      synchronized (connections) {
> >> >        connection = connections.get(remoteId);
> >> >        if (connection == null) {
> >> >          connection = new Connection(remoteId);
> >> >          connections.put(remoteId, connection);
> >> >        }
> >> >      }
> >> > ...
> >> >
> >> > At the same time PoolMap.ThreadLocalPool.put is defined like this:
> >> >    public R put(R resource) {
> >> >      R previousResource = get();
> >> >      if (previousResource == null) {
> >> > ...
> >> >        if (poolSize.intValue() >= maxSize) {
> >> >          return null;
> >> >        }
> >> > ...
> >> >    }
> >> >
> >> > So... If the ThreadLocalPool reaches its capacity it always returns
> null
> >> > and hence all new threads will create a
> >> > new connection every time getConnection is called!
> >> >
> >> > I have also verified with a test program that works fine as long as
> the
> >> > number of client threads (which include
> >> > the threads in HTable's threadpool of course) is < poolsize. Once that
> is
> >> > no longer the case the number of
> >> > connections "explodes" and the program dies with OOMEs (mostly because
> >> each
> >> > Connection is associated with
> >> > yet another thread).
> >> >
> >> > It's not clear what should happen, though. Maybe (1) the
> ThreadLocalPool
> >> > should not have a limit, or maybe
> >> > (2) allocations past the pool size should throw an exception (i.e.
> >> there's
> >> > a hard limit), or maybe (3) in that case
> >> > a single connection is returned for all threads while the pool it over
> >> its
> >> > limit or (4) we start round robin with the other
> >> > connection in the other thread locals.
> >> >
> >> > For #1 means that the number of client threads needs to be more
> carefully
> >> > managed by the client app.
> >> > In this case it would also be somewhat pointless that Connection have
> >> their
> >> > own threads, we just pass stuff
> >> > between threads.
> >> > #2 would work, but puts more logic in the client.
> >> > #3 would lead to hard to debug performance issues.
> >> > And #4 is messy :)
> >> >
> >> > Any other options?
> >> >
> >> > Maybe for now just do #2 (or #1)?
> >> >
> >> > -- Lars
> >> >
> >> >
> >>
> >>
> >
>

Re: Problem with hbase.client.ipc.pool.type=threadlocal in trunk

Posted by Stack <st...@duboce.net>.
Sounds good.  Issue?
St.Ack

On Thu, Jul 28, 2011 at 10:46 PM, Ted Yu <yu...@gmail.com> wrote:
> For HBaseClient, at least the javadoc doesn't match:
>
>   * @param config configuration
>   * @return either a {@link PoolType#Reusable} or {@link
> PoolType#ThreadLocal}
>   */
>  private static PoolType getPoolType(Configuration config) {
>    return
> PoolType.valueOf(config.get(HConstants.HBASE_CLIENT_IPC_POOL_TYPE),
>        PoolType.RoundRobin, PoolType.ThreadLocal);
>
> I think for RoundRobinPool, we shouldn't allow maxSize to be
> Integer#MAX_VALUE. Otherwise connection explosion described by Lars may
> incur.
>
> Cheers
>
> On Thu, Jul 28, 2011 at 10:27 PM, lars hofhansl <lh...@yahoo.com> wrote:
>
>> Yeah I looked at that one too, but I assumed that the rationale was to keep
>> creating connectionsuntil the pool is full, which makes
>> some sense for a round robin pool. Otherwise it is not clear to me how the
>> pool grow at all?
>>
>>
>> (And at least it can be usefully controlled with the pool.size parameter.)
>>
>>
>> Cheers indeed
>>
>>
>> ----- Original Message -----
>> From: Ted Yu <yu...@gmail.com>
>> To: user@hbase.apache.org
>> Cc:
>> Sent: Thursday, July 28, 2011 10:07 PM
>> Subject: Re: Problem with hbase.client.ipc.pool.type=threadlocal in trunk
>>
>> Looking at PoolMap, I have some question as well.
>> For RoundRobinPool, I don't understand the following method:
>>     public R get() {
>>       if (size() < maxSize) {
>>         return null;
>>       }
>> Should the above condition be?
>>       if (size() <= 0) {
>>
>> For the issue Lars raised, I prefer solution number 2.
>>
>> Cheers
>>
>> On Thu, Jul 28, 2011 at 9:12 PM, lars <lh...@yahoo.com> wrote:
>>
>> > Looking at HBaseClient.getConnection(...) I see this:
>> > ...
>> >      synchronized (connections) {
>> >        connection = connections.get(remoteId);
>> >        if (connection == null) {
>> >          connection = new Connection(remoteId);
>> >          connections.put(remoteId, connection);
>> >        }
>> >      }
>> > ...
>> >
>> > At the same time PoolMap.ThreadLocalPool.put is defined like this:
>> >    public R put(R resource) {
>> >      R previousResource = get();
>> >      if (previousResource == null) {
>> > ...
>> >        if (poolSize.intValue() >= maxSize) {
>> >          return null;
>> >        }
>> > ...
>> >    }
>> >
>> > So... If the ThreadLocalPool reaches its capacity it always returns null
>> > and hence all new threads will create a
>> > new connection every time getConnection is called!
>> >
>> > I have also verified with a test program that works fine as long as the
>> > number of client threads (which include
>> > the threads in HTable's threadpool of course) is < poolsize. Once that is
>> > no longer the case the number of
>> > connections "explodes" and the program dies with OOMEs (mostly because
>> each
>> > Connection is associated with
>> > yet another thread).
>> >
>> > It's not clear what should happen, though. Maybe (1) the ThreadLocalPool
>> > should not have a limit, or maybe
>> > (2) allocations past the pool size should throw an exception (i.e.
>> there's
>> > a hard limit), or maybe (3) in that case
>> > a single connection is returned for all threads while the pool it over
>> its
>> > limit or (4) we start round robin with the other
>> > connection in the other thread locals.
>> >
>> > For #1 means that the number of client threads needs to be more carefully
>> > managed by the client app.
>> > In this case it would also be somewhat pointless that Connection have
>> their
>> > own threads, we just pass stuff
>> > between threads.
>> > #2 would work, but puts more logic in the client.
>> > #3 would lead to hard to debug performance issues.
>> > And #4 is messy :)
>> >
>> > Any other options?
>> >
>> > Maybe for now just do #2 (or #1)?
>> >
>> > -- Lars
>> >
>> >
>>
>>
>

Re: Problem with hbase.client.ipc.pool.type=threadlocal in trunk

Posted by Ted Yu <yu...@gmail.com>.
For HBaseClient, at least the javadoc doesn't match:

   * @param config configuration
   * @return either a {@link PoolType#Reusable} or {@link
PoolType#ThreadLocal}
   */
  private static PoolType getPoolType(Configuration config) {
    return
PoolType.valueOf(config.get(HConstants.HBASE_CLIENT_IPC_POOL_TYPE),
        PoolType.RoundRobin, PoolType.ThreadLocal);

I think for RoundRobinPool, we shouldn't allow maxSize to be
Integer#MAX_VALUE. Otherwise connection explosion described by Lars may
incur.

Cheers

On Thu, Jul 28, 2011 at 10:27 PM, lars hofhansl <lh...@yahoo.com> wrote:

> Yeah I looked at that one too, but I assumed that the rationale was to keep
> creating connectionsuntil the pool is full, which makes
> some sense for a round robin pool. Otherwise it is not clear to me how the
> pool grow at all?
>
>
> (And at least it can be usefully controlled with the pool.size parameter.)
>
>
> Cheers indeed
>
>
> ----- Original Message -----
> From: Ted Yu <yu...@gmail.com>
> To: user@hbase.apache.org
> Cc:
> Sent: Thursday, July 28, 2011 10:07 PM
> Subject: Re: Problem with hbase.client.ipc.pool.type=threadlocal in trunk
>
> Looking at PoolMap, I have some question as well.
> For RoundRobinPool, I don't understand the following method:
>     public R get() {
>       if (size() < maxSize) {
>         return null;
>       }
> Should the above condition be?
>       if (size() <= 0) {
>
> For the issue Lars raised, I prefer solution number 2.
>
> Cheers
>
> On Thu, Jul 28, 2011 at 9:12 PM, lars <lh...@yahoo.com> wrote:
>
> > Looking at HBaseClient.getConnection(...) I see this:
> > ...
> >      synchronized (connections) {
> >        connection = connections.get(remoteId);
> >        if (connection == null) {
> >          connection = new Connection(remoteId);
> >          connections.put(remoteId, connection);
> >        }
> >      }
> > ...
> >
> > At the same time PoolMap.ThreadLocalPool.put is defined like this:
> >    public R put(R resource) {
> >      R previousResource = get();
> >      if (previousResource == null) {
> > ...
> >        if (poolSize.intValue() >= maxSize) {
> >          return null;
> >        }
> > ...
> >    }
> >
> > So... If the ThreadLocalPool reaches its capacity it always returns null
> > and hence all new threads will create a
> > new connection every time getConnection is called!
> >
> > I have also verified with a test program that works fine as long as the
> > number of client threads (which include
> > the threads in HTable's threadpool of course) is < poolsize. Once that is
> > no longer the case the number of
> > connections "explodes" and the program dies with OOMEs (mostly because
> each
> > Connection is associated with
> > yet another thread).
> >
> > It's not clear what should happen, though. Maybe (1) the ThreadLocalPool
> > should not have a limit, or maybe
> > (2) allocations past the pool size should throw an exception (i.e.
> there's
> > a hard limit), or maybe (3) in that case
> > a single connection is returned for all threads while the pool it over
> its
> > limit or (4) we start round robin with the other
> > connection in the other thread locals.
> >
> > For #1 means that the number of client threads needs to be more carefully
> > managed by the client app.
> > In this case it would also be somewhat pointless that Connection have
> their
> > own threads, we just pass stuff
> > between threads.
> > #2 would work, but puts more logic in the client.
> > #3 would lead to hard to debug performance issues.
> > And #4 is messy :)
> >
> > Any other options?
> >
> > Maybe for now just do #2 (or #1)?
> >
> > -- Lars
> >
> >
>
>

Re: Problem with hbase.client.ipc.pool.type=threadlocal in trunk

Posted by lars hofhansl <lh...@yahoo.com>.
Yeah I looked at that one too, but I assumed that the rationale was to keep creating connectionsuntil the pool is full, which makes
some sense for a round robin pool. Otherwise it is not clear to me how the pool grow at all?


(And at least it can be usefully controlled with the pool.size parameter.)


Cheers indeed


----- Original Message -----
From: Ted Yu <yu...@gmail.com>
To: user@hbase.apache.org
Cc: 
Sent: Thursday, July 28, 2011 10:07 PM
Subject: Re: Problem with hbase.client.ipc.pool.type=threadlocal in trunk

Looking at PoolMap, I have some question as well.
For RoundRobinPool, I don't understand the following method:
    public R get() {
      if (size() < maxSize) {
        return null;
      }
Should the above condition be?
      if (size() <= 0) {

For the issue Lars raised, I prefer solution number 2.

Cheers

On Thu, Jul 28, 2011 at 9:12 PM, lars <lh...@yahoo.com> wrote:

> Looking at HBaseClient.getConnection(...) I see this:
> ...
>      synchronized (connections) {
>        connection = connections.get(remoteId);
>        if (connection == null) {
>          connection = new Connection(remoteId);
>          connections.put(remoteId, connection);
>        }
>      }
> ...
>
> At the same time PoolMap.ThreadLocalPool.put is defined like this:
>    public R put(R resource) {
>      R previousResource = get();
>      if (previousResource == null) {
> ...
>        if (poolSize.intValue() >= maxSize) {
>          return null;
>        }
> ...
>    }
>
> So... If the ThreadLocalPool reaches its capacity it always returns null
> and hence all new threads will create a
> new connection every time getConnection is called!
>
> I have also verified with a test program that works fine as long as the
> number of client threads (which include
> the threads in HTable's threadpool of course) is < poolsize. Once that is
> no longer the case the number of
> connections "explodes" and the program dies with OOMEs (mostly because each
> Connection is associated with
> yet another thread).
>
> It's not clear what should happen, though. Maybe (1) the ThreadLocalPool
> should not have a limit, or maybe
> (2) allocations past the pool size should throw an exception (i.e. there's
> a hard limit), or maybe (3) in that case
> a single connection is returned for all threads while the pool it over its
> limit or (4) we start round robin with the other
> connection in the other thread locals.
>
> For #1 means that the number of client threads needs to be more carefully
> managed by the client app.
> In this case it would also be somewhat pointless that Connection have their
> own threads, we just pass stuff
> between threads.
> #2 would work, but puts more logic in the client.
> #3 would lead to hard to debug performance issues.
> And #4 is messy :)
>
> Any other options?
>
> Maybe for now just do #2 (or #1)?
>
> -- Lars
>
>


Re: Problem with hbase.client.ipc.pool.type=threadlocal in trunk

Posted by Ted Yu <yu...@gmail.com>.
Looking at PoolMap, I have some question as well.
For RoundRobinPool, I don't understand the following method:
    public R get() {
      if (size() < maxSize) {
        return null;
      }
Should the above condition be?
      if (size() <= 0) {

For the issue Lars raised, I prefer solution number 2.

Cheers

On Thu, Jul 28, 2011 at 9:12 PM, lars <lh...@yahoo.com> wrote:

> Looking at HBaseClient.getConnection(...) I see this:
> ...
>      synchronized (connections) {
>        connection = connections.get(remoteId);
>        if (connection == null) {
>          connection = new Connection(remoteId);
>          connections.put(remoteId, connection);
>        }
>      }
> ...
>
> At the same time PoolMap.ThreadLocalPool.put is defined like this:
>    public R put(R resource) {
>      R previousResource = get();
>      if (previousResource == null) {
> ...
>        if (poolSize.intValue() >= maxSize) {
>          return null;
>        }
> ...
>    }
>
> So... If the ThreadLocalPool reaches its capacity it always returns null
> and hence all new threads will create a
> new connection every time getConnection is called!
>
> I have also verified with a test program that works fine as long as the
> number of client threads (which include
> the threads in HTable's threadpool of course) is < poolsize. Once that is
> no longer the case the number of
> connections "explodes" and the program dies with OOMEs (mostly because each
> Connection is associated with
> yet another thread).
>
> It's not clear what should happen, though. Maybe (1) the ThreadLocalPool
> should not have a limit, or maybe
> (2) allocations past the pool size should throw an exception (i.e. there's
> a hard limit), or maybe (3) in that case
> a single connection is returned for all threads while the pool it over its
> limit or (4) we start round robin with the other
> connection in the other thread locals.
>
> For #1 means that the number of client threads needs to be more carefully
> managed by the client app.
> In this case it would also be somewhat pointless that Connection have their
> own threads, we just pass stuff
> between threads.
> #2 would work, but puts more logic in the client.
> #3 would lead to hard to debug performance issues.
> And #4 is messy :)
>
> Any other options?
>
> Maybe for now just do #2 (or #1)?
>
> -- Lars
>
>