You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Joe Pallas <pa...@cs.stanford.edu> on 2011/03/30 00:49:31 UTC

HTable and threads

Trying to understand why out test program was generating so many threads (HBase 0.90.0), I discover that every time we instantiate HTable we get a new thread pool (ThreadPoolExecutor).  This seems a bit odd.  HTable is not thread safe, so every thread needs to have its own HTable, but every HTable creates its own thread pool, and the threads are not shared by the different HTables.  (Isn't that what would make sense if the HTables are on the same table?)

Is this the way things are supposed to work?  What should I be doing to avoid creating lots of threads, which seem to hang around for a long time even though the HTables are discarded.  (I'm seeing client GC happen, but threads don't seem to decrease.) There are some ThreadPoolExecutor parameters that seem like they could be relevant.

I found HBASE-3553, which seems related, but only in so far as the fix looks like it might make this problem worse. 

So: what should a multithreaded client do to be safe and not generate more threads than are needed?

Thanks.
joe


Re: HTable and threads

Posted by Stack <st...@duboce.net>.
Deprecating in favor of HBaseAdmin would work.  Would you mind making
a patch David?
St.Ack

On Tue, Mar 29, 2011 at 6:05 PM, Buttler, David <bu...@llnl.gov> wrote:
> Thanks.
>
> I agree HBaseAdmin is probably the way to go.  I guess what was unexpected about this was that the static method HTable.isTableEnabled(tableName) really creates a configuration object under the hood and uses that configuration object to manage the connections.  Maybe this method should be deprecated, and instead point people to the HBaseAdmin (or at least the method with a configuration as a parameter)?
>
> Dave
>
> -----Original Message-----
> From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans
> Sent: Tuesday, March 29, 2011 6:01 PM
> To: user@hbase.apache.org
> Subject: Re: HTable and threads
>
>> Do the static methods on HTable (like isTableEnabled), also have this problem? From the code it looks like if you naively call the static method without a Configuration object it will create a configuration and put it into a HashMap where it will live around forever.
>
> Good point then the better to do is to use HBaseAdmin.isTableEnabled.
>
>> This really bit me recently.  Using the HTablePool doesn't really solve this because for some reason there is no meta operations on the HTableInterface object itself -- I can't ask if it is enabled.  Is there any particular reason that this method only lives on an HConnectionManager?
>
> I think my previous answer will help you out.
>
> J-D
>

RE: HTable and threads

Posted by "Buttler, David" <bu...@llnl.gov>.
Thanks.

I agree HBaseAdmin is probably the way to go.  I guess what was unexpected about this was that the static method HTable.isTableEnabled(tableName) really creates a configuration object under the hood and uses that configuration object to manage the connections.  Maybe this method should be deprecated, and instead point people to the HBaseAdmin (or at least the method with a configuration as a parameter)?

Dave

-----Original Message-----
From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans
Sent: Tuesday, March 29, 2011 6:01 PM
To: user@hbase.apache.org
Subject: Re: HTable and threads

> Do the static methods on HTable (like isTableEnabled), also have this problem? From the code it looks like if you naively call the static method without a Configuration object it will create a configuration and put it into a HashMap where it will live around forever.

Good point then the better to do is to use HBaseAdmin.isTableEnabled.

> This really bit me recently.  Using the HTablePool doesn't really solve this because for some reason there is no meta operations on the HTableInterface object itself -- I can't ask if it is enabled.  Is there any particular reason that this method only lives on an HConnectionManager?

I think my previous answer will help you out.

J-D

Re: HTable and threads

Posted by Jean-Daniel Cryans <jd...@apache.org>.
> Do the static methods on HTable (like isTableEnabled), also have this problem? From the code it looks like if you naively call the static method without a Configuration object it will create a configuration and put it into a HashMap where it will live around forever.

Good point then the better to do is to use HBaseAdmin.isTableEnabled.

> This really bit me recently.  Using the HTablePool doesn't really solve this because for some reason there is no meta operations on the HTableInterface object itself -- I can't ask if it is enabled.  Is there any particular reason that this method only lives on an HConnectionManager?

I think my previous answer will help you out.

J-D

RE: HTable and threads

Posted by "Buttler, David" <bu...@llnl.gov>.
Do the static methods on HTable (like isTableEnabled), also have this problem? From the code it looks like if you naively call the static method without a Configuration object it will create a configuration and put it into a HashMap where it will live around forever.

This really bit me recently.  Using the HTablePool doesn't really solve this because for some reason there is no meta operations on the HTableInterface object itself -- I can't ask if it is enabled.  Is there any particular reason that this method only lives on an HConnectionManager?

Dave


-----Original Message-----
From: Ted Yu [mailto:yuzhihong@gmail.com] 
Sent: Tuesday, March 29, 2011 4:00 PM
To: user@hbase.apache.org
Cc: Jean-Daniel Cryans
Subject: Re: HTable and threads

See https://issues.apache.org/jira/browse/HBASE-3712

On Tue, Mar 29, 2011 at 3:58 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> Yeah after flushing the remaining edits.
>
> On Tue, Mar 29, 2011 at 3:56 PM, Ted Yu <yu...@gmail.com> wrote:
> > Are you suggesting that the thread pool be shutdown in this method ?
> >  public void close() throws IOException {
> >    flushCommits();
> >  }
> >
> >
> > On Tue, Mar 29, 2011 at 3:49 PM, Joe Pallas <pa...@cs.stanford.edu>
> wrote:
> >
> >> Trying to understand why out test program was generating so many threads
> >> (HBase 0.90.0), I discover that every time we instantiate HTable we get
> a
> >> new thread pool (ThreadPoolExecutor).  This seems a bit odd.  HTable is
> not
> >> thread safe, so every thread needs to have its own HTable, but every
> HTable
> >> creates its own thread pool, and the threads are not shared by the
> different
> >> HTables.  (Isn't that what would make sense if the HTables are on the
> same
> >> table?)
> >>
> >> Is this the way things are supposed to work?  What should I be doing to
> >> avoid creating lots of threads, which seem to hang around for a long
> time
> >> even though the HTables are discarded.  (I'm seeing client GC happen,
> but
> >> threads don't seem to decrease.) There are some ThreadPoolExecutor
> >> parameters that seem like they could be relevant.
> >>
> >> I found HBASE-3553, which seems related, but only in so far as the fix
> >> looks like it might make this problem worse.
> >>
> >> So: what should a multithreaded client do to be safe and not generate
> more
> >> threads than are needed?
> >>
> >> Thanks.
> >> joe
> >>
> >>
> >
>

Re: HTable and threads

Posted by Ted Yu <yu...@gmail.com>.
See https://issues.apache.org/jira/browse/HBASE-3712

On Tue, Mar 29, 2011 at 3:58 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> Yeah after flushing the remaining edits.
>
> On Tue, Mar 29, 2011 at 3:56 PM, Ted Yu <yu...@gmail.com> wrote:
> > Are you suggesting that the thread pool be shutdown in this method ?
> >  public void close() throws IOException {
> >    flushCommits();
> >  }
> >
> >
> > On Tue, Mar 29, 2011 at 3:49 PM, Joe Pallas <pa...@cs.stanford.edu>
> wrote:
> >
> >> Trying to understand why out test program was generating so many threads
> >> (HBase 0.90.0), I discover that every time we instantiate HTable we get
> a
> >> new thread pool (ThreadPoolExecutor).  This seems a bit odd.  HTable is
> not
> >> thread safe, so every thread needs to have its own HTable, but every
> HTable
> >> creates its own thread pool, and the threads are not shared by the
> different
> >> HTables.  (Isn't that what would make sense if the HTables are on the
> same
> >> table?)
> >>
> >> Is this the way things are supposed to work?  What should I be doing to
> >> avoid creating lots of threads, which seem to hang around for a long
> time
> >> even though the HTables are discarded.  (I'm seeing client GC happen,
> but
> >> threads don't seem to decrease.) There are some ThreadPoolExecutor
> >> parameters that seem like they could be relevant.
> >>
> >> I found HBASE-3553, which seems related, but only in so far as the fix
> >> looks like it might make this problem worse.
> >>
> >> So: what should a multithreaded client do to be safe and not generate
> more
> >> threads than are needed?
> >>
> >> Thanks.
> >> joe
> >>
> >>
> >
>

Re: HTable and threads

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Yeah after flushing the remaining edits.

On Tue, Mar 29, 2011 at 3:56 PM, Ted Yu <yu...@gmail.com> wrote:
> Are you suggesting that the thread pool be shutdown in this method ?
>  public void close() throws IOException {
>    flushCommits();
>  }
>
>
> On Tue, Mar 29, 2011 at 3:49 PM, Joe Pallas <pa...@cs.stanford.edu> wrote:
>
>> Trying to understand why out test program was generating so many threads
>> (HBase 0.90.0), I discover that every time we instantiate HTable we get a
>> new thread pool (ThreadPoolExecutor).  This seems a bit odd.  HTable is not
>> thread safe, so every thread needs to have its own HTable, but every HTable
>> creates its own thread pool, and the threads are not shared by the different
>> HTables.  (Isn't that what would make sense if the HTables are on the same
>> table?)
>>
>> Is this the way things are supposed to work?  What should I be doing to
>> avoid creating lots of threads, which seem to hang around for a long time
>> even though the HTables are discarded.  (I'm seeing client GC happen, but
>> threads don't seem to decrease.) There are some ThreadPoolExecutor
>> parameters that seem like they could be relevant.
>>
>> I found HBASE-3553, which seems related, but only in so far as the fix
>> looks like it might make this problem worse.
>>
>> So: what should a multithreaded client do to be safe and not generate more
>> threads than are needed?
>>
>> Thanks.
>> joe
>>
>>
>

Re: HTable and threads

Posted by Ted Yu <yu...@gmail.com>.
Are you suggesting that the thread pool be shutdown in this method ?
  public void close() throws IOException {
    flushCommits();
  }


On Tue, Mar 29, 2011 at 3:49 PM, Joe Pallas <pa...@cs.stanford.edu> wrote:

> Trying to understand why out test program was generating so many threads
> (HBase 0.90.0), I discover that every time we instantiate HTable we get a
> new thread pool (ThreadPoolExecutor).  This seems a bit odd.  HTable is not
> thread safe, so every thread needs to have its own HTable, but every HTable
> creates its own thread pool, and the threads are not shared by the different
> HTables.  (Isn't that what would make sense if the HTables are on the same
> table?)
>
> Is this the way things are supposed to work?  What should I be doing to
> avoid creating lots of threads, which seem to hang around for a long time
> even though the HTables are discarded.  (I'm seeing client GC happen, but
> threads don't seem to decrease.) There are some ThreadPoolExecutor
> parameters that seem like they could be relevant.
>
> I found HBASE-3553, which seems related, but only in so far as the fix
> looks like it might make this problem worse.
>
> So: what should a multithreaded client do to be safe and not generate more
> threads than are needed?
>
> Thanks.
> joe
>
>

Re: HTable and threads

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Hey Joe,

That TPE is used to do batch operations from a single HTable, but
those pools cannot be shared the way the code works right now. If you
don't need batch operations, you can set hbase.htable.threads.max to
1.

It seems that when you call htable.close it doesn't close the TPE,
which is a bug IMO. Can you open a jira?

A multi-threaded client should use HTablePool.

J-D

On Tue, Mar 29, 2011 at 3:49 PM, Joe Pallas <pa...@cs.stanford.edu> wrote:
> Trying to understand why out test program was generating so many threads (HBase 0.90.0), I discover that every time we instantiate HTable we get a new thread pool (ThreadPoolExecutor).  This seems a bit odd.  HTable is not thread safe, so every thread needs to have its own HTable, but every HTable creates its own thread pool, and the threads are not shared by the different HTables.  (Isn't that what would make sense if the HTables are on the same table?)
>
> Is this the way things are supposed to work?  What should I be doing to avoid creating lots of threads, which seem to hang around for a long time even though the HTables are discarded.  (I'm seeing client GC happen, but threads don't seem to decrease.) There are some ThreadPoolExecutor parameters that seem like they could be relevant.
>
> I found HBASE-3553, which seems related, but only in so far as the fix looks like it might make this problem worse.
>
> So: what should a multithreaded client do to be safe and not generate more threads than are needed?
>
> Thanks.
> joe
>
>

Re: HTable and threads

Posted by tsuna <ts...@gmail.com>.
On Tue, Mar 29, 2011 at 3:49 PM, Joe Pallas <pa...@cs.stanford.edu> wrote:
> So: what should a multithreaded client do to be safe and not generate more threads than are needed?

Consider trying asynchbase instead, if you didn't write too much code
tied to the HTable stuff.  https://github.com/stumbleupon/asynchbase

asynchbase only creates 2*N threads, where N is the number of hardware
threads on the machine.  So on a 4 core machine, it'll create 8
threads.  You only need one HBaseClient instance per HBase cluster you
wanna interact with, regardless of how many tables you're gonna use.
asynchbase is built from the ground up to be thread-safe and highly
scalable, especially for applications with high write throughput
requirements.  I originally wrote it for OpenTSDB
(http://opentsdb.net) and I saw very significant throughput
improvements after switching from HTable to asynchbase.

-- 
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com