You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Michael Dagaev <mi...@gmail.com> on 2008/10/13 11:54:16 UTC

Questions on client API

Hi All

    As I understand, the HTable class uses HConnectionManager class,
which holds connections to the master and region servers.
The connections are pooled as entries in a thread-safe static table
(map). Thus, a client application should not care about connection
pooling. Is it correct?

   May several threads share the same instance of HbaseConfiguration ? HTable?

Thank you for your cooperation,
M.

Re: Questions on client API

Posted by stack <st...@duboce.net>.
Michael Dagaev wrote:
>    May several threads share the same instance of HbaseConfiguration ? HTable?
>
>   
If you can stomach it -- the commentary wanders -- see HBASE-576 
starting at about "stack - 03/Oct/08 11:45 PM".  It might be of interest 
if you are trying to write a multithreaded hbase client.
St.Ack

RE: Questions on client API

Posted by "Jim Kellerman (POWERSET)" <Ji...@microsoft.com>.
With respect to Configuration,
all gets go through:
  private synchronized Properties getProps()

all sets go through:
  private synchronized Properties getOverlay()
  private synchronized Properties getProps()

and all addResource calls go through:
  private synchronized void addResource(ArrayList<Object> resources, Object resource)

So I think Configuration is thread safe.

I think that sharing scanners across threads is probably not a good idea
in the first place, but you are correct that the cache is not synchronized.

---
Jim Kellerman, Powerset (Live Search, Microsoft Corporation)


> -----Original Message-----
> From: Andrew Purtell [mailto:apurtell@yahoo.com]
> Sent: Monday, October 13, 2008 11:14 AM
> To: hbase-user@hadoop.apache.org
> Subject: RE: Questions on client API
>
> Configuration uses unsynchronized lists and hash set. On the other hand if
> it is used in a read-only manner after initialization that would be ok.
>
> I don't think sharing a scanner across threads would be ok because of the
> cache of RowResults as an unsynchronized linked list. Otherwise HTable
> looks ok to me.
>
> Am I being overly conservative?
>
> --- On Mon, 10/13/08, Jim Kellerman (POWERSET)
> <Ji...@microsoft.com> wrote:
>
> > From: Jim Kellerman (POWERSET) <Ji...@microsoft.com>
> > Subject: RE: Questions on client API
> > To: "hbase-user@hadoop.apache.org" <hb...@hadoop.apache.org>
> > Date: Monday, October 13, 2008, 10:48 AM
> > Andrew,
> >
> > What methods in HBaseConfiguration and HTable do you think
> > are not re-entrant?
> >
> > ---
> > Jim Kellerman, Powerset (Live Search, Microsoft
> > Corporation)
> >
> >
> > > -----Original Message-----
> > > From: Michael Dagaev [mailto:michael.dagaev@gmail.com]
> > > Sent: Monday, October 13, 2008 10:24 AM
> > > To: hbase-user@hadoop.apache.org; apurtell@apache.org
> > > Subject: Re: Questions on client API
> > >
> > > Hi Andrew
> > >
> > >      Hmmm ...I would not like to instantiate
> > HBaseConfiguration per
> > > thread. I would prefer to create it once per
> > application so many
> > > threads will use it concurrently.
> > >
> > > Thank you for pointing out this issue. I will check
> > the code.
> > > M.
> > >
> > > On Mon, Oct 13, 2008 at 6:54 PM, Andrew Purtell
> > <ap...@yahoo.com>
> > > wrote:
> > > > Hello Michael,
> > > >
> > > > Your understanding regarding connection pooling
> > is correct.
> > > >
> > > > Looking at the code, I see that some methods of
> > > > HBaseConfiguration and HTable are not fully
> > reentrant, so I
> > > > would not share them across multiple threads, or
> > at least I
> > > > would explicitly synchronize access to them.
> > > >
> > > >    - Andy
> > > >
> > > >
> > > >> From: Michael Dagaev
> > <mi...@gmail.com>
> > > >> Subject: Questions on client API
> > > >> To: hbase-user@hadoop.apache.org
> > > >> Date: Monday, October 13, 2008, 2:54 AM
> > > >> Hi All
> > > >>
> > > >>     As I understand, the HTable class uses
> > > >> HConnectionManager class, which holds
> > connections to the
> > > >> master and region servers. The connections
> > are pooled as
> > > >> entries in a thread-safe static table (map).
> > Thus, a
> > > >> client application should not care about
> > connection
> > > >> pooling. Is it correct?
> > > >>
> > > >>    May several threads share the same
> > instance of
> > > >> HbaseConfiguration ? HTable?
> > > >>
> > > >> Thank you for your cooperation,
> > > >> M.
> > > >
> > > >
> > > >
> > > >
> > > >
>
>
>


RE: Questions on client API

Posted by Andrew Purtell <ap...@yahoo.com>.
Configuration uses unsynchronized lists and hash set. On the other hand if it is used in a read-only manner after initialization that would be ok. 

I don't think sharing a scanner across threads would be ok because of the cache of RowResults as an unsynchronized linked list. Otherwise HTable looks ok to me. 

Am I being overly conservative?

--- On Mon, 10/13/08, Jim Kellerman (POWERSET) <Ji...@microsoft.com> wrote:

> From: Jim Kellerman (POWERSET) <Ji...@microsoft.com>
> Subject: RE: Questions on client API
> To: "hbase-user@hadoop.apache.org" <hb...@hadoop.apache.org>
> Date: Monday, October 13, 2008, 10:48 AM
> Andrew,
> 
> What methods in HBaseConfiguration and HTable do you think
> are not re-entrant?
> 
> ---
> Jim Kellerman, Powerset (Live Search, Microsoft
> Corporation)
> 
> 
> > -----Original Message-----
> > From: Michael Dagaev [mailto:michael.dagaev@gmail.com]
> > Sent: Monday, October 13, 2008 10:24 AM
> > To: hbase-user@hadoop.apache.org; apurtell@apache.org
> > Subject: Re: Questions on client API
> >
> > Hi Andrew
> >
> >      Hmmm ...I would not like to instantiate
> HBaseConfiguration per
> > thread. I would prefer to create it once per
> application so many
> > threads will use it concurrently.
> >
> > Thank you for pointing out this issue. I will check
> the code.
> > M.
> >
> > On Mon, Oct 13, 2008 at 6:54 PM, Andrew Purtell
> <ap...@yahoo.com>
> > wrote:
> > > Hello Michael,
> > >
> > > Your understanding regarding connection pooling
> is correct.
> > >
> > > Looking at the code, I see that some methods of
> > > HBaseConfiguration and HTable are not fully
> reentrant, so I
> > > would not share them across multiple threads, or
> at least I
> > > would explicitly synchronize access to them.
> > >
> > >    - Andy
> > >
> > >
> > >> From: Michael Dagaev
> <mi...@gmail.com>
> > >> Subject: Questions on client API
> > >> To: hbase-user@hadoop.apache.org
> > >> Date: Monday, October 13, 2008, 2:54 AM
> > >> Hi All
> > >>
> > >>     As I understand, the HTable class uses
> > >> HConnectionManager class, which holds
> connections to the
> > >> master and region servers. The connections
> are pooled as
> > >> entries in a thread-safe static table (map).
> Thus, a
> > >> client application should not care about
> connection
> > >> pooling. Is it correct?
> > >>
> > >>    May several threads share the same
> instance of
> > >> HbaseConfiguration ? HTable?
> > >>
> > >> Thank you for your cooperation,
> > >> M.
> > >
> > >
> > >
> > >
> > >


      

RE: Questions on client API

Posted by "Jim Kellerman (POWERSET)" <Ji...@microsoft.com>.
Andrew,

What methods in HBaseConfiguration and HTable do you think
are not re-entrant?

---
Jim Kellerman, Powerset (Live Search, Microsoft Corporation)


> -----Original Message-----
> From: Michael Dagaev [mailto:michael.dagaev@gmail.com]
> Sent: Monday, October 13, 2008 10:24 AM
> To: hbase-user@hadoop.apache.org; apurtell@apache.org
> Subject: Re: Questions on client API
>
> Hi Andrew
>
>      Hmmm ...I would not like to instantiate HBaseConfiguration per
> thread. I would prefer to create it once per application so many
> threads will use it concurrently.
>
> Thank you for pointing out this issue. I will check the code.
> M.
>
> On Mon, Oct 13, 2008 at 6:54 PM, Andrew Purtell <ap...@yahoo.com>
> wrote:
> > Hello Michael,
> >
> > Your understanding regarding connection pooling is correct.
> >
> > Looking at the code, I see that some methods of
> > HBaseConfiguration and HTable are not fully reentrant, so I
> > would not share them across multiple threads, or at least I
> > would explicitly synchronize access to them.
> >
> >    - Andy
> >
> >
> >> From: Michael Dagaev <mi...@gmail.com>
> >> Subject: Questions on client API
> >> To: hbase-user@hadoop.apache.org
> >> Date: Monday, October 13, 2008, 2:54 AM
> >> Hi All
> >>
> >>     As I understand, the HTable class uses
> >> HConnectionManager class, which holds connections to the
> >> master and region servers. The connections are pooled as
> >> entries in a thread-safe static table (map). Thus, a
> >> client application should not care about connection
> >> pooling. Is it correct?
> >>
> >>    May several threads share the same instance of
> >> HbaseConfiguration ? HTable?
> >>
> >> Thank you for your cooperation,
> >> M.
> >
> >
> >
> >
> >


Re: Questions on client API

Posted by Michael Dagaev <mi...@gmail.com>.
Hi Andrew

     Hmmm ...I would not like to instantiate HBaseConfiguration per
thread. I would prefer to create it once per application so many
threads will use it concurrently.

Thank you for pointing out this issue. I will check the code.
M.

On Mon, Oct 13, 2008 at 6:54 PM, Andrew Purtell <ap...@yahoo.com> wrote:
> Hello Michael,
>
> Your understanding regarding connection pooling is correct.
>
> Looking at the code, I see that some methods of
> HBaseConfiguration and HTable are not fully reentrant, so I
> would not share them across multiple threads, or at least I
> would explicitly synchronize access to them.
>
>    - Andy
>
>
>> From: Michael Dagaev <mi...@gmail.com>
>> Subject: Questions on client API
>> To: hbase-user@hadoop.apache.org
>> Date: Monday, October 13, 2008, 2:54 AM
>> Hi All
>>
>>     As I understand, the HTable class uses
>> HConnectionManager class, which holds connections to the
>> master and region servers. The connections are pooled as
>> entries in a thread-safe static table (map). Thus, a
>> client application should not care about connection
>> pooling. Is it correct?
>>
>>    May several threads share the same instance of
>> HbaseConfiguration ? HTable?
>>
>> Thank you for your cooperation,
>> M.
>
>
>
>
>

Re: Questions on client API

Posted by Andrew Purtell <ap...@yahoo.com>.
Hello Michael,

Your understanding regarding connection pooling is correct.

Looking at the code, I see that some methods of 
HBaseConfiguration and HTable are not fully reentrant, so I
would not share them across multiple threads, or at least I
would explicitly synchronize access to them.

    - Andy


> From: Michael Dagaev <mi...@gmail.com>
> Subject: Questions on client API
> To: hbase-user@hadoop.apache.org
> Date: Monday, October 13, 2008, 2:54 AM
> Hi All
> 
>     As I understand, the HTable class uses
> HConnectionManager class, which holds connections to the
> master and region servers. The connections are pooled as
> entries in a thread-safe static table (map). Thus, a
> client application should not care about connection
> pooling. Is it correct?
> 
>    May several threads share the same instance of
> HbaseConfiguration ? HTable?
> 
> Thank you for your cooperation,
> M.