You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Sujee Maniyam <su...@sujee.net> on 2011/08/24 01:34:32 UTC

question on HTablePool and threads

Hi all,

Right now I have a  java client program that accesses Hbase in multiple
threads for read / write.  Each thread creates its own instance of HTable of
the _same_ table.

I am looking into HTablePool class.  Not clear on if it is a correct/better
choice for accessing the _same_ table with multiple threads.

Is this a valid  / thread-safe ?

create HtablePool in 'main'
pass HTablePool instance to threads
each thread does a  'htablepool.get(table)'
    read / write to table
    'htablepool.put (table) '  when done
(all this is done within a single JVM)

thanks
Sujee Maniyam
http://sujee.net

Re: question on HTablePool and threads

Posted by Joe Pallas <pa...@cs.stanford.edu>.
On Aug 24, 2011, at 11:37 AM, Sujee Maniyam wrote:

> sounds like even I created an HTablePool and shared it among threads (which
> seems safe to do as pointed out here),   I won't see much improvements for
> accessing the SAME table in multiple threads.
> 
> correct?

That depends on how many region servers your table is split across.  The unit of sharing, as I understand it, is the client connection to the region server.  If your threads use keys that fall into the same region, they share a connection.  If your threads use keys that fall into different regions on the same region server, they still share a connection.  If your threads use keys that fall into different regions that are distributed across different region servers, they do not share a connection and you get maximum concurrency.

joe


Re: question on HTablePool and threads

Posted by Sujee Maniyam <su...@sujee.net>.
sounds like even I created an HTablePool and shared it among threads (which
seems safe to do as pointed out here),   I won't see much improvements for
accessing the SAME table in multiple threads.

correct?

http://sujee.net

RE: question on HTablePool and threads

Posted by "Zhong, Andy" <Sh...@searshc.com>.
Ben, other than Hbase version, which license do you use for stubleupon, and is it BSD license? - thanks, Andy 

-----Original Message-----
From: Frédéric Fondement [mailto:frederic.fondement@uha.fr] 
Sent: Friday, August 26, 2011 2:29 AM
To: user@hbase.apache.org
Subject: Re: question on HTablePool and threads

Hi there,

For which versions of HBase is asyncbase working ? Can't find that on the website.

Does it supports external schema changes (new/deleted table/column family by an external thread/process)?
What about HTablePool regarding this last question ?

Is is really enough to use an HTablePool like this in multithreaded environment ?
e.g. (unchecked code follows)
thread1 and thread2:
while (! shouldStop) {
   HTableInterface t = pool.get("sometable");
   try {
     //no need for synchronized(t) here as both threads receive different instances at same time, right ?
     //works with t
   } finally {
     t.close();
   }
}

What to do when application is ended ? Is it mandatory to call closeTablePool ?

It would be great if HTableInterface could extend java.io.Closeable so that one could write
try(t) {
   //work with t
}

Sorry if questions look simple, but I'm not sure just from the Javadoc.

Cheers !

Le 26/08/2011 09:08, Ben Cuthbert a écrit :
> Hi Andy
>
> We are using the stubleupon async client. Very fast and good.
> On 26 Aug 2011, at 04:58, Zhong, Andy wrote:
>
>> Hey Michael,
>> I am also looking to the performance gain to use HTablePool instead 
>> of created HTable using a singleton instance of HBaseConfiguration. 
>> If the use case is for a web service to handle multi-threaded 
>> write/reading from a single Hbase table, are you suggest to use 
>> HTablePool to pre-created a pool of Htable instances?
>>
>> But below two comments concern me:
>> 1. Concern of restarting Hadoop/HBase cluster:
>> http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/13128
>> "The problem with the HTablePool is that it does not "ride over restart"
>> meaning that if you need to restart your cluster, HtablePool will 
>> still be pointing at the old ports and not realize the cluster is back-up."
>> 2. Seems no performance gain, and even worse:
>> http://www.srikanthps.com/2011/06/hbase-benchmarking-for-multi-threaded.
>> html if no of concurrent put<10000.
>> 3. Does anyone uses glisthub: claim a fully asynchronous, 
>> non-blocking, thread-safe, high-performance HBase client (not sure if anyone uses it):
>> http://github.com/stumbleupon/asynchbase
>>
>> What do you think of it, and welcome any one's advices or comments on 
>> this.
>>
>> Thanks,
>> Andy Zhong
>>
>>
>> -----Original Message-----
>> From: Michael Segel [mailto:michael_segel@hotmail.com]
>> Sent: Tuesday, August 23, 2011 10:57 PM
>> To: user@hbase.apache.org
>> Subject: RE: question on HTablePool and threads
>>
>>
>> Sujee,
>>
>> You are correct in creating a separate HTable instance in each thread.
>> (HTable isn't thread safe, but since the scope is within the thread 
>> it
>> works.)
>>
>> You could use the HTablePool class, but I don't think its a better 
>> solution for what you are doing.
>>
>> In your example it sounds like you're creating the connection in each 
>> thread and you're using it for the life of the thread/application.  
>> So there's no real benefit in trying to create a pool of threads and 
>> then request a thread from the pool.
>>
>> JMHO
>>
>> -Mike
>>
>>
>>> From: sujee@sujee.net
>>> Date: Tue, 23 Aug 2011 16:34:32 -0700
>>> Subject: question on HTablePool and threads
>>> To: hbase-user@hadoop.apache.org
>>>
>>> Hi all,
>>>
>>> Right now I have a  java client program that accesses Hbase in 
>>> multiple threads for read / write.  Each thread creates its own 
>>> instance of HTable of the _same_ table.
>>>
>>> I am looking into HTablePool class.  Not clear on if it is a 
>>> correct/better choice for accessing the _same_ table with multiple
>> threads.
>>> Is this a valid  / thread-safe ?
>>>
>>> create HtablePool in 'main'
>>> pass HTablePool instance to threads
>>> each thread does a  'htablepool.get(table)'
>>>     read / write to table
>>>     'htablepool.put (table) '  when done (all this is done within a 
>>> single JVM)
>>>
>>> thanks
>>> Sujee Maniyam
>>> http://sujee.net
>> 		 	   		
>>
>> This message, including any attachments, is the property of Sears Holdings Corporation and/or one of its subsidiaries. It is confidential and may contain proprietary or legally privileged information. If you are not the intended recipient, please delete it without reading the contents. Thank you.

This message, including any attachments, is the property of Sears Holdings Corporation and/or one of its subsidiaries. It is confidential and may contain proprietary or legally privileged information. If you are not the intended recipient, please delete it without reading the contents. Thank you.

Re: question on HTablePool and threads

Posted by Frédéric Fondement <fr...@uha.fr>.
Hi there,

For which versions of HBase is asyncbase working ? Can't find that on 
the website.

Does it supports external schema changes (new/deleted table/column 
family by an external thread/process)?
What about HTablePool regarding this last question ?

Is is really enough to use an HTablePool like this in multithreaded 
environment ?
e.g. (unchecked code follows)
thread1 and thread2:
while (! shouldStop) {
   HTableInterface t = pool.get("sometable");
   try {
     //no need for synchronized(t) here as both threads receive 
different instances at same time, right ?
     //works with t
   } finally {
     t.close();
   }
}

What to do when application is ended ? Is it mandatory to call 
closeTablePool ?

It would be great if HTableInterface could extend java.io.Closeable so 
that one could write
try(t) {
   //work with t
}

Sorry if questions look simple, but I'm not sure just from the Javadoc.

Cheers !

Le 26/08/2011 09:08, Ben Cuthbert a écrit :
> Hi Andy
>
> We are using the stubleupon async client. Very fast and good.
> On 26 Aug 2011, at 04:58, Zhong, Andy wrote:
>
>> Hey Michael,
>> I am also looking to the performance gain to use HTablePool instead of
>> created HTable using a singleton instance of HBaseConfiguration. If the
>> use case is for a web service to handle multi-threaded write/reading
>> from a single Hbase table, are you suggest to use HTablePool to
>> pre-created a pool of Htable instances?
>>
>> But below two comments concern me:
>> 1. Concern of restarting Hadoop/HBase cluster:
>> http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/13128
>> "The problem with the HTablePool is that it does not "ride over restart"
>> meaning that if you need to restart your cluster, HtablePool will still
>> be pointing at the old ports and not realize the cluster is back-up."
>> 2. Seems no performance gain, and even worse:
>> http://www.srikanthps.com/2011/06/hbase-benchmarking-for-multi-threaded.
>> html if no of concurrent put<10000.
>> 3. Does anyone uses glisthub: claim a fully asynchronous, non-blocking,
>> thread-safe, high-performance HBase client (not sure if anyone uses it):
>> http://github.com/stumbleupon/asynchbase
>>
>> What do you think of it, and welcome any one's advices or comments on
>> this.
>>
>> Thanks,
>> Andy Zhong
>>
>>
>> -----Original Message-----
>> From: Michael Segel [mailto:michael_segel@hotmail.com]
>> Sent: Tuesday, August 23, 2011 10:57 PM
>> To: user@hbase.apache.org
>> Subject: RE: question on HTablePool and threads
>>
>>
>> Sujee,
>>
>> You are correct in creating a separate HTable instance in each thread.
>> (HTable isn't thread safe, but since the scope is within the thread it
>> works.)
>>
>> You could use the HTablePool class, but I don't think its a better
>> solution for what you are doing.
>>
>> In your example it sounds like you're creating the connection in each
>> thread and you're using it for the life of the thread/application.  So
>> there's no real benefit in trying to create a pool of threads and then
>> request a thread from the pool.
>>
>> JMHO
>>
>> -Mike
>>
>>
>>> From: sujee@sujee.net
>>> Date: Tue, 23 Aug 2011 16:34:32 -0700
>>> Subject: question on HTablePool and threads
>>> To: hbase-user@hadoop.apache.org
>>>
>>> Hi all,
>>>
>>> Right now I have a  java client program that accesses Hbase in
>>> multiple threads for read / write.  Each thread creates its own
>>> instance of HTable of the _same_ table.
>>>
>>> I am looking into HTablePool class.  Not clear on if it is a
>>> correct/better choice for accessing the _same_ table with multiple
>> threads.
>>> Is this a valid  / thread-safe ?
>>>
>>> create HtablePool in 'main'
>>> pass HTablePool instance to threads
>>> each thread does a  'htablepool.get(table)'
>>>     read / write to table
>>>     'htablepool.put (table) '  when done (all this is done within a
>>> single JVM)
>>>
>>> thanks
>>> Sujee Maniyam
>>> http://sujee.net
>> 		 	   		
>>
>> This message, including any attachments, is the property of Sears Holdings Corporation and/or one of its subsidiaries. It is confidential and may contain proprietary or legally privileged information. If you are not the intended recipient, please delete it without reading the contents. Thank you.


Re: question on HTablePool and threads

Posted by Ben Cuthbert <be...@ymail.com>.
Hi Andy

We are using the stubleupon async client. Very fast and good. 
On 26 Aug 2011, at 04:58, Zhong, Andy wrote:

> Hey Michael,
> I am also looking to the performance gain to use HTablePool instead of
> created HTable using a singleton instance of HBaseConfiguration. If the
> use case is for a web service to handle multi-threaded write/reading
> from a single Hbase table, are you suggest to use HTablePool to
> pre-created a pool of Htable instances? 
> 
> But below two comments concern me:
> 1. Concern of restarting Hadoop/HBase cluster:
> http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/13128
> "The problem with the HTablePool is that it does not "ride over restart"
> meaning that if you need to restart your cluster, HtablePool will still
> be pointing at the old ports and not realize the cluster is back-up."
> 2. Seems no performance gain, and even worse:
> http://www.srikanthps.com/2011/06/hbase-benchmarking-for-multi-threaded.
> html if no of concurrent put <10000.
> 3. Does anyone uses glisthub: claim a fully asynchronous, non-blocking,
> thread-safe, high-performance HBase client (not sure if anyone uses it):
> http://github.com/stumbleupon/asynchbase
> 
> What do you think of it, and welcome any one's advices or comments on
> this.
> 
> Thanks,
> Andy Zhong
> 
> 
> -----Original Message-----
> From: Michael Segel [mailto:michael_segel@hotmail.com] 
> Sent: Tuesday, August 23, 2011 10:57 PM
> To: user@hbase.apache.org
> Subject: RE: question on HTablePool and threads
> 
> 
> Sujee,
> 
> You are correct in creating a separate HTable instance in each thread.
> (HTable isn't thread safe, but since the scope is within the thread it
> works.)
> 
> You could use the HTablePool class, but I don't think its a better
> solution for what you are doing.
> 
> In your example it sounds like you're creating the connection in each
> thread and you're using it for the life of the thread/application.  So
> there's no real benefit in trying to create a pool of threads and then
> request a thread from the pool.
> 
> JMHO
> 
> -Mike
> 
> 
>> From: sujee@sujee.net
>> Date: Tue, 23 Aug 2011 16:34:32 -0700
>> Subject: question on HTablePool and threads
>> To: hbase-user@hadoop.apache.org
>> 
>> Hi all,
>> 
>> Right now I have a  java client program that accesses Hbase in 
>> multiple threads for read / write.  Each thread creates its own 
>> instance of HTable of the _same_ table.
>> 
>> I am looking into HTablePool class.  Not clear on if it is a 
>> correct/better choice for accessing the _same_ table with multiple
> threads.
>> 
>> Is this a valid  / thread-safe ?
>> 
>> create HtablePool in 'main'
>> pass HTablePool instance to threads
>> each thread does a  'htablepool.get(table)'
>>    read / write to table
>>    'htablepool.put (table) '  when done (all this is done within a 
>> single JVM)
>> 
>> thanks
>> Sujee Maniyam
>> http://sujee.net
> 		 	   		  
> 
> This message, including any attachments, is the property of Sears Holdings Corporation and/or one of its subsidiaries. It is confidential and may contain proprietary or legally privileged information. If you are not the intended recipient, please delete it without reading the contents. Thank you.


RE: question on HTablePool and threads

Posted by "Zhong, Andy" <Sh...@searshc.com>.
Hey Michael,
I am also looking to the performance gain to use HTablePool instead of
created HTable using a singleton instance of HBaseConfiguration. If the
use case is for a web service to handle multi-threaded write/reading
from a single Hbase table, are you suggest to use HTablePool to
pre-created a pool of Htable instances? 

But below two comments concern me:
1. Concern of restarting Hadoop/HBase cluster:
http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/13128
"The problem with the HTablePool is that it does not "ride over restart"
meaning that if you need to restart your cluster, HtablePool will still
be pointing at the old ports and not realize the cluster is back-up."
2. Seems no performance gain, and even worse:
http://www.srikanthps.com/2011/06/hbase-benchmarking-for-multi-threaded.
html if no of concurrent put <10000.
3. Does anyone uses glisthub: claim a fully asynchronous, non-blocking,
thread-safe, high-performance HBase client (not sure if anyone uses it):
http://github.com/stumbleupon/asynchbase

What do you think of it, and welcome any one's advices or comments on
this.

Thanks,
Andy Zhong
 

-----Original Message-----
From: Michael Segel [mailto:michael_segel@hotmail.com] 
Sent: Tuesday, August 23, 2011 10:57 PM
To: user@hbase.apache.org
Subject: RE: question on HTablePool and threads


Sujee,

You are correct in creating a separate HTable instance in each thread.
(HTable isn't thread safe, but since the scope is within the thread it
works.)

You could use the HTablePool class, but I don't think its a better
solution for what you are doing.

In your example it sounds like you're creating the connection in each
thread and you're using it for the life of the thread/application.  So
there's no real benefit in trying to create a pool of threads and then
request a thread from the pool.

JMHO

-Mike


> From: sujee@sujee.net
> Date: Tue, 23 Aug 2011 16:34:32 -0700
> Subject: question on HTablePool and threads
> To: hbase-user@hadoop.apache.org
> 
> Hi all,
> 
> Right now I have a  java client program that accesses Hbase in 
> multiple threads for read / write.  Each thread creates its own 
> instance of HTable of the _same_ table.
> 
> I am looking into HTablePool class.  Not clear on if it is a 
> correct/better choice for accessing the _same_ table with multiple
threads.
> 
> Is this a valid  / thread-safe ?
> 
> create HtablePool in 'main'
> pass HTablePool instance to threads
> each thread does a  'htablepool.get(table)'
>     read / write to table
>     'htablepool.put (table) '  when done (all this is done within a 
> single JVM)
> 
> thanks
> Sujee Maniyam
> http://sujee.net
 		 	   		  

This message, including any attachments, is the property of Sears Holdings Corporation and/or one of its subsidiaries. It is confidential and may contain proprietary or legally privileged information. If you are not the intended recipient, please delete it without reading the contents. Thank you.

RE: question on HTablePool and threads

Posted by Michael Segel <mi...@hotmail.com>.
Sujee,

You are correct in creating a separate HTable instance in each thread. (HTable isn't thread safe, but since the scope is within the thread it works.)

You could use the HTablePool class, but I don't think its a better solution for what you are doing.

In your example it sounds like you're creating the connection in each thread and you're using it for the life of the thread/application.  So there's no real benefit in trying to create a pool of threads and then request a thread from the pool.

JMHO

-Mike


> From: sujee@sujee.net
> Date: Tue, 23 Aug 2011 16:34:32 -0700
> Subject: question on HTablePool and threads
> To: hbase-user@hadoop.apache.org
> 
> Hi all,
> 
> Right now I have a  java client program that accesses Hbase in multiple
> threads for read / write.  Each thread creates its own instance of HTable of
> the _same_ table.
> 
> I am looking into HTablePool class.  Not clear on if it is a correct/better
> choice for accessing the _same_ table with multiple threads.
> 
> Is this a valid  / thread-safe ?
> 
> create HtablePool in 'main'
> pass HTablePool instance to threads
> each thread does a  'htablepool.get(table)'
>     read / write to table
>     'htablepool.put (table) '  when done
> (all this is done within a single JVM)
> 
> thanks
> Sujee Maniyam
> http://sujee.net
 		 	   		  

Re: question on HTablePool and threads

Posted by Vaibhav Puranik <vp...@gmail.com>.
Sujee,

HTablePool uses ConcurrentHashMap to store HTable(s). Hence it's thread
safe.
Read the following docs for more information:

http://download.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentHashMap.html

Regards,
Vaibhav

On Tue, Aug 23, 2011 at 4:34 PM, Sujee Maniyam <su...@sujee.net> wrote:

> Hi all,
>
> Right now I have a  java client program that accesses Hbase in multiple
> threads for read / write.  Each thread creates its own instance of HTable
> of
> the _same_ table.
>
> I am looking into HTablePool class.  Not clear on if it is a correct/better
> choice for accessing the _same_ table with multiple threads.
>
> Is this a valid  / thread-safe ?
>
> create HtablePool in 'main'
> pass HTablePool instance to threads
> each thread does a  'htablepool.get(table)'
>    read / write to table
>    'htablepool.put (table) '  when done
> (all this is done within a single JVM)
>
> thanks
> Sujee Maniyam
> http://sujee.net
>