You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Han Liu <ha...@andrew.cmu.edu> on 2010/08/09 19:46:39 UTC

Server-side write buffer configuration

Hi Guys,

I know on the client side of HBase there's a configuration" hbase.client.write.buffer". I wonder if there's a similar configuration on the region server side that i can tweak to adjust performance?  

Also as of right now I have managed to insert 15gb data to a 6-regionserver HBase database in roughly 26 minutes using the "table.put(Put p)" schema. Generally is this a decent performance? 

Any advice would be appreciated. Thanks a lot in advance. 
--
Han Liu
SCS & HCI Institute
Undergrad. Class of 2012 
Carnegie Mellon University





Re: Server-side write buffer configuration

Posted by Jean-Daniel Cryans <jd...@apache.org>.
By multiple JVMs I mean multiple HBase clients, it's hard to get more
basic than that. Were you planning on doing that 100TB upload from a
single client? If so, you should revise your plans. Do split the input
between many inserting processes, use MapReduce if you can as it will
do that for you, and you will leverage the parallelism offered by
HBase/Hadoop.

For the rest, remember that google is your friend.

J-D

On Tue, Aug 10, 2010 at 10:42 PM, Han Liu <ha...@andrew.cmu.edu> wrote:
> Hi J-D,
>
> Can you explain a bit more about multiple JVMs? For example how to use it in the case of HBase clients? Or maybe point me to a reference on such topics since I am not really an expert on Java. :p
>
> Thanks again for your reply.
>
> Han
>
>
> On Aug 9, 2010, at 4:13 PM, Jean-Daniel Cryans wrote:
>
>>> I see. 0.89 is the still a developer release and I hear that it is not stable. But it sounds really tempting because it boosts performance by a lot. Can I trust it if my final goal is to insert about 100TB of data? What could be the possible issues? Also when shall I expect to see a stable release?
>>
>> The problem will be your 6 machines, not the software. And if you need
>> to insert that much data, please use the bulk uploader as it will be
>> much faster: http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html
>>
>>>
>>> So in order to do multiple-client insertion I basically just need to create multiple HTable objects to handle the insertions? Do I need to do multi-threading manually?
>>
>> By multiple clients I mean multiple JVMs, you can also do
>> multi-threading inside each client with their own HTable (since it's
>> not thread-safe). But please use the bulk loader.
>>
>>>
>>> And finally a possibly stupid question: how do I check what is the total number of regions?
>>
>> Check the "status" command in the shell, or checkout the master web UI
>> on port 60030 like it says here
>> http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#runandconfirm
>>
>> J-D
>>
>
> --
> Han Liu
> SCS & HCI Institute
> Undergrad. Class of 2012
> Carnegie Mellon University
>
>
>
>
>

Re: Server-side write buffer configuration

Posted by Han Liu <ha...@andrew.cmu.edu>.
Hi J-D,

Can you explain a bit more about multiple JVMs? For example how to use it in the case of HBase clients? Or maybe point me to a reference on such topics since I am not really an expert on Java. :p

Thanks again for your reply. 

Han


On Aug 9, 2010, at 4:13 PM, Jean-Daniel Cryans wrote:

>> I see. 0.89 is the still a developer release and I hear that it is not stable. But it sounds really tempting because it boosts performance by a lot. Can I trust it if my final goal is to insert about 100TB of data? What could be the possible issues? Also when shall I expect to see a stable release?
> 
> The problem will be your 6 machines, not the software. And if you need
> to insert that much data, please use the bulk uploader as it will be
> much faster: http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html
> 
>> 
>> So in order to do multiple-client insertion I basically just need to create multiple HTable objects to handle the insertions? Do I need to do multi-threading manually?
> 
> By multiple clients I mean multiple JVMs, you can also do
> multi-threading inside each client with their own HTable (since it's
> not thread-safe). But please use the bulk loader.
> 
>> 
>> And finally a possibly stupid question: how do I check what is the total number of regions?
> 
> Check the "status" command in the shell, or checkout the master web UI
> on port 60030 like it says here
> http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#runandconfirm
> 
> J-D
> 

--
Han Liu
SCS & HCI Institute
Undergrad. Class of 2012 
Carnegie Mellon University





Re: Server-side write buffer configuration

Posted by Jean-Daniel Cryans <jd...@apache.org>.
> I see. 0.89 is the still a developer release and I hear that it is not stable. But it sounds really tempting because it boosts performance by a lot. Can I trust it if my final goal is to insert about 100TB of data? What could be the possible issues? Also when shall I expect to see a stable release?

The problem will be your 6 machines, not the software. And if you need
to insert that much data, please use the bulk uploader as it will be
much faster: http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html

>
> So in order to do multiple-client insertion I basically just need to create multiple HTable objects to handle the insertions? Do I need to do multi-threading manually?

By multiple clients I mean multiple JVMs, you can also do
multi-threading inside each client with their own HTable (since it's
not thread-safe). But please use the bulk loader.

>
> And finally a possibly stupid question: how do I check what is the total number of regions?

Check the "status" command in the shell, or checkout the master web UI
on port 60030 like it says here
http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#runandconfirm

J-D

Re: Server-side write buffer configuration

Posted by Han Liu <ha...@andrew.cmu.edu>.
I see. 0.89 is the still a developer release and I hear that it is not stable. But it sounds really tempting because it boosts performance by a lot. Can I trust it if my final goal is to insert about 100TB of data? What could be the possible issues? Also when shall I expect to see a stable release? 

So in order to do multiple-client insertion I basically just need to create multiple HTable objects to handle the insertions? Do I need to do multi-threading manually? 

And finally a possibly stupid question: how do I check what is the total number of regions? 

Thanks. :)


On Aug 9, 2010, at 3:51 PM, Jean-Daniel Cryans wrote:

> HBASE-2066 was committed and it will be automatically in function when
> using the write buffer starting with version 0.89, eg this contains it
> http://hbase.apache.org/docs/r0.89.20100621/
> 
> Using more than 1 clients is basically starting more of them, the same
> way you started the first one. Your input data can then be split
> between the clients, using either MapReduce or your homegrown solution
> (we imported the stumbles with a MR job).
> 
> J-D
> 
> On Mon, Aug 9, 2010 at 12:44 PM, Han Liu <ha...@andrew.cmu.edu> wrote:
>> Thanks for your reply J-D!
>> 
>> Could you explain more about the HBase-2066 schema? For example how did you do the first 3 steps described on that page?
>> 
>> Also is there any documentation that describes the multiple-client in HBase?
>> 
>> 
>> On Aug 9, 2010, at 2:37 PM, Jean-Daniel Cryans wrote:
>> 
>>> That's pretty powerful machines, I would expect more performance. You
>>> could try using the same settings that we do here, checkout ryan's
>>> presentation, page 16:
>>> http://people.apache.org/~jdcryans/HUG8/HUG8-rawson.pdf
>>> 
>>> Google "IO wait" to learn about it.
>>> 
>>> Multi-clients will be faster unless you are already maxing out the
>>> machines (betting 100$ you're not), it's like asking if doing parallel
>>> processing will be faster than sequential processing.
>>> 
>>> J-D
>>> 
>>> On Mon, Aug 9, 2010 at 11:21 AM, Han Liu <ha...@andrew.cmu.edu> wrote:
>>>> Thanks for the reply J-D.
>>>> In-lined.  :)
>>>> 
>>>> 
>>>> On Aug 9, 2010, at 1:57 PM, Jean-Daniel Cryans wrote:
>>>> 
>>>>> Hard to tell if it's decent performance. How do you define "decent"?
>>>> I consider it descent if it is roughly the best performance one can get using my schema on my machines
>>>>> What kind of hardware are we talking about?
>>>> One machine for HBase master and 6 regionservers. Specs of each of these machines:
>>>> 16 GB Ram
>>>> 4 1TB 7200RPM SATA Drives
>>>> 10 Gb Network: 1x Qlogic QLE3142-Cu-CK
>>>> CPU: 2x quad-core E5440 (2.83GHz, 12MB L2 cache, 1333 MHz FSB)
>>>>> Which version are you
>>>>> using?
>>>> 0.20.4
>>>>> How much memory was given to HBase?
>>>> 6 GB
>>>>> 
>>>>> Also did you set the write buffer on the client side on HTable?
>>>> Yes I set it to be "1024*1024*12" bytes
>>>>> Did
>>>>> you also turn off auto-flushing?
>>>> Yes it's turned off
>>>>> Do you monitor your cluster? If so,
>>>>> do you see lots if IO wait?
>>>> I didn't.. What do IO waits indicate?
>>>>> 
>>>>> And finally, do you use a single client or multiple ones?
>>>>> 
>>>> Single client. Will multiple client boost performance?
>>>>> :)
>>>>> 
>>>> :) :)
>>>> 
>>>> Thanks a lot.
>>>>> J-D
>>>>> 
>>>>> On Mon, Aug 9, 2010 at 10:46 AM, Han Liu <ha...@andrew.cmu.edu> wrote:
>>>>>> Hi Guys,
>>>>>> 
>>>>>> I know on the client side of HBase there's a configuration" hbase.client.write.buffer". I wonder if there's a similar configuration on the region server side that i can tweak to adjust performance?
>>>>>> 
>>>>>> Also as of right now I have managed to insert 15gb data to a 6-regionserver HBase database in roughly 26 minutes using the "table.put(Put p)" schema. Generally is this a decent performance?
>>>>>> 
>>>>>> Any advice would be appreciated. Thanks a lot in advance.
>>>>>> --
>>>>>> Han Liu
>>>>>> SCS & HCI Institute
>>>>>> Undergrad. Class of 2012
>>>>>> Carnegie Mellon University
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> --
>>>> Han Liu
>>>> SCS & HCI Institute
>>>> Undergrad. Class of 2012
>>>> Carnegie Mellon University
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 
>> --
>> Han Liu
>> SCS & HCI Institute
>> Undergrad. Class of 2012
>> Carnegie Mellon University
>> 
>> 
>> 
>> 
>> 
> 

--
Han Liu
SCS & HCI Institute
Undergrad. Class of 2012 
Carnegie Mellon University





Re: Server-side write buffer configuration

Posted by Jean-Daniel Cryans <jd...@apache.org>.
HBASE-2066 was committed and it will be automatically in function when
using the write buffer starting with version 0.89, eg this contains it
http://hbase.apache.org/docs/r0.89.20100621/

Using more than 1 clients is basically starting more of them, the same
way you started the first one. Your input data can then be split
between the clients, using either MapReduce or your homegrown solution
(we imported the stumbles with a MR job).

J-D

On Mon, Aug 9, 2010 at 12:44 PM, Han Liu <ha...@andrew.cmu.edu> wrote:
> Thanks for your reply J-D!
>
> Could you explain more about the HBase-2066 schema? For example how did you do the first 3 steps described on that page?
>
> Also is there any documentation that describes the multiple-client in HBase?
>
>
> On Aug 9, 2010, at 2:37 PM, Jean-Daniel Cryans wrote:
>
>> That's pretty powerful machines, I would expect more performance. You
>> could try using the same settings that we do here, checkout ryan's
>> presentation, page 16:
>> http://people.apache.org/~jdcryans/HUG8/HUG8-rawson.pdf
>>
>> Google "IO wait" to learn about it.
>>
>> Multi-clients will be faster unless you are already maxing out the
>> machines (betting 100$ you're not), it's like asking if doing parallel
>> processing will be faster than sequential processing.
>>
>> J-D
>>
>> On Mon, Aug 9, 2010 at 11:21 AM, Han Liu <ha...@andrew.cmu.edu> wrote:
>>> Thanks for the reply J-D.
>>> In-lined.  :)
>>>
>>>
>>> On Aug 9, 2010, at 1:57 PM, Jean-Daniel Cryans wrote:
>>>
>>>> Hard to tell if it's decent performance. How do you define "decent"?
>>> I consider it descent if it is roughly the best performance one can get using my schema on my machines
>>>> What kind of hardware are we talking about?
>>> One machine for HBase master and 6 regionservers. Specs of each of these machines:
>>> 16 GB Ram
>>> 4 1TB 7200RPM SATA Drives
>>> 10 Gb Network: 1x Qlogic QLE3142-Cu-CK
>>> CPU: 2x quad-core E5440 (2.83GHz, 12MB L2 cache, 1333 MHz FSB)
>>>> Which version are you
>>>> using?
>>> 0.20.4
>>>> How much memory was given to HBase?
>>> 6 GB
>>>>
>>>> Also did you set the write buffer on the client side on HTable?
>>> Yes I set it to be "1024*1024*12" bytes
>>>> Did
>>>> you also turn off auto-flushing?
>>> Yes it's turned off
>>>> Do you monitor your cluster? If so,
>>>> do you see lots if IO wait?
>>> I didn't.. What do IO waits indicate?
>>>>
>>>> And finally, do you use a single client or multiple ones?
>>>>
>>> Single client. Will multiple client boost performance?
>>>> :)
>>>>
>>> :) :)
>>>
>>> Thanks a lot.
>>>> J-D
>>>>
>>>> On Mon, Aug 9, 2010 at 10:46 AM, Han Liu <ha...@andrew.cmu.edu> wrote:
>>>>> Hi Guys,
>>>>>
>>>>> I know on the client side of HBase there's a configuration" hbase.client.write.buffer". I wonder if there's a similar configuration on the region server side that i can tweak to adjust performance?
>>>>>
>>>>> Also as of right now I have managed to insert 15gb data to a 6-regionserver HBase database in roughly 26 minutes using the "table.put(Put p)" schema. Generally is this a decent performance?
>>>>>
>>>>> Any advice would be appreciated. Thanks a lot in advance.
>>>>> --
>>>>> Han Liu
>>>>> SCS & HCI Institute
>>>>> Undergrad. Class of 2012
>>>>> Carnegie Mellon University
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> --
>>> Han Liu
>>> SCS & HCI Institute
>>> Undergrad. Class of 2012
>>> Carnegie Mellon University
>>>
>>>
>>>
>>>
>>>
>>
>
> --
> Han Liu
> SCS & HCI Institute
> Undergrad. Class of 2012
> Carnegie Mellon University
>
>
>
>
>

Re: Server-side write buffer configuration

Posted by Han Liu <ha...@andrew.cmu.edu>.
Thanks for your reply J-D!

Could you explain more about the HBase-2066 schema? For example how did you do the first 3 steps described on that page? 

Also is there any documentation that describes the multiple-client in HBase? 


On Aug 9, 2010, at 2:37 PM, Jean-Daniel Cryans wrote:

> That's pretty powerful machines, I would expect more performance. You
> could try using the same settings that we do here, checkout ryan's
> presentation, page 16:
> http://people.apache.org/~jdcryans/HUG8/HUG8-rawson.pdf
> 
> Google "IO wait" to learn about it.
> 
> Multi-clients will be faster unless you are already maxing out the
> machines (betting 100$ you're not), it's like asking if doing parallel
> processing will be faster than sequential processing.
> 
> J-D
> 
> On Mon, Aug 9, 2010 at 11:21 AM, Han Liu <ha...@andrew.cmu.edu> wrote:
>> Thanks for the reply J-D.
>> In-lined.  :)
>> 
>> 
>> On Aug 9, 2010, at 1:57 PM, Jean-Daniel Cryans wrote:
>> 
>>> Hard to tell if it's decent performance. How do you define "decent"?
>> I consider it descent if it is roughly the best performance one can get using my schema on my machines
>>> What kind of hardware are we talking about?
>> One machine for HBase master and 6 regionservers. Specs of each of these machines:
>> 16 GB Ram
>> 4 1TB 7200RPM SATA Drives
>> 10 Gb Network: 1x Qlogic QLE3142-Cu-CK
>> CPU: 2x quad-core E5440 (2.83GHz, 12MB L2 cache, 1333 MHz FSB)
>>> Which version are you
>>> using?
>> 0.20.4
>>> How much memory was given to HBase?
>> 6 GB
>>> 
>>> Also did you set the write buffer on the client side on HTable?
>> Yes I set it to be "1024*1024*12" bytes
>>> Did
>>> you also turn off auto-flushing?
>> Yes it's turned off
>>> Do you monitor your cluster? If so,
>>> do you see lots if IO wait?
>> I didn't.. What do IO waits indicate?
>>> 
>>> And finally, do you use a single client or multiple ones?
>>> 
>> Single client. Will multiple client boost performance?
>>> :)
>>> 
>> :) :)
>> 
>> Thanks a lot.
>>> J-D
>>> 
>>> On Mon, Aug 9, 2010 at 10:46 AM, Han Liu <ha...@andrew.cmu.edu> wrote:
>>>> Hi Guys,
>>>> 
>>>> I know on the client side of HBase there's a configuration" hbase.client.write.buffer". I wonder if there's a similar configuration on the region server side that i can tweak to adjust performance?
>>>> 
>>>> Also as of right now I have managed to insert 15gb data to a 6-regionserver HBase database in roughly 26 minutes using the "table.put(Put p)" schema. Generally is this a decent performance?
>>>> 
>>>> Any advice would be appreciated. Thanks a lot in advance.
>>>> --
>>>> Han Liu
>>>> SCS & HCI Institute
>>>> Undergrad. Class of 2012
>>>> Carnegie Mellon University
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 
>> --
>> Han Liu
>> SCS & HCI Institute
>> Undergrad. Class of 2012
>> Carnegie Mellon University
>> 
>> 
>> 
>> 
>> 
> 

--
Han Liu
SCS & HCI Institute
Undergrad. Class of 2012 
Carnegie Mellon University





Re: Server-side write buffer configuration

Posted by Jean-Daniel Cryans <jd...@apache.org>.
That's pretty powerful machines, I would expect more performance. You
could try using the same settings that we do here, checkout ryan's
presentation, page 16:
http://people.apache.org/~jdcryans/HUG8/HUG8-rawson.pdf

Google "IO wait" to learn about it.

Multi-clients will be faster unless you are already maxing out the
machines (betting 100$ you're not), it's like asking if doing parallel
processing will be faster than sequential processing.

J-D

On Mon, Aug 9, 2010 at 11:21 AM, Han Liu <ha...@andrew.cmu.edu> wrote:
> Thanks for the reply J-D.
> In-lined.  :)
>
>
> On Aug 9, 2010, at 1:57 PM, Jean-Daniel Cryans wrote:
>
>> Hard to tell if it's decent performance. How do you define "decent"?
> I consider it descent if it is roughly the best performance one can get using my schema on my machines
>> What kind of hardware are we talking about?
> One machine for HBase master and 6 regionservers. Specs of each of these machines:
> 16 GB Ram
> 4 1TB 7200RPM SATA Drives
> 10 Gb Network: 1x Qlogic QLE3142-Cu-CK
> CPU: 2x quad-core E5440 (2.83GHz, 12MB L2 cache, 1333 MHz FSB)
>> Which version are you
>> using?
> 0.20.4
>> How much memory was given to HBase?
> 6 GB
>>
>> Also did you set the write buffer on the client side on HTable?
> Yes I set it to be "1024*1024*12" bytes
>> Did
>> you also turn off auto-flushing?
> Yes it's turned off
>> Do you monitor your cluster? If so,
>> do you see lots if IO wait?
> I didn't.. What do IO waits indicate?
>>
>> And finally, do you use a single client or multiple ones?
>>
> Single client. Will multiple client boost performance?
>> :)
>>
> :) :)
>
> Thanks a lot.
>> J-D
>>
>> On Mon, Aug 9, 2010 at 10:46 AM, Han Liu <ha...@andrew.cmu.edu> wrote:
>>> Hi Guys,
>>>
>>> I know on the client side of HBase there's a configuration" hbase.client.write.buffer". I wonder if there's a similar configuration on the region server side that i can tweak to adjust performance?
>>>
>>> Also as of right now I have managed to insert 15gb data to a 6-regionserver HBase database in roughly 26 minutes using the "table.put(Put p)" schema. Generally is this a decent performance?
>>>
>>> Any advice would be appreciated. Thanks a lot in advance.
>>> --
>>> Han Liu
>>> SCS & HCI Institute
>>> Undergrad. Class of 2012
>>> Carnegie Mellon University
>>>
>>>
>>>
>>>
>>>
>>
>
> --
> Han Liu
> SCS & HCI Institute
> Undergrad. Class of 2012
> Carnegie Mellon University
>
>
>
>
>

Re: Server-side write buffer configuration

Posted by Han Liu <ha...@andrew.cmu.edu>.
Thanks for the reply J-D. 
In-lined.  :)


On Aug 9, 2010, at 1:57 PM, Jean-Daniel Cryans wrote:

> Hard to tell if it's decent performance. How do you define "decent"?
I consider it descent if it is roughly the best performance one can get using my schema on my machines 
> What kind of hardware are we talking about?
One machine for HBase master and 6 regionservers. Specs of each of these machines:
16 GB Ram
4 1TB 7200RPM SATA Drives
10 Gb Network: 1x Qlogic QLE3142-Cu-CK
CPU: 2x quad-core E5440 (2.83GHz, 12MB L2 cache, 1333 MHz FSB)
> Which version are you
> using?
0.20.4
> How much memory was given to HBase?
6 GB
> 
> Also did you set the write buffer on the client side on HTable?
Yes I set it to be "1024*1024*12" bytes
> Did
> you also turn off auto-flushing?
Yes it's turned off
> Do you monitor your cluster? If so,
> do you see lots if IO wait?
I didn't.. What do IO waits indicate?
> 
> And finally, do you use a single client or multiple ones?
> 
Single client. Will multiple client boost performance? 
> :)
> 
:) :) 

Thanks a lot. 
> J-D
> 
> On Mon, Aug 9, 2010 at 10:46 AM, Han Liu <ha...@andrew.cmu.edu> wrote:
>> Hi Guys,
>> 
>> I know on the client side of HBase there's a configuration" hbase.client.write.buffer". I wonder if there's a similar configuration on the region server side that i can tweak to adjust performance?
>> 
>> Also as of right now I have managed to insert 15gb data to a 6-regionserver HBase database in roughly 26 minutes using the "table.put(Put p)" schema. Generally is this a decent performance?
>> 
>> Any advice would be appreciated. Thanks a lot in advance.
>> --
>> Han Liu
>> SCS & HCI Institute
>> Undergrad. Class of 2012
>> Carnegie Mellon University
>> 
>> 
>> 
>> 
>> 
> 

--
Han Liu
SCS & HCI Institute
Undergrad. Class of 2012 
Carnegie Mellon University





Re: Server-side write buffer configuration

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Hard to tell if it's decent performance. How do you define "decent"?
What kind of hardware are we talking about? Which version are you
using? How much memory was given to HBase?

Also did you set the write buffer on the client side on HTable? Did
you also turn off auto-flushing? Do you monitor your cluster? If so,
do you see lots if IO wait?

And finally, do you use a single client or multiple ones?

:)

J-D

On Mon, Aug 9, 2010 at 10:46 AM, Han Liu <ha...@andrew.cmu.edu> wrote:
> Hi Guys,
>
> I know on the client side of HBase there's a configuration" hbase.client.write.buffer". I wonder if there's a similar configuration on the region server side that i can tweak to adjust performance?
>
> Also as of right now I have managed to insert 15gb data to a 6-regionserver HBase database in roughly 26 minutes using the "table.put(Put p)" schema. Generally is this a decent performance?
>
> Any advice would be appreciated. Thanks a lot in advance.
> --
> Han Liu
> SCS & HCI Institute
> Undergrad. Class of 2012
> Carnegie Mellon University
>
>
>
>
>