You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Lin Ma <li...@gmail.com> on 2012/09/02 11:13:25 UTC

batch update question

Hello guys,

I am reading the book "HBase, the definitive guide", at the beginning of
chapter 3, it is mentioned in order to reduce performance impact for
clients to update the same row (lock contention issues for automatic
write), batch update is preferred. My questions is, for MR job, what are
the batch update methods we could leverage to resolve the issue? And for
API client, what are the batch update methods we could leverage to resolve
the issue?

thanks in advance,
Lin

Re: batch update question

Posted by Lin Ma <li...@gmail.com>.
Thank you Doug.

I still have one confusion left. My original question is, why batch update
could resolve the performance (or make improvement) issue caused by same
row update contention by multiple clients. Do you have any ideas or
comments?

regards,
Lin

On Fri, Sep 7, 2012 at 2:26 AM, Doug Meil <do...@explorysmedical.com>wrote:

>
>   For the 2nd part of the question, if you have 10 Puts it's more
> efficient to send a single RS message with 10 Puts than send 10 RS messages
> with 1 Put apiece.  There are 2 words to be careful with, and those are
> "always" and "never", because there is an exception: if you are using the
> client writeBuffer and each of those 10 Puts are going to a different
> RegionServer, then you haven't really gained much.
>
>  To answer the next question of how you know where the Puts are going,
> see this method…
>
>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getRegionLocation%28byte[],%20boolean%29
>
>  Because the Hbase client talks directly to each RS, it has to know the
> region boundaries.
>
>
>
>   From: Lin Ma <li...@gmail.com>
> Date: Thursday, September 6, 2012 11:54 AM
> To: "user@hbase.apache.org" <us...@hbase.apache.org>, Doug Meil <
> doug.meil@explorysmedical.com>
> Cc: "stack@duboce.net" <st...@duboce.net>
> Subject: Re: batch update question
>
>  Thank you Doug,
>
> Very effective reply. :-)
>
> - why batch update could resolve contention issue on the same row? Could
> you elaborate a bit more or show me an example?
> - Batch update always have good performance compared to single update
> (when we measure total throughput)?
>
> regards,
> Lin
>
> On Thu, Sep 6, 2012 at 12:59 AM, Doug Meil <do...@explorysmedical.com>wrote:
>
>>
>> Hi there, if you look in the source code for HTable there is a list of Put
>> objects.  That's the buffer, and it's a client-side buffer.
>>
>>
>>
>>
>>
>> On 9/5/12 12:04 PM, "Lin Ma" <li...@gmail.com> wrote:
>>
>> >Thank you Stack for the details directions!
>> >
>> >1. You are right, I have not met with any real row contention issues. My
>> >purpose is understanding the issue in advance, and also from this issue
>> to
>> >understand HBase generals better;
>> >2. For the comments from API Url page you referred -- "If
>>  >isAutoFlush<
>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client
>> >/HTableInterface.html#isAutoFlush%28%29>is
>>  >false, the update is buffered until the internal buffer is full.", I
>> >am
>> >confused what is the buffer? Buffer at client side or buffer in region
>> >server? Is there a way to configure its size to hold until flushing?
>> >3. Why batch could resolve contention on the same raw issue in theory,
>> >compared to non-batch operation? Besides preparation the solution in my
>> >mind in advance, I want to learn a bit about why. :-)
>> >
>> >regards,
>> >Lin
>> >
>> >On Wed, Sep 5, 2012 at 4:00 AM, Stack <st...@duboce.net> wrote:
>> >
>> >> On Sun, Sep 2, 2012 at 2:13 AM, Lin Ma <li...@gmail.com> wrote:
>> >> > Hello guys,
>> >> >
>> >> > I am reading the book "HBase, the definitive guide", at the beginning
>> >>of
>> >> > chapter 3, it is mentioned in order to reduce performance impact for
>> >> > clients to update the same row (lock contention issues for automatic
>> >> > write), batch update is preferred. My questions is, for MR job, what
>> >>are
>> >> > the batch update methods we could leverage to resolve the issue? And
>> >>for
>> >> > API client, what are the batch update methods we could leverage to
>> >> resolve
>> >> > the issue?
>> >> >
>> >>
>> >> Do you actually have a problem where there is contention on a single
>> >>row?
>> >>
>> >> Use methods like
>> >>
>> >>
>> >>
>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.htm
>> >>l#put(java.util.List)
>> >> or the batch methods listed earlier in the API.  You should set
>> >> autoflush to false too:
>> >>
>> >>
>> >>
>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTableInte
>> >>rface.html#isAutoFlush()
>> >>
>> >> Even batching, a highly contended row might hold up inserts... but for
>> >> sure you actually have this problem in the first place?
>> >>
>> >> St.Ack
>> >>
>>
>>
>>
>

Re: batch update question

Posted by Doug Meil <do...@explorysmedical.com>.
For the 2nd part of the question, if you have 10 Puts it's more efficient to send a single RS message with 10 Puts than send 10 RS messages with 1 Put apiece.  There are 2 words to be careful with, and those are "always" and "never", because there is an exception: if you are using the client writeBuffer and each of those 10 Puts are going to a different RegionServer, then you haven't really gained much.

To answer the next question of how you know where the Puts are going, see this method…

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getRegionLocation%28byte[],%20boolean%29

Because the Hbase client talks directly to each RS, it has to know the region boundaries.



From: Lin Ma <li...@gmail.com>>
Date: Thursday, September 6, 2012 11:54 AM
To: "user@hbase.apache.org<ma...@hbase.apache.org>" <us...@hbase.apache.org>>, Doug Meil <do...@explorysmedical.com>>
Cc: "stack@duboce.net<ma...@duboce.net>" <st...@duboce.net>>
Subject: Re: batch update question

Thank you Doug,

Very effective reply. :-)

- why batch update could resolve contention issue on the same row? Could you elaborate a bit more or show me an example?
- Batch update always have good performance compared to single update (when we measure total throughput)?

regards,
Lin

On Thu, Sep 6, 2012 at 12:59 AM, Doug Meil <do...@explorysmedical.com>> wrote:

Hi there, if you look in the source code for HTable there is a list of Put
objects.  That's the buffer, and it's a client-side buffer.





On 9/5/12 12:04 PM, "Lin Ma" <li...@gmail.com>> wrote:

>Thank you Stack for the details directions!
>
>1. You are right, I have not met with any real row contention issues. My
>purpose is understanding the issue in advance, and also from this issue to
>understand HBase generals better;
>2. For the comments from API Url page you referred -- "If
>isAutoFlush<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client
>/HTableInterface.html#isAutoFlush%28%29>is
>false, the update is buffered until the internal buffer is full.", I
>am
>confused what is the buffer? Buffer at client side or buffer in region
>server? Is there a way to configure its size to hold until flushing?
>3. Why batch could resolve contention on the same raw issue in theory,
>compared to non-batch operation? Besides preparation the solution in my
>mind in advance, I want to learn a bit about why. :-)
>
>regards,
>Lin
>
>On Wed, Sep 5, 2012 at 4:00 AM, Stack <st...@duboce.net>> wrote:
>
>> On Sun, Sep 2, 2012 at 2:13 AM, Lin Ma <li...@gmail.com>> wrote:
>> > Hello guys,
>> >
>> > I am reading the book "HBase, the definitive guide", at the beginning
>>of
>> > chapter 3, it is mentioned in order to reduce performance impact for
>> > clients to update the same row (lock contention issues for automatic
>> > write), batch update is preferred. My questions is, for MR job, what
>>are
>> > the batch update methods we could leverage to resolve the issue? And
>>for
>> > API client, what are the batch update methods we could leverage to
>> resolve
>> > the issue?
>> >
>>
>> Do you actually have a problem where there is contention on a single
>>row?
>>
>> Use methods like
>>
>>
>>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.htm
>>l#put(java.util.List)
>> or the batch methods listed earlier in the API.  You should set
>> autoflush to false too:
>>
>>
>>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTableInte
>>rface.html#isAutoFlush()
>>
>> Even batching, a highly contended row might hold up inserts... but for
>> sure you actually have this problem in the first place?
>>
>> St.Ack
>>




Re: batch update question

Posted by Lin Ma <li...@gmail.com>.
Thank you Doug,

Very effective reply. :-)

- why batch update could resolve contention issue on the same row? Could
you elaborate a bit more or show me an example?
- Batch update always have good performance compared to single update (when
we measure total throughput)?

regards,
Lin

On Thu, Sep 6, 2012 at 12:59 AM, Doug Meil <do...@explorysmedical.com>wrote:

>
> Hi there, if you look in the source code for HTable there is a list of Put
> objects.  That's the buffer, and it's a client-side buffer.
>
>
>
>
>
> On 9/5/12 12:04 PM, "Lin Ma" <li...@gmail.com> wrote:
>
> >Thank you Stack for the details directions!
> >
> >1. You are right, I have not met with any real row contention issues. My
> >purpose is understanding the issue in advance, and also from this issue to
> >understand HBase generals better;
> >2. For the comments from API Url page you referred -- "If
> >isAutoFlush<
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client
> >/HTableInterface.html#isAutoFlush%28%29>is
> >false, the update is buffered until the internal buffer is full.", I
> >am
> >confused what is the buffer? Buffer at client side or buffer in region
> >server? Is there a way to configure its size to hold until flushing?
> >3. Why batch could resolve contention on the same raw issue in theory,
> >compared to non-batch operation? Besides preparation the solution in my
> >mind in advance, I want to learn a bit about why. :-)
> >
> >regards,
> >Lin
> >
> >On Wed, Sep 5, 2012 at 4:00 AM, Stack <st...@duboce.net> wrote:
> >
> >> On Sun, Sep 2, 2012 at 2:13 AM, Lin Ma <li...@gmail.com> wrote:
> >> > Hello guys,
> >> >
> >> > I am reading the book "HBase, the definitive guide", at the beginning
> >>of
> >> > chapter 3, it is mentioned in order to reduce performance impact for
> >> > clients to update the same row (lock contention issues for automatic
> >> > write), batch update is preferred. My questions is, for MR job, what
> >>are
> >> > the batch update methods we could leverage to resolve the issue? And
> >>for
> >> > API client, what are the batch update methods we could leverage to
> >> resolve
> >> > the issue?
> >> >
> >>
> >> Do you actually have a problem where there is contention on a single
> >>row?
> >>
> >> Use methods like
> >>
> >>
> >>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.htm
> >>l#put(java.util.List)
> >> or the batch methods listed earlier in the API.  You should set
> >> autoflush to false too:
> >>
> >>
> >>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTableInte
> >>rface.html#isAutoFlush()
> >>
> >> Even batching, a highly contended row might hold up inserts... but for
> >> sure you actually have this problem in the first place?
> >>
> >> St.Ack
> >>
>
>
>

Re: batch update question

Posted by Doug Meil <do...@explorysmedical.com>.
Hi there, for more information about the hbase client, seeŠ

http://hbase.apache.org/book.html#client





On 9/5/12 12:59 PM, "Doug Meil" <do...@explorysmedical.com> wrote:

>
>Hi there, if you look in the source code for HTable there is a list of Put
>objects.  That's the buffer, and it's a client-side buffer.
>
>
>
>
>
>On 9/5/12 12:04 PM, "Lin Ma" <li...@gmail.com> wrote:
>
>>Thank you Stack for the details directions!
>>
>>1. You are right, I have not met with any real row contention issues. My
>>purpose is understanding the issue in advance, and also from this issue
>>to
>>understand HBase generals better;
>>2. For the comments from API Url page you referred -- "If
>>isAutoFlush<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/clien
>>t
>>/HTableInterface.html#isAutoFlush%28%29>is
>>false, the update is buffered until the internal buffer is full.", I
>>am
>>confused what is the buffer? Buffer at client side or buffer in region
>>server? Is there a way to configure its size to hold until flushing?
>>3. Why batch could resolve contention on the same raw issue in theory,
>>compared to non-batch operation? Besides preparation the solution in my
>>mind in advance, I want to learn a bit about why. :-)
>>
>>regards,
>>Lin
>>
>>On Wed, Sep 5, 2012 at 4:00 AM, Stack <st...@duboce.net> wrote:
>>
>>> On Sun, Sep 2, 2012 at 2:13 AM, Lin Ma <li...@gmail.com> wrote:
>>> > Hello guys,
>>> >
>>> > I am reading the book "HBase, the definitive guide", at the beginning
>>>of
>>> > chapter 3, it is mentioned in order to reduce performance impact for
>>> > clients to update the same row (lock contention issues for automatic
>>> > write), batch update is preferred. My questions is, for MR job, what
>>>are
>>> > the batch update methods we could leverage to resolve the issue? And
>>>for
>>> > API client, what are the batch update methods we could leverage to
>>> resolve
>>> > the issue?
>>> >
>>>
>>> Do you actually have a problem where there is contention on a single
>>>row?
>>>
>>> Use methods like
>>>
>>> 
>>>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.ht
>>>m
>>>l#put(java.util.List)
>>> or the batch methods listed earlier in the API.  You should set
>>> autoflush to false too:
>>>
>>> 
>>>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTableInt
>>>e
>>>rface.html#isAutoFlush()
>>>
>>> Even batching, a highly contended row might hold up inserts... but for
>>> sure you actually have this problem in the first place?
>>>
>>> St.Ack
>>>
>



Re: batch update question

Posted by Doug Meil <do...@explorysmedical.com>.
Hi there, if you look in the source code for HTable there is a list of Put
objects.  That's the buffer, and it's a client-side buffer.





On 9/5/12 12:04 PM, "Lin Ma" <li...@gmail.com> wrote:

>Thank you Stack for the details directions!
>
>1. You are right, I have not met with any real row contention issues. My
>purpose is understanding the issue in advance, and also from this issue to
>understand HBase generals better;
>2. For the comments from API Url page you referred -- "If
>isAutoFlush<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client
>/HTableInterface.html#isAutoFlush%28%29>is
>false, the update is buffered until the internal buffer is full.", I
>am
>confused what is the buffer? Buffer at client side or buffer in region
>server? Is there a way to configure its size to hold until flushing?
>3. Why batch could resolve contention on the same raw issue in theory,
>compared to non-batch operation? Besides preparation the solution in my
>mind in advance, I want to learn a bit about why. :-)
>
>regards,
>Lin
>
>On Wed, Sep 5, 2012 at 4:00 AM, Stack <st...@duboce.net> wrote:
>
>> On Sun, Sep 2, 2012 at 2:13 AM, Lin Ma <li...@gmail.com> wrote:
>> > Hello guys,
>> >
>> > I am reading the book "HBase, the definitive guide", at the beginning
>>of
>> > chapter 3, it is mentioned in order to reduce performance impact for
>> > clients to update the same row (lock contention issues for automatic
>> > write), batch update is preferred. My questions is, for MR job, what
>>are
>> > the batch update methods we could leverage to resolve the issue? And
>>for
>> > API client, what are the batch update methods we could leverage to
>> resolve
>> > the issue?
>> >
>>
>> Do you actually have a problem where there is contention on a single
>>row?
>>
>> Use methods like
>>
>> 
>>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.htm
>>l#put(java.util.List)
>> or the batch methods listed earlier in the API.  You should set
>> autoflush to false too:
>>
>> 
>>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTableInte
>>rface.html#isAutoFlush()
>>
>> Even batching, a highly contended row might hold up inserts... but for
>> sure you actually have this problem in the first place?
>>
>> St.Ack
>>



Re: batch update question

Posted by Lin Ma <li...@gmail.com>.
Thank you Stack for the details directions!

1. You are right, I have not met with any real row contention issues. My
purpose is understanding the issue in advance, and also from this issue to
understand HBase generals better;
2. For the comments from API Url page you referred -- "If
isAutoFlush<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTableInterface.html#isAutoFlush%28%29>is
false, the update is buffered until the internal buffer is full.", I
am
confused what is the buffer? Buffer at client side or buffer in region
server? Is there a way to configure its size to hold until flushing?
3. Why batch could resolve contention on the same raw issue in theory,
compared to non-batch operation? Besides preparation the solution in my
mind in advance, I want to learn a bit about why. :-)

regards,
Lin

On Wed, Sep 5, 2012 at 4:00 AM, Stack <st...@duboce.net> wrote:

> On Sun, Sep 2, 2012 at 2:13 AM, Lin Ma <li...@gmail.com> wrote:
> > Hello guys,
> >
> > I am reading the book "HBase, the definitive guide", at the beginning of
> > chapter 3, it is mentioned in order to reduce performance impact for
> > clients to update the same row (lock contention issues for automatic
> > write), batch update is preferred. My questions is, for MR job, what are
> > the batch update methods we could leverage to resolve the issue? And for
> > API client, what are the batch update methods we could leverage to
> resolve
> > the issue?
> >
>
> Do you actually have a problem where there is contention on a single row?
>
> Use methods like
>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#put(java.util.List)
> or the batch methods listed earlier in the API.  You should set
> autoflush to false too:
>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTableInterface.html#isAutoFlush()
>
> Even batching, a highly contended row might hold up inserts... but for
> sure you actually have this problem in the first place?
>
> St.Ack
>

Re: batch update question

Posted by Stack <st...@duboce.net>.
On Sun, Sep 2, 2012 at 2:13 AM, Lin Ma <li...@gmail.com> wrote:
> Hello guys,
>
> I am reading the book "HBase, the definitive guide", at the beginning of
> chapter 3, it is mentioned in order to reduce performance impact for
> clients to update the same row (lock contention issues for automatic
> write), batch update is preferred. My questions is, for MR job, what are
> the batch update methods we could leverage to resolve the issue? And for
> API client, what are the batch update methods we could leverage to resolve
> the issue?
>

Do you actually have a problem where there is contention on a single row?

Use methods like
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#put(java.util.List)
or the batch methods listed earlier in the API.  You should set
autoflush to false too:
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTableInterface.html#isAutoFlush()

Even batching, a highly contended row might hold up inserts... but for
sure you actually have this problem in the first place?

St.Ack