You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Jean-Daniel Cryans <jd...@apache.org> on 2009/12/04 00:24:31 UTC

Re: Should we change the default value of hbase.regionserver.flushlogentries for 0.21?

I have the feeling that this discussion isn't over, there's no
consensus yet, so I did some tests to get some numbers.

PE sequentialWrite 1 with the write buffer disabled (I get the same
numbers on every different config with it) on a standalone setup. I
stopped HBase and deleted the data dir between each run.

- hbase.regionserver.flushlogentries=1 and
hbase.regionserver.optionallogflushinterval=1000
 ran in 354765ms

- hbase.regionserver.flushlogentries=100 and
hbase.regionserver.optionallogflushinterval=1000
 run #1 in 333972ms
 run #2 in 331943ms

- hbase.regionserver.flushlogentries=1,
hbase.regionserver.optionallogflushinterval=1000 and deferred flush
enabled on TestTable
 run #1 in 309857ms
 run #2 in 311440ms

So 100 entries per flush takes ~7% less time, deferred flush takes 14% less.

I thereby think that not only should we set flushlogentries=1 in 0.21,
but also we should enable deferred log flush by default with a lower
optional log flush interval. It will be a nearly as safe but much
faster alternative to the previous option. I would even get rid of the
hbase.regionserver.flushlogentries config.

J-D

On Tue, Nov 17, 2009 at 7:10 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
> Well it's even better than that ;) We have optional log flushing which
> by default is 10 secs. Make that 100 milliseconds and that's as much
> data you can lose. If any other table syncs then this table's edits
> are also synced.
>
> J-D
>
>
> On Tue, Nov 17, 2009 at 4:36 PM, Jonathan Gray <jl...@streamy.com> wrote:
>> Thoughts on a client-facing call to explicit call a WAL sync?  So I could
>> turn on DEFERRED_LOG_FLUSH (possibly leave it on always), run a batch of
>> my inserts, and then run an explicit flush/sync.  The returning of that
>> call would guarantee to the client that the data up to that point is safe.
>>
>> JG
>>
>> On Mon, November 16, 2009 11:00 am, Jean-Daniel Cryans wrote:
>>> I added a new feature for tables called "deferred flush", see
>>> https://issues.apache.org/jira/browse/HBASE-1944
>>>
>>>
>>> My opinion is that the default should be paranoid enough to not lose
>>> any user data. If we can change a table's attribute without taking it down
>>> (there's a jira on that), wouldn't that solve the import problem?
>>>
>>>
>>> For example: have some table that needs to have fast insertion via MR.
>>> During the creation of the job, you change the table's
>>> DEFERRED_LOG_FLUSH to "true", then run the job and finally set the
>>> value to false when the job is done.
>>>
>>> This way you still pass the responsibility to the user but for
>>> performance reasons.
>>>
>>> J-D
>>>
>>>
>>> On Mon, Nov 16, 2009 at 2:05 AM, Cosmin Lehene <cl...@adobe.com> wrote:
>>>
>>>> We could have a speedy default and an extra parameter for puts that
>>>> would specify a flush is needed. This way you pass the responsibility to
>>>> the user and he can decide if he needs to be paranoid or not. This could
>>>> be part of Put and even specify granularity of the flush if needed.
>>>>
>>>>
>>>> Cosmin
>>>>
>>>>
>>>>
>>>> On 11/15/09 6:59 PM, "Andrew Purtell" <ap...@apache.org> wrote:
>>>>
>>>>
>>>>> I agree with this.
>>>>>
>>>>>
>>>>> I also think we should leave the default as is with the caveat that
>>>>> we call out the durability versus write performance tradeoff in the
>>>>> flushlogentries description and up on the wiki somewhere, maybe on
>>>>> http://wiki.apache.org/hadoop/PerformanceTuning . We could also
>>>>> provide two example configurations, one for performance (reasonable
>>>>> tradeoffs), one for paranoia. I put up an issue:
>>>>> https://issues.apache.org/jira/browse/HBASE-1984
>>>>>
>>>>>
>>>>>     - Andy
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ________________________________
>>>>> From: Ryan Rawson <ry...@gmail.com>
>>>>> To: hbase-dev@hadoop.apache.org
>>>>> Sent: Sat, November 14, 2009 11:22:13 PM
>>>>> Subject: Re: Should we change the default value of
>>>>> hbase.regionserver.flushlogentries  for 0.21?
>>>>>
>>>>> That sync at the end of a RPC is my doing. You dont want to sync
>>>>> every _EDIT_, after all, the previous definition of the word "edit"
>>>>> was each KeyValue.  So we could be calling sync for every single
>>>>> column in a row. Bad stuff.
>>>>>
>>>>> In the end, if the regionserver crashes during a batch put, we will
>>>>> never know how much of the batch was flushed to the WAL. Thus it makes
>>>>>  sense to only do it once and get a massive, massive, speedup.
>>>>>
>>>>> On Sat, Nov 14, 2009 at 9:45 PM, stack <st...@duboce.net> wrote:
>>>>>
>>>>>> I'm for leaving it as it is, at every 100 edits -- maybe every 10
>>>>>> edits? Speed stays as it was.  We used to lose MBs.  By default,
>>>>>> we'll now lose 99 or 9 edits max.
>>>>>>
>>>>>> We need to do some work bringing folks along regardless of what we
>>>>>> decide. Flush happens at the end of the put up in the regionserver.
>>>>>>  If you are
>>>>>> doing a batch of commits -- e.g. using a big write buffer over on
>>>>>> your client -- the puts will only be flushed on the way out after
>>>>>> the batch put completes EVEN if you have configured hbase to sync
>>>>>> every edit (I ran into this this evening.  J-D sorted me out).  We
>>>>>> need to make sure folks are up on this.
>>>>>>
>>>>>> St.Ack
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Nov 14, 2009 at 4:37 PM, Jean-Daniel Cryans
>>>>>> <jd...@apache.org>wrote:
>>>>>>
>>>>>>
>>>>>>> Hi dev!
>>>>>>>
>>>>>>>
>>>>>>> Hadoop 0.21 now has a reliable append and flush feature and this
>>>>>>> gives us the opportunity to review some assumptions. The current
>>>>>>> situation:
>>>>>>>
>>>>>>>
>>>>>>> - Every edit going to a catalog table is flushed so there's no
>>>>>>> data loss. - The user tables edits are flushed every
>>>>>>> hbase.regionserver.flushlogentries which by default is 100.
>>>>>>>
>>>>>>> Should we now set this value to 1 in order to have more durable
>>>>>>> but slower inserts by default? Please speak up.
>>>>>>>
>>>>>>> Thx,
>>>>>>>
>>>>>>>
>>>>>>> J-D
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>

Re: Should we change the default value of hbase.regionserver.flushlogentries for 0.21?

Posted by stack <st...@duboce.net>.

Sounds good.
St.Ack

On Fri, Dec 11, 2009 at 5:33 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> Ok to make sure I get this right:
>
> - we enable deferred log flush by default
> - we set flushlogentries=1
>
> Also since 10 seconds is kind of a huge window I propose that:
>
> - we set optionalLogFlush=1000
>
> which is the MySQL default. We also have to update the wiki (there's
> already an entry on deferred log flush) by adding the configuration of
> flushlogentries.
>
> I'll open a jira.
>
> J-D
>
> On Fri, Dec 11, 2009 at 5:26 PM, stack <st...@duboce.net> wrote:
> > Yeah, +1 on deferred log flush.  Good man J-D.
> >
> > Can we also update performance wiki page to list how to up your write
> speed
> > at cost of possible increased edit loss?
> >
> > St.Ack
> >
> >
> > On Fri, Dec 11, 2009 at 1:35 PM, Ryan Rawson <ry...@gmail.com> wrote:
> >
> >> Looks like deferred log flush is the clear winner here, and probably
> >> has a smaller chance of loss than the 100 logflushentries.
> >>
> >> I dare say we should ship with that as the default...
> >>
> >> -ryan
> >>
> >> On Thu, Dec 10, 2009 at 6:02 PM, Jean-Daniel Cryans <
> jdcryans@apache.org>
> >> wrote:
> >> > So to satisfy Ryan's thirst of cluster number, here they are:
> >> >
> >> > Default (with write buffer)
> >> > 65 060ms
> >> >
> >> > The rest is without the write buffer (which is so well optimized that
> >> > we only sync once per 2MB batch). I ran it once with entries=1 because
> >> > it's taking so long.
> >> >
> >> > 1 logflushentries
> >> > 2 188 737ms
> >> >
> >> > 100 logflushentries
> >> > 697 590ms
> >> > 698 082ms
> >> >
> >> > deferred log flush
> >> > 545 836ms
> >> > 532 788ms
> >> >
> >> > The cluster is composed of 15 i7s (a bit overkill) but it shows that
> >> > it runs much slower because of network, replication, etc.
> >> >
> >> > Also on another cluster (same hardware) I did some 0.20 testing:
> >> >
> >> > With write buffer:
> >> > 131 811ms
> >> >
> >> > Without:
> >> > 602 842ms
> >> >
> >> > Keep in mind that the sync we call isn't HDFS-265.
> >> >
> >> > J-D
> >> >
> >> > On Thu, Dec 3, 2009 at 9:53 PM, stack <st...@duboce.net> wrote:
> >> >> Thanks for picking up this discussion again J-D.
> >> >>
> >> >> See below.
> >> >>
> >> >> On Thu, Dec 3, 2009 at 3:24 PM, Jean-Daniel Cryans <
> jdcryans@apache.org
> >> >wrote:
> >> >>
> >> >>> I have the feeling that this discussion isn't over, there's no
> >> >>> consensus yet, so I did some tests to get some numbers.
> >> >>>
> >> >>> PE sequentialWrite 1 with the write buffer disabled (I get the same
> >> >>> numbers on every different config with it) on a standalone setup.
> >> >>
> >> >>
> >> >> The write buffer is disabled because otherwise it will get in the way
> of
> >> the
> >> >> hbase.regionserver.flushlogentries=1?
> >> >>
> >> >> It would be interesting to get a baseline for 0.20 which IMO would be
> >> >> settings we had in 0.19 w/ write buffer.  Would be good for
> comparison.
> >> >>
> >> >> You like the idea of the sync being time-based rather than some
> number
> >> of
> >> >> edits?  I can see fellas wanting both.
> >> >>
> >> >> stack
> >> >>
> >> >>
> >> >> I
> >> >>> stopped HBase and deleted the data dir between each run.
> >> >>>
> >> >>> - hbase.regionserver.flushlogentries=1 and
> >> >>> hbase.regionserver.optionallogflushinterval=1000
> >> >>>  ran in 354765ms
> >> >>>
> >> >>> - hbase.regionserver.flushlogentries=100 and
> >> >>> hbase.regionserver.optionallogflushinterval=1000
> >> >>>  run #1 in 333972ms
> >> >>>  run #2 in 331943ms
> >> >>>
> >> >>> - hbase.regionserver.flushlogentries=1,
> >> >>> hbase.regionserver.optionallogflushinterval=1000 and deferred flush
> >> >>> enabled on TestTable
> >> >>>  run #1 in 309857ms
> >> >>>  run #2 in 311440ms
> >> >>>
> >> >>> So 100 entries per flush takes ~7% less time, deferred flush takes
> 14%
> >> >>> less.
> >> >>>
> >> >>> I thereby think that not only should we set flushlogentries=1 in
> 0.21,
> >> >>> but also we should enable deferred log flush by default with a lower
> >> >>> optional log flush interval. It will be a nearly as safe but much
> >> >>> faster alternative to the previous option. I would even get rid of
> the
> >> >>> hbase.regionserver.flushlogentries config.
> >> >>>
> >> >>> J-D
> >> >>>
> >> >>> On Tue, Nov 17, 2009 at 7:10 PM, Jean-Daniel Cryans <
> >> jdcryans@apache.org>
> >> >>> wrote:
> >> >>> > Well it's even better than that ;) We have optional log flushing
> >> which
> >> >>> > by default is 10 secs. Make that 100 milliseconds and that's as
> much
> >> >>> > data you can lose. If any other table syncs then this table's
> edits
> >> >>> > are also synced.
> >> >>> >
> >> >>> > J-D
> >> >>> >
> >> >>> >
> >> >>> > On Tue, Nov 17, 2009 at 4:36 PM, Jonathan Gray <jlist@streamy.com
> >
> >> >>> wrote:
> >> >>> >> Thoughts on a client-facing call to explicit call a WAL sync?  So
> I
> >> >>> could
> >> >>> >> turn on DEFERRED_LOG_FLUSH (possibly leave it on always), run a
> >> batch of
> >> >>> >> my inserts, and then run an explicit flush/sync.  The returning
> of
> >> that
> >> >>> >> call would guarantee to the client that the data up to that point
> is
> >> >>> safe.
> >> >>> >>
> >> >>> >> JG
> >> >>> >>
> >> >>> >> On Mon, November 16, 2009 11:00 am, Jean-Daniel Cryans wrote:
> >> >>> >>> I added a new feature for tables called "deferred flush", see
> >> >>> >>> https://issues.apache.org/jira/browse/HBASE-1944
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> My opinion is that the default should be paranoid enough to not
> >> lose
> >> >>> >>> any user data. If we can change a table's attribute without
> taking
> >> it
> >> >>> down
> >> >>> >>> (there's a jira on that), wouldn't that solve the import
> problem?
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> For example: have some table that needs to have fast insertion
> via
> >> MR.
> >> >>> >>> During the creation of the job, you change the table's
> >> >>> >>> DEFERRED_LOG_FLUSH to "true", then run the job and finally set
> the
> >> >>> >>> value to false when the job is done.
> >> >>> >>>
> >> >>> >>> This way you still pass the responsibility to the user but for
> >> >>> >>> performance reasons.
> >> >>> >>>
> >> >>> >>> J-D
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> On Mon, Nov 16, 2009 at 2:05 AM, Cosmin Lehene <
> clehene@adobe.com>
> >> >>> wrote:
> >> >>> >>>
> >> >>> >>>> We could have a speedy default and an extra parameter for puts
> >> that
> >> >>> >>>> would specify a flush is needed. This way you pass the
> >> responsibility
> >> >>> to
> >> >>> >>>> the user and he can decide if he needs to be paranoid or not.
> This
> >> >>> could
> >> >>> >>>> be part of Put and even specify granularity of the flush if
> >> needed.
> >> >>> >>>>
> >> >>> >>>>
> >> >>> >>>> Cosmin
> >> >>> >>>>
> >> >>> >>>>
> >> >>> >>>>
> >> >>> >>>> On 11/15/09 6:59 PM, "Andrew Purtell" <ap...@apache.org>
> >> wrote:
> >> >>> >>>>
> >> >>> >>>>
> >> >>> >>>>> I agree with this.
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>> I also think we should leave the default as is with the caveat
> >> that
> >> >>> >>>>> we call out the durability versus write performance tradeoff
> in
> >> the
> >> >>> >>>>> flushlogentries description and up on the wiki somewhere,
> maybe
> >> on
> >> >>> >>>>> http://wiki.apache.org/hadoop/PerformanceTuning . We could
> also
> >> >>> >>>>> provide two example configurations, one for performance
> >> (reasonable
> >> >>> >>>>> tradeoffs), one for paranoia. I put up an issue:
> >> >>> >>>>> https://issues.apache.org/jira/browse/HBASE-1984
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>>     - Andy
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>> ________________________________
> >> >>> >>>>> From: Ryan Rawson <ry...@gmail.com>
> >> >>> >>>>> To: hbase-dev@hadoop.apache.org
> >> >>> >>>>> Sent: Sat, November 14, 2009 11:22:13 PM
> >> >>> >>>>> Subject: Re: Should we change the default value of
> >> >>> >>>>> hbase.regionserver.flushlogentries  for 0.21?
> >> >>> >>>>>
> >> >>> >>>>> That sync at the end of a RPC is my doing. You dont want to
> sync
> >> >>> >>>>> every _EDIT_, after all, the previous definition of the word
> >> "edit"
> >> >>> >>>>> was each KeyValue.  So we could be calling sync for every
> single
> >> >>> >>>>> column in a row. Bad stuff.
> >> >>> >>>>>
> >> >>> >>>>> In the end, if the regionserver crashes during a batch put, we
> >> will
> >> >>> >>>>> never know how much of the batch was flushed to the WAL. Thus
> it
> >> >>> makes
> >> >>> >>>>>  sense to only do it once and get a massive, massive, speedup.
> >> >>> >>>>>
> >> >>> >>>>> On Sat, Nov 14, 2009 at 9:45 PM, stack <st...@duboce.net>
> wrote:
> >> >>> >>>>>
> >> >>> >>>>>> I'm for leaving it as it is, at every 100 edits -- maybe
> every
> >> 10
> >> >>> >>>>>> edits? Speed stays as it was.  We used to lose MBs.  By
> default,
> >> >>> >>>>>> we'll now lose 99 or 9 edits max.
> >> >>> >>>>>>
> >> >>> >>>>>> We need to do some work bringing folks along regardless of
> what
> >> we
> >> >>> >>>>>> decide. Flush happens at the end of the put up in the
> >> regionserver.
> >> >>> >>>>>>  If you are
> >> >>> >>>>>> doing a batch of commits -- e.g. using a big write buffer
> over
> >> on
> >> >>> >>>>>> your client -- the puts will only be flushed on the way out
> >> after
> >> >>> >>>>>> the batch put completes EVEN if you have configured hbase to
> >> sync
> >> >>> >>>>>> every edit (I ran into this this evening.  J-D sorted me
> out).
> >>  We
> >> >>> >>>>>> need to make sure folks are up on this.
> >> >>> >>>>>>
> >> >>> >>>>>> St.Ack
> >> >>> >>>>>>
> >> >>> >>>>>>
> >> >>> >>>>>>
> >> >>> >>>>>>
> >> >>> >>>>>> On Sat, Nov 14, 2009 at 4:37 PM, Jean-Daniel Cryans
> >> >>> >>>>>> <jd...@apache.org>wrote:
> >> >>> >>>>>>
> >> >>> >>>>>>
> >> >>> >>>>>>> Hi dev!
> >> >>> >>>>>>>
> >> >>> >>>>>>>
> >> >>> >>>>>>> Hadoop 0.21 now has a reliable append and flush feature and
> >> this
> >> >>> >>>>>>> gives us the opportunity to review some assumptions. The
> >> current
> >> >>> >>>>>>> situation:
> >> >>> >>>>>>>
> >> >>> >>>>>>>
> >> >>> >>>>>>> - Every edit going to a catalog table is flushed so there's
> no
> >> >>> >>>>>>> data loss. - The user tables edits are flushed every
> >> >>> >>>>>>> hbase.regionserver.flushlogentries which by default is 100.
> >> >>> >>>>>>>
> >> >>> >>>>>>> Should we now set this value to 1 in order to have more
> durable
> >> >>> >>>>>>> but slower inserts by default? Please speak up.
> >> >>> >>>>>>>
> >> >>> >>>>>>> Thx,
> >> >>> >>>>>>>
> >> >>> >>>>>>>
> >> >>> >>>>>>> J-D
> >> >>> >>>>>>>
> >> >>> >>>>>>>
> >> >>> >>>>>>
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>
> >> >>> >>>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>
> >> >>> >>
> >> >>> >
> >> >>>
> >> >>
> >> >
> >>
> >
>

Re: Should we change the default value of hbase.regionserver.flushlogentries for 0.21?

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Ok to make sure I get this right:

- we enable deferred log flush by default
- we set flushlogentries=1

Also since 10 seconds is kind of a huge window I propose that:

- we set optionalLogFlush=1000

which is the MySQL default. We also have to update the wiki (there's
already an entry on deferred log flush) by adding the configuration of
flushlogentries.

I'll open a jira.

J-D

On Fri, Dec 11, 2009 at 5:26 PM, stack <st...@duboce.net> wrote:
> Yeah, +1 on deferred log flush.  Good man J-D.
>
> Can we also update performance wiki page to list how to up your write speed
> at cost of possible increased edit loss?
>
> St.Ack
>
>
> On Fri, Dec 11, 2009 at 1:35 PM, Ryan Rawson <ry...@gmail.com> wrote:
>
>> Looks like deferred log flush is the clear winner here, and probably
>> has a smaller chance of loss than the 100 logflushentries.
>>
>> I dare say we should ship with that as the default...
>>
>> -ryan
>>
>> On Thu, Dec 10, 2009 at 6:02 PM, Jean-Daniel Cryans <jd...@apache.org>
>> wrote:
>> > So to satisfy Ryan's thirst of cluster number, here they are:
>> >
>> > Default (with write buffer)
>> > 65 060ms
>> >
>> > The rest is without the write buffer (which is so well optimized that
>> > we only sync once per 2MB batch). I ran it once with entries=1 because
>> > it's taking so long.
>> >
>> > 1 logflushentries
>> > 2 188 737ms
>> >
>> > 100 logflushentries
>> > 697 590ms
>> > 698 082ms
>> >
>> > deferred log flush
>> > 545 836ms
>> > 532 788ms
>> >
>> > The cluster is composed of 15 i7s (a bit overkill) but it shows that
>> > it runs much slower because of network, replication, etc.
>> >
>> > Also on another cluster (same hardware) I did some 0.20 testing:
>> >
>> > With write buffer:
>> > 131 811ms
>> >
>> > Without:
>> > 602 842ms
>> >
>> > Keep in mind that the sync we call isn't HDFS-265.
>> >
>> > J-D
>> >
>> > On Thu, Dec 3, 2009 at 9:53 PM, stack <st...@duboce.net> wrote:
>> >> Thanks for picking up this discussion again J-D.
>> >>
>> >> See below.
>> >>
>> >> On Thu, Dec 3, 2009 at 3:24 PM, Jean-Daniel Cryans <jdcryans@apache.org
>> >wrote:
>> >>
>> >>> I have the feeling that this discussion isn't over, there's no
>> >>> consensus yet, so I did some tests to get some numbers.
>> >>>
>> >>> PE sequentialWrite 1 with the write buffer disabled (I get the same
>> >>> numbers on every different config with it) on a standalone setup.
>> >>
>> >>
>> >> The write buffer is disabled because otherwise it will get in the way of
>> the
>> >> hbase.regionserver.flushlogentries=1?
>> >>
>> >> It would be interesting to get a baseline for 0.20 which IMO would be
>> >> settings we had in 0.19 w/ write buffer.  Would be good for comparison.
>> >>
>> >> You like the idea of the sync being time-based rather than some number
>> of
>> >> edits?  I can see fellas wanting both.
>> >>
>> >> stack
>> >>
>> >>
>> >> I
>> >>> stopped HBase and deleted the data dir between each run.
>> >>>
>> >>> - hbase.regionserver.flushlogentries=1 and
>> >>> hbase.regionserver.optionallogflushinterval=1000
>> >>>  ran in 354765ms
>> >>>
>> >>> - hbase.regionserver.flushlogentries=100 and
>> >>> hbase.regionserver.optionallogflushinterval=1000
>> >>>  run #1 in 333972ms
>> >>>  run #2 in 331943ms
>> >>>
>> >>> - hbase.regionserver.flushlogentries=1,
>> >>> hbase.regionserver.optionallogflushinterval=1000 and deferred flush
>> >>> enabled on TestTable
>> >>>  run #1 in 309857ms
>> >>>  run #2 in 311440ms
>> >>>
>> >>> So 100 entries per flush takes ~7% less time, deferred flush takes 14%
>> >>> less.
>> >>>
>> >>> I thereby think that not only should we set flushlogentries=1 in 0.21,
>> >>> but also we should enable deferred log flush by default with a lower
>> >>> optional log flush interval. It will be a nearly as safe but much
>> >>> faster alternative to the previous option. I would even get rid of the
>> >>> hbase.regionserver.flushlogentries config.
>> >>>
>> >>> J-D
>> >>>
>> >>> On Tue, Nov 17, 2009 at 7:10 PM, Jean-Daniel Cryans <
>> jdcryans@apache.org>
>> >>> wrote:
>> >>> > Well it's even better than that ;) We have optional log flushing
>> which
>> >>> > by default is 10 secs. Make that 100 milliseconds and that's as much
>> >>> > data you can lose. If any other table syncs then this table's edits
>> >>> > are also synced.
>> >>> >
>> >>> > J-D
>> >>> >
>> >>> >
>> >>> > On Tue, Nov 17, 2009 at 4:36 PM, Jonathan Gray <jl...@streamy.com>
>> >>> wrote:
>> >>> >> Thoughts on a client-facing call to explicit call a WAL sync?  So I
>> >>> could
>> >>> >> turn on DEFERRED_LOG_FLUSH (possibly leave it on always), run a
>> batch of
>> >>> >> my inserts, and then run an explicit flush/sync.  The returning of
>> that
>> >>> >> call would guarantee to the client that the data up to that point is
>> >>> safe.
>> >>> >>
>> >>> >> JG
>> >>> >>
>> >>> >> On Mon, November 16, 2009 11:00 am, Jean-Daniel Cryans wrote:
>> >>> >>> I added a new feature for tables called "deferred flush", see
>> >>> >>> https://issues.apache.org/jira/browse/HBASE-1944
>> >>> >>>
>> >>> >>>
>> >>> >>> My opinion is that the default should be paranoid enough to not
>> lose
>> >>> >>> any user data. If we can change a table's attribute without taking
>> it
>> >>> down
>> >>> >>> (there's a jira on that), wouldn't that solve the import problem?
>> >>> >>>
>> >>> >>>
>> >>> >>> For example: have some table that needs to have fast insertion via
>> MR.
>> >>> >>> During the creation of the job, you change the table's
>> >>> >>> DEFERRED_LOG_FLUSH to "true", then run the job and finally set the
>> >>> >>> value to false when the job is done.
>> >>> >>>
>> >>> >>> This way you still pass the responsibility to the user but for
>> >>> >>> performance reasons.
>> >>> >>>
>> >>> >>> J-D
>> >>> >>>
>> >>> >>>
>> >>> >>> On Mon, Nov 16, 2009 at 2:05 AM, Cosmin Lehene <cl...@adobe.com>
>> >>> wrote:
>> >>> >>>
>> >>> >>>> We could have a speedy default and an extra parameter for puts
>> that
>> >>> >>>> would specify a flush is needed. This way you pass the
>> responsibility
>> >>> to
>> >>> >>>> the user and he can decide if he needs to be paranoid or not. This
>> >>> could
>> >>> >>>> be part of Put and even specify granularity of the flush if
>> needed.
>> >>> >>>>
>> >>> >>>>
>> >>> >>>> Cosmin
>> >>> >>>>
>> >>> >>>>
>> >>> >>>>
>> >>> >>>> On 11/15/09 6:59 PM, "Andrew Purtell" <ap...@apache.org>
>> wrote:
>> >>> >>>>
>> >>> >>>>
>> >>> >>>>> I agree with this.
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>> I also think we should leave the default as is with the caveat
>> that
>> >>> >>>>> we call out the durability versus write performance tradeoff in
>> the
>> >>> >>>>> flushlogentries description and up on the wiki somewhere, maybe
>> on
>> >>> >>>>> http://wiki.apache.org/hadoop/PerformanceTuning . We could also
>> >>> >>>>> provide two example configurations, one for performance
>> (reasonable
>> >>> >>>>> tradeoffs), one for paranoia. I put up an issue:
>> >>> >>>>> https://issues.apache.org/jira/browse/HBASE-1984
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>>     - Andy
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>> ________________________________
>> >>> >>>>> From: Ryan Rawson <ry...@gmail.com>
>> >>> >>>>> To: hbase-dev@hadoop.apache.org
>> >>> >>>>> Sent: Sat, November 14, 2009 11:22:13 PM
>> >>> >>>>> Subject: Re: Should we change the default value of
>> >>> >>>>> hbase.regionserver.flushlogentries  for 0.21?
>> >>> >>>>>
>> >>> >>>>> That sync at the end of a RPC is my doing. You dont want to sync
>> >>> >>>>> every _EDIT_, after all, the previous definition of the word
>> "edit"
>> >>> >>>>> was each KeyValue.  So we could be calling sync for every single
>> >>> >>>>> column in a row. Bad stuff.
>> >>> >>>>>
>> >>> >>>>> In the end, if the regionserver crashes during a batch put, we
>> will
>> >>> >>>>> never know how much of the batch was flushed to the WAL. Thus it
>> >>> makes
>> >>> >>>>>  sense to only do it once and get a massive, massive, speedup.
>> >>> >>>>>
>> >>> >>>>> On Sat, Nov 14, 2009 at 9:45 PM, stack <st...@duboce.net> wrote:
>> >>> >>>>>
>> >>> >>>>>> I'm for leaving it as it is, at every 100 edits -- maybe every
>> 10
>> >>> >>>>>> edits? Speed stays as it was.  We used to lose MBs.  By default,
>> >>> >>>>>> we'll now lose 99 or 9 edits max.
>> >>> >>>>>>
>> >>> >>>>>> We need to do some work bringing folks along regardless of what
>> we
>> >>> >>>>>> decide. Flush happens at the end of the put up in the
>> regionserver.
>> >>> >>>>>>  If you are
>> >>> >>>>>> doing a batch of commits -- e.g. using a big write buffer over
>> on
>> >>> >>>>>> your client -- the puts will only be flushed on the way out
>> after
>> >>> >>>>>> the batch put completes EVEN if you have configured hbase to
>> sync
>> >>> >>>>>> every edit (I ran into this this evening.  J-D sorted me out).
>>  We
>> >>> >>>>>> need to make sure folks are up on this.
>> >>> >>>>>>
>> >>> >>>>>> St.Ack
>> >>> >>>>>>
>> >>> >>>>>>
>> >>> >>>>>>
>> >>> >>>>>>
>> >>> >>>>>> On Sat, Nov 14, 2009 at 4:37 PM, Jean-Daniel Cryans
>> >>> >>>>>> <jd...@apache.org>wrote:
>> >>> >>>>>>
>> >>> >>>>>>
>> >>> >>>>>>> Hi dev!
>> >>> >>>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>> Hadoop 0.21 now has a reliable append and flush feature and
>> this
>> >>> >>>>>>> gives us the opportunity to review some assumptions. The
>> current
>> >>> >>>>>>> situation:
>> >>> >>>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>> - Every edit going to a catalog table is flushed so there's no
>> >>> >>>>>>> data loss. - The user tables edits are flushed every
>> >>> >>>>>>> hbase.regionserver.flushlogentries which by default is 100.
>> >>> >>>>>>>
>> >>> >>>>>>> Should we now set this value to 1 in order to have more durable
>> >>> >>>>>>> but slower inserts by default? Please speak up.
>> >>> >>>>>>>
>> >>> >>>>>>> Thx,
>> >>> >>>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>> J-D
>> >>> >>>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>
>> >>> >>>>
>> >>> >>>
>> >>> >>>
>> >>> >>
>> >>> >>
>> >>> >
>> >>>
>> >>
>> >
>>
>

Re: Should we change the default value of hbase.regionserver.flushlogentries for 0.21?

Posted by stack <st...@duboce.net>.

Yeah, +1 on deferred log flush.  Good man J-D.

Can we also update performance wiki page to list how to up your write speed
at cost of possible increased edit loss?

St.Ack


On Fri, Dec 11, 2009 at 1:35 PM, Ryan Rawson <ry...@gmail.com> wrote:

> Looks like deferred log flush is the clear winner here, and probably
> has a smaller chance of loss than the 100 logflushentries.
>
> I dare say we should ship with that as the default...
>
> -ryan
>
> On Thu, Dec 10, 2009 at 6:02 PM, Jean-Daniel Cryans <jd...@apache.org>
> wrote:
> > So to satisfy Ryan's thirst of cluster number, here they are:
> >
> > Default (with write buffer)
> > 65 060ms
> >
> > The rest is without the write buffer (which is so well optimized that
> > we only sync once per 2MB batch). I ran it once with entries=1 because
> > it's taking so long.
> >
> > 1 logflushentries
> > 2 188 737ms
> >
> > 100 logflushentries
> > 697 590ms
> > 698 082ms
> >
> > deferred log flush
> > 545 836ms
> > 532 788ms
> >
> > The cluster is composed of 15 i7s (a bit overkill) but it shows that
> > it runs much slower because of network, replication, etc.
> >
> > Also on another cluster (same hardware) I did some 0.20 testing:
> >
> > With write buffer:
> > 131 811ms
> >
> > Without:
> > 602 842ms
> >
> > Keep in mind that the sync we call isn't HDFS-265.
> >
> > J-D
> >
> > On Thu, Dec 3, 2009 at 9:53 PM, stack <st...@duboce.net> wrote:
> >> Thanks for picking up this discussion again J-D.
> >>
> >> See below.
> >>
> >> On Thu, Dec 3, 2009 at 3:24 PM, Jean-Daniel Cryans <jdcryans@apache.org
> >wrote:
> >>
> >>> I have the feeling that this discussion isn't over, there's no
> >>> consensus yet, so I did some tests to get some numbers.
> >>>
> >>> PE sequentialWrite 1 with the write buffer disabled (I get the same
> >>> numbers on every different config with it) on a standalone setup.
> >>
> >>
> >> The write buffer is disabled because otherwise it will get in the way of
> the
> >> hbase.regionserver.flushlogentries=1?
> >>
> >> It would be interesting to get a baseline for 0.20 which IMO would be
> >> settings we had in 0.19 w/ write buffer.  Would be good for comparison.
> >>
> >> You like the idea of the sync being time-based rather than some number
> of
> >> edits?  I can see fellas wanting both.
> >>
> >> stack
> >>
> >>
> >> I
> >>> stopped HBase and deleted the data dir between each run.
> >>>
> >>> - hbase.regionserver.flushlogentries=1 and
> >>> hbase.regionserver.optionallogflushinterval=1000
> >>>  ran in 354765ms
> >>>
> >>> - hbase.regionserver.flushlogentries=100 and
> >>> hbase.regionserver.optionallogflushinterval=1000
> >>>  run #1 in 333972ms
> >>>  run #2 in 331943ms
> >>>
> >>> - hbase.regionserver.flushlogentries=1,
> >>> hbase.regionserver.optionallogflushinterval=1000 and deferred flush
> >>> enabled on TestTable
> >>>  run #1 in 309857ms
> >>>  run #2 in 311440ms
> >>>
> >>> So 100 entries per flush takes ~7% less time, deferred flush takes 14%
> >>> less.
> >>>
> >>> I thereby think that not only should we set flushlogentries=1 in 0.21,
> >>> but also we should enable deferred log flush by default with a lower
> >>> optional log flush interval. It will be a nearly as safe but much
> >>> faster alternative to the previous option. I would even get rid of the
> >>> hbase.regionserver.flushlogentries config.
> >>>
> >>> J-D
> >>>
> >>> On Tue, Nov 17, 2009 at 7:10 PM, Jean-Daniel Cryans <
> jdcryans@apache.org>
> >>> wrote:
> >>> > Well it's even better than that ;) We have optional log flushing
> which
> >>> > by default is 10 secs. Make that 100 milliseconds and that's as much
> >>> > data you can lose. If any other table syncs then this table's edits
> >>> > are also synced.
> >>> >
> >>> > J-D
> >>> >
> >>> >
> >>> > On Tue, Nov 17, 2009 at 4:36 PM, Jonathan Gray <jl...@streamy.com>
> >>> wrote:
> >>> >> Thoughts on a client-facing call to explicit call a WAL sync?  So I
> >>> could
> >>> >> turn on DEFERRED_LOG_FLUSH (possibly leave it on always), run a
> batch of
> >>> >> my inserts, and then run an explicit flush/sync.  The returning of
> that
> >>> >> call would guarantee to the client that the data up to that point is
> >>> safe.
> >>> >>
> >>> >> JG
> >>> >>
> >>> >> On Mon, November 16, 2009 11:00 am, Jean-Daniel Cryans wrote:
> >>> >>> I added a new feature for tables called "deferred flush", see
> >>> >>> https://issues.apache.org/jira/browse/HBASE-1944
> >>> >>>
> >>> >>>
> >>> >>> My opinion is that the default should be paranoid enough to not
> lose
> >>> >>> any user data. If we can change a table's attribute without taking
> it
> >>> down
> >>> >>> (there's a jira on that), wouldn't that solve the import problem?
> >>> >>>
> >>> >>>
> >>> >>> For example: have some table that needs to have fast insertion via
> MR.
> >>> >>> During the creation of the job, you change the table's
> >>> >>> DEFERRED_LOG_FLUSH to "true", then run the job and finally set the
> >>> >>> value to false when the job is done.
> >>> >>>
> >>> >>> This way you still pass the responsibility to the user but for
> >>> >>> performance reasons.
> >>> >>>
> >>> >>> J-D
> >>> >>>
> >>> >>>
> >>> >>> On Mon, Nov 16, 2009 at 2:05 AM, Cosmin Lehene <cl...@adobe.com>
> >>> wrote:
> >>> >>>
> >>> >>>> We could have a speedy default and an extra parameter for puts
> that
> >>> >>>> would specify a flush is needed. This way you pass the
> responsibility
> >>> to
> >>> >>>> the user and he can decide if he needs to be paranoid or not. This
> >>> could
> >>> >>>> be part of Put and even specify granularity of the flush if
> needed.
> >>> >>>>
> >>> >>>>
> >>> >>>> Cosmin
> >>> >>>>
> >>> >>>>
> >>> >>>>
> >>> >>>> On 11/15/09 6:59 PM, "Andrew Purtell" <ap...@apache.org>
> wrote:
> >>> >>>>
> >>> >>>>
> >>> >>>>> I agree with this.
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> I also think we should leave the default as is with the caveat
> that
> >>> >>>>> we call out the durability versus write performance tradeoff in
> the
> >>> >>>>> flushlogentries description and up on the wiki somewhere, maybe
> on
> >>> >>>>> http://wiki.apache.org/hadoop/PerformanceTuning . We could also
> >>> >>>>> provide two example configurations, one for performance
> (reasonable
> >>> >>>>> tradeoffs), one for paranoia. I put up an issue:
> >>> >>>>> https://issues.apache.org/jira/browse/HBASE-1984
> >>> >>>>>
> >>> >>>>>
> >>> >>>>>     - Andy
> >>> >>>>>
> >>> >>>>>
> >>> >>>>>
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> ________________________________
> >>> >>>>> From: Ryan Rawson <ry...@gmail.com>
> >>> >>>>> To: hbase-dev@hadoop.apache.org
> >>> >>>>> Sent: Sat, November 14, 2009 11:22:13 PM
> >>> >>>>> Subject: Re: Should we change the default value of
> >>> >>>>> hbase.regionserver.flushlogentries  for 0.21?
> >>> >>>>>
> >>> >>>>> That sync at the end of a RPC is my doing. You dont want to sync
> >>> >>>>> every _EDIT_, after all, the previous definition of the word
> "edit"
> >>> >>>>> was each KeyValue.  So we could be calling sync for every single
> >>> >>>>> column in a row. Bad stuff.
> >>> >>>>>
> >>> >>>>> In the end, if the regionserver crashes during a batch put, we
> will
> >>> >>>>> never know how much of the batch was flushed to the WAL. Thus it
> >>> makes
> >>> >>>>>  sense to only do it once and get a massive, massive, speedup.
> >>> >>>>>
> >>> >>>>> On Sat, Nov 14, 2009 at 9:45 PM, stack <st...@duboce.net> wrote:
> >>> >>>>>
> >>> >>>>>> I'm for leaving it as it is, at every 100 edits -- maybe every
> 10
> >>> >>>>>> edits? Speed stays as it was.  We used to lose MBs.  By default,
> >>> >>>>>> we'll now lose 99 or 9 edits max.
> >>> >>>>>>
> >>> >>>>>> We need to do some work bringing folks along regardless of what
> we
> >>> >>>>>> decide. Flush happens at the end of the put up in the
> regionserver.
> >>> >>>>>>  If you are
> >>> >>>>>> doing a batch of commits -- e.g. using a big write buffer over
> on
> >>> >>>>>> your client -- the puts will only be flushed on the way out
> after
> >>> >>>>>> the batch put completes EVEN if you have configured hbase to
> sync
> >>> >>>>>> every edit (I ran into this this evening.  J-D sorted me out).
>  We
> >>> >>>>>> need to make sure folks are up on this.
> >>> >>>>>>
> >>> >>>>>> St.Ack
> >>> >>>>>>
> >>> >>>>>>
> >>> >>>>>>
> >>> >>>>>>
> >>> >>>>>> On Sat, Nov 14, 2009 at 4:37 PM, Jean-Daniel Cryans
> >>> >>>>>> <jd...@apache.org>wrote:
> >>> >>>>>>
> >>> >>>>>>
> >>> >>>>>>> Hi dev!
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>> Hadoop 0.21 now has a reliable append and flush feature and
> this
> >>> >>>>>>> gives us the opportunity to review some assumptions. The
> current
> >>> >>>>>>> situation:
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>> - Every edit going to a catalog table is flushed so there's no
> >>> >>>>>>> data loss. - The user tables edits are flushed every
> >>> >>>>>>> hbase.regionserver.flushlogentries which by default is 100.
> >>> >>>>>>>
> >>> >>>>>>> Should we now set this value to 1 in order to have more durable
> >>> >>>>>>> but slower inserts by default? Please speak up.
> >>> >>>>>>>
> >>> >>>>>>> Thx,
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>> J-D
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>
> >>> >>>>>
> >>> >>>>>
> >>> >>>>>
> >>> >>>>>
> >>> >>>>
> >>> >>>>
> >>> >>>
> >>> >>>
> >>> >>
> >>> >>
> >>> >
> >>>
> >>
> >
>

Re: Should we change the default value of hbase.regionserver.flushlogentries for 0.21?

Posted by Ryan Rawson <ry...@gmail.com>.

Looks like deferred log flush is the clear winner here, and probably
has a smaller chance of loss than the 100 logflushentries.

I dare say we should ship with that as the default...

-ryan

On Thu, Dec 10, 2009 at 6:02 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
> So to satisfy Ryan's thirst of cluster number, here they are:
>
> Default (with write buffer)
> 65 060ms
>
> The rest is without the write buffer (which is so well optimized that
> we only sync once per 2MB batch). I ran it once with entries=1 because
> it's taking so long.
>
> 1 logflushentries
> 2 188 737ms
>
> 100 logflushentries
> 697 590ms
> 698 082ms
>
> deferred log flush
> 545 836ms
> 532 788ms
>
> The cluster is composed of 15 i7s (a bit overkill) but it shows that
> it runs much slower because of network, replication, etc.
>
> Also on another cluster (same hardware) I did some 0.20 testing:
>
> With write buffer:
> 131 811ms
>
> Without:
> 602 842ms
>
> Keep in mind that the sync we call isn't HDFS-265.
>
> J-D
>
> On Thu, Dec 3, 2009 at 9:53 PM, stack <st...@duboce.net> wrote:
>> Thanks for picking up this discussion again J-D.
>>
>> See below.
>>
>> On Thu, Dec 3, 2009 at 3:24 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>>
>>> I have the feeling that this discussion isn't over, there's no
>>> consensus yet, so I did some tests to get some numbers.
>>>
>>> PE sequentialWrite 1 with the write buffer disabled (I get the same
>>> numbers on every different config with it) on a standalone setup.
>>
>>
>> The write buffer is disabled because otherwise it will get in the way of the
>> hbase.regionserver.flushlogentries=1?
>>
>> It would be interesting to get a baseline for 0.20 which IMO would be
>> settings we had in 0.19 w/ write buffer.  Would be good for comparison.
>>
>> You like the idea of the sync being time-based rather than some number of
>> edits?  I can see fellas wanting both.
>>
>> stack
>>
>>
>> I
>>> stopped HBase and deleted the data dir between each run.
>>>
>>> - hbase.regionserver.flushlogentries=1 and
>>> hbase.regionserver.optionallogflushinterval=1000
>>>  ran in 354765ms
>>>
>>> - hbase.regionserver.flushlogentries=100 and
>>> hbase.regionserver.optionallogflushinterval=1000
>>>  run #1 in 333972ms
>>>  run #2 in 331943ms
>>>
>>> - hbase.regionserver.flushlogentries=1,
>>> hbase.regionserver.optionallogflushinterval=1000 and deferred flush
>>> enabled on TestTable
>>>  run #1 in 309857ms
>>>  run #2 in 311440ms
>>>
>>> So 100 entries per flush takes ~7% less time, deferred flush takes 14%
>>> less.
>>>
>>> I thereby think that not only should we set flushlogentries=1 in 0.21,
>>> but also we should enable deferred log flush by default with a lower
>>> optional log flush interval. It will be a nearly as safe but much
>>> faster alternative to the previous option. I would even get rid of the
>>> hbase.regionserver.flushlogentries config.
>>>
>>> J-D
>>>
>>> On Tue, Nov 17, 2009 at 7:10 PM, Jean-Daniel Cryans <jd...@apache.org>
>>> wrote:
>>> > Well it's even better than that ;) We have optional log flushing which
>>> > by default is 10 secs. Make that 100 milliseconds and that's as much
>>> > data you can lose. If any other table syncs then this table's edits
>>> > are also synced.
>>> >
>>> > J-D
>>> >
>>> >
>>> > On Tue, Nov 17, 2009 at 4:36 PM, Jonathan Gray <jl...@streamy.com>
>>> wrote:
>>> >> Thoughts on a client-facing call to explicit call a WAL sync?  So I
>>> could
>>> >> turn on DEFERRED_LOG_FLUSH (possibly leave it on always), run a batch of
>>> >> my inserts, and then run an explicit flush/sync.  The returning of that
>>> >> call would guarantee to the client that the data up to that point is
>>> safe.
>>> >>
>>> >> JG
>>> >>
>>> >> On Mon, November 16, 2009 11:00 am, Jean-Daniel Cryans wrote:
>>> >>> I added a new feature for tables called "deferred flush", see
>>> >>> https://issues.apache.org/jira/browse/HBASE-1944
>>> >>>
>>> >>>
>>> >>> My opinion is that the default should be paranoid enough to not lose
>>> >>> any user data. If we can change a table's attribute without taking it
>>> down
>>> >>> (there's a jira on that), wouldn't that solve the import problem?
>>> >>>
>>> >>>
>>> >>> For example: have some table that needs to have fast insertion via MR.
>>> >>> During the creation of the job, you change the table's
>>> >>> DEFERRED_LOG_FLUSH to "true", then run the job and finally set the
>>> >>> value to false when the job is done.
>>> >>>
>>> >>> This way you still pass the responsibility to the user but for
>>> >>> performance reasons.
>>> >>>
>>> >>> J-D
>>> >>>
>>> >>>
>>> >>> On Mon, Nov 16, 2009 at 2:05 AM, Cosmin Lehene <cl...@adobe.com>
>>> wrote:
>>> >>>
>>> >>>> We could have a speedy default and an extra parameter for puts that
>>> >>>> would specify a flush is needed. This way you pass the responsibility
>>> to
>>> >>>> the user and he can decide if he needs to be paranoid or not. This
>>> could
>>> >>>> be part of Put and even specify granularity of the flush if needed.
>>> >>>>
>>> >>>>
>>> >>>> Cosmin
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On 11/15/09 6:59 PM, "Andrew Purtell" <ap...@apache.org> wrote:
>>> >>>>
>>> >>>>
>>> >>>>> I agree with this.
>>> >>>>>
>>> >>>>>
>>> >>>>> I also think we should leave the default as is with the caveat that
>>> >>>>> we call out the durability versus write performance tradeoff in the
>>> >>>>> flushlogentries description and up on the wiki somewhere, maybe on
>>> >>>>> http://wiki.apache.org/hadoop/PerformanceTuning . We could also
>>> >>>>> provide two example configurations, one for performance (reasonable
>>> >>>>> tradeoffs), one for paranoia. I put up an issue:
>>> >>>>> https://issues.apache.org/jira/browse/HBASE-1984
>>> >>>>>
>>> >>>>>
>>> >>>>>     - Andy
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> ________________________________
>>> >>>>> From: Ryan Rawson <ry...@gmail.com>
>>> >>>>> To: hbase-dev@hadoop.apache.org
>>> >>>>> Sent: Sat, November 14, 2009 11:22:13 PM
>>> >>>>> Subject: Re: Should we change the default value of
>>> >>>>> hbase.regionserver.flushlogentries  for 0.21?
>>> >>>>>
>>> >>>>> That sync at the end of a RPC is my doing. You dont want to sync
>>> >>>>> every _EDIT_, after all, the previous definition of the word "edit"
>>> >>>>> was each KeyValue.  So we could be calling sync for every single
>>> >>>>> column in a row. Bad stuff.
>>> >>>>>
>>> >>>>> In the end, if the regionserver crashes during a batch put, we will
>>> >>>>> never know how much of the batch was flushed to the WAL. Thus it
>>> makes
>>> >>>>>  sense to only do it once and get a massive, massive, speedup.
>>> >>>>>
>>> >>>>> On Sat, Nov 14, 2009 at 9:45 PM, stack <st...@duboce.net> wrote:
>>> >>>>>
>>> >>>>>> I'm for leaving it as it is, at every 100 edits -- maybe every 10
>>> >>>>>> edits? Speed stays as it was.  We used to lose MBs.  By default,
>>> >>>>>> we'll now lose 99 or 9 edits max.
>>> >>>>>>
>>> >>>>>> We need to do some work bringing folks along regardless of what we
>>> >>>>>> decide. Flush happens at the end of the put up in the regionserver.
>>> >>>>>>  If you are
>>> >>>>>> doing a batch of commits -- e.g. using a big write buffer over on
>>> >>>>>> your client -- the puts will only be flushed on the way out after
>>> >>>>>> the batch put completes EVEN if you have configured hbase to sync
>>> >>>>>> every edit (I ran into this this evening.  J-D sorted me out).  We
>>> >>>>>> need to make sure folks are up on this.
>>> >>>>>>
>>> >>>>>> St.Ack
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On Sat, Nov 14, 2009 at 4:37 PM, Jean-Daniel Cryans
>>> >>>>>> <jd...@apache.org>wrote:
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>> Hi dev!
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> Hadoop 0.21 now has a reliable append and flush feature and this
>>> >>>>>>> gives us the opportunity to review some assumptions. The current
>>> >>>>>>> situation:
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> - Every edit going to a catalog table is flushed so there's no
>>> >>>>>>> data loss. - The user tables edits are flushed every
>>> >>>>>>> hbase.regionserver.flushlogentries which by default is 100.
>>> >>>>>>>
>>> >>>>>>> Should we now set this value to 1 in order to have more durable
>>> >>>>>>> but slower inserts by default? Please speak up.
>>> >>>>>>>
>>> >>>>>>> Thx,
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> J-D
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>
>>> >>>>
>>> >>>
>>> >>>
>>> >>
>>> >>
>>> >
>>>
>>
>

Re: Should we change the default value of hbase.regionserver.flushlogentries for 0.21?

Posted by Jean-Daniel Cryans <jd...@apache.org>.

So to satisfy Ryan's thirst of cluster number, here they are:

Default (with write buffer)
65 060ms

The rest is without the write buffer (which is so well optimized that
we only sync once per 2MB batch). I ran it once with entries=1 because
it's taking so long.

1 logflushentries
2 188 737ms

100 logflushentries
697 590ms
698 082ms

deferred log flush
545 836ms
532 788ms

The cluster is composed of 15 i7s (a bit overkill) but it shows that
it runs much slower because of network, replication, etc.

Also on another cluster (same hardware) I did some 0.20 testing:

With write buffer:
131 811ms

Without:
602 842ms

Keep in mind that the sync we call isn't HDFS-265.

J-D

On Thu, Dec 3, 2009 at 9:53 PM, stack <st...@duboce.net> wrote:
> Thanks for picking up this discussion again J-D.
>
> See below.
>
> On Thu, Dec 3, 2009 at 3:24 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> I have the feeling that this discussion isn't over, there's no
>> consensus yet, so I did some tests to get some numbers.
>>
>> PE sequentialWrite 1 with the write buffer disabled (I get the same
>> numbers on every different config with it) on a standalone setup.
>
>
> The write buffer is disabled because otherwise it will get in the way of the
> hbase.regionserver.flushlogentries=1?
>
> It would be interesting to get a baseline for 0.20 which IMO would be
> settings we had in 0.19 w/ write buffer.  Would be good for comparison.
>
> You like the idea of the sync being time-based rather than some number of
> edits?  I can see fellas wanting both.
>
> stack
>
>
> I
>> stopped HBase and deleted the data dir between each run.
>>
>> - hbase.regionserver.flushlogentries=1 and
>> hbase.regionserver.optionallogflushinterval=1000
>>  ran in 354765ms
>>
>> - hbase.regionserver.flushlogentries=100 and
>> hbase.regionserver.optionallogflushinterval=1000
>>  run #1 in 333972ms
>>  run #2 in 331943ms
>>
>> - hbase.regionserver.flushlogentries=1,
>> hbase.regionserver.optionallogflushinterval=1000 and deferred flush
>> enabled on TestTable
>>  run #1 in 309857ms
>>  run #2 in 311440ms
>>
>> So 100 entries per flush takes ~7% less time, deferred flush takes 14%
>> less.
>>
>> I thereby think that not only should we set flushlogentries=1 in 0.21,
>> but also we should enable deferred log flush by default with a lower
>> optional log flush interval. It will be a nearly as safe but much
>> faster alternative to the previous option. I would even get rid of the
>> hbase.regionserver.flushlogentries config.
>>
>> J-D
>>
>> On Tue, Nov 17, 2009 at 7:10 PM, Jean-Daniel Cryans <jd...@apache.org>
>> wrote:
>> > Well it's even better than that ;) We have optional log flushing which
>> > by default is 10 secs. Make that 100 milliseconds and that's as much
>> > data you can lose. If any other table syncs then this table's edits
>> > are also synced.
>> >
>> > J-D
>> >
>> >
>> > On Tue, Nov 17, 2009 at 4:36 PM, Jonathan Gray <jl...@streamy.com>
>> wrote:
>> >> Thoughts on a client-facing call to explicit call a WAL sync?  So I
>> could
>> >> turn on DEFERRED_LOG_FLUSH (possibly leave it on always), run a batch of
>> >> my inserts, and then run an explicit flush/sync.  The returning of that
>> >> call would guarantee to the client that the data up to that point is
>> safe.
>> >>
>> >> JG
>> >>
>> >> On Mon, November 16, 2009 11:00 am, Jean-Daniel Cryans wrote:
>> >>> I added a new feature for tables called "deferred flush", see
>> >>> https://issues.apache.org/jira/browse/HBASE-1944
>> >>>
>> >>>
>> >>> My opinion is that the default should be paranoid enough to not lose
>> >>> any user data. If we can change a table's attribute without taking it
>> down
>> >>> (there's a jira on that), wouldn't that solve the import problem?
>> >>>
>> >>>
>> >>> For example: have some table that needs to have fast insertion via MR.
>> >>> During the creation of the job, you change the table's
>> >>> DEFERRED_LOG_FLUSH to "true", then run the job and finally set the
>> >>> value to false when the job is done.
>> >>>
>> >>> This way you still pass the responsibility to the user but for
>> >>> performance reasons.
>> >>>
>> >>> J-D
>> >>>
>> >>>
>> >>> On Mon, Nov 16, 2009 at 2:05 AM, Cosmin Lehene <cl...@adobe.com>
>> wrote:
>> >>>
>> >>>> We could have a speedy default and an extra parameter for puts that
>> >>>> would specify a flush is needed. This way you pass the responsibility
>> to
>> >>>> the user and he can decide if he needs to be paranoid or not. This
>> could
>> >>>> be part of Put and even specify granularity of the flush if needed.
>> >>>>
>> >>>>
>> >>>> Cosmin
>> >>>>
>> >>>>
>> >>>>
>> >>>> On 11/15/09 6:59 PM, "Andrew Purtell" <ap...@apache.org> wrote:
>> >>>>
>> >>>>
>> >>>>> I agree with this.
>> >>>>>
>> >>>>>
>> >>>>> I also think we should leave the default as is with the caveat that
>> >>>>> we call out the durability versus write performance tradeoff in the
>> >>>>> flushlogentries description and up on the wiki somewhere, maybe on
>> >>>>> http://wiki.apache.org/hadoop/PerformanceTuning . We could also
>> >>>>> provide two example configurations, one for performance (reasonable
>> >>>>> tradeoffs), one for paranoia. I put up an issue:
>> >>>>> https://issues.apache.org/jira/browse/HBASE-1984
>> >>>>>
>> >>>>>
>> >>>>>     - Andy
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> ________________________________
>> >>>>> From: Ryan Rawson <ry...@gmail.com>
>> >>>>> To: hbase-dev@hadoop.apache.org
>> >>>>> Sent: Sat, November 14, 2009 11:22:13 PM
>> >>>>> Subject: Re: Should we change the default value of
>> >>>>> hbase.regionserver.flushlogentries  for 0.21?
>> >>>>>
>> >>>>> That sync at the end of a RPC is my doing. You dont want to sync
>> >>>>> every _EDIT_, after all, the previous definition of the word "edit"
>> >>>>> was each KeyValue.  So we could be calling sync for every single
>> >>>>> column in a row. Bad stuff.
>> >>>>>
>> >>>>> In the end, if the regionserver crashes during a batch put, we will
>> >>>>> never know how much of the batch was flushed to the WAL. Thus it
>> makes
>> >>>>>  sense to only do it once and get a massive, massive, speedup.
>> >>>>>
>> >>>>> On Sat, Nov 14, 2009 at 9:45 PM, stack <st...@duboce.net> wrote:
>> >>>>>
>> >>>>>> I'm for leaving it as it is, at every 100 edits -- maybe every 10
>> >>>>>> edits? Speed stays as it was.  We used to lose MBs.  By default,
>> >>>>>> we'll now lose 99 or 9 edits max.
>> >>>>>>
>> >>>>>> We need to do some work bringing folks along regardless of what we
>> >>>>>> decide. Flush happens at the end of the put up in the regionserver.
>> >>>>>>  If you are
>> >>>>>> doing a batch of commits -- e.g. using a big write buffer over on
>> >>>>>> your client -- the puts will only be flushed on the way out after
>> >>>>>> the batch put completes EVEN if you have configured hbase to sync
>> >>>>>> every edit (I ran into this this evening.  J-D sorted me out).  We
>> >>>>>> need to make sure folks are up on this.
>> >>>>>>
>> >>>>>> St.Ack
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> On Sat, Nov 14, 2009 at 4:37 PM, Jean-Daniel Cryans
>> >>>>>> <jd...@apache.org>wrote:
>> >>>>>>
>> >>>>>>
>> >>>>>>> Hi dev!
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Hadoop 0.21 now has a reliable append and flush feature and this
>> >>>>>>> gives us the opportunity to review some assumptions. The current
>> >>>>>>> situation:
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> - Every edit going to a catalog table is flushed so there's no
>> >>>>>>> data loss. - The user tables edits are flushed every
>> >>>>>>> hbase.regionserver.flushlogentries which by default is 100.
>> >>>>>>>
>> >>>>>>> Should we now set this value to 1 in order to have more durable
>> >>>>>>> but slower inserts by default? Please speak up.
>> >>>>>>>
>> >>>>>>> Thx,
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> J-D
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>
>> >>
>> >
>>
>

Re: Should we change the default value of hbase.regionserver.flushlogentries for 0.21?

Posted by stack <st...@duboce.net>.

Thanks for picking up this discussion again J-D.

See below.

On Thu, Dec 3, 2009 at 3:24 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> I have the feeling that this discussion isn't over, there's no
> consensus yet, so I did some tests to get some numbers.
>
> PE sequentialWrite 1 with the write buffer disabled (I get the same
> numbers on every different config with it) on a standalone setup.


The write buffer is disabled because otherwise it will get in the way of the
hbase.regionserver.flushlogentries=1?

It would be interesting to get a baseline for 0.20 which IMO would be
settings we had in 0.19 w/ write buffer.  Would be good for comparison.

You like the idea of the sync being time-based rather than some number of
edits?  I can see fellas wanting both.

stack


I
> stopped HBase and deleted the data dir between each run.
>
> - hbase.regionserver.flushlogentries=1 and
> hbase.regionserver.optionallogflushinterval=1000
>  ran in 354765ms
>
> - hbase.regionserver.flushlogentries=100 and
> hbase.regionserver.optionallogflushinterval=1000
>  run #1 in 333972ms
>  run #2 in 331943ms
>
> - hbase.regionserver.flushlogentries=1,
> hbase.regionserver.optionallogflushinterval=1000 and deferred flush
> enabled on TestTable
>  run #1 in 309857ms
>  run #2 in 311440ms
>
> So 100 entries per flush takes ~7% less time, deferred flush takes 14%
> less.
>
> I thereby think that not only should we set flushlogentries=1 in 0.21,
> but also we should enable deferred log flush by default with a lower
> optional log flush interval. It will be a nearly as safe but much
> faster alternative to the previous option. I would even get rid of the
> hbase.regionserver.flushlogentries config.
>
> J-D
>
> On Tue, Nov 17, 2009 at 7:10 PM, Jean-Daniel Cryans <jd...@apache.org>
> wrote:
> > Well it's even better than that ;) We have optional log flushing which
> > by default is 10 secs. Make that 100 milliseconds and that's as much
> > data you can lose. If any other table syncs then this table's edits
> > are also synced.
> >
> > J-D
> >
> >
> > On Tue, Nov 17, 2009 at 4:36 PM, Jonathan Gray <jl...@streamy.com>
> wrote:
> >> Thoughts on a client-facing call to explicit call a WAL sync?  So I
> could
> >> turn on DEFERRED_LOG_FLUSH (possibly leave it on always), run a batch of
> >> my inserts, and then run an explicit flush/sync.  The returning of that
> >> call would guarantee to the client that the data up to that point is
> safe.
> >>
> >> JG
> >>
> >> On Mon, November 16, 2009 11:00 am, Jean-Daniel Cryans wrote:
> >>> I added a new feature for tables called "deferred flush", see
> >>> https://issues.apache.org/jira/browse/HBASE-1944
> >>>
> >>>
> >>> My opinion is that the default should be paranoid enough to not lose
> >>> any user data. If we can change a table's attribute without taking it
> down
> >>> (there's a jira on that), wouldn't that solve the import problem?
> >>>
> >>>
> >>> For example: have some table that needs to have fast insertion via MR.
> >>> During the creation of the job, you change the table's
> >>> DEFERRED_LOG_FLUSH to "true", then run the job and finally set the
> >>> value to false when the job is done.
> >>>
> >>> This way you still pass the responsibility to the user but for
> >>> performance reasons.
> >>>
> >>> J-D
> >>>
> >>>
> >>> On Mon, Nov 16, 2009 at 2:05 AM, Cosmin Lehene <cl...@adobe.com>
> wrote:
> >>>
> >>>> We could have a speedy default and an extra parameter for puts that
> >>>> would specify a flush is needed. This way you pass the responsibility
> to
> >>>> the user and he can decide if he needs to be paranoid or not. This
> could
> >>>> be part of Put and even specify granularity of the flush if needed.
> >>>>
> >>>>
> >>>> Cosmin
> >>>>
> >>>>
> >>>>
> >>>> On 11/15/09 6:59 PM, "Andrew Purtell" <ap...@apache.org> wrote:
> >>>>
> >>>>
> >>>>> I agree with this.
> >>>>>
> >>>>>
> >>>>> I also think we should leave the default as is with the caveat that
> >>>>> we call out the durability versus write performance tradeoff in the
> >>>>> flushlogentries description and up on the wiki somewhere, maybe on
> >>>>> http://wiki.apache.org/hadoop/PerformanceTuning . We could also
> >>>>> provide two example configurations, one for performance (reasonable
> >>>>> tradeoffs), one for paranoia. I put up an issue:
> >>>>> https://issues.apache.org/jira/browse/HBASE-1984
> >>>>>
> >>>>>
> >>>>>     - Andy
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> ________________________________
> >>>>> From: Ryan Rawson <ry...@gmail.com>
> >>>>> To: hbase-dev@hadoop.apache.org
> >>>>> Sent: Sat, November 14, 2009 11:22:13 PM
> >>>>> Subject: Re: Should we change the default value of
> >>>>> hbase.regionserver.flushlogentries  for 0.21?
> >>>>>
> >>>>> That sync at the end of a RPC is my doing. You dont want to sync
> >>>>> every _EDIT_, after all, the previous definition of the word "edit"
> >>>>> was each KeyValue.  So we could be calling sync for every single
> >>>>> column in a row. Bad stuff.
> >>>>>
> >>>>> In the end, if the regionserver crashes during a batch put, we will
> >>>>> never know how much of the batch was flushed to the WAL. Thus it
> makes
> >>>>>  sense to only do it once and get a massive, massive, speedup.
> >>>>>
> >>>>> On Sat, Nov 14, 2009 at 9:45 PM, stack <st...@duboce.net> wrote:
> >>>>>
> >>>>>> I'm for leaving it as it is, at every 100 edits -- maybe every 10
> >>>>>> edits? Speed stays as it was.  We used to lose MBs.  By default,
> >>>>>> we'll now lose 99 or 9 edits max.
> >>>>>>
> >>>>>> We need to do some work bringing folks along regardless of what we
> >>>>>> decide. Flush happens at the end of the put up in the regionserver.
> >>>>>>  If you are
> >>>>>> doing a batch of commits -- e.g. using a big write buffer over on
> >>>>>> your client -- the puts will only be flushed on the way out after
> >>>>>> the batch put completes EVEN if you have configured hbase to sync
> >>>>>> every edit (I ran into this this evening.  J-D sorted me out).  We
> >>>>>> need to make sure folks are up on this.
> >>>>>>
> >>>>>> St.Ack
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Sat, Nov 14, 2009 at 4:37 PM, Jean-Daniel Cryans
> >>>>>> <jd...@apache.org>wrote:
> >>>>>>
> >>>>>>
> >>>>>>> Hi dev!
> >>>>>>>
> >>>>>>>
> >>>>>>> Hadoop 0.21 now has a reliable append and flush feature and this
> >>>>>>> gives us the opportunity to review some assumptions. The current
> >>>>>>> situation:
> >>>>>>>
> >>>>>>>
> >>>>>>> - Every edit going to a catalog table is flushed so there's no
> >>>>>>> data loss. - The user tables edits are flushed every
> >>>>>>> hbase.regionserver.flushlogentries which by default is 100.
> >>>>>>>
> >>>>>>> Should we now set this value to 1 in order to have more durable
> >>>>>>> but slower inserts by default? Please speak up.
> >>>>>>>
> >>>>>>> Thx,
> >>>>>>>
> >>>>>>>
> >>>>>>> J-D
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >
>

Re: Should we change the default value of hbase.regionserver.flushlogentries for 0.21?

Posted by Ryan Rawson <ry...@gmail.com>.

Thanks for collecting this data. I think the expectation is that HBase
is both fast and reliable, so picking an option that ensures that is
tricky.

I generally support flushlogentries=1, but I think we need a clustered
test before we can say.  The performance is substantially different on
HDFS across multiple hosts.

On Thu, Dec 3, 2009 at 3:24 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
> I have the feeling that this discussion isn't over, there's no
> consensus yet, so I did some tests to get some numbers.
>
> PE sequentialWrite 1 with the write buffer disabled (I get the same
> numbers on every different config with it) on a standalone setup. I
> stopped HBase and deleted the data dir between each run.
>
> - hbase.regionserver.flushlogentries=1 and
> hbase.regionserver.optionallogflushinterval=1000
>  ran in 354765ms
>
> - hbase.regionserver.flushlogentries=100 and
> hbase.regionserver.optionallogflushinterval=1000
>  run #1 in 333972ms
>  run #2 in 331943ms
>
> - hbase.regionserver.flushlogentries=1,
> hbase.regionserver.optionallogflushinterval=1000 and deferred flush
> enabled on TestTable
>  run #1 in 309857ms
>  run #2 in 311440ms
>
> So 100 entries per flush takes ~7% less time, deferred flush takes 14% less.
>
> I thereby think that not only should we set flushlogentries=1 in 0.21,
> but also we should enable deferred log flush by default with a lower
> optional log flush interval. It will be a nearly as safe but much
> faster alternative to the previous option. I would even get rid of the
> hbase.regionserver.flushlogentries config.
>
> J-D
>
> On Tue, Nov 17, 2009 at 7:10 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
>> Well it's even better than that ;) We have optional log flushing which
>> by default is 10 secs. Make that 100 milliseconds and that's as much
>> data you can lose. If any other table syncs then this table's edits
>> are also synced.
>>
>> J-D
>>
>>
>> On Tue, Nov 17, 2009 at 4:36 PM, Jonathan Gray <jl...@streamy.com> wrote:
>>> Thoughts on a client-facing call to explicit call a WAL sync?  So I could
>>> turn on DEFERRED_LOG_FLUSH (possibly leave it on always), run a batch of
>>> my inserts, and then run an explicit flush/sync.  The returning of that
>>> call would guarantee to the client that the data up to that point is safe.
>>>
>>> JG
>>>
>>> On Mon, November 16, 2009 11:00 am, Jean-Daniel Cryans wrote:
>>>> I added a new feature for tables called "deferred flush", see
>>>> https://issues.apache.org/jira/browse/HBASE-1944
>>>>
>>>>
>>>> My opinion is that the default should be paranoid enough to not lose
>>>> any user data. If we can change a table's attribute without taking it down
>>>> (there's a jira on that), wouldn't that solve the import problem?
>>>>
>>>>
>>>> For example: have some table that needs to have fast insertion via MR.
>>>> During the creation of the job, you change the table's
>>>> DEFERRED_LOG_FLUSH to "true", then run the job and finally set the
>>>> value to false when the job is done.
>>>>
>>>> This way you still pass the responsibility to the user but for
>>>> performance reasons.
>>>>
>>>> J-D
>>>>
>>>>
>>>> On Mon, Nov 16, 2009 at 2:05 AM, Cosmin Lehene <cl...@adobe.com> wrote:
>>>>
>>>>> We could have a speedy default and an extra parameter for puts that
>>>>> would specify a flush is needed. This way you pass the responsibility to
>>>>> the user and he can decide if he needs to be paranoid or not. This could
>>>>> be part of Put and even specify granularity of the flush if needed.
>>>>>
>>>>>
>>>>> Cosmin
>>>>>
>>>>>
>>>>>
>>>>> On 11/15/09 6:59 PM, "Andrew Purtell" <ap...@apache.org> wrote:
>>>>>
>>>>>
>>>>>> I agree with this.
>>>>>>
>>>>>>
>>>>>> I also think we should leave the default as is with the caveat that
>>>>>> we call out the durability versus write performance tradeoff in the
>>>>>> flushlogentries description and up on the wiki somewhere, maybe on
>>>>>> http://wiki.apache.org/hadoop/PerformanceTuning . We could also
>>>>>> provide two example configurations, one for performance (reasonable
>>>>>> tradeoffs), one for paranoia. I put up an issue:
>>>>>> https://issues.apache.org/jira/browse/HBASE-1984
>>>>>>
>>>>>>
>>>>>>     - Andy
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ________________________________
>>>>>> From: Ryan Rawson <ry...@gmail.com>
>>>>>> To: hbase-dev@hadoop.apache.org
>>>>>> Sent: Sat, November 14, 2009 11:22:13 PM
>>>>>> Subject: Re: Should we change the default value of
>>>>>> hbase.regionserver.flushlogentries  for 0.21?
>>>>>>
>>>>>> That sync at the end of a RPC is my doing. You dont want to sync
>>>>>> every _EDIT_, after all, the previous definition of the word "edit"
>>>>>> was each KeyValue.  So we could be calling sync for every single
>>>>>> column in a row. Bad stuff.
>>>>>>
>>>>>> In the end, if the regionserver crashes during a batch put, we will
>>>>>> never know how much of the batch was flushed to the WAL. Thus it makes
>>>>>>  sense to only do it once and get a massive, massive, speedup.
>>>>>>
>>>>>> On Sat, Nov 14, 2009 at 9:45 PM, stack <st...@duboce.net> wrote:
>>>>>>
>>>>>>> I'm for leaving it as it is, at every 100 edits -- maybe every 10
>>>>>>> edits? Speed stays as it was.  We used to lose MBs.  By default,
>>>>>>> we'll now lose 99 or 9 edits max.
>>>>>>>
>>>>>>> We need to do some work bringing folks along regardless of what we
>>>>>>> decide. Flush happens at the end of the put up in the regionserver.
>>>>>>>  If you are
>>>>>>> doing a batch of commits -- e.g. using a big write buffer over on
>>>>>>> your client -- the puts will only be flushed on the way out after
>>>>>>> the batch put completes EVEN if you have configured hbase to sync
>>>>>>> every edit (I ran into this this evening.  J-D sorted me out).  We
>>>>>>> need to make sure folks are up on this.
>>>>>>>
>>>>>>> St.Ack
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Nov 14, 2009 at 4:37 PM, Jean-Daniel Cryans
>>>>>>> <jd...@apache.org>wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Hi dev!
>>>>>>>>
>>>>>>>>
>>>>>>>> Hadoop 0.21 now has a reliable append and flush feature and this
>>>>>>>> gives us the opportunity to review some assumptions. The current
>>>>>>>> situation:
>>>>>>>>
>>>>>>>>
>>>>>>>> - Every edit going to a catalog table is flushed so there's no
>>>>>>>> data loss. - The user tables edits are flushed every
>>>>>>>> hbase.regionserver.flushlogentries which by default is 100.
>>>>>>>>
>>>>>>>> Should we now set this value to 1 in order to have more durable
>>>>>>>> but slower inserts by default? Please speak up.
>>>>>>>>
>>>>>>>> Thx,
>>>>>>>>
>>>>>>>>
>>>>>>>> J-D
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>