You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Eshcar Hillel <es...@yahoo-inc.com.INVALID> on 2017/07/05 13:30:07 UTC

[DISCUSS] Should flush decisions be made based on data size (key-value only) or based on heap size (including metadata overhead)?

Hi All,
I opened a new Jira https://issues.apache.org/jira/browse/HBASE-18294 to discuss this question.
Flush decisions are taken at the region level and also at the region server level - there is the question of when to trigger a flush and then which region/store to flush.Regions track both their data size (key-value size only) and their total heap occupancy (including index and additional metadata).One option (which was the past policy) is to trigger flushes and choose flush subjects based on regions heap size - this gives a better estimation for sysadmin of how many regions can a RS carry.Another option (which is the current policy) is to look at the data size - this gives a better estimation of the size of the files that are created by the flush.  
I see this is as critical to HBase performance and usability, namely meeting the user expectation from the system, hence I would like to hear as many voices as possible.Please join the discussion in the Jira and let us know what you think.
Thanks,Eshcar

Re: [DISCUSS] Should flush decisions be made based on data size (key-value only) or based on heap size (including metadata overhead)?

Posted by Eshcar Hillel <es...@yahoo-inc.com.INVALID>.

We can change the AtomicLong to be AtomicReference<MemStoreSize> which is updated atomically.MemStoreSize can be changed to hold on-heap memory, off-heap memory and data size, or just on-heap memory and off-heap memory if data size is not required anymore.



On Monday, August 7, 2017, 10:23:12 AM GMT+3, Anoop John <an...@gmail.com> wrote:

Sorry for being later to reply.

So u mean we should track both sizes even at Region level?  This was
considered at that time but did not do as that will add more overhead.
We have to deal with 2 AtomicLongs in every Region.  Right now we
handle this double check at RS level only so that added just one more
variable dealing.

-Anoop-

On Mon, Jul 10, 2017 at 7:34 PM, Eshcar Hillel
<es...@yahoo-inc.com.invalid> wrote:
> Here is a suggestion:We can track both heap and off-heap sizes and have 2 thresholds one for limiting heap size and one for limiting off-heap size.And in all decision making junctions we check whether one of the thresholds is exceeded and if it is we trigger a flush. We can choose which entity to flush based on the cause.For example, if we decided to flush since the heap size exceeds the heap threshold than we flush the region/store with greatest heap size. and likewise for off-heap flush.
>
> I can prepare a patch.
>
> This is not rolling back HBASE-18294 simply refining it to have different decision making for the on and off heap cases.
>
> On Monday, July 10, 2017, 8:25:12 AM GMT+3, Anoop John <an...@gmail.com> wrote:
>
> Stack and others..
> We wont do any OOM or FullGC issues.  Because globally at RS level we
> will track both the data size (of all the memstores) and the heap
> size.  The decision there accounts both. In fact in case of normal on
> heap memstores, the accounting is like the old way of heap size based.
>
> At region level (and at Segments level)  we track data size only.  The
> decisions are based on data size.
>
> So in the past region flush size of 128 MB means we will flush when
> the heap size of that region crosses 128 MB.  But now it is data size
> alone.  What I feel is that is more inclined to a normal user
> thinking.  He say flush size of 128 MB and then the thinking can be
> 128 MB of data.
>
> The background of this change is the off heap memstores where we need
> separate tracking of both data and heap overhead sizes.  But at
> region level this behave change was done thinking that is more user
> oriented
>
> I agree with Yu that it is a surprising behave change. Ya if not tuned
> accordingly one might see more blocked writes. Because the per region
> flushes are more delayed now and so chances of reaching the global
> memstore upper barrier chances are more.  And then we will block
> writes and force flushes.  (But off heap memstores will do better job
> here).  But this would NOT cause any OOME or FullGC.
>
> I guess we should have reduced the 128 MB default flush size then?  I
> asked this Q in that jira and then we did not discuss further.
>
> I hope I explained the background and the change and the impacts.  Thanks.
>
> -Anoop-
>
> On Thu, Jul 6, 2017 at 11:43 AM, 宾莉金（binlijin） <bi...@gmail.com> wrote:
>> I like to use the former, heap occupancy, so we not need to worry about the
>> OOM and FullGc，and change configuration to adapted to new policy.
>>
>> 2017-07-06 14:03 GMT+08:00 Stack <st...@duboce.net>:
>>
>>> On Wed, Jul 5, 2017 at 9:59 PM, ramkrishna vasudevan <
>>> ramkrishna.s.vasudevan@gmail.com> wrote:
>>>
>>> >
>>> > >>Sounds like we should be doing the former, heap occupancy
>>> > Stack, so do you mean we need to roll back this new change in trunk? The
>>> > background is https://issues.apache.org/jira/browse/HBASE-16747.
>>> >
>>> >
>>> I remember that issue. It seems good to me (as it did then) where we have
>>> the global tracking in RS of all data and overhead so we shouldn't OOME and
>>> we keep accounting of overhead and data distinct because now data can be
>>> onheap or offheap.
>>>
>>> We shouldn't be doing blocking updates -- not when there is probably loads
>>> of memory still available -- but that is a different (critical) issue.
>>> Sounds like current configs can 'surprise' -- see Yu Li note -- given the
>>> new accounting.
>>>
>>> Looks like I need to read HBASE-18294
>>> <https://issues.apache.org/jira/browse/HBASE-18294> to figure what the
>>> pivot/problem w/ the new policy is.....
>>>
>>> Thanks,
>>> St.Ack
>>>
>>>
>>>
>>>
>>>
>>> > Regards
>>> > Ram
>>> >
>>> >
>>> > On Thu, Jul 6, 2017 at 8:40 AM, Yu Li <ca...@gmail.com> wrote:
>>> >
>>> > > We've also observed more blocking updates happening with the new policy
>>> > > (flush decision made on data size), but could work-around it by
>>> reducing
>>> > > the hbase.hregion.memstore.flush.size setting. The advantage of
>>> current
>>> > > policy is we could control the flushed file size more accurately, but
>>> > > meanwhile losing some "compatibility" (requires configuration updating
>>> > > during rolling upgrade).
>>> > >
>>> > > I'm not sure whether we should rollback, but if stick on current policy
>>> > > there should be more documents, metrics (monitoring heap/data occupancy
>>> > > separately) and log message refinements, etc. Attaching some of the
>>> logs
>>> > we
>>> > > observed, which is pretty confusing w/o knowing the details of
>>> > > implementation:
>>> > >
>>> > > 2017-07-03 16:11:54,724 INFO
>>> > >  [B.defaultRpcServer.handler=182,queue=11,port=16020]
>>> > > regionserver.MemStoreFlusher: Blocking updates on
>>> > > hadoop0528.et2.tbsite.net,16020,1497336978160:
>>> > > global memstore heapsize 7.2 G is >= than blocking 7.2 G size
>>> > > 2017-07-03 16:11:54,754 INFO
>>> > >  [B.defaultRpcServer.handler=186,queue=15,port=16020]
>>> > > regionserver.MemStoreFlusher: Blocking updates on
>>> > > hadoop0528.et2.tbsite.net,16020,1497336978160:
>>> > > global memstore heapsize 7.2 G is >= than blocking 7.2 G size
>>> > > 2017-07-03 16:11:57,571 INFO  [MemStoreFlusher.0]
>>> > > regionserver.MemStoreFlusher: Flush of region
>>> > > mainv7_main_result_c,1496,1499062935573.02adfa7cbdc606dce5b79a516e1649
>>> > 2a.
>>> > > due to global heap pressure. Total Memstore size=3.2 G, Region memstore
>>> > > size=331.4 M
>>> > > 2017-07-03 16:11:57,571 WARN
>>> > >  [B.defaultRpcServer.handler=49,queue=11,port=16020]
>>> > > regionserver.MemStoreFlusher: Memstore is above high water mark and
>>> block
>>> > > 2892ms
>>> > >
>>> > > Best Regards,
>>> > > Yu
>>> > >
>>> > > On 6 July 2017 at 00:56, Stack <st...@duboce.net> wrote:
>>> > >
>>> > > > On Wed, Jul 5, 2017 at 6:30 AM, Eshcar Hillel
>>> > > <eshcar@yahoo-inc.com.invalid
>>> > > > >
>>> > > > wrote:
>>> > > >
>>> > > > > Hi All,
>>> > > > > I opened a new Jira https://issues.apache.org/
>>> > jira/browse/HBASE-18294
>>> > > to
>>> > > > > discuss this question.
>>> > > > > Flush decisions are taken at the region level and also at the
>>> region
>>> > > > > server level - there is the question of when to trigger a flush and
>>> > > then
>>> > > > > which region/store to flush.Regions track both their data size
>>> > > (key-value
>>> > > > > size only) and their total heap occupancy (including index and
>>> > > additional
>>> > > > > metadata).One option (which was the past policy) is to trigger
>>> > flushes
>>> > > > and
>>> > > > > choose flush subjects based on regions heap size - this gives a
>>> > better
>>> > > > > estimation for sysadmin of how many regions can a RS carry.Another
>>> > > option
>>> > > > > (which is the current policy) is to look at the data size - this
>>> > gives
>>> > > a
>>> > > > > better estimation of the size of the files that are created by the
>>> > > flush.
>>> > > > >
>>> > > >
>>> > > >
>>> > > > Sounds like we should be doing the former, heap occupancy. An
>>> > > > OutOfMemoryException puts a nail in any benefit other accountings
>>> might
>>> > > > have.
>>> > > >
>>> > > > St.Ack
>>> > > >
>>> > > >
>>> > > >
>>> > > > > I see this is as critical to HBase performance and usability,
>>> namely
>>> > > > > meeting the user expectation from the system, hence I would like to
>>> > > hear
>>> > > > as
>>> > > > > many voices as possible.Please join the discussion in the Jira and
>>> > let
>>> > > us
>>> > > > > know what you think.
>>> > > > > Thanks,Eshcar
>>> > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>>
>> --
>> *Best Regards,*
>>  lijin bin

Re: [DISCUSS] Should flush decisions be made based on data size (key-value only) or based on heap size (including metadata overhead)?

Posted by Anoop John <an...@gmail.com>.

Sorry for being later to reply.

So u mean we should track both sizes even at Region level?  This was
considered at that time but did not do as that will add more overhead.
We have to deal with 2 AtomicLongs in every Region.  Right now we
handle this double check at RS level only so that added just one more
variable dealing.

-Anoop-

On Mon, Jul 10, 2017 at 7:34 PM, Eshcar Hillel
<es...@yahoo-inc.com.invalid> wrote:
> Here is a suggestion:We can track both heap and off-heap sizes and have 2 thresholds one for limiting heap size and one for limiting off-heap size.And in all decision making junctions we check whether one of the thresholds is exceeded and if it is we trigger a flush. We can choose which entity to flush based on the cause.For example, if we decided to flush since the heap size exceeds the heap threshold than we flush the region/store with greatest heap size. and likewise for off-heap flush.
>
> I can prepare a patch.
>
> This is not rolling back HBASE-18294 simply refining it to have different decision making for the on and off heap cases.
>
> On Monday, July 10, 2017, 8:25:12 AM GMT+3, Anoop John <an...@gmail.com> wrote:
>
> Stack and others..
> We wont do any OOM or FullGC issues.  Because globally at RS level we
> will track both the data size (of all the memstores) and the heap
> size.  The decision there accounts both. In fact in case of normal on
> heap memstores, the accounting is like the old way of heap size based.
>
> At region level (and at Segments level)  we track data size only.  The
> decisions are based on data size.
>
> So in the past region flush size of 128 MB means we will flush when
> the heap size of that region crosses 128 MB.  But now it is data size
> alone.  What I feel is that is more inclined to a normal user
> thinking.  He say flush size of 128 MB and then the thinking can be
> 128 MB of data.
>
> The background of this change is the off heap memstores where we need
> separate tracking of both data and heap overhead sizes.  But at
> region level this behave change was done thinking that is more user
> oriented
>
> I agree with Yu that it is a surprising behave change. Ya if not tuned
> accordingly one might see more blocked writes. Because the per region
> flushes are more delayed now and so chances of reaching the global
> memstore upper barrier chances are more.  And then we will block
> writes and force flushes.  (But off heap memstores will do better job
> here).  But this would NOT cause any OOME or FullGC.
>
> I guess we should have reduced the 128 MB default flush size then?  I
> asked this Q in that jira and then we did not discuss further.
>
> I hope I explained the background and the change and the impacts.  Thanks.
>
> -Anoop-
>
> On Thu, Jul 6, 2017 at 11:43 AM, 宾莉金（binlijin） <bi...@gmail.com> wrote:
>> I like to use the former, heap occupancy, so we not need to worry about the
>> OOM and FullGc，and change configuration to adapted to new policy.
>>
>> 2017-07-06 14:03 GMT+08:00 Stack <st...@duboce.net>:
>>
>>> On Wed, Jul 5, 2017 at 9:59 PM, ramkrishna vasudevan <
>>> ramkrishna.s.vasudevan@gmail.com> wrote:
>>>
>>> >
>>> > >>Sounds like we should be doing the former, heap occupancy
>>> > Stack, so do you mean we need to roll back this new change in trunk? The
>>> > background is https://issues.apache.org/jira/browse/HBASE-16747.
>>> >
>>> >
>>> I remember that issue. It seems good to me (as it did then) where we have
>>> the global tracking in RS of all data and overhead so we shouldn't OOME and
>>> we keep accounting of overhead and data distinct because now data can be
>>> onheap or offheap.
>>>
>>> We shouldn't be doing blocking updates -- not when there is probably loads
>>> of memory still available -- but that is a different (critical) issue.
>>> Sounds like current configs can 'surprise' -- see Yu Li note -- given the
>>> new accounting.
>>>
>>> Looks like I need to read HBASE-18294
>>> <https://issues.apache.org/jira/browse/HBASE-18294> to figure what the
>>> pivot/problem w/ the new policy is.....
>>>
>>> Thanks,
>>> St.Ack
>>>
>>>
>>>
>>>
>>>
>>> > Regards
>>> > Ram
>>> >
>>> >
>>> > On Thu, Jul 6, 2017 at 8:40 AM, Yu Li <ca...@gmail.com> wrote:
>>> >
>>> > > We've also observed more blocking updates happening with the new policy
>>> > > (flush decision made on data size), but could work-around it by
>>> reducing
>>> > > the hbase.hregion.memstore.flush.size setting. The advantage of
>>> current
>>> > > policy is we could control the flushed file size more accurately, but
>>> > > meanwhile losing some "compatibility" (requires configuration updating
>>> > > during rolling upgrade).
>>> > >
>>> > > I'm not sure whether we should rollback, but if stick on current policy
>>> > > there should be more documents, metrics (monitoring heap/data occupancy
>>> > > separately) and log message refinements, etc. Attaching some of the
>>> logs
>>> > we
>>> > > observed, which is pretty confusing w/o knowing the details of
>>> > > implementation:
>>> > >
>>> > > 2017-07-03 16:11:54,724 INFO
>>> > >  [B.defaultRpcServer.handler=182,queue=11,port=16020]
>>> > > regionserver.MemStoreFlusher: Blocking updates on
>>> > > hadoop0528.et2.tbsite.net,16020,1497336978160:
>>> > > global memstore heapsize 7.2 G is >= than blocking 7.2 G size
>>> > > 2017-07-03 16:11:54,754 INFO
>>> > >  [B.defaultRpcServer.handler=186,queue=15,port=16020]
>>> > > regionserver.MemStoreFlusher: Blocking updates on
>>> > > hadoop0528.et2.tbsite.net,16020,1497336978160:
>>> > > global memstore heapsize 7.2 G is >= than blocking 7.2 G size
>>> > > 2017-07-03 16:11:57,571 INFO  [MemStoreFlusher.0]
>>> > > regionserver.MemStoreFlusher: Flush of region
>>> > > mainv7_main_result_c,1496,1499062935573.02adfa7cbdc606dce5b79a516e1649
>>> > 2a.
>>> > > due to global heap pressure. Total Memstore size=3.2 G, Region memstore
>>> > > size=331.4 M
>>> > > 2017-07-03 16:11:57,571 WARN
>>> > >  [B.defaultRpcServer.handler=49,queue=11,port=16020]
>>> > > regionserver.MemStoreFlusher: Memstore is above high water mark and
>>> block
>>> > > 2892ms
>>> > >
>>> > > Best Regards,
>>> > > Yu
>>> > >
>>> > > On 6 July 2017 at 00:56, Stack <st...@duboce.net> wrote:
>>> > >
>>> > > > On Wed, Jul 5, 2017 at 6:30 AM, Eshcar Hillel
>>> > > <eshcar@yahoo-inc.com.invalid
>>> > > > >
>>> > > > wrote:
>>> > > >
>>> > > > > Hi All,
>>> > > > > I opened a new Jira https://issues.apache.org/
>>> > jira/browse/HBASE-18294
>>> > > to
>>> > > > > discuss this question.
>>> > > > > Flush decisions are taken at the region level and also at the
>>> region
>>> > > > > server level - there is the question of when to trigger a flush and
>>> > > then
>>> > > > > which region/store to flush.Regions track both their data size
>>> > > (key-value
>>> > > > > size only) and their total heap occupancy (including index and
>>> > > additional
>>> > > > > metadata).One option (which was the past policy) is to trigger
>>> > flushes
>>> > > > and
>>> > > > > choose flush subjects based on regions heap size - this gives a
>>> > better
>>> > > > > estimation for sysadmin of how many regions can a RS carry.Another
>>> > > option
>>> > > > > (which is the current policy) is to look at the data size - this
>>> > gives
>>> > > a
>>> > > > > better estimation of the size of the files that are created by the
>>> > > flush.
>>> > > > >
>>> > > >
>>> > > >
>>> > > > Sounds like we should be doing the former, heap occupancy. An
>>> > > > OutOfMemoryException puts a nail in any benefit other accountings
>>> might
>>> > > > have.
>>> > > >
>>> > > > St.Ack
>>> > > >
>>> > > >
>>> > > >
>>> > > > > I see this is as critical to HBase performance and usability,
>>> namely
>>> > > > > meeting the user expectation from the system, hence I would like to
>>> > > hear
>>> > > > as
>>> > > > > many voices as possible.Please join the discussion in the Jira and
>>> > let
>>> > > us
>>> > > > > know what you think.
>>> > > > > Thanks,Eshcar
>>> > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>>
>> --
>> *Best Regards,*
>>  lijin bin

Re: [DISCUSS] Should flush decisions be made based on data size (key-value only) or based on heap size (including metadata overhead)?

Posted by Eshcar Hillel <es...@yahoo-inc.com.INVALID>.

Here is a suggestion:We can track both heap and off-heap sizes and have 2 thresholds one for limiting heap size and one for limiting off-heap size.And in all decision making junctions we check whether one of the thresholds is exceeded and if it is we trigger a flush. We can choose which entity to flush based on the cause.For example, if we decided to flush since the heap size exceeds the heap threshold than we flush the region/store with greatest heap size. and likewise for off-heap flush.

I can prepare a patch.

This is not rolling back HBASE-18294 simply refining it to have different decision making for the on and off heap cases.

On Monday, July 10, 2017, 8:25:12 AM GMT+3, Anoop John <an...@gmail.com> wrote:

Stack and others..
We wont do any OOM or FullGC issues.  Because globally at RS level we
will track both the data size (of all the memstores) and the heap
size.  The decision there accounts both. In fact in case of normal on
heap memstores, the accounting is like the old way of heap size based.

At region level (and at Segments level)  we track data size only.  The
decisions are based on data size.

So in the past region flush size of 128 MB means we will flush when
the heap size of that region crosses 128 MB.  But now it is data size
alone.  What I feel is that is more inclined to a normal user
thinking.  He say flush size of 128 MB and then the thinking can be
128 MB of data.

The background of this change is the off heap memstores where we need
separate tracking of both data and heap overhead sizes.  But at
region level this behave change was done thinking that is more user
oriented

I agree with Yu that it is a surprising behave change. Ya if not tuned
accordingly one might see more blocked writes. Because the per region
flushes are more delayed now and so chances of reaching the global
memstore upper barrier chances are more.  And then we will block
writes and force flushes.  (But off heap memstores will do better job
here).  But this would NOT cause any OOME or FullGC.

I guess we should have reduced the 128 MB default flush size then?  I
asked this Q in that jira and then we did not discuss further.

I hope I explained the background and the change and the impacts.  Thanks.

-Anoop-

On Thu, Jul 6, 2017 at 11:43 AM, 宾莉金（binlijin） <bi...@gmail.com> wrote:
> I like to use the former, heap occupancy, so we not need to worry about the
> OOM and FullGc，and change configuration to adapted to new policy.
>
> 2017-07-06 14:03 GMT+08:00 Stack <st...@duboce.net>:
>
>> On Wed, Jul 5, 2017 at 9:59 PM, ramkrishna vasudevan <
>> ramkrishna.s.vasudevan@gmail.com> wrote:
>>
>> >
>> > >>Sounds like we should be doing the former, heap occupancy
>> > Stack, so do you mean we need to roll back this new change in trunk? The
>> > background is https://issues.apache.org/jira/browse/HBASE-16747.
>> >
>> >
>> I remember that issue. It seems good to me (as it did then) where we have
>> the global tracking in RS of all data and overhead so we shouldn't OOME and
>> we keep accounting of overhead and data distinct because now data can be
>> onheap or offheap.
>>
>> We shouldn't be doing blocking updates -- not when there is probably loads
>> of memory still available -- but that is a different (critical) issue.
>> Sounds like current configs can 'surprise' -- see Yu Li note -- given the
>> new accounting.
>>
>> Looks like I need to read HBASE-18294
>> <https://issues.apache.org/jira/browse/HBASE-18294> to figure what the
>> pivot/problem w/ the new policy is.....
>>
>> Thanks,
>> St.Ack
>>
>>
>>
>>
>>
>> > Regards
>> > Ram
>> >
>> >
>> > On Thu, Jul 6, 2017 at 8:40 AM, Yu Li <ca...@gmail.com> wrote:
>> >
>> > > We've also observed more blocking updates happening with the new policy
>> > > (flush decision made on data size), but could work-around it by
>> reducing
>> > > the hbase.hregion.memstore.flush.size setting. The advantage of
>> current
>> > > policy is we could control the flushed file size more accurately, but
>> > > meanwhile losing some "compatibility" (requires configuration updating
>> > > during rolling upgrade).
>> > >
>> > > I'm not sure whether we should rollback, but if stick on current policy
>> > > there should be more documents, metrics (monitoring heap/data occupancy
>> > > separately) and log message refinements, etc. Attaching some of the
>> logs
>> > we
>> > > observed, which is pretty confusing w/o knowing the details of
>> > > implementation:
>> > >
>> > > 2017-07-03 16:11:54,724 INFO
>> > >  [B.defaultRpcServer.handler=182,queue=11,port=16020]
>> > > regionserver.MemStoreFlusher: Blocking updates on
>> > > hadoop0528.et2.tbsite.net,16020,1497336978160:
>> > > global memstore heapsize 7.2 G is >= than blocking 7.2 G size
>> > > 2017-07-03 16:11:54,754 INFO
>> > >  [B.defaultRpcServer.handler=186,queue=15,port=16020]
>> > > regionserver.MemStoreFlusher: Blocking updates on
>> > > hadoop0528.et2.tbsite.net,16020,1497336978160:
>> > > global memstore heapsize 7.2 G is >= than blocking 7.2 G size
>> > > 2017-07-03 16:11:57,571 INFO  [MemStoreFlusher.0]
>> > > regionserver.MemStoreFlusher: Flush of region
>> > > mainv7_main_result_c,1496,1499062935573.02adfa7cbdc606dce5b79a516e1649
>> > 2a.
>> > > due to global heap pressure. Total Memstore size=3.2 G, Region memstore
>> > > size=331.4 M
>> > > 2017-07-03 16:11:57,571 WARN
>> > >  [B.defaultRpcServer.handler=49,queue=11,port=16020]
>> > > regionserver.MemStoreFlusher: Memstore is above high water mark and
>> block
>> > > 2892ms
>> > >
>> > > Best Regards,
>> > > Yu
>> > >
>> > > On 6 July 2017 at 00:56, Stack <st...@duboce.net> wrote:
>> > >
>> > > > On Wed, Jul 5, 2017 at 6:30 AM, Eshcar Hillel
>> > > <eshcar@yahoo-inc.com.invalid
>> > > > >
>> > > > wrote:
>> > > >
>> > > > > Hi All,
>> > > > > I opened a new Jira https://issues.apache.org/
>> > jira/browse/HBASE-18294
>> > > to
>> > > > > discuss this question.
>> > > > > Flush decisions are taken at the region level and also at the
>> region
>> > > > > server level - there is the question of when to trigger a flush and
>> > > then
>> > > > > which region/store to flush.Regions track both their data size
>> > > (key-value
>> > > > > size only) and their total heap occupancy (including index and
>> > > additional
>> > > > > metadata).One option (which was the past policy) is to trigger
>> > flushes
>> > > > and
>> > > > > choose flush subjects based on regions heap size - this gives a
>> > better
>> > > > > estimation for sysadmin of how many regions can a RS carry.Another
>> > > option
>> > > > > (which is the current policy) is to look at the data size - this
>> > gives
>> > > a
>> > > > > better estimation of the size of the files that are created by the
>> > > flush.
>> > > > >
>> > > >
>> > > >
>> > > > Sounds like we should be doing the former, heap occupancy. An
>> > > > OutOfMemoryException puts a nail in any benefit other accountings
>> might
>> > > > have.
>> > > >
>> > > > St.Ack
>> > > >
>> > > >
>> > > >
>> > > > > I see this is as critical to HBase performance and usability,
>> namely
>> > > > > meeting the user expectation from the system, hence I would like to
>> > > hear
>> > > > as
>> > > > > many voices as possible.Please join the discussion in the Jira and
>> > let
>> > > us
>> > > > > know what you think.
>> > > > > Thanks,Eshcar
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>
>
> --
> *Best Regards,*
>  lijin bin

Re: [DISCUSS] Should flush decisions be made based on data size (key-value only) or based on heap size (including metadata overhead)?

Posted by Anoop John <an...@gmail.com>.

Stack and others..
We wont do any OOM or FullGC issues.   Because globally at RS level we
will track both the data size (of all the memstores) and the heap
size.  The decision there accounts both. In fact in case of normal on
heap memstores, the accounting is like the old way of heap size based.

At region level (and at Segments level)  we track data size only.  The
decisions are based on data size.

So in the past region flush size of 128 MB means we will flush when
the heap size of that region crosses 128 MB.   But now it is data size
alone.   What I feel is that is more inclined to a normal user
thinking.  He say flush size of 128 MB and then the thinking can be
128 MB of data.

The background of this change is the off heap memstores where we need
separate tracking of both data and heap overhead sizes.   But at
region level this behave change was done thinking that is more user
oriented

I agree with Yu that it is a surprising behave change. Ya if not tuned
accordingly one might see more blocked writes. Because the per region
flushes are more delayed now and so chances of reaching the global
memstore upper barrier chances are more.  And then we will block
writes and force flushes.   (But off heap memstores will do better job
here).  But this would NOT cause any OOME or FullGC.

I guess we should have reduced the 128 MB default flush size then?  I
asked this Q in that jira and then we did not discuss further.

I hope I explained the background and the change and the impacts.  Thanks.

-Anoop-

On Thu, Jul 6, 2017 at 11:43 AM, 宾莉金（binlijin） <bi...@gmail.com> wrote:
> I like to use the former, heap occupancy, so we not need to worry about the
> OOM and FullGc，and change configuration to adapted to new policy.
>
> 2017-07-06 14:03 GMT+08:00 Stack <st...@duboce.net>:
>
>> On Wed, Jul 5, 2017 at 9:59 PM, ramkrishna vasudevan <
>> ramkrishna.s.vasudevan@gmail.com> wrote:
>>
>> >
>> > >>Sounds like we should be doing the former, heap occupancy
>> > Stack, so do you mean we need to roll back this new change in trunk? The
>> > background is https://issues.apache.org/jira/browse/HBASE-16747.
>> >
>> >
>> I remember that issue. It seems good to me (as it did then) where we have
>> the global tracking in RS of all data and overhead so we shouldn't OOME and
>> we keep accounting of overhead and data distinct because now data can be
>> onheap or offheap.
>>
>> We shouldn't be doing blocking updates -- not when there is probably loads
>> of memory still available -- but that is a different (critical) issue.
>> Sounds like current configs can 'surprise' -- see Yu Li note -- given the
>> new accounting.
>>
>> Looks like I need to read HBASE-18294
>> <https://issues.apache.org/jira/browse/HBASE-18294> to figure what the
>> pivot/problem w/ the new policy is.....
>>
>> Thanks,
>> St.Ack
>>
>>
>>
>>
>>
>> > Regards
>> > Ram
>> >
>> >
>> > On Thu, Jul 6, 2017 at 8:40 AM, Yu Li <ca...@gmail.com> wrote:
>> >
>> > > We've also observed more blocking updates happening with the new policy
>> > > (flush decision made on data size), but could work-around it by
>> reducing
>> > > the hbase.hregion.memstore.flush.size setting. The advantage of
>> current
>> > > policy is we could control the flushed file size more accurately, but
>> > > meanwhile losing some "compatibility" (requires configuration updating
>> > > during rolling upgrade).
>> > >
>> > > I'm not sure whether we should rollback, but if stick on current policy
>> > > there should be more documents, metrics (monitoring heap/data occupancy
>> > > separately) and log message refinements, etc. Attaching some of the
>> logs
>> > we
>> > > observed, which is pretty confusing w/o knowing the details of
>> > > implementation:
>> > >
>> > > 2017-07-03 16:11:54,724 INFO
>> > >  [B.defaultRpcServer.handler=182,queue=11,port=16020]
>> > > regionserver.MemStoreFlusher: Blocking updates on
>> > > hadoop0528.et2.tbsite.net,16020,1497336978160:
>> > > global memstore heapsize 7.2 G is >= than blocking 7.2 G size
>> > > 2017-07-03 16:11:54,754 INFO
>> > >  [B.defaultRpcServer.handler=186,queue=15,port=16020]
>> > > regionserver.MemStoreFlusher: Blocking updates on
>> > > hadoop0528.et2.tbsite.net,16020,1497336978160:
>> > > global memstore heapsize 7.2 G is >= than blocking 7.2 G size
>> > > 2017-07-03 16:11:57,571 INFO  [MemStoreFlusher.0]
>> > > regionserver.MemStoreFlusher: Flush of region
>> > > mainv7_main_result_c,1496,1499062935573.02adfa7cbdc606dce5b79a516e1649
>> > 2a.
>> > > due to global heap pressure. Total Memstore size=3.2 G, Region memstore
>> > > size=331.4 M
>> > > 2017-07-03 16:11:57,571 WARN
>> > >  [B.defaultRpcServer.handler=49,queue=11,port=16020]
>> > > regionserver.MemStoreFlusher: Memstore is above high water mark and
>> block
>> > > 2892ms
>> > >
>> > > Best Regards,
>> > > Yu
>> > >
>> > > On 6 July 2017 at 00:56, Stack <st...@duboce.net> wrote:
>> > >
>> > > > On Wed, Jul 5, 2017 at 6:30 AM, Eshcar Hillel
>> > > <eshcar@yahoo-inc.com.invalid
>> > > > >
>> > > > wrote:
>> > > >
>> > > > > Hi All,
>> > > > > I opened a new Jira https://issues.apache.org/
>> > jira/browse/HBASE-18294
>> > > to
>> > > > > discuss this question.
>> > > > > Flush decisions are taken at the region level and also at the
>> region
>> > > > > server level - there is the question of when to trigger a flush and
>> > > then
>> > > > > which region/store to flush.Regions track both their data size
>> > > (key-value
>> > > > > size only) and their total heap occupancy (including index and
>> > > additional
>> > > > > metadata).One option (which was the past policy) is to trigger
>> > flushes
>> > > > and
>> > > > > choose flush subjects based on regions heap size - this gives a
>> > better
>> > > > > estimation for sysadmin of how many regions can a RS carry.Another
>> > > option
>> > > > > (which is the current policy) is to look at the data size - this
>> > gives
>> > > a
>> > > > > better estimation of the size of the files that are created by the
>> > > flush.
>> > > > >
>> > > >
>> > > >
>> > > > Sounds like we should be doing the former, heap occupancy. An
>> > > > OutOfMemoryException puts a nail in any benefit other accountings
>> might
>> > > > have.
>> > > >
>> > > > St.Ack
>> > > >
>> > > >
>> > > >
>> > > > > I see this is as critical to HBase performance and usability,
>> namely
>> > > > > meeting the user expectation from the system, hence I would like to
>> > > hear
>> > > > as
>> > > > > many voices as possible.Please join the discussion in the Jira and
>> > let
>> > > us
>> > > > > know what you think.
>> > > > > Thanks,Eshcar
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>
>
> --
> *Best Regards,*
>  lijin bin

Re: [DISCUSS] Should flush decisions be made based on data size (key-value only) or based on heap size (including metadata overhead)?

Posted by 宾莉金（binlijin） <bi...@gmail.com>.

I like to use the former, heap occupancy, so we not need to worry about the
OOM and FullGc，and change configuration to adapted to new policy.

2017-07-06 14:03 GMT+08:00 Stack <st...@duboce.net>:

> On Wed, Jul 5, 2017 at 9:59 PM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
> >
> > >>Sounds like we should be doing the former, heap occupancy
> > Stack, so do you mean we need to roll back this new change in trunk? The
> > background is https://issues.apache.org/jira/browse/HBASE-16747.
> >
> >
> I remember that issue. It seems good to me (as it did then) where we have
> the global tracking in RS of all data and overhead so we shouldn't OOME and
> we keep accounting of overhead and data distinct because now data can be
> onheap or offheap.
>
> We shouldn't be doing blocking updates -- not when there is probably loads
> of memory still available -- but that is a different (critical) issue.
> Sounds like current configs can 'surprise' -- see Yu Li note -- given the
> new accounting.
>
> Looks like I need to read HBASE-18294
> <https://issues.apache.org/jira/browse/HBASE-18294> to figure what the
> pivot/problem w/ the new policy is.....
>
> Thanks,
> St.Ack
>
>
>
>
>
> > Regards
> > Ram
> >
> >
> > On Thu, Jul 6, 2017 at 8:40 AM, Yu Li <ca...@gmail.com> wrote:
> >
> > > We've also observed more blocking updates happening with the new policy
> > > (flush decision made on data size), but could work-around it by
> reducing
> > > the hbase.hregion.memstore.flush.size setting. The advantage of
> current
> > > policy is we could control the flushed file size more accurately, but
> > > meanwhile losing some "compatibility" (requires configuration updating
> > > during rolling upgrade).
> > >
> > > I'm not sure whether we should rollback, but if stick on current policy
> > > there should be more documents, metrics (monitoring heap/data occupancy
> > > separately) and log message refinements, etc. Attaching some of the
> logs
> > we
> > > observed, which is pretty confusing w/o knowing the details of
> > > implementation:
> > >
> > > 2017-07-03 16:11:54,724 INFO
> > >  [B.defaultRpcServer.handler=182,queue=11,port=16020]
> > > regionserver.MemStoreFlusher: Blocking updates on
> > > hadoop0528.et2.tbsite.net,16020,1497336978160:
> > > global memstore heapsize 7.2 G is >= than blocking 7.2 G size
> > > 2017-07-03 16:11:54,754 INFO
> > >  [B.defaultRpcServer.handler=186,queue=15,port=16020]
> > > regionserver.MemStoreFlusher: Blocking updates on
> > > hadoop0528.et2.tbsite.net,16020,1497336978160:
> > > global memstore heapsize 7.2 G is >= than blocking 7.2 G size
> > > 2017-07-03 16:11:57,571 INFO  [MemStoreFlusher.0]
> > > regionserver.MemStoreFlusher: Flush of region
> > > mainv7_main_result_c,1496,1499062935573.02adfa7cbdc606dce5b79a516e1649
> > 2a.
> > > due to global heap pressure. Total Memstore size=3.2 G, Region memstore
> > > size=331.4 M
> > > 2017-07-03 16:11:57,571 WARN
> > >  [B.defaultRpcServer.handler=49,queue=11,port=16020]
> > > regionserver.MemStoreFlusher: Memstore is above high water mark and
> block
> > > 2892ms
> > >
> > > Best Regards,
> > > Yu
> > >
> > > On 6 July 2017 at 00:56, Stack <st...@duboce.net> wrote:
> > >
> > > > On Wed, Jul 5, 2017 at 6:30 AM, Eshcar Hillel
> > > <eshcar@yahoo-inc.com.invalid
> > > > >
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > > I opened a new Jira https://issues.apache.org/
> > jira/browse/HBASE-18294
> > > to
> > > > > discuss this question.
> > > > > Flush decisions are taken at the region level and also at the
> region
> > > > > server level - there is the question of when to trigger a flush and
> > > then
> > > > > which region/store to flush.Regions track both their data size
> > > (key-value
> > > > > size only) and their total heap occupancy (including index and
> > > additional
> > > > > metadata).One option (which was the past policy) is to trigger
> > flushes
> > > > and
> > > > > choose flush subjects based on regions heap size - this gives a
> > better
> > > > > estimation for sysadmin of how many regions can a RS carry.Another
> > > option
> > > > > (which is the current policy) is to look at the data size - this
> > gives
> > > a
> > > > > better estimation of the size of the files that are created by the
> > > flush.
> > > > >
> > > >
> > > >
> > > > Sounds like we should be doing the former, heap occupancy. An
> > > > OutOfMemoryException puts a nail in any benefit other accountings
> might
> > > > have.
> > > >
> > > > St.Ack
> > > >
> > > >
> > > >
> > > > > I see this is as critical to HBase performance and usability,
> namely
> > > > > meeting the user expectation from the system, hence I would like to
> > > hear
> > > > as
> > > > > many voices as possible.Please join the discussion in the Jira and
> > let
> > > us
> > > > > know what you think.
> > > > > Thanks,Eshcar
> > > > >
> > > > >
> > > >
> > >
> >
>



-- 
*Best Regards,*
 lijin bin

Re: [DISCUSS] Should flush decisions be made based on data size (key-value only) or based on heap size (including metadata overhead)?

Posted by Stack <st...@duboce.net>.

On Wed, Jul 5, 2017 at 9:59 PM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

>
> >>Sounds like we should be doing the former, heap occupancy
> Stack, so do you mean we need to roll back this new change in trunk? The
> background is https://issues.apache.org/jira/browse/HBASE-16747.
>
>
I remember that issue. It seems good to me (as it did then) where we have
the global tracking in RS of all data and overhead so we shouldn't OOME and
we keep accounting of overhead and data distinct because now data can be
onheap or offheap.

We shouldn't be doing blocking updates -- not when there is probably loads
of memory still available -- but that is a different (critical) issue.
Sounds like current configs can 'surprise' -- see Yu Li note -- given the
new accounting.

Looks like I need to read HBASE-18294
<https://issues.apache.org/jira/browse/HBASE-18294> to figure what the
pivot/problem w/ the new policy is.....

Thanks,
St.Ack





> Regards
> Ram
>
>
> On Thu, Jul 6, 2017 at 8:40 AM, Yu Li <ca...@gmail.com> wrote:
>
> > We've also observed more blocking updates happening with the new policy
> > (flush decision made on data size), but could work-around it by reducing
> > the hbase.hregion.memstore.flush.size setting. The advantage of current
> > policy is we could control the flushed file size more accurately, but
> > meanwhile losing some "compatibility" (requires configuration updating
> > during rolling upgrade).
> >
> > I'm not sure whether we should rollback, but if stick on current policy
> > there should be more documents, metrics (monitoring heap/data occupancy
> > separately) and log message refinements, etc. Attaching some of the logs
> we
> > observed, which is pretty confusing w/o knowing the details of
> > implementation:
> >
> > 2017-07-03 16:11:54,724 INFO
> >  [B.defaultRpcServer.handler=182,queue=11,port=16020]
> > regionserver.MemStoreFlusher: Blocking updates on
> > hadoop0528.et2.tbsite.net,16020,1497336978160:
> > global memstore heapsize 7.2 G is >= than blocking 7.2 G size
> > 2017-07-03 16:11:54,754 INFO
> >  [B.defaultRpcServer.handler=186,queue=15,port=16020]
> > regionserver.MemStoreFlusher: Blocking updates on
> > hadoop0528.et2.tbsite.net,16020,1497336978160:
> > global memstore heapsize 7.2 G is >= than blocking 7.2 G size
> > 2017-07-03 16:11:57,571 INFO  [MemStoreFlusher.0]
> > regionserver.MemStoreFlusher: Flush of region
> > mainv7_main_result_c,1496,1499062935573.02adfa7cbdc606dce5b79a516e1649
> 2a.
> > due to global heap pressure. Total Memstore size=3.2 G, Region memstore
> > size=331.4 M
> > 2017-07-03 16:11:57,571 WARN
> >  [B.defaultRpcServer.handler=49,queue=11,port=16020]
> > regionserver.MemStoreFlusher: Memstore is above high water mark and block
> > 2892ms
> >
> > Best Regards,
> > Yu
> >
> > On 6 July 2017 at 00:56, Stack <st...@duboce.net> wrote:
> >
> > > On Wed, Jul 5, 2017 at 6:30 AM, Eshcar Hillel
> > <eshcar@yahoo-inc.com.invalid
> > > >
> > > wrote:
> > >
> > > > Hi All,
> > > > I opened a new Jira https://issues.apache.org/
> jira/browse/HBASE-18294
> > to
> > > > discuss this question.
> > > > Flush decisions are taken at the region level and also at the region
> > > > server level - there is the question of when to trigger a flush and
> > then
> > > > which region/store to flush.Regions track both their data size
> > (key-value
> > > > size only) and their total heap occupancy (including index and
> > additional
> > > > metadata).One option (which was the past policy) is to trigger
> flushes
> > > and
> > > > choose flush subjects based on regions heap size - this gives a
> better
> > > > estimation for sysadmin of how many regions can a RS carry.Another
> > option
> > > > (which is the current policy) is to look at the data size - this
> gives
> > a
> > > > better estimation of the size of the files that are created by the
> > flush.
> > > >
> > >
> > >
> > > Sounds like we should be doing the former, heap occupancy. An
> > > OutOfMemoryException puts a nail in any benefit other accountings might
> > > have.
> > >
> > > St.Ack
> > >
> > >
> > >
> > > > I see this is as critical to HBase performance and usability, namely
> > > > meeting the user expectation from the system, hence I would like to
> > hear
> > > as
> > > > many voices as possible.Please join the discussion in the Jira and
> let
> > us
> > > > know what you think.
> > > > Thanks,Eshcar
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Should flush decisions be made based on data size (key-value only) or based on heap size (including metadata overhead)?

Posted by ramkrishna vasudevan <ra...@gmail.com>.

>>I'm not sure whether we should rollback, but if stick on current policy
there should be more documents, metrics (monitoring heap/data occupancy
separately) and log message refinements, etc.

Agree to this.

>>Sounds like we should be doing the former, heap occupancy
Stack, so do you mean we need to roll back this new change in trunk? The
background is https://issues.apache.org/jira/browse/HBASE-16747.

Regards
Ram


On Thu, Jul 6, 2017 at 8:40 AM, Yu Li <ca...@gmail.com> wrote:

> We've also observed more blocking updates happening with the new policy
> (flush decision made on data size), but could work-around it by reducing
> the hbase.hregion.memstore.flush.size setting. The advantage of current
> policy is we could control the flushed file size more accurately, but
> meanwhile losing some "compatibility" (requires configuration updating
> during rolling upgrade).
>
> I'm not sure whether we should rollback, but if stick on current policy
> there should be more documents, metrics (monitoring heap/data occupancy
> separately) and log message refinements, etc. Attaching some of the logs we
> observed, which is pretty confusing w/o knowing the details of
> implementation:
>
> 2017-07-03 16:11:54,724 INFO
>  [B.defaultRpcServer.handler=182,queue=11,port=16020]
> regionserver.MemStoreFlusher: Blocking updates on
> hadoop0528.et2.tbsite.net,16020,1497336978160:
> global memstore heapsize 7.2 G is >= than blocking 7.2 G size
> 2017-07-03 16:11:54,754 INFO
>  [B.defaultRpcServer.handler=186,queue=15,port=16020]
> regionserver.MemStoreFlusher: Blocking updates on
> hadoop0528.et2.tbsite.net,16020,1497336978160:
> global memstore heapsize 7.2 G is >= than blocking 7.2 G size
> 2017-07-03 16:11:57,571 INFO  [MemStoreFlusher.0]
> regionserver.MemStoreFlusher: Flush of region
> mainv7_main_result_c,1496,1499062935573.02adfa7cbdc606dce5b79a516e16492a.
> due to global heap pressure. Total Memstore size=3.2 G, Region memstore
> size=331.4 M
> 2017-07-03 16:11:57,571 WARN
>  [B.defaultRpcServer.handler=49,queue=11,port=16020]
> regionserver.MemStoreFlusher: Memstore is above high water mark and block
> 2892ms
>
> Best Regards,
> Yu
>
> On 6 July 2017 at 00:56, Stack <st...@duboce.net> wrote:
>
> > On Wed, Jul 5, 2017 at 6:30 AM, Eshcar Hillel
> <eshcar@yahoo-inc.com.invalid
> > >
> > wrote:
> >
> > > Hi All,
> > > I opened a new Jira https://issues.apache.org/jira/browse/HBASE-18294
> to
> > > discuss this question.
> > > Flush decisions are taken at the region level and also at the region
> > > server level - there is the question of when to trigger a flush and
> then
> > > which region/store to flush.Regions track both their data size
> (key-value
> > > size only) and their total heap occupancy (including index and
> additional
> > > metadata).One option (which was the past policy) is to trigger flushes
> > and
> > > choose flush subjects based on regions heap size - this gives a better
> > > estimation for sysadmin of how many regions can a RS carry.Another
> option
> > > (which is the current policy) is to look at the data size - this gives
> a
> > > better estimation of the size of the files that are created by the
> flush.
> > >
> >
> >
> > Sounds like we should be doing the former, heap occupancy. An
> > OutOfMemoryException puts a nail in any benefit other accountings might
> > have.
> >
> > St.Ack
> >
> >
> >
> > > I see this is as critical to HBase performance and usability, namely
> > > meeting the user expectation from the system, hence I would like to
> hear
> > as
> > > many voices as possible.Please join the discussion in the Jira and let
> us
> > > know what you think.
> > > Thanks,Eshcar
> > >
> > >
> >
>

Re: [DISCUSS] Should flush decisions be made based on data size (key-value only) or based on heap size (including metadata overhead)?

Posted by Yu Li <ca...@gmail.com>.

We've also observed more blocking updates happening with the new policy
(flush decision made on data size), but could work-around it by reducing
the hbase.hregion.memstore.flush.size setting. The advantage of current
policy is we could control the flushed file size more accurately, but
meanwhile losing some "compatibility" (requires configuration updating
during rolling upgrade).

I'm not sure whether we should rollback, but if stick on current policy
there should be more documents, metrics (monitoring heap/data occupancy
separately) and log message refinements, etc. Attaching some of the logs we
observed, which is pretty confusing w/o knowing the details of
implementation:

2017-07-03 16:11:54,724 INFO
 [B.defaultRpcServer.handler=182,queue=11,port=16020]
regionserver.MemStoreFlusher: Blocking updates on
hadoop0528.et2.tbsite.net,16020,1497336978160:
global memstore heapsize 7.2 G is >= than blocking 7.2 G size
2017-07-03 16:11:54,754 INFO
 [B.defaultRpcServer.handler=186,queue=15,port=16020]
regionserver.MemStoreFlusher: Blocking updates on
hadoop0528.et2.tbsite.net,16020,1497336978160:
global memstore heapsize 7.2 G is >= than blocking 7.2 G size
2017-07-03 16:11:57,571 INFO  [MemStoreFlusher.0]
regionserver.MemStoreFlusher: Flush of region
mainv7_main_result_c,1496,1499062935573.02adfa7cbdc606dce5b79a516e16492a.
due to global heap pressure. Total Memstore size=3.2 G, Region memstore
size=331.4 M
2017-07-03 16:11:57,571 WARN
 [B.defaultRpcServer.handler=49,queue=11,port=16020]
regionserver.MemStoreFlusher: Memstore is above high water mark and block
2892ms

Best Regards,
Yu

On 6 July 2017 at 00:56, Stack <st...@duboce.net> wrote:

> On Wed, Jul 5, 2017 at 6:30 AM, Eshcar Hillel <eshcar@yahoo-inc.com.invalid
> >
> wrote:
>
> > Hi All,
> > I opened a new Jira https://issues.apache.org/jira/browse/HBASE-18294 to
> > discuss this question.
> > Flush decisions are taken at the region level and also at the region
> > server level - there is the question of when to trigger a flush and then
> > which region/store to flush.Regions track both their data size (key-value
> > size only) and their total heap occupancy (including index and additional
> > metadata).One option (which was the past policy) is to trigger flushes
> and
> > choose flush subjects based on regions heap size - this gives a better
> > estimation for sysadmin of how many regions can a RS carry.Another option
> > (which is the current policy) is to look at the data size - this gives a
> > better estimation of the size of the files that are created by the flush.
> >
>
>
> Sounds like we should be doing the former, heap occupancy. An
> OutOfMemoryException puts a nail in any benefit other accountings might
> have.
>
> St.Ack
>
>
>
> > I see this is as critical to HBase performance and usability, namely
> > meeting the user expectation from the system, hence I would like to hear
> as
> > many voices as possible.Please join the discussion in the Jira and let us
> > know what you think.
> > Thanks,Eshcar
> >
> >
>

Re: [DISCUSS] Should flush decisions be made based on data size (key-value only) or based on heap size (including metadata overhead)?

Posted by Stack <st...@duboce.net>.

On Wed, Jul 5, 2017 at 6:30 AM, Eshcar Hillel <es...@yahoo-inc.com.invalid>
wrote:

> Hi All,
> I opened a new Jira https://issues.apache.org/jira/browse/HBASE-18294 to
> discuss this question.
> Flush decisions are taken at the region level and also at the region
> server level - there is the question of when to trigger a flush and then
> which region/store to flush.Regions track both their data size (key-value
> size only) and their total heap occupancy (including index and additional
> metadata).One option (which was the past policy) is to trigger flushes and
> choose flush subjects based on regions heap size - this gives a better
> estimation for sysadmin of how many regions can a RS carry.Another option
> (which is the current policy) is to look at the data size - this gives a
> better estimation of the size of the files that are created by the flush.
>


Sounds like we should be doing the former, heap occupancy. An
OutOfMemoryException puts a nail in any benefit other accountings might
have.

St.Ack



> I see this is as critical to HBase performance and usability, namely
> meeting the user expectation from the system, hence I would like to hear as
> many voices as possible.Please join the discussion in the Jira and let us
> know what you think.
> Thanks,Eshcar
>
>