You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Lars Hofhansl (JIRA)" <ji...@apache.org> on 2014/08/07 09:23:12 UTC
[jira] [Comment Edited] (HBASE-11695) PeriodicFlusher and WakeFrequency issues

    [ https://issues.apache.org/jira/browse/HBASE-11695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088938#comment-14088938 ] 

Lars Hofhansl edited comment on HBASE-11695 at 8/7/14 7:21 AM:
---------------------------------------------------------------

We can do what we do what the CompactionChecker does and add a multiplier.
(As an aside the default multiplier for the CompactionChecker is 1000, so it would only check every 10000s = 2h 46m, isn't that too rarely?)

Another option is to set the period like this: max(wakeFrequency, 2*jitter, flushInteral/10). I.e. we
# do not wake up more often that wakeFrequency
# do not wake up such that we would request flush of the same region multiple times (2*jitter)
# only wakeup often enough to satisfy the flushInterval with an accuracy of 10% (flushInterval/10)

The jitter is hardcoded to 20s. wakeFrequency defaults to 10s (it's not actually a frequency, btw), and flushInterval defaults to 1h. So with these defaults we'd wake up to check every 360s, which seems more like it.

Or maybe just max(wakeFrequency, 2*jitter)... I.e. every 40s with default settings.

But maybe that's too complicated and we just define another multiplier, or a complete new setting - means another config option, though.



was (Author: lhofhansl):
We can do what we do what the CompactionChecker does and add a multiplier.
(As an aside the default multiplier for the CompactionChecker is 1000, so it would only check every 10000s = 2h 46m, isn't that too rarely?)

Another option is to set the period like this: max(wakeFrequency, 2*jitter, flushInteral/10). I.e. we
# do not wake up more often that wakeFrequency
# do not wake up such that we would request flush of the same region multiple times (2*jitter)
# only wakeup often enough to satisfy the flushInterval with an accuracy of 10%

The jitter is hardcoded to 20s. wakeFrequency defaults to 10s (it's not actually a frequency, btw), and flushInterval defaults to 1h. So with these defaults we'd wake up to check every 360s, which seems more like it.

Or maybe just max(wakeFrequency, 2*jitter)... I.e. every 40s with default settings.

But maybe that's too complicate and we just define another multiplier, or a complete new setting - mean another config option, though.


> PeriodicFlusher and WakeFrequency issues
> ----------------------------------------
>
>                 Key: HBASE-11695
>                 URL: https://issues.apache.org/jira/browse/HBASE-11695
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.21
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Critical
>
> We just ran into a flush storm caused by the PeriodicFlusher.
> Many memstore became eligible for flushing at exactly the same time, the effect we've seen is that the exact same region was flushed multiple times, because the flusher wakes up too often (every 10s). The jitter of 20s is larger than that and it takes some time to actually flush the memstore.
> Here's one example. We've seen 100's of these, monopolizing the flush queue and preventing "important" flushes from happening.
> {code}
> 06-Aug-2014 20:11:56  [regionserver60020.periodicFlusher] INFO  org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2. after a delay of 13449
> 06-Aug-2014 20:12:06  [regionserver60020.periodicFlusher] INFO  org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2. after a delay of 14060
> {code}
> So we need to increase the period of the PeriodicFlusher to at least the random jitter, also increase the default random jitter (20s does not help with many regions).



--
This message was sent by Atlassian JIRA
(v6.2#6252)