You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Esteban Gutierrez (JIRA)" <ji...@apache.org> on 2015/04/04 07:27:33 UTC
[jira] [Commented] (HBASE-13407) Add a configurable jitter to MemStoreFlusher#FlushHandler in order to smooth write latency

    [ https://issues.apache.org/jira/browse/HBASE-13407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14395556#comment-14395556 ] 

Esteban Gutierrez commented on HBASE-13407:
-------------------------------------------

This is a good example when this happens while running YCSB:

{code}
2015-04-03 21:26:01,985 INFO org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for usertable,user7375,1428051592143.73d40c07586ba526791a44f87c2765bf., current region memstore size 256.20 MB, and 1/1 column families' memstores are being flushed.
2015-04-03 21:26:05,598 INFO org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for usertable,user6850,1428051592142.a4fd5b0bac8ec25bd72da26914be61b7., current region memstore size 256.12 MB, and 1/1 column families' memstores are being flushed.
2015-04-03 21:26:07,624 INFO org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for usertable,user7450,1428051592143.bb73df854c86eeb3afb8d2269a85734a., current region memstore size 256.05 MB, and 1/1 column families' memstores are being flushed.
2015-04-03 21:26:12,208 INFO org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for usertable,user8575,1428051592144.e198c3e0dffbeadb0b20d020d2a5424e., current region memstore size 256.09 MB, and 1/1 column families' memstores are being flushed.
2015-04-03 21:26:20,740 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~256.20 MB/268642496, currentsize=22.81 MB/23913248 for region usertable,user7375,1428051592143.73d40c07586ba526791a44f87c2765bf. in 18755ms, sequenceid=473396, compaction requested=true
2015-04-03 21:26:23,356 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~256.15 MB/268595112, currentsize=24.09 MB/25261824 for region usertable,user6850,1428051592142.a4fd5b0bac8ec25bd72da26914be61b7. in 17758ms, sequenceid=473170, compaction requested=true
2015-04-03 21:26:26,825 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~256.19 MB/268636560, currentsize=27.52 MB/28854080 for region usertable,user7450,1428051592143.bb73df854c86eeb3afb8d2269a85734a. in 19201ms, sequenceid=473529, compaction requested=true
2015-04-03 21:26:30,940 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~256.33 MB/268781008, currentsize=29.82 MB/31269440 for region usertable,user8575,1428051592144.e198c3e0dffbeadb0b20d020d2a5424e. in 18732ms, sequenceid=474463, compaction requested=true
{code}

The 4 flushers started about the same time for nearly 20 seconds, and during that time writes dropped from 37K req/sec to 30K req/sec. (see attached chart)




> Add a configurable jitter to MemStoreFlusher#FlushHandler in order to smooth write latency
> ------------------------------------------------------------------------------------------
>
>                 Key: HBASE-13407
>                 URL: https://issues.apache.org/jira/browse/HBASE-13407
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Esteban Gutierrez
>            Assignee: Esteban Gutierrez
>         Attachments: memstoreflush.png
>
>
> There is a very interesting behavior that I can reproduce consistently with many workloads from HBase 0.98 to HBase 1.0 since hbase.hstore.flusher.count was set by default to 2: when writes are evenly distributed across regions, memstores grow and flush about the same rate causing spikes in IO and CPU. The side effect of those spikes is loss in throughput which in some cases can above 10% impacting write metrics. When the flushes get a out of sync the spikes lower and and throughput is very stable. Reverting hbase.hstore.flusher.count to 1 doesn't help too much with write heavy workloads since we end with a large flush queue that eventually can block writers.
> Adding a small configurable jitter hbase.server.thread.wakefrequency.jitter.pct (a percentage of the hbase.server.thread.wakefrequency frequency) can help to stagger the writes from FlushHandler to HDFS and smooth the write latencies when the memstores are flushed in multiple threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)