You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Himanshu Vashishtha (JIRA)" <ji...@apache.org> on 2014/03/07 17:50:47 UTC
[jira] [Commented] (HBASE-10278) Provide better write predictability

    [ https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924043#comment-13924043 ] 

Himanshu Vashishtha commented on HBASE-10278:
---------------------------------------------

Attached is the trunk-based first cut of providing a Writer-switch functionality. 

Here is a brief description of what this patch adds :
a) An additional writer, which is used in case the current writer becomes slow and WALSwitchPolicy agrees to kick off the switch.
b) A WALSwitchPolicy interface. A concrete policy would tell when to do the switch, etc based on passed params. For a start, there is one impl, AggressiveWALSwitchPolicy (which switches when even one sync op took more than the threshold time). I find it very good to test this feature (actually, it acts as a "chaos-monkey" for this feature, where it is switching a lot). I plan to have a less aggressive one (where it also takes into account last time we switch, and last few ops which took more than threshold time after switch).
c) A thread pool for sync ops. The SyncRunners submits a callable for the sync call and wait on the returned Future. 
d) SyncLatencyWatcher thread to monitor sync ops latency, and send input to the WALSwitchPolicy to make decision.

h4. How does it work ?
SyncRunner submits a sync call to the writer. SyncLatencyWatcher monitors the call duration and send input to WALSwitchPolicy. If the later decides to make the switch, the following sequence of events happen:
1) Set FSHLog#switching true. This blocks the RingBufferEventHandler thread in its onEvent method.
2) Interrupt the SyncRunner threads to unblock them from their current sync call, and wait till they reach a safe point.
3) Grab their Append lists (i.e., whatever they were trying to sync). Consolidate, and sort it. These are the "in-flight" edits we need to append to the new Writer.
4) Get the max SyncFuture object, and note its sequenceId. We ought to unblock all handlers that are waiting for all sequence <= max_syncedSequenceId, after switching.
5) Take the "other" writer, and append-sync these "in-flight" edits. Set the current writer to this writer.
6) Tell SyncRunners that switch is done, and let them take new writes (complete the latch)
7) Set FSHLog#switching true. 
8) Roll the old writer.

It is worthy to note that in case the sync op delay is due to a concurrent log roll, it doesn't switch. This avoids un-necessary switches.

I intend to add metrics for number of in-flight edits used, etc. But, the above patch is good for giving a sense of how it looks.

h4. Testing:
I tested it on trunk and compared it with WAL switch enable and disable mode. I also tested by introducing hiccups (similar approach used in the above doc).
h5. No hiccups:
1.) Trunk :
2014-03-07 06:37:45,049 INFO  wal.HLogPerformanceEvaluation (HLogPerformanceEvaluation.java:logBenchmarkResult(413)) - Summary: threads=10, iterations=1000000, syncInterval=10 took 212.120s 47143.129ops/s
2014-03-07 06:42:38,271 INFO  wal.HLogPerformanceEvaluation (HLogPerformanceEvaluation.java:logBenchmarkResult(413)) - Summary: threads=10, iterations=1000000, syncInterval=10 took 214.548s 46609.617ops/s
2014-03-07 06:47:43,457 INFO  wal.HLogPerformanceEvaluation (HLogPerformanceEvaluation.java:logBenchmarkResult(413)) - Summary: threads=10, iterations=1000000, syncInterval=10 took 223.635s 44715.723ops/s

2. Trunk + patch, but with switch disabled:
2014-03-07 04:54:50,451 INFO  wal.HLogPerformanceEvaluation (HLogPerformanceEvaluation.java:logBenchmarkResult(438)) - Summary: threads=10, iterations=1000000, syncInterval=10 took 218.036s 45863.988ops/s
2014-03-07 04:59:55,640 INFO  wal.HLogPerformanceEvaluation (HLogPerformanceEvaluation.java:logBenchmarkResult(438)) - Summary: threads=10, iterations=1000000, syncInterval=10 took 223.940s 44654.816ops/s
2014-03-07 05:04:56,496 INFO  wal.HLogPerformanceEvaluation (HLogPerformanceEvaluation.java:logBenchmarkResult(438)) - Summary: threads=10, iterations=1000000, syncInterval=10 took 219.976s 45459.504ops/s

3. Trunk + patch, switch enabled:
2014-03-07 06:12:04,946 INFO  wal.HLogPerformanceEvaluation (HLogPerformanceEvaluation.java:logBenchmarkResult(438)) - Summary: threads=10, iterations=1000000, syncInterval=10 took 214.938s 46525.043ops/s
2014-03-07 06:16:59,603 INFO  wal.HLogPerformanceEvaluation (HLogPerformanceEvaluation.java:logBenchmarkResult(438)) - Summary: threads=10, iterations=1000000, syncInterval=10 took 217.718s 45930.973ops/s
2014-03-07 06:21:48,768 INFO  wal.HLogPerformanceEvaluation (HLogPerformanceEvaluation.java:logBenchmarkResult(438)) - Summary: threads=10, iterations=1000000, syncInterval=10 took 216.949s 46093.781ops/s

h5. With a sleep of 2 sec after every 2k sync ops (This involved some instrumentation in ProtobufLogWriter).
1. Trunk:
2014-03-06 20:52:03,212 INFO  wal.HLogPerformanceEvaluation (HLogPerformanceEvaluation.java:logBenchmarkResult(413)) - Summary: threads=10, iterations=1000000, syncInterval=10 took 406.600s 24594.195ops/s
2014-03-06 21:00:03,974 INFO  wal.HLogPerformanceEvaluation (HLogPerformanceEvaluation.java:logBenchmarkResult(413)) - Summary: threads=10, iterations=1000000, syncInterval=10 took 398.953s 25065.609ops/s
2014-03-06 21:08:21,323 INFO  wal.HLogPerformanceEvaluation (HLogPerformanceEvaluation.java:logBenchmarkResult(413)) - Summary: threads=10, iterations=1000000, syncInterval=10 took 416.390s 24015.945ops/s


2. Trunk  + patch:
2014-03-06 21:15:53,566 INFO  wal.HLogPerformanceEvaluation (HLogPerformanceEvaluation.java:logBenchmarkResult(438)) - Summary: threads=10, iterations=1000000, syncInterval=10 took 307.909s 32477.129ops/s
2014-03-06 21:22:13,517 INFO  wal.HLogPerformanceEvaluation (HLogPerformanceEvaluation.java:logBenchmarkResult(438)) - Summary: threads=10, iterations=1000000, syncInterval=10 took 303.185s 32983.164ops/s
2014-03-06 21:28:42,993 INFO  wal.HLogPerformanceEvaluation (HLogPerformanceEvaluation.java:logBenchmarkResult(438)) - Summary: threads=10, iterations=1000000, syncInterval=10 took 314.436s 31802.975ops/s

 Note: It is the later set I wanted to fix…i.e.,  how we perform when we are having a non-optimal sync performance.
Also note that these numbers are from AggressiveSwitchPolicy which involves number of switches (switch is a costly affair). 
I think these numbers would be better if we have somewhat less aggressive policy. That would come later. 

Attaching the patch for your review. Thanks.


> Provide better write predictability
> -----------------------------------
>
>                 Key: HBASE-10278
>                 URL: https://issues.apache.org/jira/browse/HBASE-10278
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Himanshu Vashishtha
>            Assignee: Himanshu Vashishtha
>         Attachments: Multiwaldesigndoc.pdf
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons such as n/w blip, a node in the pipeline having a bad disk, etc), the overall write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We also looked at HBASE-5699, which talks about adding concurrent multi WALs. Along with performance numbers, we also focussed on design simplicity, minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. Considering all these parameters, we propose a new HLog implementation with WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL Switching feature, and experiments/results of a prototype implementation, showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)