You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Kahlil Oppenheimer <ka...@gmail.com> on 2017/10/26 19:31:44 UTC

Lowering HBase Replication Latency

Hi all,

I'm running HBase replication on CDH 5.9.0 and am wondering if there are
known configurations/methods to decrease the replication lag/latency. I am
monitoring replication latency via two separate methods:

1) The JMX 'replication.source.ageOfLastShippedOp' exposed by the region
server. The 99th percentile latency (assuming I'm constantly writing data),
according to this metric, averages around ~480-500ms.

2) A worker constantly writing data to the source cluster (2,000
writes/sec) and constantly reading data from the sink cluster. It tries to
read the data it just wrote and reports latency as `currentTime -
resultTimestamp`. The 99th percentile latency, according to this metric,
averages around ~1,470-1,500ms.

As expected, (1) is a lower bound of (2).

I'm just curious as to whether or not anyone has figured out ways to reduce
the replication latency so that the 99th percentile latency could hover
closer to the 300-400ms range.

I have tried changing `hbase.replication.handler.count` on the sink cluster
from 3 to 15, but did not observe too large a difference.

I looked through some HBaseCon 2014 slides and saw that Flurry achieved
85ms latency between DCs (
https://www.slideshare.net/HBaseCon/operations-session-2-35938496). Any
thoughts on how something like this might be possible?