You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Andrew Kyle Purtell (Jira)" <ji...@apache.org> on 2020/05/26 17:28:00 UTC

[jira] [Created] (HBASE-24439) Replication queue recovery tool for rescuing deep queues

Andrew Kyle Purtell created HBASE-24439:
-------------------------------------------

Summary: Replication queue recovery tool for rescuing deep queues
Key: HBASE-24439
URL: https://issues.apache.org/jira/browse/HBASE-24439
Project: HBase
Issue Type: Improvement
Components: Replication
Reporter: Andrew Kyle Purtell

In HBase cross site replication, on the source side, every regionserver places its WALs into a replication queue and then drains the queue to the remote sink cluster. At the source cluster every regionserver participates as a source. At the sink cluster, a configurable subset of regionservers volunteer to process inbound replication RPC.

When data is highly skewed we can take certain steps to mitigate, such as pre-splitting, or manual splitting, and rebalancing. This can most effectively be done at the sink, because replication RPCs are randomly distributed over the set of receiving regionservers, and splitting on the sink side can effectively redistribute resulting writes there. On the source side we are more limited.

If writes are deeply unbalanced, a regionserver's source replication queue may become very deep. Hotspotting can happen, despite mitigations. Unlike on the sink side, once hotspotting has happened at the source, it is not possible to increase parallelism or redistribute work among sources once WALs have already been enqueued. Increasing parallelism on the sink side will not help if there is a big rock at the source. Source side mitigations like splitting and redistribute cannot help deep queues already accumulated.

Can we redistribute source work? Yes and no. If a source regionserver fails, its queues will be recovered by other regionservers. However the other rs must still serve the recovered queue as an atomic entity. We can move a deep queue, but we can't break it up.

Where time is of the essence, and ordering semantics can be allowed to break, operators should have available to them a recovery tool that rescues their production from the consequences of deep source queues. A very large replication queue can be split into many smaller queues. Perhaps even one new queue for each WAL file. Then, these new synthetic queues can be distributed to any/all source regionservers through the normal recovery queue assignment protocol. This increases parallelism at the source.

Of course this would break serial replication semantics, and sync replication semantics, and even in branch-1 which does not have these features would highly increase the probability of reordering of edits. That is an unavoidable consequence of breaking up the queue for more parallelism, but as long as this is done by a separate tool, invoked by operators, it is a valid option for emergency drain if backed up replication queues. Every cell in the WAL entries carries a timestamp assigned at the source, and will be applied on the sink with this timestamp. When the queue is drained and all edits have been persisted at the target, there will be a complete and correct temporal data ordering at that time. An operator will be and must be prepared to handle intermediate mis-/re-ordered states if they intend to invoke this tool. In many use cases the interim states are not important. The final state after all edits have transferred cross cluster and persisted at this sink, after invocation of the recovery tool, is the point where the operator would transition back into service.

--
This message was sent by Atlassian Jira
(v8.3.4#803005)