You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "yuqi (Jira)" <ji...@apache.org> on 2020/11/17 09:04:00 UTC
[jira] [Assigned] (HBASE-25295) Refactor the locate WAL logic in
ReplicationSource
[ https://issues.apache.org/jira/browse/HBASE-25295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
yuqi reassigned HBASE-25295:
----------------------------
Assignee: yuqi
> Refactor the locate WAL logic in ReplicationSource
> --------------------------------------------------
>
> Key: HBASE-25295
> URL: https://issues.apache.org/jira/browse/HBASE-25295
> Project: HBase
> Issue Type: Bug
> Components: Replication
> Reporter: Guanghao Zhang
> Assignee: yuqi
> Priority: Major
>
> When cluster replication enabled and one RegionServer crashed, its WALs will be move from WALs dir to oldWALs dir and its replication queue will moved to other RegionServer's replication queue.
>
> HDFS layout (WAL Storage)
> /hbase/WALs/RS1/1.log
> /hbase/WALs/RS1/2.log
> /hbase/WALs/RS1/3.log
> ZooKeeper layout (Replication queue storage)
> /hbase/replication/rs/RS1/peerId/1.log
> /hbase/replication/rs/RS1/peerId/2.log
> /hbase/replication/rs/RS1/peerId/3.log
>
> Failover finished:
> HDFS layout (WAL Storage)
> /hbase/.oldWALs/1.log
> /hbase/.oldWALs/2.log
> /hbase/.oldWALs/3.log
> ZooKeeper layout (Replication queue storage)
> /hbase/replication/rs/RS2/peerId-RS1/1.log
> /hbase/replication/rs/RS2/peerId-RS1/2.log
> /hbase/replication/rs/RS2/peerId-RS1/3.log
>
> And if enabled hbase.separate.oldlogdir.by.regionserver, the HDFS layout may be:
> HDFS layout (WAL Storage)
> /hbase/.oldWALs/RS1/1.log
> /hbase/.oldWALs/RS1/2.log
> /hbase/.oldWALs/RS1/3.log
>
> Then if RS2 crashed, the HDFS layout will not change but ZooKeeper layout may changed.
> ZooKeeper layout (Replication queue storage)
> /hbase/replication/rs/RS3/peerId-RS1-RS2/1.log
> /hbase/replication/rs/RS3/peerId-RS1-RS2/2.log
> /hbase/replication/rs/RS3/peerId-RS1-RS2/3.log
>
> So even the replication queue was transfered many times, the HDFS layout never change.
>
> Another case is master-cluster disaster, the failover work not finished. Then ReplicationSyncUp tool can start replication source to replicate the WAL data. The HDFS layout need to consider two more case:
> /hbase/WALs/RS1/1.log
> /hbase/WALs/RS1/2.log
> /hbase/WALs/RS1/3.log
> or
> /hbase/WALs/RS1-splitting/1.log
> /hbase/WALs/RS1-splitting/2.log
> /hbase/WALs/RS1-splitting/3.log
--
This message was sent by Atlassian Jira
(v8.3.4#803005)