You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "yuqi (Jira)" <ji...@apache.org> on 2020/11/17 09:04:00 UTC
[jira] [Assigned] (HBASE-25295) Refactor the locate WAL logic in ReplicationSource

     [ https://issues.apache.org/jira/browse/HBASE-25295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

yuqi reassigned HBASE-25295:
----------------------------

    Assignee: yuqi

> Refactor the locate WAL logic in ReplicationSource
> --------------------------------------------------
>
>                 Key: HBASE-25295
>                 URL: https://issues.apache.org/jira/browse/HBASE-25295
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>            Reporter: Guanghao Zhang
>            Assignee: yuqi
>            Priority: Major
>
> When cluster replication enabled and one RegionServer crashed, its WALs will be move from WALs dir to oldWALs dir and its replication queue will moved to other RegionServer's replication queue.
>  
> HDFS layout (WAL Storage)
> /hbase/WALs/RS1/1.log
> /hbase/WALs/RS1/2.log
> /hbase/WALs/RS1/3.log
> ZooKeeper layout (Replication queue storage)
> /hbase/replication/rs/RS1/peerId/1.log
> /hbase/replication/rs/RS1/peerId/2.log
> /hbase/replication/rs/RS1/peerId/3.log
>  
> Failover finished:
> HDFS layout (WAL Storage)
> /hbase/.oldWALs/1.log
> /hbase/.oldWALs/2.log
> /hbase/.oldWALs/3.log
> ZooKeeper layout (Replication queue storage)
> /hbase/replication/rs/RS2/peerId-RS1/1.log
> /hbase/replication/rs/RS2/peerId-RS1/2.log
> /hbase/replication/rs/RS2/peerId-RS1/3.log
>  
> And if enabled hbase.separate.oldlogdir.by.regionserver, the HDFS layout may be:
> HDFS layout (WAL Storage)
> /hbase/.oldWALs/RS1/1.log
> /hbase/.oldWALs/RS1/2.log
> /hbase/.oldWALs/RS1/3.log
>  
> Then if RS2 crashed, the HDFS layout will not change but ZooKeeper layout may changed.
> ZooKeeper layout (Replication queue storage)
> /hbase/replication/rs/RS3/peerId-RS1-RS2/1.log
> /hbase/replication/rs/RS3/peerId-RS1-RS2/2.log
> /hbase/replication/rs/RS3/peerId-RS1-RS2/3.log
>  
> So even the replication queue was transfered many times, the HDFS layout never change.
>  
> Another case is master-cluster disaster, the failover work not finished. Then ReplicationSyncUp tool can start replication source to replicate the WAL data. The HDFS layout need to consider two more case:
> /hbase/WALs/RS1/1.log
> /hbase/WALs/RS1/2.log
> /hbase/WALs/RS1/3.log
> or
> /hbase/WALs/RS1-splitting/1.log
> /hbase/WALs/RS1-splitting/2.log
> /hbase/WALs/RS1-splitting/3.log



--
This message was sent by Atlassian Jira
(v8.3.4#803005)