You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by 何良均 <20...@163.com> on 2022/09/01 15:32:49 UTC

Re: [DISCUSS] Move replication queue storage from zookeeper to a separated HBase table

Meetings notes:

Attendees: Duo Zhang, Yu Li, Xin Sun, Tianhang Tang, Liangjun He

First Liangjun introduced the old implementation of ReplicationSyncUp/DumpReplicationQueues, as well as the existing problems and preliminary solutions under the new replication implementation. Then we discussed the relevant solutions, the following is the content of the discussion：

ReplicationSyncUp tool

1. The ReplicationSyncUp tool can replicate the remaining data to the backup cluster when the master cluster crashes , if the master cluster crashes, the tool cannot access the hbase table. Is it possible to copy the data of replication queue info to ZK when it is written to the hbase table, and then implement the ReplicationSyncUp tool based on ZK again?

Since our goal is to reduce the reliance of ZK when replication queue info is stored, this will break our goal. Maybe we can use the HMaster maintenance mode and pull up hbase:meta, and then perform additional repair operations to solve the problem, but considering that the HMaster maintenance mode only supports HMaster internal access to hbase:meta, the external cannot access , so this way cannot be used.

After discussion, we all agree that if it does not rely on external storage under the new replication implementation, it is difficult to solve the problem. Then if ZK (or third-party storage system) is used, we will have a data sync problem, including how to sync-writing replication queue info to ZK (if it is real-time sync-writing, how to ensure the consistency between replication queue info writing to hbase table and ZK , if it is timed sync-writing, some redundant data will be replicated when ReplicationSyncUp is executed. Of course, partially redundant data may be acceptable), and when the master cluster is recovered, how to sync-writing the replication queue info back to hbase:replication table from ZK .

Further, we can also solve the problem of ReplicationSyncUp accessing replication queue info based on the snapshot of the hbase:replicaiton table, for example, the snapshot of the hbase:replication table is periodically generated, and then when the ReplicationSyncUp tool is executed, the snapshot of the hbase:replication table is loaded into the memory. After the ReplicationSyncUp tool is executed and the data is replicated completely, we will regenerate a new snapshot based on the memory info and write it to the file system. When the master cluster is recovered, the HMaster will restore the hbase:meta table from the new snapshot.

2. If the ReplicationSyncUp tool is implemented based on the hbase:replication snapshot, after ReplicationSyncUp is executed and the data is replicated completely, it is necessary to ensure that the HMaster is started first when the master cluster is recovered, and the snapshot is restored to the hbase:replicaion , so as to avoid the situation where the RegionServer is started first and then the redundant data is replicated to the backup cluster, but the master cluster cannot guarantee that the HMaster will be started before the RegionServer when the cluster is recovered, so how to ensure that the HMaster first restores the snapshot to the hbase:replicaion table?

Option 1: If RegionServer is started first, RegionServer determines that if there is corresponding snapshot of hbase:replication table, RegionServer replication related operations will wait until the HMaster starts and restores the snapshot to hbase:replicaiton table, then RegionServer will continue to replicate data. The advantage of this way is that it is transparent to the user, but the implementation is complicated.

Option 2: After ReplicationSyncUp is executed, we disable the peer. Even if the RegionServer is started first, the replication operation will not be executed until the HMaster starts and restores the snapshot to the hbase:replicaiton table. The disadvantage of this way is that it will cause confusion to the user, because the peer is disabled when the user is unknown, and the advantage is that the implementation is simple.

At present, it seems that there is no such solution that can solve the problem perfectly.

DumpReplicationQueues tool

1. Under the new replication implementation, most of the info output by the DumpReplicationQueues tool can be obtained through the new interface, which is consistent with the old implementation, but the difference from the old implementation is that each queue in the new replication implementation will only save one wal and the corresponding offset info, while the old implementation will save all wal files and offset info under the queue, so the old DumpReplicationQueues tool will include all wal files and offset info when outputting queue info. In the new implementation, we can also directly access the file system to get all the wal files corresponding to the queue, which can be completely consistent with the output info of the old DumpReplicationQueues tool, and it doesn't cost too much, but is it necessary?

It is recommended that the output of the wal file and offset info be consistent with the old version, to avoid the situation where users fail to upgrade the HBase version due to their dependence on the output info of the DumpReplicationQueues tool.

Thanks.




在 2022-08-31 00:01:22，"何良均" <20...@163.com> 写道：

Last time we discussed the design doc Move replication queue storage from zookeeper to a separated HBase table, but the part of the replication tool was not discussed. 
This time we decided to discuss this part.


We plan to hold an online meeting at 2PM to 3PM, 31 Aug, GMT +8, using tencent meeting.


何良均 邀请您参加腾讯会议
会议主题：replication tool讨论
会议时间：2022/08/31 14:00-15:00 (GMT+08:00) 中国标准时间 - 北京


点击链接入会，或添加至会议列表：
https://meeting.tencent.com/dm/norZvACxGtya


#腾讯会议：982-412-761
会议密码：210189


More attendees are always welcomed.

Re:Re: [DISCUSS] Move replication queue storage from zookeeper to a separated HBase table

Posted by zhengsicheng <zh...@163.com>.



The idea of using HBASE table replace replication queue storage is very good





At 2022-09-02 14:08:30, "张铎(Duo Zhang)" <pa...@gmail.com> wrote:
>Thanks for the detailed write up!
>
>何良均 <20...@163.com>于2022年9月1日 周四23:33写道：
>
>> Meetings notes:
>>
>> Attendees: Duo Zhang, Yu Li, Xin Sun, Tianhang Tang, Liangjun He
>>
>> First Liangjun introduced the old implementation of
>> ReplicationSyncUp/DumpReplicationQueues, as well as the existing problems
>> and preliminary solutions under the new replication implementation. Then we
>> discussed the relevant solutions, the following is the content of the
>> discussion：
>>
>> ReplicationSyncUp tool
>>
>> 1. The ReplicationSyncUp tool can replicate the remaining data to the
>> backup cluster when the master cluster crashes , if the master cluster
>> crashes, the tool cannot access the hbase table. Is it possible to copy the
>> data of replication queue info to ZK when it is written to the hbase table,
>> and then implement the ReplicationSyncUp tool based on ZK again?
>>
>> Since our goal is to reduce the reliance of ZK when replication queue info
>> is stored, this will break our goal. Maybe we can use the HMaster
>> maintenance mode and pull up hbase:meta, and then perform additional repair
>> operations to solve the problem, but considering that the HMaster
>> maintenance mode only supports HMaster internal access to hbase:meta, the
>> external cannot access , so this way cannot be used.
>>
>> After discussion, we all agree that if it does not rely on external
>> storage under the new replication implementation, it is difficult to solve
>> the problem. Then if ZK (or third-party storage system) is used, we will
>> have a data sync problem, including how to sync-writing replication queue
>> info to ZK (if it is real-time sync-writing, how to ensure the consistency
>> between replication queue info writing to hbase table and ZK , if it is
>> timed sync-writing, some redundant data will be replicated when
>> ReplicationSyncUp is executed. Of course, partially redundant data may be
>> acceptable), and when the master cluster is recovered, how to sync-writing
>> the replication queue info back to hbase:replication table from ZK .
>>
>> Further, we can also solve the problem of ReplicationSyncUp accessing
>> replication queue info based on the snapshot of the hbase:replicaiton
>> table, for example, the snapshot of the hbase:replication table is
>> periodically generated, and then when the ReplicationSyncUp tool is
>> executed, the snapshot of the hbase:replication table is loaded into the
>> memory. After the ReplicationSyncUp tool is executed and the data is
>> replicated completely, we will regenerate a new snapshot based on the
>> memory info and write it to the file system. When the master cluster is
>> recovered, the HMaster will restore the hbase:meta table from the new
>> snapshot.
>>
>> 2. If the ReplicationSyncUp tool is implemented based on the
>> hbase:replication snapshot, after ReplicationSyncUp is executed and the
>> data is replicated completely, it is necessary to ensure that the HMaster
>> is started first when the master cluster is recovered, and the snapshot is
>> restored to the hbase:replicaion , so as to avoid the situation where the
>> RegionServer is started first and then the redundant data is replicated to
>> the backup cluster, but the master cluster cannot guarantee that the
>> HMaster will be started before the RegionServer when the cluster is
>> recovered, so how to ensure that the HMaster first restores the snapshot to
>> the hbase:replicaion table?
>>
>> Option 1: If RegionServer is started first, RegionServer determines that
>> if there is corresponding snapshot of hbase:replication table, RegionServer
>> replication related operations will wait until the HMaster starts and
>> restores the snapshot to hbase:replicaiton table, then RegionServer will
>> continue to replicate data. The advantage of this way is that it is
>> transparent to the user, but the implementation is complicated.
>>
>> Option 2: After ReplicationSyncUp is executed, we disable the peer. Even
>> if the RegionServer is started first, the replication operation will not be
>> executed until the HMaster starts and restores the snapshot to the
>> hbase:replicaiton table. The disadvantage of this way is that it will cause
>> confusion to the user, because the peer is disabled when the user is
>> unknown, and the advantage is that the implementation is simple.
>>
>> At present, it seems that there is no such solution that can solve the
>> problem perfectly.
>>
>> DumpReplicationQueues tool
>>
>> 1. Under the new replication implementation, most of the info output by
>> the DumpReplicationQueues tool can be obtained through the new interface,
>> which is consistent with the old implementation, but the difference from
>> the old implementation is that each queue in the new replication
>> implementation will only save one wal and the corresponding offset info,
>> while the old implementation will save all wal files and offset info under
>> the queue, so the old DumpReplicationQueues tool will include all wal files
>> and offset info when outputting queue info. In the new implementation, we
>> can also directly access the file system to get all the wal files
>> corresponding to the queue, which can be completely consistent with the
>> output info of the old DumpReplicationQueues tool, and it doesn't cost too
>> much, but is it necessary?
>>
>> It is recommended that the output of the wal file and offset info be
>> consistent with the old version, to avoid the situation where users fail to
>> upgrade the HBase version due to their dependence on the output info of the
>> DumpReplicationQueues tool.
>>
>> Thanks.
>>
>>
>>
>>
>> 在 2022-08-31 00:01:22，"何良均" <20...@163.com> 写道：
>>
>> Last time we discussed the design doc Move replication queue storage from
>> zookeeper to a separated HBase table, but the part of the replication tool
>> was not discussed.
>> This time we decided to discuss this part.
>>
>>
>> We plan to hold an online meeting at 2PM to 3PM, 31 Aug, GMT +8, using
>> tencent meeting.
>>
>>
>> 何良均 邀请您参加腾讯会议
>> 会议主题：replication tool讨论
>> 会议时间：2022/08/31 14:00-15:00 (GMT+08:00) 中国标准时间 - 北京
>>
>>
>> 点击链接入会，或添加至会议列表：
>> https://meeting.tencent.com/dm/norZvACxGtya
>>
>>
>> #腾讯会议：982-412-761
>> 会议密码：210189
>>
>>
>> More attendees are always welcomed.
>>
>>

Re: [DISCUSS] Move replication queue storage from zookeeper to a separated HBase table

Posted by "张铎(Duo Zhang)" <pa...@gmail.com>.

Thanks for the detailed write up!

何良均 <20...@163.com>于2022年9月1日 周四23:33写道：

> Meetings notes:
>
> Attendees: Duo Zhang, Yu Li, Xin Sun, Tianhang Tang, Liangjun He
>
> First Liangjun introduced the old implementation of
> ReplicationSyncUp/DumpReplicationQueues, as well as the existing problems
> and preliminary solutions under the new replication implementation. Then we
> discussed the relevant solutions, the following is the content of the
> discussion：
>
> ReplicationSyncUp tool
>
> 1. The ReplicationSyncUp tool can replicate the remaining data to the
> backup cluster when the master cluster crashes , if the master cluster
> crashes, the tool cannot access the hbase table. Is it possible to copy the
> data of replication queue info to ZK when it is written to the hbase table,
> and then implement the ReplicationSyncUp tool based on ZK again?
>
> Since our goal is to reduce the reliance of ZK when replication queue info
> is stored, this will break our goal. Maybe we can use the HMaster
> maintenance mode and pull up hbase:meta, and then perform additional repair
> operations to solve the problem, but considering that the HMaster
> maintenance mode only supports HMaster internal access to hbase:meta, the
> external cannot access , so this way cannot be used.
>
> After discussion, we all agree that if it does not rely on external
> storage under the new replication implementation, it is difficult to solve
> the problem. Then if ZK (or third-party storage system) is used, we will
> have a data sync problem, including how to sync-writing replication queue
> info to ZK (if it is real-time sync-writing, how to ensure the consistency
> between replication queue info writing to hbase table and ZK , if it is
> timed sync-writing, some redundant data will be replicated when
> ReplicationSyncUp is executed. Of course, partially redundant data may be
> acceptable), and when the master cluster is recovered, how to sync-writing
> the replication queue info back to hbase:replication table from ZK .
>
> Further, we can also solve the problem of ReplicationSyncUp accessing
> replication queue info based on the snapshot of the hbase:replicaiton
> table, for example, the snapshot of the hbase:replication table is
> periodically generated, and then when the ReplicationSyncUp tool is
> executed, the snapshot of the hbase:replication table is loaded into the
> memory. After the ReplicationSyncUp tool is executed and the data is
> replicated completely, we will regenerate a new snapshot based on the
> memory info and write it to the file system. When the master cluster is
> recovered, the HMaster will restore the hbase:meta table from the new
> snapshot.
>
> 2. If the ReplicationSyncUp tool is implemented based on the
> hbase:replication snapshot, after ReplicationSyncUp is executed and the
> data is replicated completely, it is necessary to ensure that the HMaster
> is started first when the master cluster is recovered, and the snapshot is
> restored to the hbase:replicaion , so as to avoid the situation where the
> RegionServer is started first and then the redundant data is replicated to
> the backup cluster, but the master cluster cannot guarantee that the
> HMaster will be started before the RegionServer when the cluster is
> recovered, so how to ensure that the HMaster first restores the snapshot to
> the hbase:replicaion table?
>
> Option 1: If RegionServer is started first, RegionServer determines that
> if there is corresponding snapshot of hbase:replication table, RegionServer
> replication related operations will wait until the HMaster starts and
> restores the snapshot to hbase:replicaiton table, then RegionServer will
> continue to replicate data. The advantage of this way is that it is
> transparent to the user, but the implementation is complicated.
>
> Option 2: After ReplicationSyncUp is executed, we disable the peer. Even
> if the RegionServer is started first, the replication operation will not be
> executed until the HMaster starts and restores the snapshot to the
> hbase:replicaiton table. The disadvantage of this way is that it will cause
> confusion to the user, because the peer is disabled when the user is
> unknown, and the advantage is that the implementation is simple.
>
> At present, it seems that there is no such solution that can solve the
> problem perfectly.
>
> DumpReplicationQueues tool
>
> 1. Under the new replication implementation, most of the info output by
> the DumpReplicationQueues tool can be obtained through the new interface,
> which is consistent with the old implementation, but the difference from
> the old implementation is that each queue in the new replication
> implementation will only save one wal and the corresponding offset info,
> while the old implementation will save all wal files and offset info under
> the queue, so the old DumpReplicationQueues tool will include all wal files
> and offset info when outputting queue info. In the new implementation, we
> can also directly access the file system to get all the wal files
> corresponding to the queue, which can be completely consistent with the
> output info of the old DumpReplicationQueues tool, and it doesn't cost too
> much, but is it necessary?
>
> It is recommended that the output of the wal file and offset info be
> consistent with the old version, to avoid the situation where users fail to
> upgrade the HBase version due to their dependence on the output info of the
> DumpReplicationQueues tool.
>
> Thanks.
>
>
>
>
> 在 2022-08-31 00:01:22，"何良均" <20...@163.com> 写道：
>
> Last time we discussed the design doc Move replication queue storage from
> zookeeper to a separated HBase table, but the part of the replication tool
> was not discussed.
> This time we decided to discuss this part.
>
>
> We plan to hold an online meeting at 2PM to 3PM, 31 Aug, GMT +8, using
> tencent meeting.
>
>
> 何良均 邀请您参加腾讯会议
> 会议主题：replication tool讨论
> 会议时间：2022/08/31 14:00-15:00 (GMT+08:00) 中国标准时间 - 北京
>
>
> 点击链接入会，或添加至会议列表：
> https://meeting.tencent.com/dm/norZvACxGtya
>
>
> #腾讯会议：982-412-761
> 会议密码：210189
>
>
> More attendees are always welcomed.
>
>