You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jeffrey Zhong (JIRA)" <ji...@apache.org> on 2013/03/08 20:16:13 UTC
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication

    [ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13597418#comment-13597418 ] 

Jeffrey Zhong commented on HBASE-7709:
--------------------------------------

I have another idea which IMHO is better. The basic idea is following:

1) We maintain a counter value called RD("replication distance") which represents how far a WAL edit from a source cluster to current cluster like the hop-counter mentioned in option 3.
2) Each replaying & receiving region server maintains an internal memory ClusterDistanceMap <clusterId, MIN(RD)>. Every time, if it sees a WAL with RD less than it currently has seen then just update the internal map with the smaller RD value.
3) drop all WAL edits from a cluster with RD > the the one current region server has in the ClusterDistanceMap

Initially we could duplicate data for first several WAL edits but it will be corrected soon so we don't need to persistent any data for fail over scenario. 

The above idea is similar to option 3 but without always double replicating data on some clusters and maintaining the max-hop is human error-prone if we forget to bump up the max hop-count value when more clusters join in replication cycle.

Why it works? Loop detection: quick walker will catch up slow walker but travel more.
When we have infinite loop replication as mentioned in the JIRA, the data from a source must come from multiple ways to the destination with different RDs. Because it's evolving some loops, the RD won't be same otherwise there is no loop. Since the RD is different, we just need keep the data from the source with min distance.  

You may ask the diamond situation like following.

a->b->d
a->c->d

where the data from a will be replicated to d twice. This is we configure to let d receive a's data twice. If there is loop involved and the loop-backed data will be dropped by the above way.


This is general loop detection strategy so we can implement it in 0.96 or above. For 0.94, 

1) we can introduce a new version(3) in HLogKey
2) use top two bytes of UUID to store the RD value and the remaining 14 bytes as a hash value of the 16 bytes length of origin UUID value without compromising uniqueness because in most cases we have 10s clusters involved in replication and the collision probability is less than 10(-18)
OR
using Ted's suggestion to overload the boolean byte.
3) we can introduce a configuration setting with default to true. When we want to revert the new behavior, we can turn it off. 

please let me how do you think? Assign the ticket to me firstly in case we agree the implement the way I'm proposing.

Thanks,
-Jeffrey


  
 
                
> Infinite loop possible in Master/Master replication
> ---------------------------------------------------
>
>                 Key: HBASE-7709
>                 URL: https://issues.apache.org/jira/browse/HBASE-7709
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 0.95.0, 0.94.6
>            Reporter: Lars Hofhansl
>             Fix For: 0.95.0, 0.94.7
>
>
> We just discovered the following scenario:
> # Cluster A and B are setup in master/master replication
> # By accident we had Cluster C replicate to Cluster A.
> Now all edit originating from C will be bouncing between A and B. Forever!
> The reason is that when the edit come in from C the cluster ID is already set and won't be reset.
> We have a couple of options here:
> # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles > 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format.
> # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved.
> # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira