You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Andrew Kyle Purtell (Jira)" <ji...@apache.org> on 2022/06/16 17:56:00 UTC

[jira] [Resolved] (HBASE-9220) An API(and shell command) to list tables replicated TO the current cluster

     [ https://issues.apache.org/jira/browse/HBASE-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Kyle Purtell resolved HBASE-9220.
----------------------------------------
    Resolution: Incomplete

> An API(and shell command) to list tables replicated TO the current cluster 
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-9220
>                 URL: https://issues.apache.org/jira/browse/HBASE-9220
>             Project: HBase
>          Issue Type: New Feature
>          Components: Replication, shell
>         Environment: clusters setup as Master and Slave for replication of tables
>            Reporter: Demai Ni
>            Priority: Major
>
> This JIRA to track the continuous discussion following HBASE-8663, and hopefully surface a better way to handle the use case: 
> an administrator or developer,  who has 'list table' access to a cluster, would like to know which tables/families are replicated to the cluster(i.e slave). so that he/she won't mess things up.
> While HBASE-8663 covered the API to get the list of tables and families from current cluster(i.e Master). There is no conclusion on how to do the same for replicated tables TO the current cluster(i.e slave). Several ideas have been entertained during HBASE-8663's discussion, and summarized here: 
> * *Idea 1*: on Slave cluster, use a new String attribute REPLICATION_MASTER to HColumnDescriptor to indicate this column is replicated from it. A check can be added to ensure the value of REPLICATION_MASTER is valid at the same of set. 
> ** problem 1) a slave can have more than one master(a minor one); 
> ** problem 2) the consistency is broken if the Master cluster 'remove_peer'(a major problem which request a synchronous call to the remote master/peer cluster)
> * *Idea 2*: reuse REPLICATION_SCOPE, and give a new meaning for value '-1'. If a table is replicated to this cluster, its REPLICATION_SCOPE must be set to -1 before a replication can occur
> ** problem 1) incompatible change. Currently the slave side table will look just like normal tables, the new change will request use to explicitly flag REPLICATION_SCOPE = -1
> ** problem 2) incompatible change. Currently any none-zero value of REPLICATION_SCOPE will be treated as if its value of 1(global replication). the change will impact the existing tables
> ** problem 3) value '-1' only tell user that the table is replicated to current cluster, won't be able to indicate the source/Master cluster
> * *Idea 3*:  invent a new HColumnDescriptor attribute 'replication_peers', an array of ID. We can use positive ID for target-cluster, and negative ID for source-cluster, for example 
> {code}
> hbase(main):004:0> list_peers
>  PEER_ID CLUSTER_KEY STATE
>  1 Slave_A.hbase.com:2181:/hbase ENABLED
>  2 Slave_B.hbase.com:2181:/hbase ENABLED
>  3 Slave_Master_C.hbase.com:2181:/hbase ENABLED
> -1 Master_A.hbase.com:2181:/hbase ENABLED
> -2 Master_B.hbase.com:2181:/hbase ENABLED
> -3 Slave_Master_C.hbase.com:2181:/hbase ENABLED
> >describe table
> 't1_dn', {NAME => 'cf1', REPLICATION_PEERS => '1,2,3', ..}
> 't2_dn', {NAME => 'cf1', REPLICATION_PEERS => '-1,-2',..}
> 't3_dn', {NAME => 'cf1', REPLICATION_PEERS => '3,-3',..}
> t1_dn#cf1 is replicated from this cluster, and its slave clusters are Slave_A,Slave_B and Slave_Master_C
> t2_dn#cf1 is replicated to this cluster, and its master clusters are Master_A and Master_B
> t3_dn#cf1 is setup as Master_Slave replication, with Slave_Master_C.hbase.com(while don't have to be the same cluster) 
> {code}
> ** problem: similar as idea 1, and an improved version. A synchronous call can be implemented through the peer ID
> * *Idea 4*: Replication central controller that resides outside of all the clusters. The controller will communicate with all clusters and keep info consistent, which can be a very good operational manager for users who have 10+ clusters to oversee, and other features(such as backup/restore) can leverage the framework
> ** problem: well, not really a problem per se, except the effort for the whole solution is pretty large and need some clean up work. For example, currently 'add_peer' doesn't check the value, and we need to fix that first; and replication setup rely on manually create table on peer slave, we may like to ensure the same schema and do it automatically from Master cluster. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)