You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Hamado Dene <ha...@yahoo.com.INVALID> on 2022/01/23 17:46:38 UTC

Sync two cluster

hi Hbase community, In our production environment we have a main cluster and one used as a replica in a second datacenter.  We have found that the disk space used on the primary is much greater than on the replica.  So we think the replica is far behind the primary.  Is there a way to synchronize the two clusters without impacting the primary cluster too much?  Still keeping replication on?
Hbase version: 2.2.Hadoop version: 2.8.5
Thanks
Hamado Dene 

Re: Sync two cluster

Posted by Hamado Dene <ha...@yahoo.com.INVALID>.
 Thanks for the info.Surely the snapshot method is what we will do.
However, we have a cluster with hundreds of tables. The fastest way to take snapshots and export them to the secondary cluster is to set up a script?Or do we have a more optimized way to do it?
 

    Il lunedì 24 gennaio 2022, 02:47:22 CET, Mallikarjun <ma...@gmail.com> ha scritto:  
 
 You can do this several ways. I recommend you can do following way

1. Stop and remove existing replication.
2. Setup replication again and disable it temporarily
3. Export snapshot to secondary cluster.
4. Delete the table there. And restore from export snapshot
5. Enable replication.
6. You can check from hamster ui replication tab for status

This is recommended because it uses least resources. Disk and network are
the major resources used. And they can be controlled via bandwidth
parameter of export snapshot

Alternatively

1. You can disable replication
2. Run Hash table, sync table utility
3. Enable replication

This is resource intensive if the difference is huge. Because it will do in
hbase layer and scan whole table and ship batch of rows at a time.


On Sun, Jan 23, 2022, 11:22 PM Hamado Dene <ha...@yahoo.com.invalid>
wrote:

> hi Hbase community, In our production environment we have a main cluster
> and one used as a replica in a second datacenter.  We have found that the
> disk space used on the primary is much greater than on the replica.  So we
> think the replica is far behind the primary.  Is there a way to synchronize
> the two clusters without impacting the primary cluster too much?  Still
> keeping replication on?
> Hbase version: 2.2.Hadoop version: 2.8.5
> Thanks
> Hamado Dene
  

Re: Sync two cluster

Posted by Mallikarjun <ma...@gmail.com>.
You can do this several ways. I recommend you can do following way

1. Stop and remove existing replication.
2. Setup replication again and disable it temporarily
3. Export snapshot to secondary cluster.
4. Delete the table there. And restore from export snapshot
5. Enable replication.
6. You can check from hamster ui replication tab for status

This is recommended because it uses least resources. Disk and network are
the major resources used. And they can be controlled via bandwidth
parameter of export snapshot

Alternatively

1. You can disable replication
2. Run Hash table, sync table utility
3. Enable replication

This is resource intensive if the difference is huge. Because it will do in
hbase layer and scan whole table and ship batch of rows at a time.


On Sun, Jan 23, 2022, 11:22 PM Hamado Dene <ha...@yahoo.com.invalid>
wrote:

> hi Hbase community, In our production environment we have a main cluster
> and one used as a replica in a second datacenter.  We have found that the
> disk space used on the primary is much greater than on the replica.  So we
> think the replica is far behind the primary.  Is there a way to synchronize
> the two clusters without impacting the primary cluster too much?  Still
> keeping replication on?
> Hbase version: 2.2.Hadoop version: 2.8.5
> Thanks
> Hamado Dene