You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Rijo Roy <rj...@yahoo.com.INVALID> on 2021/10/20 18:56:54 UTC

Help needed to migrate from one infra to another without downtime

Hi, 

Hope you are safe and well!

Let me give a brief about my environment:

OS: Ubuntu 18.04
Kafka Version: Confluent Kafka v5.5.1
ZooKeeper Version : 3.5.8
No.of Kafka Brokers: 3
No. of Zookeeper nodes: 3

I am working on a project where we are aiming to move out from our existing infrastructure lets call it A where Kafka and ZooKeeper clusters are hosted to a better infrastructure lets call it B but with no or minimal downtime. Once the cutover is done, we would like to terminate the old infrastructure A.

I was able to use kafka-reassign-partitions.sh as per the steps mentioned in https://kafka.apache.org/documentation/#basic_ops_cluster_expansion to move the topics-partitions to the Kafka brokers I created in B. Please note that I have added 3 zookeeper nodes running in B into the zookeeper cluster in A and hence they were following the ZK leader in A. 
I was in the impression that since I had 6 nodes in the ZooKeeper ensemble, stopping the A side of ZooKeeper nodes would not cause an issue but I was wrong. As soon as I stopped the ZK process on the A nodes, B Zk nodes failed to accept any connections from Kafka and I assume it is because the leadership of ZK did not transfer to the ZK B nodes and failed the quorum resulting in this failure. I had to remove the version-2 folder inside the B Zk nodes and starting them 1 by 1 after removing the details of ZK A nodes from zookeeper.properties helped me to resolve the failure and run the cluster on infrastructure B. I know I failed miserably but this was a sandbox where I could afford the downtime but cannot in a production setup. I request your help and guidance to make it right. Please help!

Thanks in advance.

Regards,Rijo S Roy



Re: Help needed to migrate from one infra to another without downtime

Posted by Rijo Roy <rj...@yahoo.com.INVALID>.
 Thanks for your time extending a helping hand! Much Appreciated!

It looks like the easiest way here was to use MirrorMaker2 which would serve my purpose of migrating data from one cluster A to cluster B where I do not have to worry about downtime involved with ZooKeeper at all.
    On Saturday, 23 October, 2021, 09:01:39 am IST, Haruki Okada <oc...@gmail.com> wrote:  
 
 Hi, Rijo.

This slide might help you to create a procedure to migrate the zk ensemble
without downtime.
https://speakerdeck.com/line_developers/split-brain-free-online-zookeeper-migration

The slide is based on zookeeper 3.4 so in your environment (3.5), the
procedure might be simplified thanks to dynamic reconfiguration though.


Thanks,

2021年10月21日(木) 4:46 Ran Lupovich <ra...@gmail.com>:

> One thing that comes to my mind after reading your explanation, zk quorum
> should be odd number, you stated you have six zookeepers... I would suggest
> checking this matter, 3 , 5 , 7 etc...
>
> בתאריך יום ד׳, 20 באוק׳ 2021, 22:00, מאת Rijo Roy
> ‏<rj...@yahoo.com.invalid>:
>
> > Hi,
> >
> > Hope you are safe and well!
> >
> > Let me give a brief about my environment:
> >
> > OS: Ubuntu 18.04
> > Kafka Version: Confluent Kafka v5.5.1
> > ZooKeeper Version : 3.5.8
> > No.of Kafka Brokers: 3
> > No. of Zookeeper nodes: 3
> >
> > I am working on a project where we are aiming to move out from our
> > existing infrastructure lets call it A where Kafka and ZooKeeper clusters
> > are hosted to a better infrastructure lets call it B but with no or
> minimal
> > downtime. Once the cutover is done, we would like to terminate the old
> > infrastructure A.
> >
> > I was able to use kafka-reassign-partitions.sh as per the steps mentioned
> > in https://kafka.apache.org/documentation/#basic_ops_cluster_expansion
> to
> > move the topics-partitions to the Kafka brokers I created in B. Please
> note
> > that I have added 3 zookeeper nodes running in B into the zookeeper
> cluster
> > in A and hence they were following the ZK leader in A.
> > I was in the impression that since I had 6 nodes in the ZooKeeper
> > ensemble, stopping the A side of ZooKeeper nodes would not cause an issue
> > but I was wrong. As soon as I stopped the ZK process on the A nodes, B Zk
> > nodes failed to accept any connections from Kafka and I assume it is
> > because the leadership of ZK did not transfer to the ZK B nodes and
> failed
> > the quorum resulting in this failure. I had to remove the version-2
> folder
> > inside the B Zk nodes and starting them 1 by 1 after removing the details
> > of ZK A nodes from zookeeper.properties helped me to resolve the failure
> > and run the cluster on infrastructure B. I know I failed miserably but
> this
> > was a sandbox where I could afford the downtime but cannot in a
> production
> > setup. I request your help and guidance to make it right. Please help!
> >
> > Thanks in advance.
> >
> > Regards,Rijo S Roy
> >
> >
> >
>


-- 
========================
Okada Haruki
ocadaruma@gmail.com
========================
  

Re: Help needed to migrate from one infra to another without downtime

Posted by Haruki Okada <oc...@gmail.com>.
Hi, Rijo.

This slide might help you to create a procedure to migrate the zk ensemble
without downtime.
https://speakerdeck.com/line_developers/split-brain-free-online-zookeeper-migration

The slide is based on zookeeper 3.4 so in your environment (3.5), the
procedure might be simplified thanks to dynamic reconfiguration though.


Thanks,

2021年10月21日(木) 4:46 Ran Lupovich <ra...@gmail.com>:

> One thing that comes to my mind after reading your explanation, zk quorum
> should be odd number, you stated you have six zookeepers... I would suggest
> checking this matter, 3 , 5 , 7 etc...
>
> בתאריך יום ד׳, 20 באוק׳ 2021, 22:00, מאת Rijo Roy
> ‏<rj...@yahoo.com.invalid>:
>
> > Hi,
> >
> > Hope you are safe and well!
> >
> > Let me give a brief about my environment:
> >
> > OS: Ubuntu 18.04
> > Kafka Version: Confluent Kafka v5.5.1
> > ZooKeeper Version : 3.5.8
> > No.of Kafka Brokers: 3
> > No. of Zookeeper nodes: 3
> >
> > I am working on a project where we are aiming to move out from our
> > existing infrastructure lets call it A where Kafka and ZooKeeper clusters
> > are hosted to a better infrastructure lets call it B but with no or
> minimal
> > downtime. Once the cutover is done, we would like to terminate the old
> > infrastructure A.
> >
> > I was able to use kafka-reassign-partitions.sh as per the steps mentioned
> > in https://kafka.apache.org/documentation/#basic_ops_cluster_expansion
> to
> > move the topics-partitions to the Kafka brokers I created in B. Please
> note
> > that I have added 3 zookeeper nodes running in B into the zookeeper
> cluster
> > in A and hence they were following the ZK leader in A.
> > I was in the impression that since I had 6 nodes in the ZooKeeper
> > ensemble, stopping the A side of ZooKeeper nodes would not cause an issue
> > but I was wrong. As soon as I stopped the ZK process on the A nodes, B Zk
> > nodes failed to accept any connections from Kafka and I assume it is
> > because the leadership of ZK did not transfer to the ZK B nodes and
> failed
> > the quorum resulting in this failure. I had to remove the version-2
> folder
> > inside the B Zk nodes and starting them 1 by 1 after removing the details
> > of ZK A nodes from zookeeper.properties helped me to resolve the failure
> > and run the cluster on infrastructure B. I know I failed miserably but
> this
> > was a sandbox where I could afford the downtime but cannot in a
> production
> > setup. I request your help and guidance to make it right. Please help!
> >
> > Thanks in advance.
> >
> > Regards,Rijo S Roy
> >
> >
> >
>


-- 
========================
Okada Haruki
ocadaruma@gmail.com
========================

Re: Help needed to migrate from one infra to another without downtime

Posted by Ran Lupovich <ra...@gmail.com>.
One thing that comes to my mind after reading your explanation, zk quorum
should be odd number, you stated you have six zookeepers... I would suggest
checking this matter, 3 , 5 , 7 etc...

בתאריך יום ד׳, 20 באוק׳ 2021, 22:00, מאת Rijo Roy
‏<rj...@yahoo.com.invalid>:

> Hi,
>
> Hope you are safe and well!
>
> Let me give a brief about my environment:
>
> OS: Ubuntu 18.04
> Kafka Version: Confluent Kafka v5.5.1
> ZooKeeper Version : 3.5.8
> No.of Kafka Brokers: 3
> No. of Zookeeper nodes: 3
>
> I am working on a project where we are aiming to move out from our
> existing infrastructure lets call it A where Kafka and ZooKeeper clusters
> are hosted to a better infrastructure lets call it B but with no or minimal
> downtime. Once the cutover is done, we would like to terminate the old
> infrastructure A.
>
> I was able to use kafka-reassign-partitions.sh as per the steps mentioned
> in https://kafka.apache.org/documentation/#basic_ops_cluster_expansion to
> move the topics-partitions to the Kafka brokers I created in B. Please note
> that I have added 3 zookeeper nodes running in B into the zookeeper cluster
> in A and hence they were following the ZK leader in A.
> I was in the impression that since I had 6 nodes in the ZooKeeper
> ensemble, stopping the A side of ZooKeeper nodes would not cause an issue
> but I was wrong. As soon as I stopped the ZK process on the A nodes, B Zk
> nodes failed to accept any connections from Kafka and I assume it is
> because the leadership of ZK did not transfer to the ZK B nodes and failed
> the quorum resulting in this failure. I had to remove the version-2 folder
> inside the B Zk nodes and starting them 1 by 1 after removing the details
> of ZK A nodes from zookeeper.properties helped me to resolve the failure
> and run the cluster on infrastructure B. I know I failed miserably but this
> was a sandbox where I could afford the downtime but cannot in a production
> setup. I request your help and guidance to make it right. Please help!
>
> Thanks in advance.
>
> Regards,Rijo S Roy
>
>
>