You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by "Sofer, Tovi " <to...@citi.com> on 2018/07/09 11:48:11 UTC

high availability with automated disaster recovery using zookeeper

Hi all,

We are now examining how to achieve high availability for Flink, and to support also automatic recovery in disaster scenario- when all DC goes down.
We have DC1 which we usually want work to be done, and DC2 - which is more remote and we want work to go there only when DC1 is down.

We examined few options and would be glad to hear feedback a suggestion for another way to achieve this.

*         Two zookeeper separate zookeeper and flink clusters on the two data centers.
Only the cluster on DC1 are running, and state is copied to DC2 in offline process.

To achieve automatic recovery we need to use some king of watch dog which will check DC1 availability , and if it is down will start DC2 (and same later if DC2 is down).

Is there recommended tool for this?

*         Zookeeper "stretch cluster" cross data centers - with 2 nodes on DC1, 2 nodes on DC2 and one observer node.

Also flink cluster jobmabnager1 on DC1 and jobmanager2 on DC2.

This way when DC1 is down, zookeeper will notice this automatically and will transfer work to jobmanager2 on DC2.

However we would like zookeeper leader, and flink jobmanager leader (primary one) to be from DC1 - unless it is down.

Is there a way to achieve this?

Thanks and regards,
[citi_logo_mail]
Tovi Sofer
Software Engineer
+972 (3) 7405756
[Mail_signature_blue]

Re: high availability with automated disaster recovery using zookeeper

Posted by Till Rohrmann <tr...@apache.org>.

Hi Tovi,

you can define hard host attribute constraints for the TaskManagers. See
the configuration section [1] for more information.

If you want to run the JobManager/cluster entry point on Mesos as well,
then I recommend starting it with Marathon [2]. This will also give you HA
for the master process. I assume that you can provide Marathon with similar
constraints in order to control which Mesos task to allocate for the
JobManager/cluster entry point process.

[1]
https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/mesos.html#configuration-parameters
[2]
https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/mesos.html#high-availability

Cheers,
Till

On Tue, Jul 10, 2018 at 9:09 PM Sofer, Tovi <to...@citi.com> wrote:

> To add one thing to Mesos question-
>
> My assumption that  constraints on JobManager  can work, is based on the
> sentence from link bleow
>
> “When running Flink with Marathon, the whole Flink cluster including the
> job manager will be run as Mesos tasks in the Mesos cluster.”
>
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/mesos.html
>
>
>
> [Not sure this is accurate, since it seems to contradict the image in link
> below
>
> https://mesosphere.com/blog/apache-flink-on-dcos-and-apache-mesos ]
>
>
>
> *From:* Sofer, Tovi [ICG-IT]
> *Sent:* יום ג 10 יולי 2018 20:04
> *To:* 'Till Rohrmann' <tr...@apache.org>; user <us...@flink.apache.org>
> *Cc:* Gardi, Hila [ICG-IT] <hg...@imceu.eu.ssmb.com>
> *Subject:* RE: high availability with automated disaster recovery using
> zookeeper
>
>
>
> Hi Till, group,
>
>
>
> Thank you for your response.
>
> After reading further online on Mesos – Can’t Mesos fill the requirement
> of running job manager in primary server?
>
> By using: “constraints”: [[“datacenter”, “CLUSTER”, “main”]]
>
> (See
> http://www.stratio.com/blog/mesos-multi-data-center-architecture-for-disaster-recovery/
> )
>
>
>
> Is this supported by Flink cluster on Mesos ?
>
>
>
> Thanks again
>
> Tovi
>
>
>
> *From:* Till Rohrmann <tr...@apache.org>
> *Sent:* יום ג 10 יולי 2018 10:11
> *To:* Sofer, Tovi [ICG-IT] <ts...@imceu.eu.ssmb.com>
> *Cc:* user <us...@flink.apache.org>
> *Subject:* Re: high availability with automated disaster recovery using
> zookeeper
>
>
>
> Hi Tovi,
>
>
>
> that is an interesting use case you are describing here. I think, however,
> it depends mainly on the capabilities of ZooKeeper to produce the intended
> behavior. Flink itself relies on ZooKeeper for leader election in HA mode
> but does not expose any means to influence the leader election process. To
> be more precise ZK is used as a blackbox which simply tells a JobManager
> that it is now the leader, independent of any data center preferences. I'm
> not sure whether it is possible to tell ZooKeeper about these preferences.
> If not, then an alternative could be to implement one's own high
> availability services which does that at the moment.
>
>
>
> Cheers,
>
> Till
>
>
>
> On Mon, Jul 9, 2018 at 1:48 PM Sofer, Tovi <to...@citi.com> wrote:
>
> Hi all,
>
>
>
> We are now examining how to achieve high availability for Flink, and to
> support also automatic recovery in disaster scenario- when all DC goes down.
>
> We have DC1 which we usually want work to be done, and DC2 – which is more
> remote and we want work to go there only when DC1 is down.
>
>
>
> We examined few options and would be glad to hear feedback a suggestion
> for another way to achieve this.
>
> ·         Two zookeeper separate zookeeper and flink clusters on the two
> data centers.
>
> Only the cluster on DC1 are running, and state is copied to DC2 in offline
> process.
>
> To achieve automatic recovery we need to use some king of watch dog which
> will check DC1 availability , and if it is down will start DC2 (and same
> later if DC2 is down).
>
> Is there recommended tool for this?
>
> ·         Zookeeper “stretch cluster” cross data centers – with 2 nodes
> on DC1, 2 nodes on DC2 and one observer node.
>
> Also flink cluster jobmabnager1 on DC1 and jobmanager2 on DC2.
>
> This way when DC1 is down, zookeeper will notice this automatically and
> will transfer work to jobmanager2 on DC2.
>
> However we would like zookeeper leader, and flink jobmanager leader
> (primary one) to be from DC1 – unless it is down.
>
> Is there a way to achieve this?
>
>
>
> Thanks and regards,
>
> [image: citi_logo_mail]
>
> *Tovi Sofer*
>
> Software Engineer
> +972 (3) 7405756
>
> [image: Mail_signature_blue]
>
>
>
>

RE: high availability with automated disaster recovery using zookeeper

Posted by "Sofer, Tovi " <to...@citi.com>.

To add one thing to Mesos question-
My assumption that  constraints on JobManager  can work, is based on the sentence from link bleow
“When running Flink with Marathon, the whole Flink cluster including the job manager will be run as Mesos tasks in the Mesos cluster.”
https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/mesos.html

[Not sure this is accurate, since it seems to contradict the image in link below
https://mesosphere.com/blog/apache-flink-on-dcos-and-apache-mesos ]

From: Sofer, Tovi [ICG-IT]
Sent: יום ג 10 יולי 2018 20:04
To: 'Till Rohrmann' <tr...@apache.org>; user <us...@flink.apache.org>
Cc: Gardi, Hila [ICG-IT] <hg...@imceu.eu.ssmb.com>
Subject: RE: high availability with automated disaster recovery using zookeeper

Hi Till, group,

Thank you for your response.
After reading further online on Mesos – Can’t Mesos fill the requirement of running job manager in primary server?
By using: “constraints”: [[“datacenter”, “CLUSTER”, “main”]]
(See http://www.stratio.com/blog/mesos-multi-data-center-architecture-for-disaster-recovery/ )

Is this supported by Flink cluster on Mesos ?

Thanks again
Tovi

From: Till Rohrmann <tr...@apache.org>>
Sent: יום ג 10 יולי 2018 10:11
To: Sofer, Tovi [ICG-IT] <ts...@imceu.eu.ssmb.com>>
Cc: user <us...@flink.apache.org>>
Subject: Re: high availability with automated disaster recovery using zookeeper

Hi Tovi,

that is an interesting use case you are describing here. I think, however, it depends mainly on the capabilities of ZooKeeper to produce the intended behavior. Flink itself relies on ZooKeeper for leader election in HA mode but does not expose any means to influence the leader election process. To be more precise ZK is used as a blackbox which simply tells a JobManager that it is now the leader, independent of any data center preferences. I'm not sure whether it is possible to tell ZooKeeper about these preferences. If not, then an alternative could be to implement one's own high availability services which does that at the moment.

Cheers,
Till

On Mon, Jul 9, 2018 at 1:48 PM Sofer, Tovi <to...@citi.com>> wrote:
Hi all,

We are now examining how to achieve high availability for Flink, and to support also automatic recovery in disaster scenario- when all DC goes down.
We have DC1 which we usually want work to be done, and DC2 – which is more remote and we want work to go there only when DC1 is down.

We examined few options and would be glad to hear feedback a suggestion for another way to achieve this.

•         Two zookeeper separate zookeeper and flink clusters on the two data centers.
Only the cluster on DC1 are running, and state is copied to DC2 in offline process.

To achieve automatic recovery we need to use some king of watch dog which will check DC1 availability , and if it is down will start DC2 (and same later if DC2 is down).

Is there recommended tool for this?

•         Zookeeper “stretch cluster” cross data centers – with 2 nodes on DC1, 2 nodes on DC2 and one observer node.

Also flink cluster jobmabnager1 on DC1 and jobmanager2 on DC2.

This way when DC1 is down, zookeeper will notice this automatically and will transfer work to jobmanager2 on DC2.

However we would like zookeeper leader, and flink jobmanager leader (primary one) to be from DC1 – unless it is down.

Is there a way to achieve this?

Thanks and regards,
[citi_logo_mail]
Tovi Sofer
Software Engineer
+972 (3) 7405756
[Mail_signature_blue]

RE: high availability with automated disaster recovery using zookeeper

Posted by "Sofer, Tovi " <to...@citi.com>.

Hi Till, group,

Thank you for your response.
After reading further online on Mesos – Can’t Mesos fill the requirement of running job manager in primary server?
By using: “constraints”: [[“datacenter”, “CLUSTER”, “main”]]
(See http://www.stratio.com/blog/mesos-multi-data-center-architecture-for-disaster-recovery/ )

Is this supported by Flink cluster on Mesos ?

Thanks again
Tovi

From: Till Rohrmann <tr...@apache.org>
Sent: יום ג 10 יולי 2018 10:11
To: Sofer, Tovi [ICG-IT] <ts...@imceu.eu.ssmb.com>
Cc: user <us...@flink.apache.org>
Subject: Re: high availability with automated disaster recovery using zookeeper

Hi Tovi,

that is an interesting use case you are describing here. I think, however, it depends mainly on the capabilities of ZooKeeper to produce the intended behavior. Flink itself relies on ZooKeeper for leader election in HA mode but does not expose any means to influence the leader election process. To be more precise ZK is used as a blackbox which simply tells a JobManager that it is now the leader, independent of any data center preferences. I'm not sure whether it is possible to tell ZooKeeper about these preferences. If not, then an alternative could be to implement one's own high availability services which does that at the moment.

Cheers,
Till

On Mon, Jul 9, 2018 at 1:48 PM Sofer, Tovi <to...@citi.com>> wrote:
Hi all,

We are now examining how to achieve high availability for Flink, and to support also automatic recovery in disaster scenario- when all DC goes down.
We have DC1 which we usually want work to be done, and DC2 – which is more remote and we want work to go there only when DC1 is down.

We examined few options and would be glad to hear feedback a suggestion for another way to achieve this.

•         Two zookeeper separate zookeeper and flink clusters on the two data centers.
Only the cluster on DC1 are running, and state is copied to DC2 in offline process.

To achieve automatic recovery we need to use some king of watch dog which will check DC1 availability , and if it is down will start DC2 (and same later if DC2 is down).

Is there recommended tool for this?

•         Zookeeper “stretch cluster” cross data centers – with 2 nodes on DC1, 2 nodes on DC2 and one observer node.

Also flink cluster jobmabnager1 on DC1 and jobmanager2 on DC2.

This way when DC1 is down, zookeeper will notice this automatically and will transfer work to jobmanager2 on DC2.

However we would like zookeeper leader, and flink jobmanager leader (primary one) to be from DC1 – unless it is down.

Is there a way to achieve this?

Thanks and regards,
[citi_logo_mail]
Tovi Sofer
Software Engineer
+972 (3) 7405756
[Mail_signature_blue]

Re: high availability with automated disaster recovery using zookeeper

Posted by Till Rohrmann <tr...@apache.org>.

Hi Tovi,

that is an interesting use case you are describing here. I think, however,
it depends mainly on the capabilities of ZooKeeper to produce the intended
behavior. Flink itself relies on ZooKeeper for leader election in HA mode
but does not expose any means to influence the leader election process. To
be more precise ZK is used as a blackbox which simply tells a JobManager
that it is now the leader, independent of any data center preferences. I'm
not sure whether it is possible to tell ZooKeeper about these preferences.
If not, then an alternative could be to implement one's own high
availability services which does that at the moment.

Cheers,
Till

On Mon, Jul 9, 2018 at 1:48 PM Sofer, Tovi <to...@citi.com> wrote:

> Hi all,
>
>
>
> We are now examining how to achieve high availability for Flink, and to
> support also automatic recovery in disaster scenario- when all DC goes down.
>
> We have DC1 which we usually want work to be done, and DC2 – which is more
> remote and we want work to go there only when DC1 is down.
>
>
>
> We examined few options and would be glad to hear feedback a suggestion
> for another way to achieve this.
>
> ·         Two zookeeper separate zookeeper and flink clusters on the two
> data centers.
>
> Only the cluster on DC1 are running, and state is copied to DC2 in offline
> process.
>
> To achieve automatic recovery we need to use some king of watch dog which
> will check DC1 availability , and if it is down will start DC2 (and same
> later if DC2 is down).
>
> Is there recommended tool for this?
>
> ·         Zookeeper “stretch cluster” cross data centers – with 2 nodes
> on DC1, 2 nodes on DC2 and one observer node.
>
> Also flink cluster jobmabnager1 on DC1 and jobmanager2 on DC2.
>
> This way when DC1 is down, zookeeper will notice this automatically and
> will transfer work to jobmanager2 on DC2.
>
> However we would like zookeeper leader, and flink jobmanager leader
> (primary one) to be from DC1 – unless it is down.
>
> Is there a way to achieve this?
>
>
>
> Thanks and regards,
>
> [image: citi_logo_mail]
>
> *Tovi Sofer*
>
> Software Engineer
> +972 (3) 7405756
>
> [image: Mail_signature_blue]
>
>
>

Re: FW: high availability with automated disaster recovery using zookeeper

Posted by Scott Kidder <ki...@gmail.com>.

Hi Tovi, we run all services (Flink, Zookeeper, Hadoop HDFS, and Consul) in
a Kubernetes cluster in each data center. Kubernetes will automatically
restart/reschedule any services that crash or become unhealthy. This is a
little outside the scope of Flink, and I'd be happy to discuss it further
off-list.

Best,

--Scott Kidder

On Mon, Jul 16, 2018 at 5:11 AM Sofer, Tovi <to...@citi.com> wrote:

> Thank you Scott,
>
> Looks like a very elegant solution.
>
>
>
> How did you manage high availability in single data center?
>
>
>
> Thanks,
>
> Tovi
>
>
>
> *From:* Scott Kidder <ki...@gmail.com>
> *Sent:* יום ו 13 יולי 2018 01:13
> *To:* Sofer, Tovi [ICG-IT] <ts...@imceu.eu.ssmb.com>
> *Cc:* user@flink.apache.org
> *Subject:* Re: high availability with automated disaster recovery using
> zookeeper
>
>
>
> I've used a multi-datacenter Consul cluster used to coordinate
> service-discovery. When a service starts up in the primary DC, it registers
> itself in Consul with a key that has a TTL that must be periodically
> renewed. If the service shuts down or terminates abruptly, the key expires
> and is removed from Consul. A standby service in another DC can be started
> automatically after detecting the absence of the key in Consul in the
> primary DC. This could lead to submitting a job to the standby Flink
> cluster from the most recent savepoint that was copied by the offline
> process you mentioned. It should be pretty easy to automate all of this. I
> would not recommend setting up a multi-datacenter Zookeeper cluster; in my
> experience, Consul is much easier to work with.
>
>
>
> Best,
>
>
>
> --
>
> Scott Kidder
>
>
>
>
>
> On Mon, Jul 9, 2018 at 4:48 AM Sofer, Tovi <to...@citi.com> wrote:
>
> Hi all,
>
>
>
> We are now examining how to achieve high availability for Flink, and to
> support also automatic recovery in disaster scenario- when all DC goes down.
>
> We have DC1 which we usually want work to be done, and DC2 – which is more
> remote and we want work to go there only when DC1 is down.
>
>
>
> We examined few options and would be glad to hear feedback a suggestion
> for another way to achieve this.
>
> ·         Two zookeeper separate zookeeper and flink clusters on the two
> data centers.
>
> Only the cluster on DC1 are running, and state is copied to DC2 in offline
> process.
>
> To achieve automatic recovery we need to use some king of watch dog which
> will check DC1 availability , and if it is down will start DC2 (and same
> later if DC2 is down).
>
> Is there recommended tool for this?
>
> ·         Zookeeper “stretch cluster” cross data centers – with 2 nodes
> on DC1, 2 nodes on DC2 and one observer node.
>
> Also flink cluster jobmabnager1 on DC1 and jobmanager2 on DC2.
>
> This way when DC1 is down, zookeeper will notice this automatically and
> will transfer work to jobmanager2 on DC2.
>
> However we would like zookeeper leader, and flink jobmanager leader
> (primary one) to be from DC1 – unless it is down.
>
> Is there a way to achieve this?
>
>
>
> Thanks and regards,
>
> [image: citi_logo_mail]
>
> *Tovi Sofer*
>
> Software Engineer
> +972 (3) 7405756
>
> [image: Mail_signature_blue]
>
>
>
>

FW: high availability with automated disaster recovery using zookeeper

Posted by "Sofer, Tovi " <to...@citi.com>.

Thank you Scott,
Looks like a very elegant solution.

How did you manage high availability in single data center?

Thanks,
Tovi

From: Scott Kidder <ki...@gmail.com>
Sent: יום ו 13 יולי 2018 01:13
To: Sofer, Tovi [ICG-IT] <ts...@imceu.eu.ssmb.com>
Cc: user@flink.apache.org
Subject: Re: high availability with automated disaster recovery using zookeeper

I've used a multi-datacenter Consul cluster used to coordinate service-discovery. When a service starts up in the primary DC, it registers itself in Consul with a key that has a TTL that must be periodically renewed. If the service shuts down or terminates abruptly, the key expires and is removed from Consul. A standby service in another DC can be started automatically after detecting the absence of the key in Consul in the primary DC. This could lead to submitting a job to the standby Flink cluster from the most recent savepoint that was copied by the offline process you mentioned. It should be pretty easy to automate all of this. I would not recommend setting up a multi-datacenter Zookeeper cluster; in my experience, Consul is much easier to work with.

Best,

--
Scott Kidder


On Mon, Jul 9, 2018 at 4:48 AM Sofer, Tovi <to...@citi.com>> wrote:
Hi all,

We are now examining how to achieve high availability for Flink, and to support also automatic recovery in disaster scenario- when all DC goes down.
We have DC1 which we usually want work to be done, and DC2 – which is more remote and we want work to go there only when DC1 is down.

We examined few options and would be glad to hear feedback a suggestion for another way to achieve this.

•         Two zookeeper separate zookeeper and flink clusters on the two data centers.
Only the cluster on DC1 are running, and state is copied to DC2 in offline process.

To achieve automatic recovery we need to use some king of watch dog which will check DC1 availability , and if it is down will start DC2 (and same later if DC2 is down).

Is there recommended tool for this?

•         Zookeeper “stretch cluster” cross data centers – with 2 nodes on DC1, 2 nodes on DC2 and one observer node.

Also flink cluster jobmabnager1 on DC1 and jobmanager2 on DC2.

This way when DC1 is down, zookeeper will notice this automatically and will transfer work to jobmanager2 on DC2.

However we would like zookeeper leader, and flink jobmanager leader (primary one) to be from DC1 – unless it is down.

Is there a way to achieve this?

Thanks and regards,
[citi_logo_mail]
Tovi Sofer
Software Engineer
+972 (3) 7405756
[Mail_signature_blue]

Re: high availability with automated disaster recovery using zookeeper

Posted by Scott Kidder <ki...@gmail.com>.

I've used a multi-datacenter Consul cluster used to coordinate
service-discovery. When a service starts up in the primary DC, it registers
itself in Consul with a key that has a TTL that must be periodically
renewed. If the service shuts down or terminates abruptly, the key expires
and is removed from Consul. A standby service in another DC can be started
automatically after detecting the absence of the key in Consul in the
primary DC. This could lead to submitting a job to the standby Flink
cluster from the most recent savepoint that was copied by the offline
process you mentioned. It should be pretty easy to automate all of this. I
would not recommend setting up a multi-datacenter Zookeeper cluster; in my
experience, Consul is much easier to work with.

Best,

--
Scott Kidder

On Mon, Jul 9, 2018 at 4:48 AM Sofer, Tovi <to...@citi.com> wrote:

> Hi all,
>
>
>
> We are now examining how to achieve high availability for Flink, and to
> support also automatic recovery in disaster scenario- when all DC goes down.
>
> We have DC1 which we usually want work to be done, and DC2 – which is more
> remote and we want work to go there only when DC1 is down.
>
>
>
> We examined few options and would be glad to hear feedback a suggestion
> for another way to achieve this.
>
> ·         Two zookeeper separate zookeeper and flink clusters on the two
> data centers.
>
> Only the cluster on DC1 are running, and state is copied to DC2 in offline
> process.
>
> To achieve automatic recovery we need to use some king of watch dog which
> will check DC1 availability , and if it is down will start DC2 (and same
> later if DC2 is down).
>
> Is there recommended tool for this?
>
> ·         Zookeeper “stretch cluster” cross data centers – with 2 nodes
> on DC1, 2 nodes on DC2 and one observer node.
>
> Also flink cluster jobmabnager1 on DC1 and jobmanager2 on DC2.
>
> This way when DC1 is down, zookeeper will notice this automatically and
> will transfer work to jobmanager2 on DC2.
>
> However we would like zookeeper leader, and flink jobmanager leader
> (primary one) to be from DC1 – unless it is down.
>
> Is there a way to achieve this?
>
>
>
> Thanks and regards,
>
> [image: citi_logo_mail]
>
> *Tovi Sofer*
>
> Software Engineer
> +972 (3) 7405756
>
> [image: Mail_signature_blue]
>
>
>