You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Srinath Perera <he...@gmail.com> on 2017/10/17 09:53:20 UTC

Minimal HA Setup for Apache Flink

Hi All,

I am trying to write an article comparing minimal HA(Highly available)
deployments of different streaming processing systems.

Basically, the question is if an organization has a limited workload, such
as 10k events per second, which might grow in the future, what is the
minimal setup they can use to run a highly available Stream Processor?

Could someone help answer following questions?

   1. How many nodes minimal Apache Flink HA setup needs? As I understood
   from [2], it is zookeeper nodes + 2 job managers without YARN and 1 job
   manager with YARN + worker nodes? Is this correct?
   2. As per [1], Zookeeper needs minimal 3 nodes to provide HA. Is there a
   way to run Apache Flink without HA?
   3. If someone runs Apache Flink without HA, but use state snapshots, how
   fast it can recover after a failure? ( ballpark figure)

Thanks
Srinath


   1.
   https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html#Deploying_ZooKeeper
   2.
   https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/jobmanager_high_availability.html#standalone-cluster-high-availability


-- 
============================
Srinath Perera, Ph.D.
   http://people.apache.org/~hemapani/
   http://srinathsview.blogspot.com/

Re: Minimal HA Setup for Apache Flink

Posted by Aljoscha Krettek <al...@apache.org>.
Hi,

Your assumptions are mostly correct.

1. This is correct, but you can also run a non-YARN setup where you only have one JobManager if you have a system that will make sure to restart/keep alive this JobManager. This could either be some supervisor, or Kubernetes, or Mesos. You also probably need to factor in the distributed filesystem (or similar thing) that you need for state snapshots.

2. You can run Flink without HA but then a failure will bring the complete cluster down, meaning any state checkpoints/snapshots will be lost. You can get around this by enabling externalised checkpoints [1]. With this, you can restore from a checkpoint even after the cluster failed.

3. In order to recover from failures you always need state snapshots. HA only makes the JobManager failure resilient. That being said, restarting the cluster after failure and recovering from an externalised checkpoint should probably take a couple of minutes if you don't have too many nodes.

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/checkpoints.html#externalized-checkpoints <https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/checkpoints.html#externalized-checkpoints>

Best,
Aljoscha

> On 17. Oct 2017, at 11:53, Srinath Perera <he...@gmail.com> wrote:
> 
> Hi All,
> 
> I am trying to write an article comparing minimal HA(Highly available)
> deployments of different streaming processing systems.
> 
> Basically, the question is if an organization has a limited workload, such
> as 10k events per second, which might grow in the future, what is the
> minimal setup they can use to run a highly available Stream Processor?
> 
> Could someone help answer following questions?
> 
>   1. How many nodes minimal Apache Flink HA setup needs? As I understood
>   from [2], it is zookeeper nodes + 2 job managers without YARN and 1 job
>   manager with YARN + worker nodes? Is this correct?
>   2. As per [1], Zookeeper needs minimal 3 nodes to provide HA. Is there a
>   way to run Apache Flink without HA?
>   3. If someone runs Apache Flink without HA, but use state snapshots, how
>   fast it can recover after a failure? ( ballpark figure)
> 
> Thanks
> Srinath
> 
> 
>   1.
>   https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html#Deploying_ZooKeeper
>   2.
>   https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/jobmanager_high_availability.html#standalone-cluster-high-availability
> 
> 
> -- 
> ============================
> Srinath Perera, Ph.D.
>   http://people.apache.org/~hemapani/
>   http://srinathsview.blogspot.com/