You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by "Koffman, Noa (Nokia - IL/Kfar Sava)" <no...@nokia.com> on 2022/02/03 14:11:44 UTC

Flink High-Availability and Job-Manager recovery

Hi all,
We are currently deploying flink on k8s 3 nodes cluster - with 1 job-manager and 3 task managers
We are trying to understand the recommendation for deployment, more specifically for recovery from job-manager failure, and have some questions about that:


  1.  If we use flink HA solution (either Kubernetes-HA or zookeeper), the documentation states we should define the ‘high-availability.storageDir

In the examples we found, there is mostly hdfs or s3 storage.

We were wondering if we could use Kubernetes PersistentVolumes and PersistentVolumeClaims, if we do use that, can each job-manager have its own volume? Or it must be shared?

  1.  Is there a solution for jobmanager recovery without HA? With the way our flink is currenly configured, killing the job-manager pod, all the jobs are lost.

Is there a way to configure the job-manager so that if it goes down and k8s restarts it, it will continue from the same state (restart all the tasks, etc…)?

For this, can a Persistent Volume be used, without HDFS or external solutions?

  1.  Regarding the deployment mode: we are working with beam + flink, and flink is running in session mode, we have a few long running streaming pipelines deployed (less then 10).

Is ‘session’ mode the right deployment mode for our type of deployment? Or should we consider switching to something different? (Per-job/application)



Thanks

Re: Flink High-Availability and Job-Manager recovery

Posted by "Koffman, Noa (Nokia - IL/Kfar Sava)" <no...@nokia.com>.

Hi, thanks for your reply, it was very helpful.

we tried to go with the 2nd approach, enabling HA mode, and added these conf values:
    high-availability: zookeeper
    high-availability.zookeeper.quorum: zk-noa-edge-infra:2181
    high-availability.zookeeper.path.root: /flink
    high-availability.cluster-id: /flink
    high-availability.storageDir: /flink_state
    high-availability.jobmanager.port: 6150

for the storageDir, we are using a k8s persistent volume with ReadWriteOnce



Recovery of job-manager failure is working now, but it looks like there are issues with the task-managers:

The same configuration file is used in the task-managers as well, and there are a lot of error in the task-manager’s logs –

java.io.FileNotFoundException: /flink_state/flink/blob/job_9f4be579c7ab79817e25ed56762b7623/blob_p-5cf39313e388d9120c235528672fd267105be0e0-938e4347a98aa6166dc2625926fdab56 (No such file or directory)



It seems that the taskmanagers are trying to access the jobmanager’s storage dir – can this be avoided?

The task manager does not have access to the job manager persistent volume – is this mandatory?

If we don’t have the option to use shared storage, is there a way to make zookeeper hold and manage the job states, instead of using the shared storage?



Thank

Noa






From: bastien dine <ba...@gmail.com>
Date: Friday, 4 February 2022 at 10:56
To: Koffman, Noa (Nokia - IL/Kfar Sava) <no...@nokia.com>
Cc: user@flink.apache.org <us...@flink.apache.org>
Subject: Re: Flink High-Availability and Job-Manager recovery
Hello,

On k8s the current recommendation is to set up 1 job manager with H-A enabled, so that cluster do not lost state upon crash

1. The storage dir can for sure be on kube PV, the directory should be shared within all JM, you will need to map the volume to the same local directory (e.g /data) so that the configuration amongst JM is the same
2. You can have only 1 JM, but you still need to enabled HA, since HA will write the cluster state into ZK & storage dir
3. I don't know anything about beam, so I can not help you with that,
But per-job mode will not be available on k8s (neither native nor standalone kube) https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/deployment/resource-providers/standalone/kubernetes/#per-job-mode & https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/deployment/resource-providers/native_kubernetes/#per-job-mode, you will need YARN to do so (i think MESOS is deprecated)
Application mode can be a bit tricky to understand, it will "move" the submit of the job inside the JM
The chosen solution will depends on your deployment needs, I can't tell you without knowing more,
But going into session mode + streaming job deployment is pretty standard and you can easily emulate "one cluster per job" with it (better for ops & tuning matters than a cluster with multiple jobs)

Hope this can help,
Regards,
Bastien


Le jeu. 3 févr. 2022 à 15:12, Koffman, Noa (Nokia - IL/Kfar Sava) <no...@nokia.com>> a écrit :
Hi all,
We are currently deploying flink on k8s 3 nodes cluster - with 1 job-manager and 3 task managers
We are trying to understand the recommendation for deployment, more specifically for recovery from job-manager failure, and have some questions about that:


  1.  If we use flink HA solution (either Kubernetes-HA or zookeeper), the documentation states we should define the ‘high-availability.storageDir

In the examples we found, there is mostly hdfs or s3 storage.

We were wondering if we could use Kubernetes PersistentVolumes and PersistentVolumeClaims, if we do use that, can each job-manager have its own volume? Or it must be shared?

  1.  Is there a solution for jobmanager recovery without HA? With the way our flink is currenly configured, killing the job-manager pod, all the jobs are lost.

Is there a way to configure the job-manager so that if it goes down and k8s restarts it, it will continue from the same state (restart all the tasks, etc…)?

For this, can a Persistent Volume be used, without HDFS or external solutions?

  1.  Regarding the deployment mode: we are working with beam + flink, and flink is running in session mode, we have a few long running streaming pipelines deployed (less then 10).

Is ‘session’ mode the right deployment mode for our type of deployment? Or should we consider switching to something different? (Per-job/application)



Thanks

Re: Flink High-Availability and Job-Manager recovery

Posted by bastien dine <ba...@gmail.com>.

Hello,

On k8s the current recommendation is to set up 1 job manager with H-A
enabled, so that cluster do not lost state upon crash

1. The storage dir can for sure be on kube PV, the directory should be
shared within all JM, you will need to map the volume to the same local
directory (e.g /data) so that the configuration amongst JM is the same
2. You can have only 1 JM, but you still need to enabled HA, since HA will
write the cluster state into ZK & storage dir
3. I don't know anything about beam, so I can not help you with that,
But per-job mode will not be available on k8s (neither native nor
standalone kube)
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/deployment/resource-providers/standalone/kubernetes/#per-job-mode
&
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/deployment/resource-providers/native_kubernetes/#per-job-mode,
you will need YARN to do so (i think MESOS is deprecated)
Application mode can be a bit tricky to understand, it will "move" the
submit of the job inside the JM
The chosen solution will depends on your deployment needs, I can't tell you
without knowing more,
But going into session mode + streaming job deployment is pretty standard
and you can easily emulate "one cluster per job" with it (better for ops &
tuning matters than a cluster with multiple jobs)

Hope this can help,
Regards,
Bastien


Le jeu. 3 févr. 2022 à 15:12, Koffman, Noa (Nokia - IL/Kfar Sava) <
noa.koffman@nokia.com> a écrit :

> Hi all,
>
> We are currently deploying flink on k8s 3 nodes cluster - with 1
> job-manager and 3 task managers
>
> We are trying to understand the recommendation for deployment, more
> specifically for recovery from job-manager failure, and have some questions
> about that:
>
>
>
>    1. If we use flink HA solution (either Kubernetes-HA or zookeeper),
>    the documentation states we should define the ‘high-availability.storageDir
>
> In the examples we found, there is mostly hdfs or s3 storage.
>
> We were wondering if we could use Kubernetes PersistentVolumes and
> PersistentVolumeClaims, if we do use that, can each job-manager have its
> own volume? Or it must be shared?
>
>    1. Is there a solution for jobmanager recovery without HA? With the
>    way our flink is currenly configured, killing the job-manager pod, all the
>    jobs are lost.
>
> Is there a way to configure the job-manager so that if it goes down and
> k8s restarts it, it will continue from the same state (restart all the
> tasks, etc…)?
>
> For this, can a Persistent Volume be used, without HDFS or external
> solutions?
>
>    1. Regarding the deployment mode: we are working with beam + flink,
>    and flink is running in session mode, we have a few long running streaming
>    pipelines deployed (less then 10).
>
> Is ‘session’ mode the right deployment mode for our type of deployment? Or
> should we consider switching to something different? (Per-job/application)
>
>
>
> Thanks
>
>
>
>
>
>
>
>
>